Google’s AMIE beats doctors on key simulated disease-management tasks

In a blinded virtual study, Google’s AMIE matched primary care doctors overall and outperformed them on several management-reasoning measures, but researchers caution that the system remains experimental and untested in real clinical care.

Study: Towards Conversational AI for Disease Management Image Credit: Krot_Studio / Shutterstock

Study: Towards Conversational AI for Disease Management. Image Credit: Krot_Studio / Shutterstock

New Google research, published as an Accelerated Article Preview in the journal Nature, describes the potential clinical value of a large language model-based research artificial intelligence system, Articulate Medical Intelligence Explorer (AMIE), for simulated, multi-visit disease-management reasoning.

Background

Large language model (LLM)-based artificial intelligence (AI) systems are showing growing promise in clinical settings, not only for accurate diagnosis but also for collecting medical history through conversations in a natural, empathetic style that helps build trustworthy relationships with patients.

Although several AI models have been developed for diagnostic reasoning, their capabilities in multi-visit disease management, such as monitoring disease progression and therapeutic response across multiple clinical visits and safe medication prescription, largely remain unexplored.

A team of researchers at Google DeepMind and Google Research, California, USA, evaluated the capabilities of Articulate Medical Intelligence Explorer (AMIE), which is an LLM-based research AI system with physician-like performance on conversational diagnostic tasks, in disease management over time.

To advance AMIE for management reasoning, the team developed an LLM-based agentic system comprising an empathetic dialogue agent for synchronous text-chat patient conversations and a management reasoning agent that performs more extensive inference-time reasoning and cross-references up-to-date clinical practice guidelines and drug formularies.

The disease-management version of AMIE used the Gemini models' long-context capabilities to track longitudinal patient data across follow-up visits.

To benchmark medication reasoning, the team developed RxQA, a multiple-choice question benchmark derived from two national drug formularies (OpenFDA and the British National Formulary) and validated by board-certified pharmacists.

The team next conducted a randomized, blinded, virtual Objective Structured Clinical Examination study to compare the multi-visit disease-management reasoning capabilities of AMIE with those of 21 primary care physicians across 100 multi-visit case scenarios designed to reflect UK NICE Guidance and BMJ Best Practice clinical practice guidelines.

Key findings

The comparative analysis revealed that AMIE was non-inferior to primary care physicians on overall management reasoning and scored significantly higher than physicians on appropriateness of the overall plan and treatment recommendations across all three visits.

In addition to treatment precision, AMIE’s precision in recommending investigations was significantly higher than that of physicians across all three visits.

For at least one of the three visits, AMIE scored significantly higher than physicians on being free of significant errors, providing appropriate follow-up recommendations, and avoiding inappropriate treatments.

Regarding the use of clinical guidelines, both AMIE and physicians scored similarly high on selecting applicable guidelines. However, AMIE scored significantly higher than physicians in recommending treatments and investigations that aligned with the guidelines and in explicitly grounding recommendations in guideline references.

To compare medication reasoning accuracy, the research team used lower-difficulty and higher-difficulty question benchmarks (RxQA) and an “open-book” and a “closed-book” setting. The “open-book” setting allowed both AMIE and physicians to search for relevant information. In the “closed-book” setting, neither physicians nor AMIE had access to external knowledge resources.

The comparative analysis revealed that access to external drug information was beneficial for both physicians and AMIE. However, AMIE outperformed physicians on greater difficulty questions in both “open-book” and “closed-book” settings.

Study significance

The study highlights the potential of the LLM-based research AI system, AMIE, as a promising future tool for multi-visit disease management. The findings reveal that AMIE can perform with similar quality, or in some cases better, than physicians across a variety of disease management reasoning challenges.

Globally, health care systems are experiencing increased care fragmentation, meaning that a patient’s care is spread across several physicians, settings, or systems that share little or no information with one another. Such care fragmentation is associated with worsened morbidity for patients with chronic diseases. Based on current findings, the Google research team suggests that AMIE may one day serve as a point of continuity in otherwise fragmented health systems, either independently or in collaboration with physicians.

The team also believes that, with rigorous clinical testing, such systems can address the growing unmet clinical needs caused by global shortages and inequalities in physician availability, physician burnout, and increasingly complex patient populations.

However, the study was conducted in simulated, text-chat consultations with trained patient actors, not in real clinical care, and the authors state that AMIE is not ready for clinical use. The scenarios were constructed for evaluation, the case mix was not representative of routine primary care, and the study did not test effects on patient outcomes.

The observed capabilities of AMIE reflect the rapid advancement of LLMs in clinical conversation and reasoning. The rapid improvement of state-of-the-art LLMs may help mitigate current limitations, such as confabulations (the generation of false, misleading, or entirely fabricated responses), which otherwise pose considerable risks in clinical medicine.

Overall, the study demonstrates the evolution of Google’s AMIE research system from conversational diagnostic AI toward a multi-visit disease-management reasoning system. Although the model system has been tested using global measures of management reasoning, the researchers urge that it be seen as a first step in measuring management reasoning and highlight the need for future work to explore the reasoning traces of medical AI systems in a comprehensive, quantitative manner.

Journal reference:
Dr. Sanchari Sinha Dutta

Written by

Dr. Sanchari Sinha Dutta

Dr. Sanchari Sinha Dutta is a science communicator who believes in spreading the power of science in every corner of the world. She has a Bachelor of Science (B.Sc.) degree and a Master's of Science (M.Sc.) in biology and human physiology. Following her Master's degree, Sanchari went on to study a Ph.D. in human physiology. She has authored more than 10 original research articles, all of which have been published in world renowned international journals.

Citations

Please use one of the following formats to cite this article in your essay, paper or report:

  • APA

    Dutta, Sanchari Sinha Dutta. (2026, June 21). Google’s AMIE beats doctors on key simulated disease-management tasks. News-Medical. Retrieved on June 21, 2026 from https://www.news-medical.net/news/20260621/Googlee28099s-AMIE-beats-doctors-on-key-simulated-disease-management-tasks.aspx.

  • MLA

    Dutta, Sanchari Sinha Dutta. "Google’s AMIE beats doctors on key simulated disease-management tasks". News-Medical. 21 June 2026. <https://www.news-medical.net/news/20260621/Googlee28099s-AMIE-beats-doctors-on-key-simulated-disease-management-tasks.aspx>.

  • Chicago

    Dutta, Sanchari Sinha Dutta. "Google’s AMIE beats doctors on key simulated disease-management tasks". News-Medical. https://www.news-medical.net/news/20260621/Googlee28099s-AMIE-beats-doctors-on-key-simulated-disease-management-tasks.aspx. (accessed June 21, 2026).

  • Harvard

    Dutta, Sanchari Sinha Dutta. 2026. Google’s AMIE beats doctors on key simulated disease-management tasks. News-Medical, viewed 21 June 2026, https://www.news-medical.net/news/20260621/Googlee28099s-AMIE-beats-doctors-on-key-simulated-disease-management-tasks.aspx.

Comments

The opinions expressed here are the views of the writer and do not necessarily reflect the views and opinions of News Medical.
Post a new comment
Post

While we only use edited and approved content for Azthena answers, it may on occasions provide incorrect responses. Please confirm any data provided with the related suppliers or authors. We do not provide medical advice, if you search for medical information you must always consult a medical professional before acting on any information provided.

Your questions, but not your email details will be shared with OpenAI and retained for 30 days in accordance with their privacy principles.

Please do not ask questions that use sensitive or confidential information.

Read the full Terms & Conditions.

You might also like...
Why AI tools need clearer guardrails in high-stakes health research