AMIE AI system outshines doctors in simulated medical consultations

Download PDF Copy

By Pooja Toshniwal PahariaReviewed by Susha Cheriyedath, M.Sc.Jan 16 2024

In a recent study posted to the arXiv preprint* server, researchers at Google Research and Google DeepMind introduced Articulate Medical Intelligence Explorer (AMIE), a Large Language Model (LLM)-based artificial intelligence (AI) system to optimize diagnostic dialogue.

The physician-patient interaction is at the core of medicine, where skilled history-taking sets the path for correct diagnosis, successful care, and long-term trust. Artificial intelligence systems capable of diagnostic discourse can improve accessibility, consistency, and quality of care. However, simulating physician skills is a significant issue.

Study: Towards Conversational Diagnostic AI

*Important notice: arXiv publishes preliminary scientific reports that are not peer-reviewed and, therefore, should not be regarded as conclusive, guide clinical practice/health-related behavior, or treated as established information.

About the study

The researchers of the present study developed the AMIE framework for conversational AI applications.

The team created a self-play-driven simulated discussion environment with automatic feedback to extend AMIE's learning capabilities across multiple medical conditions, settings, and specializations. They also implemented inference-time-based chain-of-reasoning techniques to improve AMIE's conversation quality and diagnostic accuracy. During online inference, the techniques gradually modified AMIE answers based on the present discussion, resulting in accurate and grounded responses to patients at each dialogue turn.

The team used an iterative self-improvement strategy comprised of two self-play cycles. The inner loop modified its behavior on AI patient agents based on in-context critic input, while the outer loop included refined conversations into subsequent fine-tuning cycles. To illustrate the improvement, they used the auto-evaluation approach on simulated talks before and after the self-play procedure.

The team developed the AMIE framework to evaluate clinically significant performance parameters such as history recording, diagnostic accuracy, management reasoning, communication, and comprehension. The researchers created a prototype assessment criterion to measure history-taking, communication skills, diagnostic reasoning, and comprehending of medical diagnostic conversational artificial intelligence, including clinician- and patient-centered metrics.

The team conducted a remote double-blinded randomized crossover study including 149 clinical case situations from health providers in the United Kingdom, India, and Canada. Randomization enabled counterbalanced comparisons of the AMIE framework to 20 primary care physicians (PCPs) while consulting with verified patient actors.

The Objective Structured Clinical Examination (OSCE) modeled patients using an online multi-turn synchronous text conversation and generated post-questionnaire responses. Specialist physicians and patient actors reviewed the data. The researchers performed several analyses to improve understanding of AMIE's capabilities, identified primary limitations, and offered essential next steps for real-world clinical translation of AMIE. They assessed conversational features using the General Medical Council Patient Questionnaire (GMCPQ), the Practical Assessment of Clinical Examination Skills (PACES), and a narrative analysis of Patient-Centered Communication Best Practice (PCCBP).

Results

The study showed that AMIE outperformed PCPs on 28 of 32 assessment axes from specialist physician perspectives and 24 out of 26 assessment axes from patient actor perspectives. Under specialist physician examination, AMIE demonstrated higher differential diagnosis (DDx) accuracy than PCPs, with the highest gains in the cardiovascular and respiratory specialties. According to auto-evaluation, AMIE was as efficient as PCPs in data collection. The team used the same procedure to replicate the differential diagnosis precision analysis with the model auto-evaluators rather than specialist raters and noticed that the auto-evaluator's performance trends aligned with specialist evaluations despite minor differences in computed values for accuracy.

Superiority of large language model #AI randomized vs 20 primary care physicians on 149 case scenarios for
diagnostic accuracy, conversation, communication, clinical exam skills, empathy, & management plan https://t.co/RdmBrrXwzb @taotu831 @alan_karthi @vivnat @GoogleDeepMind pic.twitter.com/9MuZLG9BUc
— Eric Topol (@EricTopol) January 12, 2024

The study compared AMIE's DDx performance to that generated by primary care physician consultations using the differential diagnosis auto-evaluator and identified comparable DDx performance. The findings indicated consistent diagnostic performance irrespective of AMIE processing data from its dialogues or those of the PCPs. Both techniques outperformed PCPs' differential diagnosis considerably.

Regarding the overall word counts generated in their replies throughout the consultation, AMIE was more verbose than PCPs. However, conversational turns and word counts obtained from patient actors were comparable among the OSCE agents, implying that the AMIE system and primary care physicians gathered equivalent amounts of patient data during the interaction.

According to specialists and patient actors, AMIE outperformed PCPs in conversation quality. Patient actors judged AMIE consultations considerably higher than those from PCPs on 24 of 26 dimensions. For scenarios within their realm of competence, specialist physicians evaluated both the conversational quality and replies to the post-questionnaire. The findings indicate that AMIE was as effective as PCPs in extracting relevant data during simulated consultations and was more accurate than PCPs in forming a comprehensive differential diagnosis when given the same amount of data.

Overall, the study findings highlighted the potential of the AMIE conversational artificial intelligence system for clinical history-taking and diagnostic discourse. AMIE, which blends real-world and virtual medical conversations, scored higher than PCPs on multiple dimensions. The study, however, had limitations since clinicians were limited to unfamiliar synchronous text conversations. AMIE's success in simulated consultations is a huge step forward, but converting it into real-world tools requires more study to assure safety, dependability, fairness, efficacy, and privacy.

Journal reference:

Preliminary scientific report. Tao Tu et al., Towards Conversational Diagnostic AI, arXiv:2401.05654, 2024, DOI: 10.48550/arXiv.2401.05654, https://arxiv.org/abs/2401.05654

Posted in: Device / Technology News | Medical Science News | Medical Research News | Disease/Infection News

Comments (0)

Written by

Pooja Toshniwal Paharia

Pooja Toshniwal Paharia is an oral and maxillofacial physician and radiologist based in Pune, India. Her academic background is in Oral Medicine and Radiology. She has extensive experience in research and evidence-based clinical-radiological diagnosis and management of oral lesions and conditions and associated maxillofacial disorders.

Download PDF Copy

Citations

Please use one of the following formats to cite this article in your essay, paper or report:

APA
Toshniwal Paharia, Pooja Toshniwal Paharia. (2024, January 16). AMIE AI system outshines doctors in simulated medical consultations. News-Medical. Retrieved on February 09, 2026 from https://www.news-medical.net/news/20240116/AMIE-AI-system-outshines-doctors-in-simulated-medical-consultations.aspx.
MLA
Toshniwal Paharia, Pooja Toshniwal Paharia. "AMIE AI system outshines doctors in simulated medical consultations". News-Medical. 09 February 2026. <https://www.news-medical.net/news/20240116/AMIE-AI-system-outshines-doctors-in-simulated-medical-consultations.aspx>.
Chicago
Toshniwal Paharia, Pooja Toshniwal Paharia. "AMIE AI system outshines doctors in simulated medical consultations". News-Medical. https://www.news-medical.net/news/20240116/AMIE-AI-system-outshines-doctors-in-simulated-medical-consultations.aspx. (accessed February 09, 2026).
Harvard
Toshniwal Paharia, Pooja Toshniwal Paharia. 2024. AMIE AI system outshines doctors in simulated medical consultations. News-Medical, viewed 09 February 2026, https://www.news-medical.net/news/20240116/AMIE-AI-system-outshines-doctors-in-simulated-medical-consultations.aspx.

Comments

The opinions expressed here are the views of the writer and do not necessarily reflect the views and opinions of News Medical.

Post a new comment

(Logout)

Post

Sign in to keep reading

We're committed to providing free access to quality science. By registering and providing insight into your preferences you're joining a community of over 1m science interested individuals and help us to provide you with insightful content whilst keeping our service free.