AMIE AI system outshines doctors in simulated medical consultations

NewsGuard 100/100 Score

In a recent study posted to the arXiv preprint* server, researchers at Google Research and Google DeepMind introduced Articulate Medical Intelligence Explorer (AMIE), a Large Language Model (LLM)-based artificial intelligence (AI) system to optimize diagnostic dialogue.

The physician-patient interaction is at the core of medicine, where skilled history-taking sets the path for correct diagnosis, successful care, and long-term trust. Artificial intelligence systems capable of diagnostic discourse can improve accessibility, consistency, and quality of care. However, simulating physician skills is a significant issue.

Study: Towards Conversational Diagnostic AI

*Important notice: arXiv publishes preliminary scientific reports that are not peer-reviewed and, therefore, should not be regarded as conclusive, guide clinical practice/health-related behavior, or treated as established information.

About the study

The researchers of the present study developed the AMIE framework for conversational AI applications.

The team created a self-play-driven simulated discussion environment with automatic feedback to extend AMIE's learning capabilities across multiple medical conditions, settings, and specializations. They also implemented inference-time-based chain-of-reasoning techniques to improve AMIE's conversation quality and diagnostic accuracy. During online inference, the techniques gradually modified AMIE answers based on the present discussion, resulting in accurate and grounded responses to patients at each dialogue turn.

The team used an iterative self-improvement strategy comprised of two self-play cycles. The inner loop modified its behavior on AI patient agents based on in-context critic input, while the outer loop included refined conversations into subsequent fine-tuning cycles. To illustrate the improvement, they used the auto-evaluation approach on simulated talks before and after the self-play procedure.

The team developed the AMIE framework to evaluate clinically significant performance parameters such as history recording, diagnostic accuracy, management reasoning, communication, and comprehension. The researchers created a prototype assessment criterion to measure history-taking, communication skills, diagnostic reasoning, and comprehending of medical diagnostic conversational artificial intelligence, including clinician- and patient-centered metrics.

The team conducted a remote double-blinded randomized crossover study including 149 clinical case situations from health providers in the United Kingdom, India, and Canada. Randomization enabled counterbalanced comparisons of the AMIE framework to 20 primary care physicians (PCPs) while consulting with verified patient actors. 

The Objective Structured Clinical Examination (OSCE) modeled patients using an online multi-turn synchronous text conversation and generated post-questionnaire responses. Specialist physicians and patient actors reviewed the data. The researchers performed several analyses to improve understanding of AMIE's capabilities, identified primary limitations, and offered essential next steps for real-world clinical translation of AMIE. They assessed conversational features using the General Medical Council Patient Questionnaire (GMCPQ), the Practical Assessment of Clinical Examination Skills (PACES), and a narrative analysis of Patient-Centered Communication Best Practice (PCCBP).

Results

The study showed that AMIE outperformed PCPs on 28 of 32 assessment axes from specialist physician perspectives and 24 out of 26 assessment axes from patient actor perspectives. Under specialist physician examination, AMIE demonstrated higher differential diagnosis (DDx) accuracy than PCPs, with the highest gains in the cardiovascular and respiratory specialties. According to auto-evaluation, AMIE was as efficient as PCPs in data collection. The team used the same procedure to replicate the differential diagnosis precision analysis with the model auto-evaluators rather than specialist raters and noticed that the auto-evaluator's performance trends aligned with specialist evaluations despite minor differences in computed values for accuracy.

The study compared AMIE's DDx performance to that generated by primary care physician consultations using the differential diagnosis auto-evaluator and identified comparable DDx performance. The findings indicated consistent diagnostic performance irrespective of AMIE processing data from its dialogues or those of the PCPs. Both techniques outperformed PCPs' differential diagnosis considerably.

Regarding the overall word counts generated in their replies throughout the consultation, AMIE was more verbose than PCPs. However, conversational turns and word counts obtained from patient actors were comparable among the OSCE agents, implying that the AMIE system and primary care physicians gathered equivalent amounts of patient data during the interaction.

According to specialists and patient actors, AMIE outperformed PCPs in conversation quality. Patient actors judged AMIE consultations considerably higher than those from PCPs on 24 of 26 dimensions. For scenarios within their realm of competence, specialist physicians evaluated both the conversational quality and replies to the post-questionnaire. The findings indicate that AMIE was as effective as PCPs in extracting relevant data during simulated consultations and was more accurate than PCPs in forming a comprehensive differential diagnosis when given the same amount of data.

Overall, the study findings highlighted the potential of the AMIE conversational artificial intelligence system for clinical history-taking and diagnostic discourse. AMIE, which blends real-world and virtual medical conversations, scored higher than PCPs on multiple dimensions. The study, however, had limitations since clinicians were limited to unfamiliar synchronous text conversations. AMIE's success in simulated consultations is a huge step forward, but converting it into real-world tools requires more study to assure safety, dependability, fairness, efficacy, and privacy.

*Important notice: arXiv publishes preliminary scientific reports that are not peer-reviewed and, therefore, should not be regarded as conclusive, guide clinical practice/health-related behavior, or treated as established information.

Journal reference:
  • Preliminary scientific report. Tao Tu et al., Towards Conversational Diagnostic AI, arXiv:2401.05654, 2024, DOI: 10.48550/arXiv.2401.05654, https://arxiv.org/abs/2401.05654
Pooja Toshniwal Paharia

Written by

Pooja Toshniwal Paharia

Dr. based clinical-radiological diagnosis and management of oral lesions and conditions and associated maxillofacial disorders.

Citations

Please use one of the following formats to cite this article in your essay, paper or report:

  • APA

    Toshniwal Paharia, Pooja Toshniwal Paharia. (2024, January 16). AMIE AI system outshines doctors in simulated medical consultations. News-Medical. Retrieved on April 18, 2024 from https://www.news-medical.net/news/20240116/AMIE-AI-system-outshines-doctors-in-simulated-medical-consultations.aspx.

  • MLA

    Toshniwal Paharia, Pooja Toshniwal Paharia. "AMIE AI system outshines doctors in simulated medical consultations". News-Medical. 18 April 2024. <https://www.news-medical.net/news/20240116/AMIE-AI-system-outshines-doctors-in-simulated-medical-consultations.aspx>.

  • Chicago

    Toshniwal Paharia, Pooja Toshniwal Paharia. "AMIE AI system outshines doctors in simulated medical consultations". News-Medical. https://www.news-medical.net/news/20240116/AMIE-AI-system-outshines-doctors-in-simulated-medical-consultations.aspx. (accessed April 18, 2024).

  • Harvard

    Toshniwal Paharia, Pooja Toshniwal Paharia. 2024. AMIE AI system outshines doctors in simulated medical consultations. News-Medical, viewed 18 April 2024, https://www.news-medical.net/news/20240116/AMIE-AI-system-outshines-doctors-in-simulated-medical-consultations.aspx.

Comments

The opinions expressed here are the views of the writer and do not necessarily reflect the views and opinions of News Medical.
Post a new comment
Post

While we only use edited and approved content for Azthena answers, it may on occasions provide incorrect responses. Please confirm any data provided with the related suppliers or authors. We do not provide medical advice, if you search for medical information you must always consult a medical professional before acting on any information provided.

Your questions, but not your email details will be shared with OpenAI and retained for 30 days in accordance with their privacy principles.

Please do not ask questions that use sensitive or confidential information.

Read the full Terms & Conditions.

You might also like...
Researchers leverage machine-learning techniques to predict future risk of pressure injuries