Google’s AI medical assistant shows doctor-level diagnostic reasoning in real clinic study

Download PDF Copy

By Hugo Francisco de SouzaReviewed by Susha Cheriyedath, M.Sc.Mar 12 2026

In a real-world urgent care trial with 100 patients, Google’s conversational AI system, AMIE, safely conducted pre-visit medical interviews and generated diagnostic insights comparable to those of physicians, offering an early glimpse of how AI assistants could transform everyday clinical workflows.

Study: A prospective clinical feasibility study of a conversational diagnostic AI in an ambulatory primary care clinic. Image Credit: khunkornStudio / Shutterstock

*Important notice: arXiv publishes preliminary scientific reports that are not peer-reviewed and, therefore, should not be regarded as conclusive, guide clinical practice/health-related behavior, or treated as established information.

In a recent study published on the arXiv preprint* server, researchers conducted a prospective feasibility study to evaluate the real-world performance of AMIE (Articulate Medical Intelligence Explorer) at conducting pre-visit history-taking with 100 adult patients at an urgent care clinic.

While the model was supervised in real time by a physician during patient–AI interactions and evaluated by clinicians at multiple stages of the study, the study found that the LLM-based conversational AI model performed safely with no predefined safety stops triggered during the interactions.

Furthermore, the model was reported to improve patients' attitudes toward medical AI and to generate differential diagnoses of comparable quality to those of human primary care providers when evaluated by blinded physician reviewers.

AI enters the clinic as healthcare systems face growing physician shortages

Despite significant technological advances in modern medicine, many of which are designed to reduce direct physician inputs, such as robotic-assisted surgeries, global healthcare systems are facing a growing shortage of primary care physicians.

Studies have found that discrepancies between physician and patient numbers are already leading to significantly increased workloads and unprecedented burnout rates among physicians.

To alleviate these structural strains, researchers increasingly seek to leverage modern computational advances and digital solutions, particularly large language models (LLMs).

In controlled pre-clinical environments, these sophisticated artificial intelligence (AI) algorithms have shown promise in engaging in nuanced clinical reasoning and simulating realistic patient interactions.

For example, previous laboratory tests using trained Objective Structured Clinical Examination (OSCE) actors showed that Google’s AMIE (Articulate Medical Intelligence Explorer) could gather patient histories and generate diagnostic reports comparable to those produced by human doctors.

Critiques, however, argue that real clinical practice is far messier than standardized simulations. Actual patients bring diverse communication styles, varying levels of health literacy, and unpredictable emotions such as anxiety, which are rarely represented in LLM training data.

Consequently, before these tools can be safely integrated into routine medical practice, their real-world performance must be carefully evaluated to ensure they can navigate unexpected clinical complexities without causing harm.

Today we announce results from a first-of-its-kind study with @BIDMC_Medicine on AMIE, our conversational AI for clinical reasoning. In a real-world clinical study, AMIE was found to be safe, feasible, and well-received by patients.
Learn more: https://t.co/ZenryGZssY pic.twitter.com/iMMeRaQkBb
- Google Research (@GoogleResearch) March 11, 2026

Researchers test AMIE in real urgent-care visits rather than simulated patient encounters.

The present study aimed to validate the safety and performance of AMIE in a live clinical workflow.

The study was designed as a prospective single-arm feasibility study conducted at Healthcare Associates, an ambulatory primary care practice within Beth Israel Deaconess Medical Center.

The study participants were 100 adult patients already scheduled for non-emergency urgent care visits.

Up to five days before their scheduled appointment, participants engaged in a secure text-based chat with AMIE.

The model’s performance was monitored during patient intake, during which AMIE gathered each patient’s medical history while dynamically adapting its questions based on suspected conditions and information gaps rather than relying on static questionnaires.

All patient–AI interactions were monitored in real time by a board-certified internal medicine physician via screen sharing.

Following the intake interaction, participants completed surveys assessing their experience.

A summary of the chat transcript, along with an automatically generated clinical summary and participants’ survey results, was forwarded to the clinician scheduled to see the patient ahead of the urgent care visit.

Finally, an independent panel of physicians performed a blinded chart review eight weeks later, comparing the accuracy and safety of management plans generated by both AMIE and human clinicians against the patient’s finalized clinical assessment documented in the medical record after the visit and follow-up.

AI safely handled patient history-taking and produced diagnoses comparable to clinicians

In the trial’s primary safety outcome, AMIE was judged safe under supervision. Physicians supervising the patient–AI interactions did not trigger a single safety stop across all 100 interactions, although minor clarifications were occasionally provided.

Interacting with the chatbot also significantly improved patient trust.

Survey scores on the General Attitudes toward AI Scale (GAAIS) shifted positively after the chat (p < 0.001) and remained elevated even after the patient saw their physician.

When evaluating AMIE’s clinical reasoning capabilities, blinded evaluators found no significant difference in the overall quality of differential diagnoses (p = 0.6) between AMIE and human clinicians.

Furthermore, the appropriateness (p = 0.1) and safety (p = 1.0) of the AI’s proposed management plans were comparable to those of human clinicians in blinded evaluations of standardized case summaries.

However, human clinicians significantly outperformed AMIE in designing management plans that were both practical (p = 0.003) and cost-effective (p = 0.004).

These differences likely reflect clinicians’ greater access to contextual patient information and real-world healthcare constraints, including access to longitudinal medical records and workflow considerations that were not fully available to the AI during the study.

Early results position conversational AI as a supervised clinical assistant

The study demonstrates that a conversational diagnostic AI system can safely and effectively gather clinical histories from real patients in a busy primary care clinic when used within a supervised research setting.

While AI is not yet ready to practice medicine autonomously, these findings support its emerging role as a collaborative clinical tool and physician assistant. The results highlight the need for larger multi-site studies to confirm safety, effectiveness, and generalizability across diverse patient populations.

Source:

Exploring the feasibility of conversational diagnostic AI in a real-world clinical study - https://research.google/blog/exploring-the-feasibility-of-conversational-diagnostic-ai-in-a-real-world-clinical-study/

Journal reference:

Preliminary scientific report. Brodeur, P., et al. (2026). A prospective clinical feasibility study of a conversational diagnostic AI in an ambulatory primary care clinic (Version 2). arXiv. DOI, 10.48550/ARXIV.2603.08448, https://arxiv.org/abs/2603.08448v2

Posted in: Device / Technology News | Medical Science News | Medical Research News

Comments (0)

Written by

Hugo Francisco de Souza

Hugo Francisco de Souza is a scientific writer based in Bangalore, Karnataka, India. His academic passions lie in biogeography, evolutionary biology, and herpetology. He is currently pursuing his Ph.D. from the Centre for Ecological Sciences, Indian Institute of Science, where he studies the origins, dispersal, and speciation of wetland-associated snakes. Hugo has received, amongst others, the DST-INSPIRE fellowship for his doctoral research and the Gold Medal from Pondicherry University for academic excellence during his Masters. His research has been published in high-impact peer-reviewed journals, including PLOS Neglected Tropical Diseases and Systematic Biology. When not working or writing, Hugo can be found consuming copious amounts of anime and manga, composing and making music with his bass guitar, shredding trails on his MTB, playing video games (he prefers the term ‘gaming’), or tinkering with all things tech.

Download PDF Copy

Citations

Please use one of the following formats to cite this article in your essay, paper or report:

APA
Francisco de Souza, Hugo. (2026, March 12). Google’s AI medical assistant shows doctor-level diagnostic reasoning in real clinic study. News-Medical. Retrieved on April 27, 2026 from https://www.news-medical.net/news/20260312/Googlee28099s-AI-medical-assistant-shows-doctor-level-diagnostic-reasoning-in-real-clinic-study.aspx.
MLA
Francisco de Souza, Hugo. "Google’s AI medical assistant shows doctor-level diagnostic reasoning in real clinic study". News-Medical. 27 April 2026. <https://www.news-medical.net/news/20260312/Googlee28099s-AI-medical-assistant-shows-doctor-level-diagnostic-reasoning-in-real-clinic-study.aspx>.
Chicago
Francisco de Souza, Hugo. "Google’s AI medical assistant shows doctor-level diagnostic reasoning in real clinic study". News-Medical. https://www.news-medical.net/news/20260312/Googlee28099s-AI-medical-assistant-shows-doctor-level-diagnostic-reasoning-in-real-clinic-study.aspx. (accessed April 27, 2026).
Harvard
Francisco de Souza, Hugo. 2026. Google’s AI medical assistant shows doctor-level diagnostic reasoning in real clinic study. News-Medical, viewed 27 April 2026, https://www.news-medical.net/news/20260312/Googlee28099s-AI-medical-assistant-shows-doctor-level-diagnostic-reasoning-in-real-clinic-study.aspx.