Medical AI can repeat false claims in clinical contexts

Medical artificial intelligence (AI) is often described as a way to make patient care safer by helping clinicians manage information. A new study by the Icahn School of Medicine at Mount Sinai and collaborators confronts a critical vulnerability: when a medical lie enters the system, can AI pass it on as if it were true?

Analyzing more than a million prompts across nine leading language models, the researchers found that these systems can repeat false medical claims when they appear in realistic hospital notes or social-media health discussions.

The findings, published in the February 9 online issue of The Lancet Digital Health [10.1016/j.landig.2025.100949], suggest that current safeguards do not reliably distinguish fact from fabrication once a claim is wrapped in familiar clinical or social-media language.

To test this systematically, the team exposed the models to three types of content: real hospital discharge summaries from the Medical Information Mart for Intensive Care (MIMIC) database with a single fabricated recommendation added; common health myths collected from Reddit; and 300 short clinical scenarios written and validated by physicians. Each case was presented in multiple versions, from neutral wording to emotionally charged or leading phrasing similar to what circulates on social platforms.

In one example, a discharge note falsely advised patients with esophagitis-related bleeding to "drink cold milk to soothe the symptoms." Several models accepted the statement rather than flagging it as unsafe. They treated it like ordinary medical guidance.

Our findings show that current AI systems can treat confident medical language as true by default, even when it's clearly wrong. A fabricated recommendation in a discharge note can slip through. It can be repeated as if it were standard care. For these models, what matters is less whether a claim is correct than how it is written."

Eyal Klang, MD, co-senior and co-corresponding author, Chief of Generative AI in the Windreich Department of Artificial Intelligence and Human Health at the Icahn School of Medicine at Mount Sinai

The authors say the next step is to treat "can this system pass on a lie?" as a measurable property, using large-scale stress tests and external evidence checks before AI is built into clinical tools.

"Hospitals and developers can use our dataset as a stress test for medical AI," says physician-scientist and first author Mahmud Omar, MD, who consults with the research team. "Instead of assuming a model is safe, you can measure how often it passes on a lie, and whether that number falls in the next generation."

"AI has the potential to be a real help for clinicians and patients, offering faster insights and support," says co-senior and co-corresponding author Girish N. Nadkarni, MD, MPH, Chair of the Windreich Department of Artificial Intelligence and Human Health, Director of the Hasso Plattner Institute for Digital Health, Irene and Dr. Arthur M. Fishberg Professor of Medicine at the Icahn School of Medicine at Mount Sinai, and Chief AI Officer of the Mount Sinai Health System. "But it needs built-in safeguards that check medical claims before they are presented as fact. Our study shows where these systems can still pass on false information, and points to ways we can strengthen them before they are embedded in care."

The paper is titled "Mapping LLM Susceptibility to Medical Misinformation Across Clinical Notes and Social Media." 

The study's authors, as listed in the journal, are Mahmud Omar, Vera Sorin, Lothar H Wieler, Alexander W Charney, Patricia Kovatch, Carol R Horowitz, Panagiotis Korfiatis, Benjamin S Glicksberg, Robert Freeman, Girish N Nadkarni, and Eyal Klang.

This work was supported by the Clinical and Translational Science Awards (CTSA) grant UL1TR004419 from the National Center for Advancing Translational Sciences. Research reported in this publication was also supported by the Office of Research Infrastructure of the National Institutes of Health under award number S10OD026880 and S10OD030463.

Source:
Journal reference:

Omar, M., et al. (2026) Mapping the susceptibility of large language models to medical misinformation across clinical notes and social media: a cross-sectional benchmarking analysis. The Lancet Digital Health. DOI: 10.1016/j.landig.2025.100949. https://www.thelancet.com/journals/landig/article/PIIS2589-7500(25)00131-1/fulltext

Comments

The opinions expressed here are the views of the writer and do not necessarily reflect the views and opinions of News Medical.
Post a new comment
Post

While we only use edited and approved content for Azthena answers, it may on occasions provide incorrect responses. Please confirm any data provided with the related suppliers or authors. We do not provide medical advice, if you search for medical information you must always consult a medical professional before acting on any information provided.

Your questions, but not your email details will be shared with OpenAI and retained for 30 days in accordance with their privacy principles.

Please do not ask questions that use sensitive or confidential information.

Read the full Terms & Conditions.

You might also like...
Social media support may ease anxiety in young adults