Large language model outperforms human doctors in clinical reasoning tasks

A cutting-edge large language model (LLM) outperformed human doctors in common clinical reasoning tasks including emergency room decisions, identifying likely diagnoses, and choosing next steps in management, according to a new study that used real emergency department data. The authors of the study – one of the largest studies to date to compare artificial intelligence and physicians on a wide array of clinical reasoning tasks – are clear that their results do not mean AI systems are ready to practice medicine on their own, or that doctors can be removed from the diagnostic process. The results do, however, raise urgent questions about the future evaluation and implementation of artificial intelligence (AI) tools in clinical care.

For more than 65 years, difficult clinical diagnostic cases have been the gold standard for evaluating medical computing systems. Most recently, LLMs have surpassed earlier computational approaches on these complex cases. However, despite this progress, most medical studies of LLMs have examined narrow or highly controlled scenarios and often lacked direct comparison to the performance of human physicians in real-world clinical reasoning tasks. The rapid advancement of LLM-based medical tools now necessitates more rigorous evaluation.

Here, Peter Brodeur and colleagues comprehensively evaluated the diagnostic and treatment-planning abilities of an advanced LLM – the OpenAI o1 series – by comparing its performance to that of hundreds of physicians and earlier AI systems, across a range of clinical reasoning tasks. These included both standardized clinical cases and a real-world study involving randomly selected emergency room patients at a major emergency medical center in Massachusetts. Brodeur et al. found that, across all six experiments, the LLM model consistently matched or exceeded human performance in diagnostic and management reasoning. Notably, its advantage was most pronounced in early-stage emergency department triage, where clinicians must make rapid decisions with minimal information. While both humans and AI improved as more clinical data became available, the model demonstrated a distinct strength under conditions of uncertainty, using even fragmented, unstructured health record data effectively.

According to the authors, LLMs are rapidly approaching, and in some areas surpassing, human-level clinical reasoning, and although AI-assisted decision-making is often viewed as risky, the findings suggest such tools – when used in collaboration with physicians' assessments – could reduce diagnostic errors, delays, and disparities in access to care. However, the authors also note several important limitations of the study. For example, its focus was confined to text-based reasoning, whereas clinical practice depends heavily on visual and auditory cues, areas where current AI remains less capable. "Accuracy on a defined task is only one dimension of deployment readiness. Clinical AI must also deliver equitable, cost-effective, and safe outcomes, supported by accountability, transparency, and ongoing monitoring," write Ashley Hopkins and Erik Cornelisse in a related Perspective. "Without robust demonstrated effectiveness, equity, and safety, many AI systems will remain insufficient for clinical use."

Comments

The opinions expressed here are the views of the writer and do not necessarily reflect the views and opinions of News Medical.
Post a new comment
Post

While we only use edited and approved content for Azthena answers, it may on occasions provide incorrect responses. Please confirm any data provided with the related suppliers or authors. We do not provide medical advice, if you search for medical information you must always consult a medical professional before acting on any information provided.

Your questions, but not your email details will be shared with OpenAI and retained for 30 days in accordance with their privacy principles.

Please do not ask questions that use sensitive or confidential information.

Read the full Terms & Conditions.

You might also like...
Antiviral drugs and shingles vaccines tied to lower dementia risk