AI diagnostic reasoning nears physician performance

Advanced reasoning-based AI systems are showing physician-level performance on select diagnostic tasks, but researchers warn that real-world safety, bias, and clinical accountability remain major barriers to healthcare deployment. 

Study: AI can reason like a physician - what comes next? Image Credit: Thandon88 / Shutterstock.com

A recent Perspective article published in Science explores whether advanced artificial intelligence (AI) systems are approaching physician-level reasoning, while considering the implications and safety of their integration into clinical practice.

Progress in AI and diagnostic reasoning

Large language models (LLMs) are AI algorithms trained on substantial amounts of data to learn patterns that are then used to generate human-like responses. Reasoning models add to these capabilities by evaluating possible approaches before generating a response, thereby mimicking structured cognitive processing.

Numerous studies have evaluated healthcare applications of LLMs, including their performance on medical licensing examinations and other relevant assessments. These evaluations often extend beyond standard tests to include simulated clinical scenarios such as diagnostic case vignettes, specialty-specific exams, and problem-solving tasks designed to approximate clinical decision-making processes.

Discussing findings from Brodeur et al., the authors note that GPT-4 by OpenAI has achieved exact or very close diagnostic accuracy in up to 73 % of cases, with the company’s first reasoning model, o1-preview, exceeding that performance at 88.6 % on clinicopathological cases.

Moreover, o1-preview achieved close or exact diagnostic accuracy in 67 % of emergency department (ED) cases at initial triage, exceeding that of two expert physicians in specific text-based diagnostic scenarios.

Since reasoning models were originally developed, their reasoning capabilities, deliberation times, and processing of multimodal inputs have significantly improved. Whereas o1-preview only accepted text inputs, recent models can increasingly process combinations of text, images, audio, and video to support more complex clinical assessments.

Read our interview with Dr Rahul Goyal to learn how AI is changing clinical decision-making in real-world healthcare settings

How AI is being integrated into clinical practice

It is important to emphasize that AI systems are not being proposed as replacements for physicians. Rather, research in this area considers LLMs and other advanced models as collaborative tools, with clinicians providing accountability, oversight, and contextual judgment.

However, the authors also note that some well-defined healthcare tasks may ultimately be performed more effectively by AI systems operating independently. AI applications in healthcare have the potential to significantly reduce the human and financial costs associated with diagnostic errors, delays, and limited access.

The Medical Holistic Evaluation of Language Models (Med-HELM) defines five healthcare domains for AI use, including administrative workflows, clinical note generation, clinical decision support, patient communication, and medical research assistance. Across these domains, AI has evolved to analyze patient records, monitor clinical encounters, and interact with predictive models, thereby minimizing delays, reducing diagnostic errors, and improving access to care.

Nevertheless, it remains unclear whether advanced AI models would operate more effectively for specific tasks or independently across healthcare. As clinicians increasingly integrate AI tools into their practice, with some already doing so without institutional oversight, randomized trials are urgently needed to establish how these models are improving real-world applications.

Requiring clinical certification of AI models has also been proposed to expand the role of AI in medicine while ensuring transparency and accountability. The proposed pathway would gradually advance AI systems from medical knowledge assistants to supervised clinical practice and, potentially, to broader autonomous responsibilities. The implementation of robust monitoring frameworks can complement these initiatives to further support the safety, efficiency, and cost of AI clinical decision support systems.

Despite these efforts, AI has had limited real-world success due to inadequate benchmark performance and unclear clinical benefits. Although newer multimodal systems can now integrate images, audio, and video, many medical AI evaluations remain focused on text-only tasks, which limits their ability to fully capture the complexity of clinical decision-making.

The authors also highlight concerns surrounding the rapid deployment of consumer-facing health AI systems. In one example, an independent evaluation found that a publicly available health-focused AI tool under-triaged more than half of emergency cases presented to it.

Beyond diagnostic accuracy, the perspective emphasizes that clinical AI systems must demonstrate real-world effectiveness, equity, safety, transparency, and accountability before they can be widely adopted. The authors also note that previous healthcare algorithms have exhibited racial bias and that biased AI systems can negatively affect clinician decision-making.

Without robust demonstrated effectiveness, equity, and safety, many AI systems will remain insufficient for clinical use.

Download your PDF copy by clicking here.

Journal reference:
Tarun Sai Lomte

Written by

Tarun Sai Lomte

Tarun is a writer based in Hyderabad, India. He has a Master’s degree in Biotechnology from the University of Hyderabad and is enthusiastic about scientific research. He enjoys reading research papers and literature reviews and is passionate about writing.

Citations

Please use one of the following formats to cite this article in your essay, paper or report:

  • APA

    Sai Lomte, Tarun. (2026, May 08). AI diagnostic reasoning nears physician performance. News-Medical. Retrieved on May 08, 2026 from https://www.news-medical.net/news/20260508/AI-diagnostic-reasoning-nears-physician-performance.aspx.

  • MLA

    Sai Lomte, Tarun. "AI diagnostic reasoning nears physician performance". News-Medical. 08 May 2026. <https://www.news-medical.net/news/20260508/AI-diagnostic-reasoning-nears-physician-performance.aspx>.

  • Chicago

    Sai Lomte, Tarun. "AI diagnostic reasoning nears physician performance". News-Medical. https://www.news-medical.net/news/20260508/AI-diagnostic-reasoning-nears-physician-performance.aspx. (accessed May 08, 2026).

  • Harvard

    Sai Lomte, Tarun. 2026. AI diagnostic reasoning nears physician performance. News-Medical, viewed 08 May 2026, https://www.news-medical.net/news/20260508/AI-diagnostic-reasoning-nears-physician-performance.aspx.

Comments

The opinions expressed here are the views of the writer and do not necessarily reflect the views and opinions of News Medical.
Post a new comment
Post

While we only use edited and approved content for Azthena answers, it may on occasions provide incorrect responses. Please confirm any data provided with the related suppliers or authors. We do not provide medical advice, if you search for medical information you must always consult a medical professional before acting on any information provided.

Your questions, but not your email details will be shared with OpenAI and retained for 30 days in accordance with their privacy principles.

Please do not ask questions that use sensitive or confidential information.

Read the full Terms & Conditions.

You might also like...
Why clinical proteomics faces a mass spectrometry vs. high-throughput profiling dilemma