Advanced AI models outperform pediatricians in diagnosing rare diseases

Artificial intelligence (AI) is increasingly being explored as a tool to support clinical decision-making, yet its real-world performance in pediatric diagnosis remains unclear. Now, a Pediatric Investigation study using authentic clinical cases reports that advanced AI models outperform clinicians in diagnostic accuracy, particularly for rare diseases, while a combined human-AI approach achieves the highest overall success. The findings highlight the potential of AI as a complementary tool to improve diagnostic precision and patient outcomes.

Accurate diagnosis in pediatric care can be particularly challenging, especially when rare diseases present with subtle or overlapping symptoms. Early uncertainty in diagnosis may delay treatment and increase the risk of complications. While artificial intelligence (AI) has shown potential in healthcare, most previous studies have relied on simplified or curated cases rather than real-world clinical data. This leaves an important gap in understanding how large language models perform in everyday clinical settings, where decisions are often made with limited information.

Against this backdrop, a team of researchers led by Dr. Cristian Launes from Hospital Sant Joan de Déu in Barcelona, Spain, evaluated the performance of AI models using real pediatric clinical cases. The study, published in the journal Pediatric Investigation on [25/March/2026], compared four advanced language models with 78 pediatric clinicians across 50 cases, including both common conditions and rare diseases.

Dr. Launes is a Clinical Professor and pediatrician at Hospital Sant Joan de Déu, Barcelona, specializing in pediatric infectious diseases, particularly respiratory viruses. His expertise includes respiratory viral infections, pediatric epidemiology, and infectious disease research.

To reflect real clinical practice, the researchers used patient summaries based on the first 72 hours of presentation. Each case was assessed multiple times to examine both diagnostic accuracy and consistency. Performance was evaluated based on whether the correct diagnosis appeared as the top prediction or within the top five suggestions.

The results showed that the most advanced AI models achieved higher diagnostic accuracy than clinicians overall. This advantage was particularly evident in rare disease cases, where AI systems were more likely to identify correct diagnoses that clinicians initially missed. However, clinicians demonstrated strengths in certain complex or context-dependent scenarios, highlighting differences in how humans and AI approach diagnostic reasoning.

Importantly, the study did not evaluate a real-time, interactive "human-plus-AI" diagnostic workflow. Instead, the researchers estimated potential complementarity using a prespecified "union" approach, asking whether the correct diagnosis appeared in the Top-5 list of either clinicians or model runs. Under this estimate, the best-performing pairing reached 94.3% Top-5 union accuracy, suggesting that clinicians and AI may contribute different correct hypotheses in difficult cases, particularly for rare diseases. "Our results suggest that AI can be evaluated as a clinician-supervised second opinion, especially in difficult cases where rare diseases are involved," said Dr. Launes. "Rather than replacing clinicians, these tools may help broaden the differential diagnosis and reduce the likelihood of missed diagnoses - as long as outputs are interpreted critically and within robust oversight frameworks."

From a governance perspective, medical diagnostic decision-support systems are generally considered high-risk applications under the European Union AI Act. This classification implies expectations around risk management, data governance, transparency, human oversight, and cybersecurity. The authors emphasize that any clinical use should remain advisory, with clear accountability, monitoring, and safeguards to address variability and the risk of misleading outputs.

The researchers also observed that additional clinical information improved diagnostic performance for both groups. When more detailed data, such as laboratory or imaging results, were included, accuracy increased. This finding underscores the importance of continuous clinical assessment and suggests that AI systems may be most effective when integrated into evolving, information-rich workflows.

The interaction between data quality and diagnostic performance is critical. AI systems perform best when they are part of a continuous clinical process, where clinicians iteratively gather, verify, and curate the evolving clinical picture to feed the model, with ongoing reassessment and human oversight - not a one-time input-output tool."

Dr. Cristian Launes from Hospital Sant Joan de Déu, Barcelona

These findings highlight the potential of AI-assisted tools to support earlier and more accurate diagnosis, particularly for rare diseases where expertise may be limited. In the longer term, integrating AI into clinical workflows could enable more collaborative and data-driven decision-making, while also encouraging closer collaboration between clinicians, engineers, and policymakers.

Overall, this study demonstrates that advanced AI models can outperform clinicians in certain pediatric diagnostic tasks, particularly for rare conditions, while achieving the greatest benefit when used alongside human expertise. Although challenges such as variability in responses and the need for appropriate oversight remain, the findings point to a promising role for AI as a supportive tool in pediatric healthcare.

Source:
Journal reference:

Launes, C., et al. (2026). Large‐language‐models for pediatric diagnosis: Performance evaluation using real‐world clinical notes from common and rare cases. Pediatric Investigation. DOI: 10.1002/ped4.70053. https://onlinelibrary.wiley.com/doi/10.1002/ped4.70053

Comments

The opinions expressed here are the views of the writer and do not necessarily reflect the views and opinions of News Medical.

While we only use edited and approved content for Azthena answers, it may on occasions provide incorrect responses. Please confirm any data provided with the related suppliers or authors. We do not provide medical advice, if you search for medical information you must always consult a medical professional before acting on any information provided.

Your questions, but not your email details will be shared with OpenAI and retained for 30 days in accordance with their privacy principles.

Please do not ask questions that use sensitive or confidential information.

Read the full Terms & Conditions.

You might also like...
Why do the deadliest cancers still get less NIH research funding?