Can tomorrow’s doctors learn with AI without losing their critical thinking? A NEJM review offers DEFT-AI and new collaboration models to help educators harness AI while protecting clinical skills.

Review: Educational Strategies for Clinical Supervision of Artificial Intelligence Use. Image Credit: Antonio Marca / Shutterstock
In a recent review published in The New England Journal of Medicine, researchers elucidate the challenges of supervising early-career medical learners who use powerful large language models (LLMs) as education aids. Review findings highlight the dangers of "deskilling," where overreliance on AI erodes fundamental clinical reasoning skills, and "mis-skilling," where trainees adopt AI-generated errors, alongside "never-skilling" – the failure to develop essential competencies in the first place.
The review proposes a structured educational framework called "diagnosis, evidence, feedback, teaching, and recommendation for AI engagement (DEFT-AI)" to counter these potential AI demerits by scaffolding critical thinking. The review also introduces the "cyborg" and "centaur" models of human-AI collaboration, urging clinicians to adopt an adaptive practice where they learn to engage critically with AI-generated outputs rather than unquestioningly trusting them.
Background
Recent advancements in artificial intelligence (AI), particularly in computation and large language models (LLMs), are progressing at an astonishing rate. Large language models (LLMs) such as OpenAI's ChatGPT and Google's Gemini are increasingly being used in medical learning, raising both opportunities and risks for clinical reasoning. A growing body of literature suggests that AI tools are fundamentally reshaping medical learning and practice.
However, integrating AI into clinical practice presents unprecedented opportunities and significant risks for medical education. While rapid access to information and the ability to consolidate vast swaths of data into easily accessible summaries may become integral in future medical education and practice, LLMs are known to simulate human-like reasoning to create an "appearance of agency" – an effect where systems simulate reasoning and outputs appear to show agency when none actually exists. This can be extremely dangerous for inexperienced medical trainees.
Medical educators thus face a novel and urgent challenge – guiding and supervising trainees who might be more proficient at leveraging AI than the educators themselves, creating an 'inversion of expertise' where teachers become learners too. The present review highlights three specific hurdles ("deskilling," "never-skilling," and "mis-skilling") that must be overcome before AI can cement its role in ensuring a safer and healthier future.
About the Review
This review aims to address the urgent and critical need by conducting a comprehensive examination of the scientific literature that explores the challenges and opportunities presented by AI in medical education. It collates and synthesizes the outcomes of more than 70 prior publications across existing educational theory, cognitive science, and emerging research on human-AI interaction and uses these insights to develop novel conceptual frameworks for the clinical supervision of AI:
Diagnosis, evidence, feedback, teaching (DEFT), and recommendation for AI engagement (DEFT-AI) – An adapted framework for promoting critical thinking during educational conversations around AI use.
Cyborg vs. Centaur Models: A new typology to describe two distinct modalities of human-AI collaboration. These models are designed to help educators and learners adapt their use of AI to the specific clinical task and associated risk.
Review Findings
The review identifies and addresses several cognitive traps imposed on medical education by today’s AI age. "Cognitive offloading", the process of over-relying on AI for complex tasks like clinical reasoning, is highlighted for its link to "automation bias", subsequent over-reliance on the AI’s output, and a failure to catch its mistakes.
Alarmingly, cognitive offloading and automation bias are not just theoretical concerns. A study found that more than a third of advanced medical students failed to identify erroneous LLM answers to clinical scenarios. Another study reported a significant negative correlation between the frequent use of AI tools and critical thinking abilities, mediated by increased offloading, and this effect was especially pronounced among younger participants.
The review recommends addressing these concerns by developing and adopting the DEFT-AI framework, a structured approach for educators as a response to a trainee’s dependence on AI. It proposes leveraging a critical conversation that moves beyond the AI’s answer to probe the learner’s reasoning. Key questions include: "What prompts did you use?", "How did you verify the AI-generated output?", and "How did the AI’s suggestion influence or change your diagnostic approach?" Educators are also encouraged to teach evidence-based appraisal of AI outputs using Sackett’s framework (ask, acquire, appraise, apply, assess) and effective prompt engineering techniques, such as chain-of-thought reasoning.
The review further stresses that supervision must distinguish between evaluating the AI tool itself and evaluating its specific output. For example, institutional scorecards or model leaderboards may be used to judge tools, while evidence-based medicine appraisal steps should be applied to each individual output.
Finally, the review presents the "cyborg" and "centaur" modes of clinician-AI interaction. In centaur mode, tasks are strategically divided so that the clinician delegates low-risk, well-defined tasks (such as summarizing data or drafting communications) to the AI, while retaining complete control over high-stakes clinical judgment and decision-making. This mode is recommended when addressing complex or uncertain cases.
In contrast, the cyborg mode assumes that the clinician and AI co-construct a solution to the task at hand. This mode is efficient for low-risk, routine tasks but carries a higher risk of automation bias if not used with ongoing reflective oversight and justification.
The review also warns that performance heterogeneity and bias in LLMs can exacerbate health inequities. AI systems may underperform for certain populations, and uncritical adoption could widen disparities rather than close them.
Conclusions
The present review concludes that while the integration of AI into medicine and medical education is inevitable (and largely beneficial), its successful and safe adoption is not. It highlights that medical education must proactively address the risks of deskilling, never-skilling, and mis-skilling by fundamentally changing how clinical reasoning is taught, particularly against the backdrop of AI. Critical thinking remains foundational for "adaptive practice" – the ability to shift between efficient routines and innovative problem-solving when faced with the unpredictability of AI.
In summary, this review demonstrates that the ultimate goal is not to create doctors who are dependent on AI, but to cultivate clinicians who can skilfully and safely leverage it as a powerful tool to augment, but not replace, their own expertise through a "verify and trust" paradigm.
Journal reference:
- Abdulnour, R.-E. E., Gin, B., & Boscardin, C. K. (2025). Educational Strategies for Clinical Supervision of Artificial Intelligence Use. New England Journal of Medicine, 393(8), 786–797. DOI – 10.1056/nejmra2503232. https://www.nejm.org/doi/full/10.1056/NEJMra2503232