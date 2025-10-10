Except for one key aspect, the setup is a familiar one in medicine: An expert diagnostician presents a particularly challenging case to a roomful of colleagues, carefully walking them through the patient's symptoms and initial test results. The physician explains her reasoning in detail as she breaks down the case and every possibility she considered, aided by a slide deck. At the end of the five-minute talk, she reveals her diagnosis and the next steps she would recommend.

The twist? This time, the physician in question is an artificial intelligence system called Dr. CaBot.

Researchers at Harvard Medical School are developing Dr. CaBot as a medical education tool. The system, which operates in both presentation and written formats, shows how it reasons through a case, offering what's called a differential diagnosis - a comprehensive list of possible conditions that explain what's going on - and narrowing down the possibilities until it reaches a final diagnosis.

Dr. CaBot's ability to spell out its "thought process" rather than focusing solely on reaching an accurate answer distinguishes it from other AI diagnostic tools. It is also one of only a few models designed to tackle more complex medical cases.

"We wanted to create an AI system that could generate a differential diagnosis and explain its detailed, nuanced reasoning at the level of an expert diagnostician," said Arjun (Raj) Manrai, assistant professor of biomedical informatics in the Blavatnik Institute at HMS. Manrai created Dr. CaBot with Thomas Buckley, a Harvard Kenneth C. Griffin School of Arts and Sciences doctoral student and a member of the Manrai lab.

Although the system is not yet ready for use in the clinic, Manrai and his team have been providing demonstrations of Dr. CaBot at Boston-area hospitals. Now, Dr. CaBot has a chance to prove itself by going head-to-head with an expert diagnostician in The New England Journal of Medicine's famed Case Records of the Massachusetts General Hospital, also known as clinicopathological conferences, or CPCs. It marks the first time the journal is publishing an AI-generated diagnosis.

The resulting medical case discussion, published Oct. 8 in NEJM, offers a window into Dr. CaBot's capabilities, showcasing its usefulness for medical educators and students - and hinting at its potential for physicians in the clinic. As the researchers continue to improve Dr. CaBot, they hope that it will serve as a useful model for other medical-AI teams around the world.

One hundred years of medical cases

The concept of CPCs dates back to the late 1800s, when physicians at Massachusetts General Hospital began using patient case studies for medical education. In 1900, Mass General pathologist Richard Cabot - for whom Dr. CaBot is named - formalized these as part of the curriculum for HMS doctors-in-training. Since 1923, NEJM has been continuously publishing the cases as CPCs to teach physicians how other physicians reason through complex cases.

The cases are pretty legendary. They're known to be extremely challenging, filled with distractions and red herrings." Arjun (Raj) Manrai, Assistant Professor, Biomedical Informatics, Blavatnik Institute, HMS

Each CPC consists of a detailed presentation of the case from the patient's doctors. Then, an expert not involved in the case is invited to give a presentation to colleagues at Mass General explaining their reasoning, step-by-step, and providing a differential diagnosis before homing in on the most likely possibility. After that, the patient's doctors reveal the actual diagnosis. The diagnostician's write-up is published in NEJM along with the case presentation.

The Oct. 8 NEJM article includes a typical case presentation along with a carefully reasoned differential diagnosis from expert diagnostician Gurpreet Dhaliwal of San Francisco Veterans Affairs Medical Center and the University of California, San Francisco, whom Manrai describes as "a real, modern Dr. House." After that, Dr. CaBot's differential diagnosis appears.

Manrai and Buckley were encouraged to see that although Dr. CaBot reasoned through the case differently than Dhaliwal, it reached a comparable final diagnosis.

From Dr. Cabot to Dr. CaBot

During graduate school, Manrai became fascinated by how CPCs demystify the process that physicians use to arrive at a diagnosis. They reminded him of the mystery novels he enjoyed growing up.

More recently, his lab and others have studied the accuracy of AI models for providing patient diagnoses. Manrai wondered whether it was possible to design a system that could go further.

The core of Dr. CaBot is OpenAI's o3 large language reasoning model. In building the system, Buckley, who is a Dunleavy Fellow in HMS' AI in Medicine track, needed to augment o3 with new abilities.

One is Dr. CaBot's ability to efficiently search millions of clinical abstracts from high-impact journals, which helps it properly cite its work and avoid factual hallucinations. Dr. CaBot can also search its "brain" of several thousand CPCs and use these examples to replicate the style of an expert diagnostician in NEJM. The team is working closely with clinician collaborators at Beth Israel Deaconess Medical Center and other Harvard-affiliated hospitals to continue refining the system.

Dr. CaBot delivers two main products.

The first is a roughly five-minute, narrated, slide-based video presentation of a case, in which the system explains how it reasoned through the possibilities to come to a diagnosis. The presentations are "surprisingly lifelike," Buckley said, complete with filler words like "um," "uh," and "you know" as well as colloquial phrases.

During the team's demonstrations, "the realness of the narrated presentation seems to connect with physicians," Manrai said.

The other is a detailed written version of Dr. CaBot's reasoning and diagnosis.

Taking Dr. CaBot on the road

The researchers are eager for physicians to engage with Dr. CaBot and provide expert feedback. To this end, they are planning more demonstrations at local hospitals, and they published a paper describing the system on a preprint server. They see the NEJM CPC as another opportunity for input.

"Dr. CaBot's AI-generated discussion has not been analyzed for correctness; any factual errors present have been retained so that the reader can observe the strengths and limitations of the system," the editor's note on the CPC reads, concluding, "whether AI has a legitimate use in clinical decision making is up to the reader to determine."

Dr. CaBot is also available online, where users can test the system on new cases for educational and research purposes, and review presentations and write-ups for 15 existing cases ranging from "A Newborn Girl With Skin Lesions" to "An 89-Year-Old Man With Progressive Dyspnea."

"We're really trying to stick our necks out," Manrai said. "There's great potential to be embarrassed, but you learn a lot by playing a video for actual clinicians for five minutes. We're getting so much feedback that way."

Although the primary use case for Dr. CaBot is as an educational tool, its ability to rapidly sift through millions of clinical abstracts could also make it a valuable research aid.

According to Manrai and Buckley, the tool would need further improvement, validation, and the addition of patient privacy protections before it could be considered for implementation in real-world settings. However, the team noted that physicians are already expressing interest.

The advantages of an AI system are that it is always available, doesn't get tired, isn't juggling responsibilities, and can quickly search vast quantities of medical literature, they said.

Manrai added that there's evidence physicians are using AI tools "in amounts that I think would surprise a lot of folks," including ChatGPT and a physician-specific platform called OpenEvidence.

"We're very nascent in human-AI collaboration," Manrai said, but the field is evolving rapidly. Eventually, Dr. CaBot might join the AI toolbox that physicians are already exploring as they determine how to best help their patients.