AI outperforms doctors in summarizing health records, study shows

Download PDF Copy

By Dr. Chinta SidharthanReviewed by Susha Cheriyedath, M.Sc.Feb 28 2024

In a recent study published in the journal Nature Medicine, an international team of scientists identified the best large language models and adaptation methods for clinically summarizing large amounts of electronic health record data and compared the performance of these models to that of medical experts.

Study: Adapted large language models can outperform medical experts in clinical text summarization. Image Credit: takasu / Shutterstock

Background

A laborious but essential aspect of medical practice is the documentation of patient medical health records containing progress reports, diagnostic tests, and treatment history across specialists. Clinicians often spend a substantial portion of their time compiling vast amounts of textual data, and even with very experienced physicians, this process presents a possibility of introducing errors, which can translate to serious medical and diagnostic problems.

The transition from paper records to electronic health records only seems to have expanded the workload of clinical documentation, and reports suggest that clinicians spend approximately two hours each documenting the clinical data from their interactions with one patient. Nurses spend close to 60% of their time in clinical documentation, and the temporal demands of this process often result in considerable stress and burnout, decreasing job satisfaction among clinicians and eventually resulting in worse patient outcomes.

Although large language models present an excellent option for the summarization of clinical data, and these models have been evaluated for general natural language processing tasks, their efficiency and accuracy in summarizing clinical data have not been evaluated extensively.

About the study

In the present study, the researchers evaluated eight large language models across four clinical summarization tasks, namely, patient questions, radiology reports, dialogue between doctor and patient, and progress notes.

They first used quantitative natural language processing metrics to determine which model and adaptation method performed the best across the four summarization tasks. Ten physicians then conducted a clinical reader study where they compared the best summaries from the large language models with those from medical experts along parameters such as conciseness, correctness, and completeness.

Finally, the researchers assessed the safety aspects to determine the challenges, such as the fabrication of information and the potential for medical harm present in the summarization of clinical data by medical experts and large language models.

Two broad language-generation approaches — autoregressive and seq2seq models — were used to evaluate the eight large language models. Training seq2seq models requires paired datasets as they use an encoder-decoder architecture that maps the input to the output. These models perform efficiently in tasks involving summarization and machine translation.

On the other hand, autoregressive models do not require paired datasets, and these models are suitable for tasks such as dialogue and question-answer interactions and text generation. The study evaluated open-sourced autoregressive and seq2seq large language models, as well as some proprietary autoregressive models and two techniques for adapting the general-purpose, pre-trained large language models to perform domain-specific tasks.

The four areas of tasks used to evaluate the large language models consisted of summarization of radiology reports using detailed data of radiology analyses and results, summarization of questions from patients into condensed queries, using progress notes to produce a list of medical problems and diagnoses, and summarizing interactions between the doctor and patient into a paragraph on the assessment and plan.

Results

The results showed that 45% of the summaries from the best-adapted large language models were equivalent to and 36% of them were superior to those from medical experts. Furthermore, in the clinical reader study, the large language model summaries scored higher than the medical expert summaries across all three parameters of conciseness, correctness, and completeness.

Furthermore, the scientists found that ‘prompt engineering’ or the process of tuning or modifying the input prompts greatly improved the performance of the model. This was apparent, especially along the conciseness parameter, where specific prompts instructing the model to summarize patient questions into queries of specific word counts were helpful in meaningfully condensing the information.

Radiology reports were the one aspect where the conciseness of the large language model summaries was lower than that of medical experts, and the scientists predicted that this could be due to the vagueness of the input prompt since the prompts for summarizing the radiology reports did not specify the word limit. However, they also believe that incorporating checks from other large language models or model ensembles, as well as from human operators, can greatly improve the accuracy of this process.

Conclusions

Overall, the study found that using large language models to summarize data on patient health records performed as well or better than the summarization of data by medical experts. Most of these large language models scored higher than human operators in the natural language processing metrics, concisely, correctly, and completely summarizing the data. This process can potentially be implemented with further modifications and improvements to help clinicians save valuable time and improve patient care.

Journal reference:

Veen, V., Uden, V., Blankemeier, L., Delbrouck, J., Aali, A., Bluethgen, C., Pareek, A., Polacin, M., Reis, E. P., Seehofnerová, A., Rohatgi, N., Hosamani, P., Collins, W., Ahuja, N., Langlotz, C. P., Hom, J., Gatidis, S., Pauly, J., & Chaudhari, A. S. (2024). Adapted large language models can outperform medical experts in clinical text summarization. Nature Medicine. DOI: 10.1038/s41591024028555, https://www.nature.com/articles/s41591-024-02855-5

Posted in: Device / Technology News | Medical Research News | Healthcare News

Comments (0)

Written by

Dr. Chinta Sidharthan

Chinta Sidharthan is a writer based in Bangalore, India. Her academic background is in evolutionary biology and genetics, and she has extensive experience in scientific research, teaching, science writing, and herpetology. Chinta holds a Ph.D. in evolutionary biology from the Indian Institute of Science and is passionate about science education, writing, animals, wildlife, and conservation. For her doctoral research, she explored the origins and diversification of blindsnakes in India, as a part of which she did extensive fieldwork in the jungles of southern India. She has received the Canadian Governor General’s bronze medal and Bangalore University gold medal for academic excellence and published her research in high-impact journals.

Download PDF Copy

Citations

Please use one of the following formats to cite this article in your essay, paper or report:

APA
Sidharthan, Chinta. (2024, February 28). AI outperforms doctors in summarizing health records, study shows. News-Medical. Retrieved on July 02, 2025 from https://www.news-medical.net/news/20240228/AI-outperforms-doctors-in-summarizing-health-records-study-shows.aspx.
MLA
Sidharthan, Chinta. "AI outperforms doctors in summarizing health records, study shows". News-Medical. 02 July 2025. <https://www.news-medical.net/news/20240228/AI-outperforms-doctors-in-summarizing-health-records-study-shows.aspx>.
Chicago
Sidharthan, Chinta. "AI outperforms doctors in summarizing health records, study shows". News-Medical. https://www.news-medical.net/news/20240228/AI-outperforms-doctors-in-summarizing-health-records-study-shows.aspx. (accessed July 02, 2025).
Harvard
Sidharthan, Chinta. 2024. AI outperforms doctors in summarizing health records, study shows. News-Medical, viewed 02 July 2025, https://www.news-medical.net/news/20240228/AI-outperforms-doctors-in-summarizing-health-records-study-shows.aspx.