Doctors and nurses are better at triaging patients in emergency departments than artificial intelligence (AI), according to research presented at the European Emergency Medicine Congress today (Tuesday).
However, Dr. Renata Jukneviciene, a postdoctoral researcher at Vilnius University, Lithuania, who presented the study, said that AI could be useful when used in conjunction with clinical staff, but should not be used as a stand-alone triage tool.
"We conducted this study to address the growing issue of overcrowding in the emergency department and the escalating workload of nurses," said Dr Jukneviciene. "Given the rapid development of AI tools like ChatGPT, we aimed to explore whether AI could support triage decision-making, improve efficiency and reduce the burden on staff in emergency settings."
The researchers distributed a paper and digital questionnaire to six emergency medicine doctors and 51 nurses working in the emergency department of Vilnius University Hospital Santaros Klinikos. They asked them to triage clinical cases selected randomly from 110 reports cited in the PubMed database on the internet. The clinical staff were required to classify the patients according to urgency, placing them in one of five categories from most to least urgent, using the Manchester Triage System. The same cases were analysed by ChatGPT (version 3.5).
A total of 44 nurses (86.3%) and six doctors (100%) completed the questionnaire.
"Overall, AI underperformed compared to both nurses and doctors across most of the metrics we measured," said Dr Jukneviciene. "For example, AI's overall accuracy was 50.4%, compared to 65.5% for nurses and 70.6% for doctors. Sensitivity – how well it identified true urgent cases – for AI was also lower at 58.3% compared to nurses, who scored 73.8%, and doctors, who scored 83.0%."
Doctors had the highest scores in all the areas and categories of urgency that the researchers analyzed.
"However, AI did outperform nurses in the first triage category, which are the most urgent cases; it showed better accuracy and specificity, meaning that it identified the truly life-threatening cases. For accuracy, AI scored 27.3% compared to 9.3% for nurses, and for the specificity AI scored 27.8% versus 8.3%."
The distribution of cases across the five categories of urgency was as follows:
"These results suggest that while AI generally tends to over-triage, it may be somewhat more cautious in flagging critical cases, which can be both a strength and a drawback," said Dr Jukneviciene.
Doctors also performed better than AI when considering cases that required or involved surgery, and in cases that required treatment with medication or other non-invasive therapies. For surgical cases, doctors scored 68.4%, nurses scored 63.% and AI scored 39.5% for reliability. For therapeutic cases, doctors scored 65.9%, nurses scored 44.5% and AI did better than nurses, scoring 51.9% for reliability.
"While we anticipated that AI might not outperform experienced clinicians and nurses, we were surprised that in some areas AI performed quite well. In fact, in the most urgent triage category, it demonstrated higher accuracy than nurses. This indicates that AI should not replace clinical judgement, but could serve as a decision-support tool in specific clinical contexts and in overwhelmed emergency departments.
"AI may assist in prioritising the most urgent cases more consistently and in supporting new or less experienced staff. However, excessive triaging could lead to inefficiencies, so careful integration and human oversight are crucial. Hospitals should approach AI implementation with caution and focus on training staff to critically interpret AI suggestions," concluded Dr. Jukneviciene.
The researchers are planning follow-up studies using newer versions of AI and AI models that are fine-tuned for medical purposes. They want to test them in larger groups of participants, include ECG interpretation, and explore how AI can be integrated into nurse training, specifically for triage and incidents involving mass casualties.
Limitations of the study include its small numbers, that it took place in a single centre, and that the AI analysis took place outside a real-time hospital setting, so it was not possible to assess how it could be used in the daily workflow; nor was it possible to interact with patients, assess vital signs and have follow-up data. In addition, ChatGPT 3.5 was not trained specifically for medical use.
Strengths of the study were that it used real clinical cases for comparison by a multidisciplinary group of doctors and nurses, as well as AI; its accessibility and flexibility was increased by distributing the questionnaire digitally and on paper; it was clinically relevant to current healthcare challenges such as overcrowding and staff shortages in the emergency department; and the study identified that AI over-triages many patients, assigning higher urgency to them, which is crucial knowledge for the safe implementation of AI in emergency departments.
Dr Barbra Backus is chair of the EUSEM abstract selection committee. She is an emergency physician in Amsterdam, The Netherlands, and was not involved in the study. She said: "AI has the potential to be a useful tool for many aspects of medical care and it is already proving its worth in areas such as interpreting x-rays. However, it has its limitations, and this study shows very clearly that it cannot replace trained medical staff for triaging patients coming in to emergency departments. This does not mean it should not be used, as it could aid in speeding up decision-making. However, it needs to be applied with caution and with oversight from doctors and nurses. I expect AI will improve in the future, but should be tested at every stage of development."
On Monday 29 September, a colleague of Dr Jukneviciene's, assistant professor Rakesh Jalali, from the University of Warmia and Mazury (Olsztyn, Poland), gave a presentation at the congress on the use of virtual reality to train clinical staff how to treat patients who have been subject to multiple traumatic injuries.