A recent study published in The British Medical Journal tested whether artificial intelligence (AI) could pass the examination for the Fellowship of the Royal College of Radiologists (FRCR).
Radiologists in the United Kingdom (UK) must pass the FRCR examination before completing their training. Assuming that AI can pass the same test, it could replace radiologists. The final FRCR exam has three components, and candidates require a passing mark in each component to pass the exam overall.
In the rapid reporting component, candidates must analyze and interpret 30 radiographs in 35 minutes and correctly report at least 90% of these to pass this part of the exam. This session gauges candidates for accuracy and speed. There is an argument suggesting that AI would excel in accuracy, speed, radiographs, and binary outcomes. As such, the rapid reporting session of the FRCR exam can be an ideal setting to test the prowess of AI.
Study: Can artificial intelligence pass the Fellowship of the Royal College of Radiologists examination? Multi-reader diagnostic accuracy study. Image Credit: SquareMotion / Shutterstock
About the study
In the present study, researchers evaluated whether an AI candidate can pass the FRCR exam and outperform human radiologists taking the same examination. The authors used 10 FRCR mock examinations for analysis since the RCR denied sharing retired FRCR rapid reporting examination cases. The radiographs were selected, reflecting the same difficulty level as an actual exam.
Each mock exam comprised 30 radiographs, covering all body parts from adults and children; approximately half contained one pathology, and the remaining had no abnormalities. Previous successful FRCR candidates (radiologist readers) who passed the FRCR examination in the past 12 months were recruited via social media, word of mouth, and email.
Radiologist readers completed a short survey that captured information on demographics and previous FRCR exam attempts. Anonymized radiographs were provided via an online image-viewing platform (digital imaging and communications in medicine, DICOM). Radiologists were given one month (May 2022) to record their interpretations for ten mock examinations on an online sheet.
Radiologists provided ratings on 1) how representative the mock exams were relative to the actual FRCR examination, 2) their performance, and 3) how well they thought AI would have performed. Likewise, 300 anonymized radiographs were provided to the AI candidate called Smarturgences, developed by Milvue, a French AI company.
The AI tool was not certified to analyze abdominal and axial skeleton radiographs; still, it was provided with these radiographs for fairness across participants. The score for the AI tool was calculated in four ways. In the first scenario, only the AI-interpretable radiographs were scored, excluding non-interpretable radiographs. The non-interpretable radiographs were scored as normal, abnormal, and wrong in the second, third, and fourth scenarios.
In total, 26 radiologists, including 16 females, were recruited, and most participants were aged 31 – 40. Sixteen radiologists completed their FRCR exam in the past three months. Most participants cleared the FRCR exam on their first attempt. The AI tool would have passed two mock exams in the first scenario. In scenario 2, AI would have passed one mock examination.
In scenarios 3 and 4, the AI candidate would have failed the examination. The overall sensitivity, specificity, and accuracy for AI were 83.6%, 75.2%, and 79.5% in scenario 1. For radiologists, the summary estimates of sensitivity, specificity, and accuracy were 84.1%, 87.3%, and 84.8%, respectively. AI was the highest-performing candidate in one examination but ranked second to last overall.
Assuming strict scoring criteria best reflecting the actual examination, which was the case in scenario 4, AI's overall sensitivity, specificity, and accuracy stood at 75.2%, 62.3%, and 68.7%, respectively. In comparison, radiologists' summary estimates of sensitivity, specificity, and accuracy were 84%, 87.5%, and 85.2%, respectively.
No radiologist passed all mock examinations. The highest-ranked radiologist passed nine mock exams, while the three lowest-ranked radiologists passed only one. On average, radiologists could pass four mock examinations. The radiologists rated the mock examinations marginally more complex than the FRCR examination. They rated their performance 5.8 – 7.0 on a 10-point Likert-type scale and the performance of AI between 6 and 6.6.
The researchers say: “On this occasion, the artificial intelligence candidate was unable to pass any of the 10 mock examinations when marked against similarly strict criteria to its human counterparts, but it could pass two of the mock examinations if special dispensation was made by the RCR to exclude images that it had not been trained on.”
Of the 42 non-interpretable radiographs in the dataset, the AI candidate yielded a result for one, mislabeled as basal pneumothorax on a normal abdominal radiograph. More than half of the radiologists wrongly diagnosed 20 radiographs; of these, the AI tool incorrectly diagnosed 10 radiographs but correctly interpreted the remaining. Overall, almost all radiologists correctly analyzed 148 radiographs, 134 of which were also correctly interpreted by the AI candidate.
To summarize, AI passed two mock examinations when the special dispensation was provided, viz., exclusion of non-interpretable images. However, AI would pass none if dispensation was not granted. Although AI did not outperform radiologists, its accuracy remained high, given the complexity and case mix.
Moreover, AI ranked the highest in one mock exam outperforming three radiologists. Notably, AI correctly diagnosed half of the radiographs, which its human peers interpreted wrongly. Nonetheless, the AI candidate still requires more training to achieve performance and skills on the same levels as an average radiologist, especially for cases that are non-interpretable by the AI.