New research at ACR Convergence 2023, the American College of Rheumatology's (ACR) annual meeting, shows that a deep learning system could accurately identify and predict joint space narrowing and erosions in hand radiographs of patients with rheumatoid arthritis (RA) (Abstract #0745).
Radiographs are the most commonly used imaging technique for detecting and monitoring RA in the hand. Radiologists frequently use the well-validated Sharp/van der Heidje (SvH) method to evaluate joint space narrowing and erosions by grading specific locations in each hand and wrist. However, SvH scoring is time-consuming and requires expertise that isn't always available. This has led to an increased use of deep learning (also called machine learning) to analyze hand X-ray data in RA.
According to Carol Hitchon, M.D., FRCPC, MSc, an associate professor at the University of Manitoba and a clinician scientist in rheumatology and lead co-author of the study, "Machine learning offers a powerful and complementary approach to traditional RA detection and diagnosis methods. It enhances the accuracy, efficiency, and objectivity of RA radiograph assessment, while providing the potential for early damage detection and valuable insights into the disease."
For the current study, Hitchon and colleagues aimed to develop and validate a deep learning system for the automated detection of joints and prediction of SvH scores in hand X-rays of patients with RA.
They used a convolutional neural network (CNN)-based algorithm called You Only Look Once (YOLO). CNN is a deep learning neural network often used in computer vision and recognition tasks that has been successfully used in medical image classification. YOLO is a type of CNN model specifically designed for real-time object detection in images and videos and known for speed and efficiency in image processing. Hitchon and colleagues used a recent version of YOLOv516, which they have shown is more than 90% accurate at detecting hand joints.
The YOLO model was trained to detect joints in 240 training and 89 evaluation pediatric hand radiographs from the Radiologic Society of North America database.
The researchers boxed and labeled the various joints of interest: proximal interphalangeal, metacarpophalangeal, wrist, distal radius, and distal ulna. The joint detection model was validated with 54 clinician-labeled X-rays from four adult RA patients who had been followed for more than a decade.
Researchers then applied a vision transformer model (VTM) to predict each joint's erosion and joint space narrowing score. Hitchon explains that a VTM is a deep learning architecture designed to efficiently process and understand sequences of data.
It works by splitting an image into small patches, transforming or flattening the patches into a sequence, making low dimensional linear embeddings from the flattened patches, adding the positional embeddings, then feeding the encoded sequence into a standard transformer encoder for the remaining prediction task."
Carol Hitchon, M.D., FRCPC, MSc, Associate Professor, University of Manitoba
The VTM was validated using more than 2,200 hand X-rays from 381 RA patients who had physician assigned SvH scores. Patients were drawn from the Canadian Early Arthritis Cohort, a multicenter Canadian research study. These scored radiographs were used as the gold standard for this study.
The joint detection model was trained to detect the entire wrist, but the researchers had SvH scores for individual wrist joints, so they trained a separate model to detect joint space narrowing and erosion in each joint.
When they evaluated the accuracy of their models, they found:
- The joint detection model accurately identified target joints. The pediatric data F1 score was 0.991, and the adult data F1 score was 0.812. (In machine learning, the F1 score is a metric that measures a model's accuracy).
- VTM predictions for joint space narrowing and erosion were highly accurate. The root main squared error, which evaluates the accuracy of predictions, was 0.91 and 0.93, respectively.
- The multi-task models predicted SvH erosion and joint space narrowing scores of individual wrist joints with moderate accuracy (0.6 to 0.91)
Hitchon says they were not surprised by their model's performance.
"The AI technologies we applied to this study have been successfully and widely used in other domains, some of which have been commercialized. Compared with the performance of the model in other domains, our performance is relatively low in predicting radiograph scoring for some joint types, such as the wrist. [This] may be due to the relatively small sample size in our study or the complexity of wrist joint anatomy," she notes.
Hitchon also says the model performance does not match that of human radiologists for joints like the wrist.
"The AI models cannot replace human radiologists at this stage, but they will be excellent complementary tools that can enhance the overall quality and efficiency of radiograph scoring analysis when used in conjunction with radiologist judgment. In addition, [these models] may be applicable to the interpretation of large volumes of X-rays in clinical trials."
The study has two main limitations: Radiographs were obtained from cohorts composed almost entirely of white women and the findings may not apply to races and ethnicities traditionally under-represented in research studies. Hitchon acknowledges that the findings need to be replicated in other groups. The model also does not have the ability to learn and become more accurate with subsequent images, although Hitchon says they are developing a new deep learning framework so that the model will have continual learning ability when new data are available.
This study received local funding from the Health Science Centre Foundation, a hospital-based charity in Winnipeg, Manitoba, Canada. One of the co-authors, Pingzhao Hu, is supported by the Canada Research Chair Program. The Canadian Early Arthritis Cohort, which provided one set of radiographs, is funded by multiple sources.