In a recent study posted to the medRxiv* preprint server, researchers performed a comprehensive search for randomized controlled trials (RCTs) involving artificial intelligence (AI) algorithms published between 2018 and 2023 on PubMed and the International Clinical Trials Registry Platform (ICTRP).
Specifically, the current scoping review evaluated study endpoints, intervention features, and RCT outcomes to inform stakeholders about the clinical relevance of AI, which, in turn, might help improve care management and medical decision-making while identifying areas that require further work in this rapidly evolving research domain.
Study: Randomized Controlled Trials Evaluating AI in Clinical Practice: A Scoping Evaluation. Image Credit: metamorworks/Shutterstock.com
*Important notice: medRxiv publishes preliminary scientific reports that are not peer-reviewed and, therefore, should not be regarded as conclusive, guide clinical practice/health-related behavior, or treated as established information.
Background
The Food and Drug Administration (FDA) approved ~300 AI-enabled medical devices after several research studies reported these models performed superior to clinicians; however, only a few AI-enabled medical devices have undergone evaluation using prospective RCTs.
For instance, a widely used AI model, the sepsis model, was found to perform worse than was reported by its developer, resulting in multiple incorrect alerts.
When deployed prospectively, AI-based devices perform worse, and adopting AI in clinical practice could further diminish its potential benefits.
About the study
In the present study, researchers used keywords related to artificial intelligence, clinician, and clinical trial, to name a few, and identified RCTs published in English on PubMed and the ICTRP between January 1, 2018, and August 18, 2023, that met the following criteria:
i) used a non-linear computational model based on AI as an intervention;
ii) integrated AI-based intervention into clinical practice, such that it impacted patient health; and
iii) published as a full-text peer-reviewed article.
Two independent investigators used the Covidence Review software for the initial screening, followed by a full-text screening, while a third reviewer resolved discrepancies (if any) through discussion.
The team retrieved information regarding the study site, clinical task, results, and the type of AI used from all eligible RCTs.
Additionally, they categorized studies by their primary endpoints, e.g., care management, medical specialty, and AI-used data modality. Finally, they presented simple descriptive statistics to provide an overview of all the eligible trials.
The current study adhered to preferred reporting items for systematic reviews and meta-analyses (PRISMA) guidelines.
Results
A total of 84 RCTs constituted this study's analytical dataset, which revealed several notable trends with implications for the development of AI in real-world clinical settings.
Of these 84 studies, 71 and 13 were sourced through primary and reference screening, respectively.
Most RCTs were gastroenterology-related (35/84), followed by radiology, surgery, and cardiology, with 13, five, and five RCTs, respectively.
Four research groups from Wuhan University, Wision AI, Medtronic, and Fujifilm conducted most gastroenterology-related RCTs (24/35), which were notable for their uniformity and testing of video-based machine learning (ML) algorithms with help from clinicians.
The United States (US) led the way, followed by China, suggesting most RCTs were single-site studies. Indeed, there is a need for multi-center international trials to ensure tests of AI systems are valid across diverse populations and healthcare systems.
China predominantly conducted gastroenterology-related RCTs (19/24), while RCTs conducted in the US covered multiple medical specialties. Multi-center RCTs mainly involved European nations, while single-site RCTs evaluating an average of 359 patients were predominant (52/84) in the final study set.
Compared to success rates observed in historical reviews of RCTs for AI in healthcare, most RCTs evaluating AI-based medical devices in clinical practice fetched more positive outcomes for all primary endpoints evaluated (69/84).
Such a high success rate lends credibility to clinical AI; however, it is also possible that the nascency of the field and publication bias might have tempered these observations.
Furthermore, most RCTs evaluating interventions on diagnostic accuracy offering convincing prospective evidence of the performance of clinical AI might not be a precise representation of improved patient outcomes.
Thus, RCTs assessing AI algorithms in healthcare should focus on incorporating clinically meaningful endpoints, e.g., patient symptoms, survival, and treatment needs.
Conclusions
Overall, the existing RCTs on AI in clinical practice demonstrated an increasing interest in AI applications across wide-ranging medical specialties and locations.
However, given AI's limitations in the healthcare domain, further research focused on multi-center RCTs incorporating diverse clinically meaningful endpoints is needed.
*Important notice: medRxiv publishes preliminary scientific reports that are not peer-reviewed and, therefore, should not be regarded as conclusive, guide clinical practice/health-related behavior, or treated as established information.