AI in healthcare shows promise in trials but needs real-world testing to ensure effectiveness

NewsGuard 100/100 Score

In a recent study published in the journal The Lancet Digital Healthscientists in the United States evaluated the efficacy and challenges of artificial intelligence (AI) in clinical practice by analyzing randomized controlled trials, emphasizing the need for more diverse and comprehensive research approaches.

Review: Randomised controlled trials evaluating artificial intelligence in clinical practice: a scoping review. Image Credit: Kundra / ShutterstockReview: Randomised controlled trials evaluating artificial intelligence in clinical practice: a scoping review. Image Credit: Kundra / Shutterstock


AI's role in healthcare has significantly expanded in the last five years, showing potential to match or exceed clinician performance in various specialties. However, most AI models have undergone retrospective rather than real-world testing. Out of nearly 300 AI-enabled medical devices approved by the United States (US) Food and Drug Administration (FDA), only a few have been evaluated through prospective randomized controlled trials (RCTs). This gap in real-world testing highlights concerns about AI's reliability and effectiveness, with issues like alert fatigue from faulty AI predictions, as demonstrated by a sepsis model. Further research is needed to validate AI's real-world efficacy, address biases, and ensure its safe, equitable, and effective integration into clinical practice.

About the study 

From January 1, 2018, to November 14, 2023, a systematic search was conducted across databases such as SCOPUS, PubMed, CENTRAL, and the International Clinical Trials Registry Platform, targeting the rise of modern AI in clinical trials. Search terms included "artificial intelligence," "clinician," and "clinical trial," with further studies identified through a manual review of relevant publication references.

The inclusion criteria were specific to RCTs utilizing significant AI components, defined as non-linear computational models like decision trees or neural networks, which must integrate into clinical practice and influence patient management. Exclusions included studies using linear models, secondary studies, abstracts, and non-integrated interventions. This methodology follows Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines for scoping reviews and is registered with the International Prospective Register of Systematic Reviews (PROSPERO).

The publications were initially screened using the Covidence Review software, focusing on titles and abstracts. Two independent investigators performed the screening, with subsequent full-text reviews. Data extraction was completed in Google Sheets by one investigator and verified by another, with any disagreements resolved by a third. Information was collected on study location, participant characteristics, clinical tasks, primary endpoints, time efficiency, comparators, results, AI type, and origin. Studies were categorized by primary endpoint group, clinical area or specialty, and AI data modality.

No contact was made with study authors for additional information, and due to the varied nature of tasks and endpoints across studies, no meta-analyses were performed. Instead, descriptive statistics were used to provide an overview of the characteristics of the trials included in this review.

Study results 

After deduplication, the electronic search for the scoping review produced 10,484 unique records spanning from January 1, 2018, to November 14, 2023. This process included retrieving 6,219 study records and 4,299 trial registrations. The initial screening of titles and abstracts narrowed the selection to 133 articles subjected to a full-text review. Subsequent exclusions left 73 studies, supplemented by an additional 13 articles identified through secondary reference screening, totaling 86 unique RCTs for inclusion.

Of these 86 RCTs, a substantial proportion (43%) focused on gastroenterology, followed by radiology (13%), surgery (6%), and cardiology (6%). Gastroenterology trials predominantly utilized video-based deep learning algorithms to assist clinicians, mainly evaluating diagnostic yield or performance. Most gastroenterology trials were concentrated among four research groups, highlighting a lack of diversity in trial conduct. Geographically, 92% of the trials were conducted within single countries, with the USA and China leading in the number of trials but focusing on different specialties.

The trials typically involved single centers and a median of 359 participants. Participant demographics like age and sex were consistently reported, but race or ethnicity was less frequently included. 

Diagnostic effectiveness was the most common primary endpoint, followed by metrics related to care management, patient behavior and symptoms, and clinical decision-making. Notably, AI interventions in insulin dosing and hypotension monitoring demonstrated improvements in clinical management by optimizing time within target ranges. Other AI applications influenced patient behavior positively, as seen in trials that increased adherence to referral recommendations through immediate AI-generated predictions.

The majority of the trials evaluated deep learning systems for medical imaging, specifically video-based systems used in endoscopy. The use of AI varied across different data types, including structured data from electronic health records and waveform data. In terms of development, most AI models originated from the industry, with academia also playing a significant role.

Outcome analyses revealed that a substantial number of the trials achieved significant improvements in their primary endpoints when AI was used to assist clinicians or compared to routine care. However, a small group of trials used non-inferiority designs to demonstrate that AI systems could match the performance of unassisted clinicians or routine care.

Operational time measurements varied across trials, with some reporting significant reductions while others saw increases or no change. Gastroenterology was notably the most studied specialty in terms of operational time effects, with mixed results regarding the impact of AI on operational efficiency.

Journal reference:
Vijay Kumar Malesu

Written by

Vijay Kumar Malesu

Vijay holds a Ph.D. in Biotechnology and possesses a deep passion for microbiology. His academic journey has allowed him to delve deeper into understanding the intricate world of microorganisms. Through his research and studies, he has gained expertise in various aspects of microbiology, which includes microbial genetics, microbial physiology, and microbial ecology. Vijay has six years of scientific research experience at renowned research institutes such as the Indian Council for Agricultural Research and KIIT University. He has worked on diverse projects in microbiology, biopolymers, and drug delivery. His contributions to these areas have provided him with a comprehensive understanding of the subject matter and the ability to tackle complex research challenges.    


Please use one of the following formats to cite this article in your essay, paper or report:

  • APA

    Kumar Malesu, Vijay. (2024, April 26). AI in healthcare shows promise in trials but needs real-world testing to ensure effectiveness. News-Medical. Retrieved on June 12, 2024 from

  • MLA

    Kumar Malesu, Vijay. "AI in healthcare shows promise in trials but needs real-world testing to ensure effectiveness". News-Medical. 12 June 2024. <>.

  • Chicago

    Kumar Malesu, Vijay. "AI in healthcare shows promise in trials but needs real-world testing to ensure effectiveness". News-Medical. (accessed June 12, 2024).

  • Harvard

    Kumar Malesu, Vijay. 2024. AI in healthcare shows promise in trials but needs real-world testing to ensure effectiveness. News-Medical, viewed 12 June 2024,


The opinions expressed here are the views of the writer and do not necessarily reflect the views and opinions of News Medical.
Post a new comment

While we only use edited and approved content for Azthena answers, it may on occasions provide incorrect responses. Please confirm any data provided with the related suppliers or authors. We do not provide medical advice, if you search for medical information you must always consult a medical professional before acting on any information provided.

Your questions, but not your email details will be shared with OpenAI and retained for 30 days in accordance with their privacy principles.

Please do not ask questions that use sensitive or confidential information.

Read the full Terms & Conditions.

You might also like...
BIOHIT HealthCare to host symposium on gastric cancer at BSG LIVE'24