Do deep learning tools outperform humans in diagnosing breast cancer via ultrasound imaging?

NewsGuard 100/100 Score

In a recent study published in the journal npj Precision Oncology, researchers conducted a systematic review to examine the accuracy of deep learning (DL) in diagnosing breast cancer using ultrasound (US) compared to human readers in clinical settings.

They found that there isn’t enough evidence to determine whether DL performs better than human readers or increases the accuracy of diagnostic breast US in clinical settings.

Study: Diagnostic performance of deep learning in ultrasound diagnosis of breast cancer: a systematic review. Image Credit: Gorodenkoff/Shutterstock.comStudy: Diagnostic performance of deep learning in ultrasound diagnosis of breast cancer: a systematic review. Image Credit: Gorodenkoff/


Breast cancer, the most prevalent cancer globally, caused 685,000 deaths in 2020. Early and accurate diagnosis is crucial.

The US serves as a low-cost, radiation-free, and effective diagnostic tool, especially in cases with dense breast tissues or occult lesions, offering guidance for biopsy procedures. However, its diagnostic efficacy and reproducibility are hindered by operator-dependent factors.

DL is a potent artificial intelligence technology shown to perform well in image-related tasks, enhancing the efficiency and accuracy of medical imaging workflows, especially in the diagnosis of diseases such as cancer.

Recent reports suggest that DL-based analysis of breast US may be equivalent to or surpass human radiologists, but its clinical application remains debated.

Therefore, researchers in the present review focused on the general diagnostic performance of DL in breast US, comparing standalone DL systems to radiologists and assessing the assistive role of DL alongside human readers.

About the study

In the present study, a database search followed by the application of stringent inclusion and exclusion criteria ultimately yielded 16 studies involving 9,238 women from various countries.

These studies were selected based on the PICO (short for population, intervention, comparison, outcome) framework and used DL convolutional neural networks, with 14 of them employing commercial DL systems.

Most of the included studies were in a diagnostic setting, and pathology served as the gold standard in all of them. The study quality was assessed using tailored versions of Quality for Assessment of Diagnostic Studies-2 (QUADAS-2) and QUADAS-C tools.

DL could be used as a standalone tool or may be employed to assist radiologists with the aim of enhancing diagnostic capabilities.

Four studies assessed DL as standalone, two as assistive, and ten explored both roles. Human readers with different clinical experience levels in breast ultrasound were recruited to evaluate DL performance.

Results and discussion

In 14 studies evaluating DL as a standalone system in breast-US, comparisons were made with human readers. While one study found that DL had a lower area under the curve (AUC) than human readers, two showed equivalent AUC, and one reported higher AUC for DL.

DL demonstrated greater AUC over less experienced human readers but was comparable to experienced readers in three studies. Regarding accuracy, DL outperformed all human readers in two studies and outperformed less experienced readers but was found to be comparable to experienced readers in another study.

DL showed lower sensitivity than human readers in five studies and higher specificity in five studies, with varied results in the remaining studies.

In 12 studies evaluating assistive DL systems in breast-US, three reported improved AUC when combined with human readers. One study showed AUC comparable to human readers. For less experienced human readers, assistive DL systems had higher AUC but no positive impact on experienced readers.

During accuracy testing, assistive DL systems showed higher accuracy than human readers in three studies. However, no improvement in overall sensitivity was observed when combining DL with human readers.

Elevated specificity was seen in human readers in seven studies using assistive DL systems, with variations in impact on specificity for experienced and less experienced readers.

During the quality assessment, the studies included in the present review demonstrated a high risk of bias across various domains. Most studies showed a high bias in patient selection due to cancer prevalence significantly exceeding real-world scenarios.

Additionally, the study designs did not fully replicate clinical pathways, as DL systems were used for reading images but were not integrated into final clinical decisions. Testing pathways of human readers lacked access to patient clinical information, and reference standards varied among the studies.

Notably, some studies had a short follow-up time for women with negative tests, potentially impacting the assessment of missed cancers and overall diagnostic accuracy.


In conclusion, this comprehensive review assessing the diagnostic performance of DL systems in breast-US revealed substantial variability in outcomes.

While DL systems demonstrated potential specificity advantages, no consensus emerged on AUC, accuracy, or sensitivity, whether used standalone or as human reader aids.

Concerns were raised about biases, study heterogeneity, and limitations in generalizability, particularly in Asian-centric studies. The review emphasizes the need for standardized DL research guidelines, consistent benchmarks, and multicenter trials to ensure reproducibility and clinical applicability.

The current evidence does not support broad clinical recommendations for DL systems in breast-US, calling for further research and development in the field.

Journal reference:
Dr. Sushama R. Chaphalkar

Written by

Dr. Sushama R. Chaphalkar

Dr. Sushama R. Chaphalkar is a senior researcher and academician based in Pune, India. She holds a PhD in Microbiology and comes with vast experience in research and education in Biotechnology. In her illustrious career spanning three decades and a half, she held prominent leadership positions in academia and industry. As the Founder-Director of a renowned Biotechnology institute, she worked extensively on high-end research projects of industrial significance, fostering a stronger bond between industry and academia.  


Please use one of the following formats to cite this article in your essay, paper or report:

  • APA

    Chaphalkar, Sushama R.. (2024, January 31). Do deep learning tools outperform humans in diagnosing breast cancer via ultrasound imaging?. News-Medical. Retrieved on April 14, 2024 from

  • MLA

    Chaphalkar, Sushama R.. "Do deep learning tools outperform humans in diagnosing breast cancer via ultrasound imaging?". News-Medical. 14 April 2024. <>.

  • Chicago

    Chaphalkar, Sushama R.. "Do deep learning tools outperform humans in diagnosing breast cancer via ultrasound imaging?". News-Medical. (accessed April 14, 2024).

  • Harvard

    Chaphalkar, Sushama R.. 2024. Do deep learning tools outperform humans in diagnosing breast cancer via ultrasound imaging?. News-Medical, viewed 14 April 2024,


The opinions expressed here are the views of the writer and do not necessarily reflect the views and opinions of News Medical.
Post a new comment

While we only use edited and approved content for Azthena answers, it may on occasions provide incorrect responses. Please confirm any data provided with the related suppliers or authors. We do not provide medical advice, if you search for medical information you must always consult a medical professional before acting on any information provided.

Your questions, but not your email details will be shared with OpenAI and retained for 30 days in accordance with their privacy principles.

Please do not ask questions that use sensitive or confidential information.

Read the full Terms & Conditions.

You might also like...
Non-invasive detection and treatment of ovarian cancer with new radiotheranostic system