Can synthetic data boost fairness in medical imaging AI?

NewsGuard 100/100 Score

In a recent study published in the journal Nature Medicine, researchers used diffusion models for data augmentation to increase the robustness and fairness of medical machine learning (ML) models in three medical imaging contexts: histopathology, chest X-rays, and dermatological images.

Study: Generative models improve fairness of medical classifiers under distribution shiftsStudy: Generative models improve fairness of medical classifiers under distribution shifts

Domain generalization has become a primary issue for ML use in healthcare settings since model performance may be worse than planned due to data discrepancies during model development and deployment. Underrepresentation of specific groups or diseases is a typical problem that competent doctors struggle to solve due to disease rarity or the availability of clinical knowledge. Few initiatives have gained widespread acceptance and scaled influence on clinical outcomes, with 'out-of-distribution' data being a significant hurdle to implementation.

About the study

In the present study, researchers employed diffusion models to examine medical imaging situations such as histology, chest X-rays, and dermatological pictures. They used these photos to enhance the reliability and fairness of medical machine-learning models. They also utilized unlabeled data to track data dispersion and supplement actual samples. The project sought to expand the training dataset in a steerable and programmable manner.

The researchers trained a generative model using labeled and unlabeled data, with labeled data accessible exclusively for a single source domain and extra unlabeled data from any domain (in or out of distribution). They may condition the model on diagnostic labels with or without property (for instance, sensitive attribute labels or ID of hospitals). The researchers de-identified data before analysis. Model conditioning on one or both qualities allowed them to specify which synthetic samples were used to supplement the training set. They trained the generative model of low resolution and upsampler using one conditioning vector.

The team added synthetic pictures from generative modeling to training data obtained from source domains before diagnostic model training. They tested their strategy on several medical situations with denoising diffusion probabilistic models (DDPMs), tracking fairness and diagnostic performance in and out from distribution (OOD). They defined in-distribution data as photos from similar demographic and illness distributions obtained using the one imaging technique as training data.

The researchers used two criteria to compare the model baseline performances and the suggested technique. One set concentrated on diagnostic accuracies, such as top-1-type accuracy in identifying histology images and receiver operating curve-area under the curve (ROC-AUC) values for radiological assessments, whereas the other was more concerned with fairness. Expert dermatologists have found high-risk-type sensitivity to be the most useful diagnostic tool.

Researchers used two big public radiology datasets, CheXpert and ChestX-ray, to create generative and diagnostic models for chest X-rays. After training on 201,055 chest X-ray instances, dermatologists evaluated the model's ability to capture primary characteristics on 488 synthetic pictures from regular and high-risk classes. They assessed the picture quality to offer a diagnosis for up to three of the approximately 20,000 common illnesses.


The study shows that diffusion models may learn realistic augmentations from data in a label-efficient way, making them more resilient and statistically fair both in and out of distribution. Combining synthetic and real-time data can considerably increase diagnostic accuracy and decrease the fairness gap between different qualities during shifts in distribution.

Generated images in the dermatology setting. Each row of images corresponds to a different condition. a, Generated images for cyst, melanocytic nevus and seborrheic dermatitis. b, Generated images for folliculitis, hidradenitis and alopecia areata.Generated images in the dermatology setting. Each row of images corresponds to a different condition. a, Generated images for cyst, melanocytic nevus and seborrheic dermatitis. b, Generated images for folliculitis, hidradenitis and alopecia areata.

While not a substitute for representative and high-quality data collection methods, it can enable clinicians to use unlabeled and labeled information and close potentially harmful diagnostic accuracy gaps between underrepresented and overrepresented populations without penalizations. The researchers found that using synthetic data beat in-distribution baselines in more and less skewed circumstances, narrowing fairness gaps between hospitals.

Color augmentations on top of produced samples performed the best overall, with 49% relative improvements over baseline modeling and a 3.2% improvement over models with color augmentation training in the test hospital. The study showed that synthetic pictures considerably increased the average AUC for five diseases, notably cardiomegaly and OOD. The female fairness difference narrowed by 45%, while the race fairness gap shrank by 32%. Combining heuristic augmentations with synthetic data-based techniques such as 'Label conditioning' and 'Label and property conditioning' increased model sensitivity without sacrificing fairness, resulting in considerable gains in OOD scenarios.

Label and property conditioning improved high-risk diagnostic sensitivity by 27% and increased OOD by 63.5%, narrowing the fairness gap by 7.5×. The dermatological modality produced realistic and canonical pictures that captured features of numerous illnesses, including rare occurrences. Synthetic pictures also decreased false correlations and compressed representations, lowering the model's dependence on non-generalizable OOD correlations and underserving individuals.

The study shows that diffusion models may generate synthetic pictures helpful in medical applications such as histology, radiology, and dermatology while enhancing statistical fairness, balanced accuracy, and high-risk sensitivity. These synthetic samples produce realistic, canonical pictures that professional doctors consider diagnosable. However, the researchers point out possible hazards and limits depending on created data, such as overconfidence in AI systems, restricted insights, and the recurrence of biases in the original training data.

Journal reference:
Pooja Toshniwal Paharia

Written by

Pooja Toshniwal Paharia

Dr. based clinical-radiological diagnosis and management of oral lesions and conditions and associated maxillofacial disorders.


Please use one of the following formats to cite this article in your essay, paper or report:

  • APA

    Toshniwal Paharia, Pooja Toshniwal Paharia. (2024, April 12). Can synthetic data boost fairness in medical imaging AI?. News-Medical. Retrieved on May 25, 2024 from

  • MLA

    Toshniwal Paharia, Pooja Toshniwal Paharia. "Can synthetic data boost fairness in medical imaging AI?". News-Medical. 25 May 2024. <>.

  • Chicago

    Toshniwal Paharia, Pooja Toshniwal Paharia. "Can synthetic data boost fairness in medical imaging AI?". News-Medical. (accessed May 25, 2024).

  • Harvard

    Toshniwal Paharia, Pooja Toshniwal Paharia. 2024. Can synthetic data boost fairness in medical imaging AI?. News-Medical, viewed 25 May 2024,


The opinions expressed here are the views of the writer and do not necessarily reflect the views and opinions of News Medical.
Post a new comment

While we only use edited and approved content for Azthena answers, it may on occasions provide incorrect responses. Please confirm any data provided with the related suppliers or authors. We do not provide medical advice, if you search for medical information you must always consult a medical professional before acting on any information provided.

Your questions, but not your email details will be shared with OpenAI and retained for 30 days in accordance with their privacy principles.

Please do not ask questions that use sensitive or confidential information.

Read the full Terms & Conditions.

You might also like...
Study finds accounting for sex improves precision and prognostic performance of CMR biomarkers for heart failure