Demographic inaccuracies and biases in the depiction of patients by artificial intelligence text-to-image generators

AI’s Patient Images Show Demographic Biases

One-Sentence Summary

This study reveals that leading AI text-to-image generators produce patient depictions with significant demographic inaccuracies, over-representing White and normal-weight individuals while failing to reflect real-world disease epidemiology.

Overview

As artificial intelligence (AI) text-to-image generators become widely used for creating visual content, their application in medical contexts raises concerns about accuracy and bias. This research systematically evaluated four popular AI models—Adobe Firefly, Bing Image Generator, Meta Imagine, and Midjourney—to assess how accurately they depict patients for 29 different diseases. Researchers generated a total of 9060 images and had twelve independent raters assess the depicted sex, age, race/ethnicity, and weight. These AI-generated demographics were then compared against established, real-world epidemiological data for each disease. The findings indicate a consistent failure across all platforms to accurately represent patient populations. A pronounced bias was observed toward the over-representation of White individuals, who constituted 87% of images from Adobe and 78% from Midjourney, compared to a pooled real-world average of 20%. Similarly, normal-weight individuals were over-represented, making up 96% of Adobe’s and 93% of Midjourney’s outputs, far exceeding the general population average of 63%.

Novelty

While previous studies have identified general demographic biases in AI image generators, this paper provides a specific and systematic analysis within the medical domain. Its novelty lies in directly comparing the outputs of multiple leading AI models against concrete, real-world epidemiological data for a wide range of diseases. The study moves beyond general observations of bias by quantifying the inaccuracies in depictions of disease-specific populations. For instance, it evaluates whether the AI correctly generates images of children for pediatric diseases or males for male-specific conditions. This rigorous, disease-contextualized approach provides specific evidence of the models’ shortcomings in a field where accuracy is critical, highlighting a significant gap between the technology’s capabilities and the requirements for responsible medical use.

My Perspective

The inaccuracies documented in this paper likely stem not only from biased training data but also from the AI developers’ attempts to mitigate bias, which can lead to unintended consequences. For example, the depiction of both men and women for sex-specific diseases like prostate cancer suggests an over-correction, where a general directive to ensure gender balance overrides specific, context-dependent facts. This reveals a lack of nuanced understanding within the models. Furthermore, the perpetuation of these biases in medical illustrations or educational materials is particularly concerning. It could inadvertently reinforce stereotypes among healthcare students and professionals, potentially leading them to associate certain diseases with specific demographics. This could subtly influence clinical judgment and contribute to diagnostic delays for patients who do not fit the stereotypical image presented by these tools.

Potential Clinical / Research Applications

Clinically, these findings serve as a strong caution against the uncritical use of AI-generated images for patient education or medical training. Healthcare professionals and educators who use these tools must be aware of their current limitations and should manually curate or edit images to ensure they reflect accurate patient diversity. For research, this study opens several important avenues. It highlights the need for developing and fine-tuning AI models on more diverse and medically relevant datasets that include a wider range of patient demographics. Future research could explore the effectiveness of advanced “prompt engineering”—using highly detailed text commands to specify patient characteristics—in reducing these biases. Additionally, this work can inform the development of standards and guidelines for the use of generative AI in healthcare, pushing developers to prioritize demographic accuracy and transparency in their models.


Posted

in

by

Tags:

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *

CAPTCHA