Demographic inaccuracies and biases in the depiction of patients by artificial intelligence text-to-image generators

AI’s Patient Images Show Demographic Biases

One-Sentence Summary

This study reveals that leading AI text-to-image generators produce patient depictions with significant demographic inaccuracies, over-representing White and normal-weight individuals while failing to reflect real-world disease epidemiology.

Overview

As artificial intelligence (AI) text-to-image generators become widely used for creating visual content, their application in medical contexts raises concerns about accuracy and bias. This research systematically evaluated four popular AI models—Adobe Firefly, Bing Image Generator, Meta Imagine, and Midjourney—to assess how accurately they depict patients for 29 different diseases. Researchers generated a total of 9060 images and had twelve independent raters assess the depicted sex, age, race/ethnicity, and weight. These AI-generated demographics were then compared against established, real-world epidemiological data for each disease. The findings indicate a consistent failure across all platforms to accurately represent patient populations. A pronounced bias was observed toward the over-representation of White individuals, who constituted 87% of images from Adobe and 78% from Midjourney, compared to a pooled real-world average of 20%. Similarly, normal-weight individuals were over-represented, making up 96% of Adobe’s and 93% of Midjourney’s outputs, far exceeding the general population average of 63%.

Novelty

While previous studies have identified general demographic biases in AI image generators, this paper provides a specific and systematic analysis within the medical domain. Its novelty lies in directly comparing the outputs of multiple leading AI models against concrete, real-world epidemiological data for a wide range of diseases. The study moves beyond general observations of bias by quantifying the inaccuracies in depictions of disease-specific populations. For instance, it evaluates whether the AI correctly generates images of children for pediatric diseases or males for male-specific conditions. This rigorous, disease-contextualized approach provides specific evidence of the models’ shortcomings in a field where accuracy is critical, highlighting a significant gap between the technology’s capabilities and the requirements for responsible medical use.

My Perspective

The inaccuracies documented in this paper likely stem not only from biased training data but also from the AI developers’ attempts to mitigate bias, which can lead to unintended consequences. For example, the depiction of both men and women for sex-specific diseases like prostate cancer suggests an over-correction, where a general directive to ensure gender balance overrides specific, context-dependent facts. This reveals a lack of nuanced understanding within the models. Furthermore, the perpetuation of these biases in medical illustrations or educational materials is particularly concerning. It could inadvertently reinforce stereotypes among healthcare students and professionals, potentially leading them to associate certain diseases with specific demographics. This could subtly influence clinical judgment and contribute to diagnostic delays for patients who do not fit the stereotypical image presented by these tools.

Potential Clinical / Research Applications

Clinically, these findings serve as a strong caution against the uncritical use of AI-generated images for patient education or medical training. Healthcare professionals and educators who use these tools must be aware of their current limitations and should manually curate or edit images to ensure they reflect accurate patient diversity. For research, this study opens several important avenues. It highlights the need for developing and fine-tuning AI models on more diverse and medically relevant datasets that include a wider range of patient demographics. Future research could explore the effectiveness of advanced “prompt engineering”—using highly detailed text commands to specify patient characteristics—in reducing these biases. Additionally, this work can inform the development of standards and guidelines for the use of generative AI in healthcare, pushing developers to prioritize demographic accuracy and transparency in their models.

Similar Posts

  • A study of 691 FDA-cleared AI devices found that reporting on efficacy, safety, and bias is inadequate, urging stronger regulatory oversight.

    Original Title: Benefit-Risk Reporting for FDA-Cleared Artificial Intelligence-Enabled Medical Devices Journal: JAMA health forum DOI: 10.1001/jamahealthforum.2025.3351 Title FDA Reporting for AI Medical Devices One-Sentence Summary An analysis of 691 FDA-cleared AI medical devices reveals significant gaps in reporting on efficacy, safety, and bias, highlighting a need for improved regulatory oversight. Overview This study investigated the comprehensiveness of benefit-risk reporting for artificial intelligence (AI) and machine learning (ML) medical devices cleared by the US Food and Drug Administration (FDA). Researchers conducted a cross-sectional analysis of all 691 AI/ML devices cleared from September 1995 to July 2023. The findings reveal substantial gaps in the information provided to regulators and the public. For…

  • Evaluating AI and Human Performance in Spinal Surgery SSI

    Original Title: A Commentary on "Artificial Intelligence-Based Prediction Model for Surgical Site Infection in Metastatic Spinal Disease: a Multicenter Development and Validation Study" Journal: International journal of surgery (London, England) DOI: 10.1097/JS9.0000000000003123 Overview The commentary evaluates a multicenter study that developed a gradient boosting machine learning model to predict surgical site infection in metastatic spinal disease. The original research aimed to provide individualized risk stratification using prospectively collected data. A key feature was a performance comparison between the model and five experienced spine surgeons with ten to fifteen years of experience. The results showed a significant statistical difference: the artificial intelligence achieved an area under the receiver operating characteristic curve…

  • Reform Strategies for Medicare Physician Payment Stability

    Original Title: How AI Will Help Solve Medicine's Productivity Challenges Journal: JAMA health forum DOI: 10.1001/jamahealthforum.2025.6647 Overview This analysis examines the mechanisms of the Medicare Physician Fee Schedule and the impact of budget neutrality requirements on physician reimbursement. Between 2001 and 2024, inflation-adjusted payments for physicians declined by 29 percent. Unlike other Medicare providers, physician payments are not automatically tied to inflation. Instead, they are governed by a conversion factor adjusted annually by the Centers for Medicare and Medicaid Services. The primary constraint is the budget neutrality mandate, requiring that any changes in the fee schedule projected to increase or decrease spending by more than 20 million dollars be offset…

  • AI Model to Predict Gout Recurrence in Hospitalized Patients

    Original Title: Development and validation of a multidimensional and interpretable artificial intelligence model to predict gout recurrence in hospitalised patients: a real-world, ambispective multicentre cohort study in China Journal: BMC medicine DOI: 10.1186/s12916-025-04454-8 Overview Researchers addressed the challenge of predicting gout recurrence in hospitalized patients with other health conditions. This large, multicentre study in China included 6,526 patients in both retrospective and prospective cohorts. Using 82 clinical, laboratory, and medication features, the team developed and rigorously tested 3,744 different artificial intelligence models to find the most accurate and reliable one. The final selected model, a Gradient Boosting algorithm, demonstrated good predictive performance. It achieved an area under the curve (AUC)…

  • Expert Consensus on Sonazoid CEUS for Liver Lesions

    Original Title: Expert consensus regarding the clinical application of liver contrast-enhanced US with Sonazoid (Sonazoid CEUS) Journal: International journal of surgery (London, England) DOI: 10.1097/JS9.0000000000003510 Overview This document presents an expert consensus on the clinical use of Sonazoid contrast-enhanced ultrasound for managing focal liver lesions. Sonazoid is a second-generation agent that functions as both a blood pool and a Kupffer-cell agent, with a phagocytic rate of 99 percent. Unlike pure blood-pool agents, it provides a stable post-vascular phase that lasts for approximately sixty minutes, enabling thorough liver scans. The consensus covers surveillance, diagnosis of hepatocellular carcinoma, detection of metastases, and interventional guidance. In high-risk patients, Sonazoid improves the detection of…

  • A Multisociety Syllabus for AI in Radiology Education

    Original Title: Teaching AI for Radiology Applications: A Multisociety-Recommended Syllabus from the AAPM, ACR, RSNA, and SIIM Journal: Radiology. Artificial intelligence DOI: 10.1148/ryai.250137 Overview This paper presents a recommended syllabus for artificial intelligence (AI) education in radiology, developed through a collaboration of four major U.S. societies: the American Association of Physicists in Medicine (AAPM), the American College of Radiology (ACR), the Radiological Society of North America (RSNA), and the Society for Imaging Informatics in Medicine (SIIM). The framework addresses the growing need for standardized competencies as AI tools become more common in medical imaging. It defines the required knowledge for four distinct professional roles, or “personas”: users of AI systems…

Leave a Reply

Your email address will not be published. Required fields are marked *

CAPTCHA