A study of 691 FDA-cleared AI/ML devices reveals significant reporting gaps in efficacy, safety, and bias, calling for better regulation.

Original Title: Benefit-Risk Reporting for FDA-Cleared Artificial Intelligence-Enabled Medical Devices

Journal: JAMA health forum

DOI: 10.1001/jamahealthforum.2025.3351

FDA AI/ML Device Reporting Lacks Transparency

Overview

A comprehensive analysis of 691 artificial intelligence and machine learning (AI/ML) medical devices cleared by the US Food and Drug Administration (FDA) between 1995 and 2023 reveals significant deficiencies in benefit-risk reporting. The cross-sectional study examined FDA decision summaries and postmarket surveillance databases. It found that crucial information was frequently missing. For instance, 95.5% of device summaries lacked demographic data for the populations on which the AI was tested, 53.3% did not report the training sample size, and 46.7% omitted the study design. The evidence supporting clearance was often not robust; only 1.6% of devices were backed by data from randomized clinical trials. Postmarket issues were also identified, with 5.2% of devices linked to 489 adverse events, including one death, and 5.8% of devices being recalled.

Novelty

This research provides a uniquely comprehensive assessment by linking premarket clearance data with postmarket safety information from adverse event and recall databases for all FDA-cleared AI/ML devices over a 28-year period. While previous studies have examined aspects of AI/ML device approvals, this work is distinct in its scale and its integrated analysis of the full device lifecycle. The study also introduces a temporal analysis, comparing devices cleared before 2021 with those cleared in or after 2021. This comparison showed that while reporting of demographic bias and efficacy has improved recently (19.3% vs 2.5% and 35.8% vs 23.8%, respectively), reporting of safety assessments and association with peer-reviewed publications has declined (13.4% vs 36.8% and 31.9% vs 43.7%, respectively).

My Perspective

I find these results highlight a critical tension between the rapid advancement of AI technology and the existing regulatory frameworks designed for more conventional medical devices. The FDA’s reliance on the 510(k) pathway, which clears most AI/ML devices based on “substantial equivalence” to a prior device, seems ill-suited for this technology. An AI algorithm is fundamentally different from a physical instrument; its performance is highly dependent on the data it was trained on. The finding that 95.5% of submissions lack demographic details is deeply concerning. This practice risks creating and perpetuating health disparities, as a tool validated on one population may not perform safely or effectively on another. It suggests a systemic failure to prioritize equity in the development and approval process for these influential clinical tools.

Potential Clinical / Research Applications

In clinical practice, these findings should prompt healthcare providers and institutions to exercise caution when adopting new AI/ML technologies. Clinicians should critically evaluate the evidence provided by manufacturers, specifically questioning the diversity of the training and validation data and demanding transparency on performance metrics before integrating these tools into patient care. For the research community, this study underscores the need to develop and advocate for standardized reporting guidelines for AI/ML device evaluations, analogous to frameworks used for clinical trials. Future research could also focus on creating robust, independent postmarket surveillance systems, perhaps by leveraging real-world data from electronic health records to monitor the performance of these devices after deployment and detect performance degradation or biases not apparent in premarket testing.


Posted

in

by

Tags:

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *

CAPTCHA