A study of 691 FDA-cleared AI devices found that reporting on efficacy, safety, and bias is inadequate, urging stronger regulatory oversight.

Original Title: Benefit-Risk Reporting for FDA-Cleared Artificial Intelligence-Enabled Medical Devices

Journal: JAMA health forum

DOI: 10.1001/jamahealthforum.2025.3351

Title

FDA Reporting for AI Medical Devices

One-Sentence Summary

An analysis of 691 FDA-cleared AI medical devices reveals significant gaps in reporting on efficacy, safety, and bias, highlighting a need for improved regulatory oversight.

Overview

This study investigated the comprehensiveness of benefit-risk reporting for artificial intelligence (AI) and machine learning (ML) medical devices cleared by the US Food and Drug Administration (FDA). Researchers conducted a cross-sectional analysis of all 691 AI/ML devices cleared from September 1995 to July 2023. The findings reveal substantial gaps in the information provided to regulators and the public. For instance, device summaries frequently failed to report fundamental details such as study design (46.7%), training sample size (53.3%), and patient demographic information (95.5%). The quality of evidence supporting clearance was often limited; only 1.6% of devices presented data from randomized clinical trials, and very few reported on patient outcomes (<1%). Postmarket surveillance data was also examined, identifying 489 adverse events associated with 36 devices (5.2%), which included 30 injuries and one death. Furthermore, 40 devices (5.8%) were recalled a total of 113 times, with most recalls (75.2%) attributed to software issues. The study concludes that current reporting standards for AI/ML devices are inadequate, suggesting a need for more rigorous regulatory pathways to ensure patient safety.

Novelty

The study’s contribution lies in its comprehensive scope and integrated analysis. It is the first to examine the entire cohort of 691 AI/ML devices cleared by the FDA through mid-2023. Unlike previous work, this research connects premarket clearance data with postmarket surveillance information by linking FDA decision summaries to databases for adverse events (MAUDE) and recalls. This approach provides a holistic view of a device’s lifecycle, from its initial approval to its performance in real-world settings. Additionally, the paper introduces a temporal analysis by comparing devices cleared before and after 2021, when the FDA issued new recommendations. This comparison offers valuable insight into whether reporting practices have improved over time, finding mixed results with some areas like bias reporting improving while others like safety assessments have not.

My Perspective

These findings underscore a critical disconnect between the pace of AI innovation and the evolution of regulatory science. The fact that 96.7% of these complex software devices were cleared through the 510(k) pathway, which relies on demonstrating “substantial equivalence” to a previously cleared device, is concerning. This pathway was not designed for adaptive, data-driven algorithms whose performance can vary significantly across different populations or degrade over time. I believe this creates a form of regulatory debt, where potential risks accumulate as new devices are benchmarked against predecessors that may also lack robust clinical validation. The near-total absence of demographic data (95.5% of devices) is particularly troubling, as it suggests we are deploying tools without knowing if they are safe and effective for diverse patient groups, potentially worsening health disparities.

Potential Clinical / Research Applications

Clinically, these results should prompt healthcare organizations to conduct more thorough due diligence before adopting AI-enabled devices. Administrators and clinicians should press manufacturers for transparent data on performance, safety, and the demographic makeup of validation cohorts. For researchers, this study highlights an urgent need to develop and validate standardized reporting guidelines and evaluation frameworks specifically tailored for AI/ML medical technologies. Future research should focus on real-world performance monitoring of these FDA-cleared devices to assess for performance drift and identify adverse events not captured by current voluntary reporting systems. This could involve creating automated surveillance systems linked to electronic health records to provide a more dynamic and complete picture of device safety and efficacy in practice.