A study of 691 FDA-cleared AI devices found that reporting on efficacy, safety, and bias is inadequate, urging stronger regulatory oversight.

Original Title: Benefit-Risk Reporting for FDA-Cleared Artificial Intelligence-Enabled Medical Devices

Journal: JAMA health forum

DOI: 10.1001/jamahealthforum.2025.3351

Title

FDA Reporting for AI Medical Devices

One-Sentence Summary

An analysis of 691 FDA-cleared AI medical devices reveals significant gaps in reporting on efficacy, safety, and bias, highlighting a need for improved regulatory oversight.

Overview

This study investigated the comprehensiveness of benefit-risk reporting for artificial intelligence (AI) and machine learning (ML) medical devices cleared by the US Food and Drug Administration (FDA). Researchers conducted a cross-sectional analysis of all 691 AI/ML devices cleared from September 1995 to July 2023. The findings reveal substantial gaps in the information provided to regulators and the public. For instance, device summaries frequently failed to report fundamental details such as study design (46.7%), training sample size (53.3%), and patient demographic information (95.5%). The quality of evidence supporting clearance was often limited; only 1.6% of devices presented data from randomized clinical trials, and very few reported on patient outcomes (<1%). Postmarket surveillance data was also examined, identifying 489 adverse events associated with 36 devices (5.2%), which included 30 injuries and one death. Furthermore, 40 devices (5.8%) were recalled a total of 113 times, with most recalls (75.2%) attributed to software issues. The study concludes that current reporting standards for AI/ML devices are inadequate, suggesting a need for more rigorous regulatory pathways to ensure patient safety.

Novelty

The study’s contribution lies in its comprehensive scope and integrated analysis. It is the first to examine the entire cohort of 691 AI/ML devices cleared by the FDA through mid-2023. Unlike previous work, this research connects premarket clearance data with postmarket surveillance information by linking FDA decision summaries to databases for adverse events (MAUDE) and recalls. This approach provides a holistic view of a device’s lifecycle, from its initial approval to its performance in real-world settings. Additionally, the paper introduces a temporal analysis by comparing devices cleared before and after 2021, when the FDA issued new recommendations. This comparison offers valuable insight into whether reporting practices have improved over time, finding mixed results with some areas like bias reporting improving while others like safety assessments have not.

My Perspective

These findings underscore a critical disconnect between the pace of AI innovation and the evolution of regulatory science. The fact that 96.7% of these complex software devices were cleared through the 510(k) pathway, which relies on demonstrating “substantial equivalence” to a previously cleared device, is concerning. This pathway was not designed for adaptive, data-driven algorithms whose performance can vary significantly across different populations or degrade over time. I believe this creates a form of regulatory debt, where potential risks accumulate as new devices are benchmarked against predecessors that may also lack robust clinical validation. The near-total absence of demographic data (95.5% of devices) is particularly troubling, as it suggests we are deploying tools without knowing if they are safe and effective for diverse patient groups, potentially worsening health disparities.

Potential Clinical / Research Applications

Clinically, these results should prompt healthcare organizations to conduct more thorough due diligence before adopting AI-enabled devices. Administrators and clinicians should press manufacturers for transparent data on performance, safety, and the demographic makeup of validation cohorts. For researchers, this study highlights an urgent need to develop and validate standardized reporting guidelines and evaluation frameworks specifically tailored for AI/ML medical technologies. Future research should focus on real-world performance monitoring of these FDA-cleared devices to assess for performance drift and identify adverse events not captured by current voluntary reporting systems. This could involve creating automated surveillance systems linked to electronic health records to provide a more dynamic and complete picture of device safety and efficacy in practice.

Similar Posts

  • Deep Learning for HCC Biomarkers and Drug Discovery

    Original Title: Deep learning facilitated discovery of prognosis biomarkers and their ligands to improve liver cancer treatment Journal: International journal of surgery (London, England) DOI: 10.1097/JS9.0000000000003455 Overview The DLCP framework improves clinical prognosis for hepatocellular carcinoma by integrating genomics, transcriptomics, and epigenetics directly with survival outcomes. Analyzing 371 patients from The Cancer Genome Atlas, the model successfully stratified the cohort into 298 low-risk and 76 high-risk individuals, demonstrating significant survival differences with a P-value below 0.001. These results were confirmed through validation in an independent cohort of 232 patients. Key molecular signatures, including mutations in EIF2B4 and HCCS, were found to occur exclusively in the high-risk group. The analysis further…

  • Reform Strategies for Medicare Physician Payment Stability

    Original Title: How AI Will Help Solve Medicine's Productivity Challenges Journal: JAMA health forum DOI: 10.1001/jamahealthforum.2025.6647 Overview This analysis examines the mechanisms of the Medicare Physician Fee Schedule and the impact of budget neutrality requirements on physician reimbursement. Between 2001 and 2024, inflation-adjusted payments for physicians declined by 29 percent. Unlike other Medicare providers, physician payments are not automatically tied to inflation. Instead, they are governed by a conversion factor adjusted annually by the Centers for Medicare and Medicaid Services. The primary constraint is the budget neutrality mandate, requiring that any changes in the fee schedule projected to increase or decrease spending by more than 20 million dollars be offset…

  • AI, Mobile Tech, and Social Media for Health in Africa

    Original Title: Scoping review of artificial intelligence via mobile technology and social media for health in Africa Journal: Nature communications DOI: 10.1038/s41467-025-64766-4 Overview This scoping review investigates the integration of artificial intelligence with mobile technology and social media to address health challenges in Africa. Following the PRISMA approach, researchers screened 469 articles published between 2014 and 2023, ultimately synthesizing 116 papers with a focused analysis of 29 studies. The results indicate that these digital tools are primarily utilized for infectious disease monitoring and diagnosis. Specifically, malaria was the subject of 17.2% of the studies, while COVID-19 accounted for 13.8%. Other conditions frequently studied include Ebola at 10.3%, cervical cancer at…

  • AI Model to Predict Gout Recurrence in Hospitalized Patients

    Original Title: Development and validation of a multidimensional and interpretable artificial intelligence model to predict gout recurrence in hospitalised patients: a real-world, ambispective multicentre cohort study in China Journal: BMC medicine DOI: 10.1186/s12916-025-04454-8 Overview Researchers addressed the challenge of predicting gout recurrence in hospitalized patients with other health conditions. This large, multicentre study in China included 6,526 patients in both retrospective and prospective cohorts. Using 82 clinical, laboratory, and medication features, the team developed and rigorously tested 3,744 different artificial intelligence models to find the most accurate and reliable one. The final selected model, a Gradient Boosting algorithm, demonstrated good predictive performance. It achieved an area under the curve (AUC)…

  • Volumetric Brain Matter Changes in Mild Cognitive Impairment

    Original Title: Biomarkers Journal: Alzheimer's & dementia : the journal of the Alzheimer's Association DOI: 10.1002/alz70856_106355 Overview Mild cognitive impairment (MCI) serves as a critical transitional stage between the typical cognitive changes of aging and the onset of Alzheimer's disease. This study explores structural brain alterations associated with this condition by quantifying gray matter and white matter volumes using high-resolution T1-weighted magnetic resonance imaging. The research team utilized a specialized deep neural network named Vb-Net to perform automated segmentation and volumetric analysis on healthy controls and individuals with MCI. Patients with MCI experienced a 4.60% reduction in gray matter volume and a 5.60% decrease in white matter volume compared to…

  • AI enhanced diagnostic accuracy and workload reduction in hepatocellular carcinoma screening

    Title AI Enhances Liver Cancer Screening Efficiency One-Sentence Summary A study of AI-human collaboration in liver cancer screening found that a specific workflow maintained high detection sensitivity while improving specificity, significantly reducing radiologists’ workload. Overview This study evaluated the utility of artificial intelligence (AI) in ultrasound screening for hepatocellular carcinoma (HCC). Researchers developed two AI models—UniMatch for lesion detection and LivNet for classification—which were trained and tested on 21,934 ultrasound images. The study compared the conventional radiologist-only screening method with four different human-AI interaction strategies. The most effective approach, Strategy 4, involved AI performing an initial triage, with radiologists reviewing specific cases flagged as negative by the AI. Compared to…

Leave a Reply

Your email address will not be published. Required fields are marked *

CAPTCHA