AI Model Variability in EGFR Prediction by Ancestry

Original Title: Ancestry-Associated Performance Variability of Open-Source AI Models for EGFR Prediction in Lung Cancer

Journal: JAMA oncology

DOI: 10.1001/jamaoncol.2025.6430

Overview

This study evaluates the performance and generalizability of two open-source artificial intelligence models, EAGLE and DeepGEM, for predicting "EGFR" mutation status in lung adenocarcinoma using routine hematoxylin-eosin pathology slides. Researchers analyzed 2098 patients across two independent cohorts: the Dana-Farber Cancer Institute and the European TNM-I trial. The primary objective was to determine if these AI tools maintain accuracy across different ancestral backgrounds and anatomical sample types. Results indicated that the EAGLE model achieved an area under the receiver operating characteristic curve of 0.83 in the first cohort and 0.81 in the second, outperforming DeepGEM, which recorded values of 0.68 and 0.75. However, EAGLE accuracy dropped to 0.68 in Asian patients compared to 0.84 in European patients. Accuracy was also lower in pleural samples, yielding an area under the curve of 0.66, compared to 0.86 in lung specimens.

Novelty

The research provides an independent validation of open-source AI pathology models across diverse populations and anatomical sites. Unlike previous works focusing on single-center datasets, this research utilizes a multicohort approach to assess the impact of genetic ancestry on model reliability. A distinct contribution is the identification of a performance gap in Asian populations, despite a higher prevalence of "EGFR" mutations in this group. The research also highlights technical differences between model architectures, demonstrating that fine-tuned foundation models exhibit better generalizability than frozen feature-based approaches. Additionally, the study implements a dual-threshold triage strategy that reduced the need for rapid molecular testing by 57%. By maintaining a sensitivity of 0.84 and a specificity of 0.99, this triage approach demonstrates how AI can optimize resource allocation without compromising diagnostic standards.

Potential Clinical / Research Applications

These AI models can be implemented as preliminary screening tools to prioritize patients for rapid molecular testing, especially in resource-limited settings where genomic sequencing is delayed. The dual-threshold triage system provides a framework for identifying cases with a high probability of being mutation-negative, allowing clinicians to accelerate the initiation of alternative therapies. In research, these tools can screen large retrospective pathology archives to identify cohorts with specific genomic alterations for epidemiological studies. The findings regarding ancestry-related variability emphasize the necessity for developing site-specific recalibration protocols. Clinical laboratories should use the reported performance metrics to establish internal validation standards, ensuring that the AI adjunct provides reliable results across their specific patient demographics and various biopsy types.


Posted

in

by

Tags:

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *

CAPTCHA