Robust CRC Diagnosis via Causal and Uncertainty-Aware AI

Original Title: Uncertainty-aware and causal test-time adaptive foundation model for robust colorectal cancer pathology diagnosis

Journal: NPJ digital medicine

DOI: 10.1038/s41746-025-02149-1

Overview

Colorectal cancer remains a major global health challenge, requiring precise histopathological analysis for effective treatment. While computational pathology has advanced with the use of large-scale foundation models, these systems frequently encounter obstacles when deployed in real-world clinical settings. Key issues include domain shifts caused by variations in staining protocols and scanner hardware, as well as the tendency for models to provide overconfident yet incorrect predictions. This paper introduces UAD-FM, an uncertainty-aware and causally adaptive foundation model designed to address these limitations. The framework integrates a variational Bayesian approach to decompose uncertainty into epistemic and aleatoric components, alongside a causal test-time adaptation mechanism. By evaluating the model across five diverse datasets, including TCGA-COAD/READ and DigestPath 2019, the researchers demonstrate that UAD-FM maintains high performance and reliability even when faced with unfamiliar data distributions. The system is designed to bridge the gap between experimental AI performance and the rigorous requirements of clinical diagnostic environments.

Novelty

The technical contribution of UAD-FM lies in its unique combination of three distinct methodologies within a single foundation model architecture. First, it employs a variational uncertainty decomposition head that distinguishes between model-related uncertainty and inherent data noise. Second, the model introduces causal test-time adaptation using do-interventions to separate biological content from non-causal style variables, such as staining artifacts. This allows the model to ignore spurious correlations that often mislead standard deep learning systems. Third, the framework incorporates post-hoc clinical calibration to align prediction confidence with empirical accuracy. Quantitative results show that UAD-FM achieves an AUROC of 0.945 on the TCGA dataset, outperforming established models like UNI and Virchow2, which achieved 0.923 and 0.931 respectively. Furthermore, the model reduces the Expected Calibration Error to 0.031, a significant improvement over the 0.089 observed in traditional Monte Carlo dropout methods. This integration ensures that the model is not only accurate but also provides a reliable measure of its own limitations.

Potential Clinical / Research Applications

This framework has significant potential for clinical triage, where it could automatically flag the 10% most uncertain cases for expert review. In simulations, this strategy improved diagnostic accuracy from 0.881 to 0.907 and reduced high-confidence errors by 32%. In multi-institutional research networks, UAD-FM can facilitate the pooling of data from centers with different scanning technologies without requiring extensive manual normalization. The model also provides fine-grained gland segmentation, achieving a Dice score of 0.872, which is useful for automated grading and prognosis modeling. Furthermore, the interpretable uncertainty maps generated by the system can serve as educational tools for pathology trainees, highlighting regions of diagnostic ambiguity that require closer inspection. By providing a transparent measure of confidence, the model supports a collaborative workflow where AI handles routine cases and humans focus on complex, high-uncertainty samples.

Similar Posts

  • Information Preferences Following ADRD Biomarker Testing

    Original Title: Dementia Care Research and Psychosocial Factors Journal: Alzheimer's & dementia : the journal of the Alzheimer's Association DOI: 10.1002/alz70858_099100 Overview This study investigates how individuals with cognitive symptoms and their care partners prefer to receive and share health information after undergoing biomarker testing for Alzheimer's disease and related dementias. Utilizing a mixed-methods approach, researchers analyzed data from 50 symptomatic participants with a mean age of 72.6 years and 36 care partners with a mean age of 67.6 years. The cohort was diverse, including 18.6% Black and 14% Hispanic/Latino individuals. Quantitative results indicated a preference for traditional communication; 84% of participants and 69.4% of care partners favored receiving results…

  • Scalable Protein Stability Prediction via Generative Models

    Original Title: Generalizable and scalable protein stability prediction with rewired protein generative models Journal: Nature communications DOI: 10.1038/s41467-025-67609-4 Overview Protein stability, typically measured by changes in Gibbs free energy (ΔΔG), is a fundamental property that dictates protein function and engineering potential. Accurately predicting how mutations influence this stability remains a significant challenge due to the scarcity of high-quality experimental data and the intricate nature of three-dimensional molecular interactions. This research introduces SPURS, a deep learning framework designed to address these limitations by integrating two distinct types of protein generative models. Specifically, it combines the evolutionary patterns captured by the protein language model ESM2 with the geometric constraints learned by the…

  • Federated Data and Sepsis Management in the EHDS

    Original Title: The next frontier in sepsis: connected ICU data for real-world clinical decision making Journal: Intensive care medicine DOI: 10.1007/s00134-025-08284-3 Overview Sepsis is a major healthcare challenge, causing one in five deaths globally and affecting approximately 49 million individuals every year. In Europe, hospital treatment costs range from 16,000 euros in France to over 27,000 euros in Greece, while follow-up care for survivors in Germany costs about 6.8 billion euros annually. Despite these high stakes, clinical data remains fragmented across local silos, hindering the development of effective decision-support tools. The European Health Data Space (EHDS) proposes a federated infrastructure to connect intensive care units across borders. This framework allows…

  • A study of 950 AI medical devices found that lack of clinical validation and public company status were linked to higher odds of early recalls.

    Original Title: Early Recalls and Clinical Validation Gaps in Artificial Intelligence-Enabled Medical Devices Journal: JAMA health forum DOI: 10.1001/jamahealthforum.2025.3172 AI Medical Device Recalls and Validation Gaps Overview Artificial intelligence-enabled medical devices (AIMDs) are increasingly common in clinical practice, yet many receive US Food and Drug Administration (FDA) clearance through an accelerated pathway that does not require prospective human testing. This raises concerns about their performance and safety after entering the market. This study investigated the frequency of recalls among AIMDs and examined whether recalls were associated with two key factors: the lack of premarket clinical validation and the type of manufacturer (publicly traded vs. privately held). Researchers analyzed 950 FDA-cleared…

  • Role of stem-like cells in chemotherapy resistance and relapse in pediatric T-cell acute lymphoblastic leukemia

    Title Stem-like cells in pediatric T-ALL relapse One-Sentence Summary This study uses single-cell RNA sequencing to identify a subpopulation of quiescent, stem-like leukemia cells in pediatric T-cell acute lymphoblastic leukemia that resists chemotherapy and expands at relapse. Overview Relapse in pediatric T-cell acute lymphoblastic leukemia (T-ALL) is associated with a poor prognosis, often driven by the development of chemotherapy resistance. To investigate the underlying cellular mechanisms, researchers performed longitudinal single-cell RNA sequencing on patient-derived xenograft (PDX) samples from 18 pediatric patients, including 13 with matched samples from both diagnosis and relapse. The analysis revealed a distinct subpopulation of T-ALL cells exhibiting stem-like features in 11 of the 18 cases. This…

  • AI Model to Predict Gout Recurrence in Hospitalized Patients

    Original Title: Development and validation of a multidimensional and interpretable artificial intelligence model to predict gout recurrence in hospitalised patients: a real-world, ambispective multicentre cohort study in China Journal: BMC medicine DOI: 10.1186/s12916-025-04454-8 Overview Researchers addressed the challenge of predicting gout recurrence in hospitalized patients with other health conditions. This large, multicentre study in China included 6,526 patients in both retrospective and prospective cohorts. Using 82 clinical, laboratory, and medication features, the team developed and rigorously tested 3,744 different artificial intelligence models to find the most accurate and reliable one. The final selected model, a Gradient Boosting algorithm, demonstrated good predictive performance. It achieved an area under the curve (AUC)…

Leave a Reply

Your email address will not be published. Required fields are marked *

CAPTCHA