Interpretable Deep Learning for Gastric Cancer T Staging

Original Title: Interpretable deep learning for multicenter gastric cancer T staging from CT images

Journal: NPJ digital medicine

DOI: 10.1038/s41746-025-02002-5

Overview

Gastric cancer remains a significant global health challenge, requiring precise preoperative T staging to determine the appropriate therapeutic strategy, such as neoadjuvant chemotherapy or direct surgical intervention. Standard contrast-enhanced computed tomography is the primary tool for this evaluation, yet its accuracy often ranges between 65% and 75% due to subjective interpretation and the difficulty of identifying subtle serosal invasion. This study introduces GTRNet, an automated deep-learning framework designed to classify gastric cancer into four T stages from routine portal venous phase images. Developed using a retrospective multicenter dataset of 1,792 patients, the system utilizes a modified ResNet-152 backbone to analyze the largest axial tumor cross-section. In internal testing, the model achieved an accuracy of 89.9% and an area under the curve (AUC) of 0.97. External validation across two independent cohorts demonstrated consistent performance, with accuracies between 87% and 94% and AUC values ranging from 0.91 to 0.95. Compared to expert radiologists, who achieved independent accuracies of 55.3% to 59.7%, GTRNet showed superior discrimination and consistency.

Novelty

The framework distinguishes itself by implementing an end-to-end pipeline that eliminates the need for time-consuming manual tumor segmentation or annotation, which are common bottlenecks in clinical AI applications. While previous research often focused on binary classifications like early versus advanced stages, GTRNet provides a complete four-category T-staging output. The architecture incorporates parallel max-pooling and center-cropping streams to capture both local tumor details and broader contextual information of the gastric wall. Furthermore, the researchers developed a comprehensive nomogram by integrating a deep-learning-derived Rad-score with clinical variables, including tumor size, differentiation status, and Lauren classification. This multimodal approach significantly improved model fit and clinical utility. To address the opaque nature of neural networks, Gradient-weighted Class Activation Mapping was utilized to visualize model attention. These heatmaps showed a high degree of spatial overlap with expert-annotated regions, specifically targeting the mucosa in T1 lesions and the organ interface in T4 cases, with Dice similarity coefficients ranging from 0.56 to 0.63.

Potential Clinical / Research Applications

This technology has direct implications for refining neoadjuvant therapy selection. By accurately identifying T3 and T4 cases, the system can ensure that patients who require preoperative chemotherapy receive it, while sparing T1 and T2 patients from unnecessary toxicity. Decision curve analysis indicated a higher net benefit for the AI model compared to endoscopic ultrasound, showing lower over-treatment (2.09% vs. 12.97%) and under-treatment (2.51% vs. 17.57%) rates. In research settings, the automated nature of GTRNet allows for the rapid processing of large-scale imaging datasets in retrospective studies or clinical trials. Additionally, the interpretable heatmaps can serve as an educational resource for junior radiologists, helping them recognize the subtle radiological signs of serosal invasion and transmural spread. The framework could eventually be expanded into a unified system covering the entire TNM staging protocol, offering a more comprehensive auxiliary diagnostic tool for gastric cancer management.

Similar Posts

  • Assessing Parkinson’s Gait via Smartphone Video and AI

    Original Title: Deep learning-enabled accurate assessment of gait impairments in Parkinson's disease using smartphone videos Journal: NPJ digital medicine DOI: 10.1038/s41746-025-02150-8 Overview Parkinson's disease affects millions globally, with gait impairments serving as a primary source of disability. Traditional assessment relies on the Unified Parkinson’s Disease Rating Scale, which is often subjective and lacks the sensitivity to detect minor changes. This study introduces a deep learning framework that utilizes videos recorded by a single smartphone from lateral perspectives to evaluate gait. The system employs a Siamese contrastive network architecture to fuse information from both sides of the body. In testing, the model achieved a micro-average area under the receiver operating characteristic…

  • Supervised Contrastive Learning for Lacune Detection in MRI

    Original Title: Biomarkers Journal: Alzheimer's & dementia : the journal of the Alzheimer's Association DOI: 10.1002/alz70856_099645 Overview Lacunes are small, deep brain infarcts that indicate vascular disease and increase the risk of cognitive decline. Detecting these features manually is time-consuming and prone to error due to their small size and similarity to other structures like perivascular spaces. This study presents a deep learning framework designed to automate the segmentation of lacunes using 2D T2-FLAIR MRI scans. The researchers utilized a dataset of 427 images, which underwent preprocessing to segment intracranial volume and white matter hyperintensities. The core architecture employed is an Attention U-Net. To address the challenge of imbalanced data…

  • Demographic inaccuracies and biases in the depiction of patients by artificial intelligence text-to-image generators

    AI’s Patient Images Show Demographic Biases One-Sentence Summary This study reveals that leading AI text-to-image generators produce patient depictions with significant demographic inaccuracies, over-representing White and normal-weight individuals while failing to reflect real-world disease epidemiology. Overview As artificial intelligence (AI) text-to-image generators become widely used for creating visual content, their application in medical contexts raises concerns about accuracy and bias. This research systematically evaluated four popular AI models—Adobe Firefly, Bing Image Generator, Meta Imagine, and Midjourney—to assess how accurately they depict patients for 29 different diseases. Researchers generated a total of 9060 images and had twelve independent raters assess the depicted sex, age, race/ethnicity, and weight. These AI-generated demographics were…

  • Automating Expert-Level Medical Reasoning Evaluation for AI

    Original Title: Automating expert-level medical reasoning evaluation of large language models Journal: NPJ digital medicine DOI: 10.1038/s41746-025-02208-7 Overview Large language models increasingly assist in clinical decision-making, yet their internal reasoning processes often remain opaque. Current evaluation methods frequently rely on multiple-choice question accuracy, which fails to capture whether a model reached a correct conclusion through sound medical logic or mere pattern matching. While human expert review provides a highly reliable assessment, it is time-consuming and difficult to scale. To address these limitations, researchers developed MedThink-Bench, a dataset of 500 complex medical questions across ten domains, including pathology and pharmacology. Each question is paired with expert-authored, step-by-step reasoning paths. Alongside this…

  • Great debate: artificial intelligence will replace much of what cardiologists do

    Title AI in Cardiology: A Tool, Not a Replacement One-Sentence Summary This paper debates the extent to which artificial intelligence will substitute for cardiologists, presenting arguments that AI will enhance many tasks but cannot replace the essential human elements of clinical judgment, accountability, and the physician-patient relationship. Overview The paper presents a balanced debate on the future role of artificial intelligence (AI) in cardiology. The “pro” argument suggests that AI’s capabilities in medical education, diagnostic imaging, and personalized care are advancing rapidly and could surpass human performance in these domains. It highlights AI’s potential to automate tasks, synthesize vast amounts of data, and improve efficiency. Conversely, the “contra” argument emphasizes…

  • Harnessing protein language model for structure-based discovery of highly efficient and robust PET hydrolases

    Title AI-Driven Discovery of Efficient PET Hydrolases One-Sentence Summary This study introduces a computational pipeline using a protein language model and structure-based search to discover a novel, highly efficient, and thermostable PET hydrolase from nature. Overview Polyethylene terephthalate (PET) plastic waste poses a significant environmental problem. While some enzymes, known as PET hydrolases (PETases), can break down PET, their performance is often limited. This research introduces VenusMine, a computational pipeline designed to discover new and more effective PETases. The process began by using the known structure of an existing enzyme, IsPETase, as a template to search for structurally similar proteins from vast biological databases. A protein language model (PLM) was…

Leave a Reply

Your email address will not be published. Required fields are marked *

CAPTCHA