Interpretable Deep Learning for Gastric Cancer T Staging

Original Title: Interpretable deep learning for multicenter gastric cancer T staging from CT images

Journal: NPJ digital medicine

DOI: 10.1038/s41746-025-02002-5

Overview

Gastric cancer remains a significant global health challenge, requiring precise preoperative T staging to determine the appropriate therapeutic strategy, such as neoadjuvant chemotherapy or direct surgical intervention. Standard contrast-enhanced computed tomography is the primary tool for this evaluation, yet its accuracy often ranges between 65% and 75% due to subjective interpretation and the difficulty of identifying subtle serosal invasion. This study introduces GTRNet, an automated deep-learning framework designed to classify gastric cancer into four T stages from routine portal venous phase images. Developed using a retrospective multicenter dataset of 1,792 patients, the system utilizes a modified ResNet-152 backbone to analyze the largest axial tumor cross-section. In internal testing, the model achieved an accuracy of 89.9% and an area under the curve (AUC) of 0.97. External validation across two independent cohorts demonstrated consistent performance, with accuracies between 87% and 94% and AUC values ranging from 0.91 to 0.95. Compared to expert radiologists, who achieved independent accuracies of 55.3% to 59.7%, GTRNet showed superior discrimination and consistency.

Novelty

The framework distinguishes itself by implementing an end-to-end pipeline that eliminates the need for time-consuming manual tumor segmentation or annotation, which are common bottlenecks in clinical AI applications. While previous research often focused on binary classifications like early versus advanced stages, GTRNet provides a complete four-category T-staging output. The architecture incorporates parallel max-pooling and center-cropping streams to capture both local tumor details and broader contextual information of the gastric wall. Furthermore, the researchers developed a comprehensive nomogram by integrating a deep-learning-derived Rad-score with clinical variables, including tumor size, differentiation status, and Lauren classification. This multimodal approach significantly improved model fit and clinical utility. To address the opaque nature of neural networks, Gradient-weighted Class Activation Mapping was utilized to visualize model attention. These heatmaps showed a high degree of spatial overlap with expert-annotated regions, specifically targeting the mucosa in T1 lesions and the organ interface in T4 cases, with Dice similarity coefficients ranging from 0.56 to 0.63.

Potential Clinical / Research Applications

This technology has direct implications for refining neoadjuvant therapy selection. By accurately identifying T3 and T4 cases, the system can ensure that patients who require preoperative chemotherapy receive it, while sparing T1 and T2 patients from unnecessary toxicity. Decision curve analysis indicated a higher net benefit for the AI model compared to endoscopic ultrasound, showing lower over-treatment (2.09% vs. 12.97%) and under-treatment (2.51% vs. 17.57%) rates. In research settings, the automated nature of GTRNet allows for the rapid processing of large-scale imaging datasets in retrospective studies or clinical trials. Additionally, the interpretable heatmaps can serve as an educational resource for junior radiologists, helping them recognize the subtle radiological signs of serosal invasion and transmural spread. The framework could eventually be expanded into a unified system covering the entire TNM staging protocol, offering a more comprehensive auxiliary diagnostic tool for gastric cancer management.

Similar Posts

  • AI, Mobile Tech, and Social Media for Health in Africa

    Original Title: Scoping review of artificial intelligence via mobile technology and social media for health in Africa Journal: Nature communications DOI: 10.1038/s41467-025-64766-4 Overview This scoping review investigates the integration of artificial intelligence with mobile technology and social media to address health challenges in Africa. Following the PRISMA approach, researchers screened 469 articles published between 2014 and 2023, ultimately synthesizing 116 papers with a focused analysis of 29 studies. The results indicate that these digital tools are primarily utilized for infectious disease monitoring and diagnosis. Specifically, malaria was the subject of 17.2% of the studies, while COVID-19 accounted for 13.8%. Other conditions frequently studied include Ebola at 10.3%, cervical cancer at…

  • Deep Learning MRI Super-Resolution for Alzheimer’s Atrophy

    Original Title: Biomarkers Journal: Alzheimer's & dementia : the journal of the Alzheimer's Association DOI: 10.1002/alz70856_107471 Overview Alzheimer's disease involves grey matter loss in regions like the hippocampus. Accurate atrophy measurement is essential for monitoring progression. Deformation Based Morphometry (DBM) quantifies these changes but is limited by the 1 millimeter cubed resolution of standard Magnetic Resonance Imaging. This study evaluates whether deep learning-based super-resolution improves the detection of subtle brain changes. The researchers used a dataset of 497 individuals from the Alzheimer’s Disease Neuroimaging Initiative. They compared standard 1 millimeter resolution images against high-resolution 0.5 millimeter isotropic images generated via an autoencoder-based model. By correlating measurements with ADASCog13 cognitive scores,…

  • Staging of Alzheimer’s disease progression in Down syndrome using mixed clinical and plasma biomarker measures with machine learning

    論文「ダウン症候群におけるアルツハイマー病進行の機械学習を用いた臨床・血漿バイオマーカー混合指標によるステージング」の要約 タイトル 機械学習がダウン症候群におけるアルツハイマー病をステージング 1文での要約 本研究では、機械学習モデルを用いてダウン症候群の成人における認知機能と血漿バイオマーカーの変化の順序を明らかにし、一般集団と同様の前臨床期アルツハイマー病の進行パターンを明らかにしました。 概要 ダウン症候群(DS)の人は、遺伝的にアルツハイマー病(AD)のリスクが高いですが、その前臨床段階における一連の事象は完全には解明されていません。本研究は、まだ臨床的な認知症を発症していないDSの成人57名を対象に、ADの進行を調査しました。研究者らは、イベントベースモデル(EBM)と呼ばれる教師なし機械学習技術を用いて、認知機能テストと血漿バイオマーカー(アミロイドベータ(Aβ)42/40比、リン酸化タウ(p-tau)、神経フィラメント軽鎖(NfL)、グリア線維性酸性タンパク質(GFAP)など)の横断的データを分析しました。このモデルは、病理学的変化が起こる最も可能性の高い順序を推定しました。結果として、最も早期に検出された事象は血漿Aβ42/40比の低下であり、その直後に記憶能力の低下が続くことが示されました。続いて、神経変性マーカー(NfLおよびp-tau)に変化が生じ、これは実行機能および視覚運動機能の低下に先行していました。この一連の変化の最後の事象は、神経炎症マーカーであるGFAPの変化でした。さらなる分析により、39歳から52歳がこれらの血漿バイオマーカーが最も急速に変化する期間であることが特定されました。 新規性 本研究の主な貢献は、DSにおける前臨床ADをステージングするために、認知機能評価と血漿バイオマーカーの両方を統合するイベントベースモデルを適用した点にあります。これまでの研究では、これらのモダリティを別々に検討することが多かったのに対し、本研究はこの特定の集団において、両者をデータ駆動型の手法で組み合わせた最初の研究の一つです。このマルチモーダルなアプローチは、単一の種類のデータを分析するよりも、より全体的で頑健な疾患進行のタイムラインを提供します。臨床的マーカーと生物学的マーカーを組み合わせた順序付けを行うことで、このモデルは単純な相関関係を超え、横断的データに基づいて病理学的事象の起こりうる時間的順序を確立し、前臨床段階のカスケードのより詳細な全体像を提示します。 私の視点 データ駆動型のアプローチであるEBMの利用は特に説得力があると感じます。なぜなら、このモデルはアミロイドカスケード仮説のような既存の仮説をデータに押し付けないからです。モデルが独自に決定した変化の順序は、結果的にカスケード仮説と一致しており、DSの文脈におけるこの枠組みの妥当性を強固なものにしています。しかし、このモデルのバイアスのかからない性質は、もしデータが異なるパターンを支持していれば、それを検出できたであろうことを意味し、複雑な疾患を研究する上でのこのような予断を持たないアプローチの価値を浮き彫りにします。さらに、横断的データから縦断的な進行を推測できる能力は、特に長期的な研究が困難な集団にとって強力なツールです。この手法は疾患の軌跡を描くための実用的なテンプレートを提供しますが、その知見は縦断的データセットによる検証によってさらに確固たるものとなるでしょう。 臨床・研究への応用の可能性 この研究成果は、臨床実践と研究の両方に直接的な示唆を与えます。臨床的には、EBMによって提供される個別のステージングは、特定の前臨床段階でどのマーカーが異常を示す可能性が高いかを特定することにより、早期診断を向上させる可能性があります。研究面では、39歳から52歳の間というバイオマーカー変化の重要な期間を特定したことで、臨床試験のデザインを最適化するための強力な根拠がもたらされます。疾患修飾薬の試験では、治療効果を観察できる可能性を最大化するために、この年齢層の参加者を特異的に募集することが考えられます。また、脳脊髄液分析やPETイメージングよりも侵襲性が低くアクセスしやすい血漿バイオマーカーに依存することで、DS集団における将来のAD予防試験の実現可能性と拡張性を向上させることもできるでしょう。

  • Automating Expert-Level Medical Reasoning Evaluation for AI

    Original Title: Automating expert-level medical reasoning evaluation of large language models Journal: NPJ digital medicine DOI: 10.1038/s41746-025-02208-7 Overview Large language models increasingly assist in clinical decision-making, yet their internal reasoning processes often remain opaque. Current evaluation methods frequently rely on multiple-choice question accuracy, which fails to capture whether a model reached a correct conclusion through sound medical logic or mere pattern matching. While human expert review provides a highly reliable assessment, it is time-consuming and difficult to scale. To address these limitations, researchers developed MedThink-Bench, a dataset of 500 complex medical questions across ten domains, including pathology and pharmacology. Each question is paired with expert-authored, step-by-step reasoning paths. Alongside this…

  • AI for Cancer Risk Assessment in Oral Disorders

    Original Title: Artificial Intelligence in cancer risk assessment of oral potentially malignant disorders: applications and challenges Journal: International journal of surgery (London, England) DOI: 10.1097/JS9.0000000000003363 Overview This article examines the role of artificial intelligence in evaluating the risk of malignant transformation in oral potentially malignant disorders. Traditionally, clinicians rely on oral epithelial dysplasia grading to determine cancer risk. However, this method is often limited by human subjectivity and an inability to incorporate various risk factors simultaneously. Artificial intelligence offers a method to integrate diverse datasets, including demographic information, smoking history, clinical images, and histopathology slides. By analyzing both structured and unstructured data, these computational models can provide an objective assessment…

  • Age-Related Attitudes Toward AI Cognitive Assessment Tools

    Original Title: Biomarkers Journal: Alzheimer's & dementia : the journal of the Alzheimer's Association DOI: 10.1002/alz70856_101023 Overview This research examines how different age groups perceive artificial intelligence in the context of healthcare, specifically focusing on a cognitive assessment tool named CognoSpeak. The study utilized a mixed-methods approach involving 95 participants for an online survey and 20 participants for semi-structured interviews. Participants were categorized into younger adults, aged 18 to 54, and older adults, aged 55 and above. Quantitative analysis using a linear model demonstrated that there were no statistically significant differences between these two groups regarding their general attitudes toward artificial intelligence, with a score of b = 3.11 and…

Leave a Reply

Your email address will not be published. Required fields are marked *

CAPTCHA