Assessing ChatGPT in Diagnosing Degenerative Diseases

Original Title: Clinical Manifestations

Journal: Alzheimer's & dementia : the journal of the Alzheimer's Association

DOI: 10.1002/alz70857_101996

Overview

This study evaluates the clinical performance of ChatGPT version 3.5 in diagnosing neurodegenerative diseases. Building on previous research where the model achieved a 45.1% accuracy rate on neurology residency exams, this investigation uses nine case reports from the journal Dementia and Neurocognitive Disorders. The methodology involved a two-stage interaction to simulate the diagnostic process. First, the model received patient symptoms, medical histories, and physical findings to generate differential diagnoses and suggest diagnostic procedures. Second, specific laboratory and imaging results were provided to determine the final diagnosis. This approach assesses how the model processes incremental clinical information. Results show the model included the correct diagnosis in its initial differential list for 33.3% of cases. However, it correctly identified appropriate diagnostic methods in 88.9% of cases, representing eight out of nine instances.

Novelty

This research transitions from evaluating general knowledge through standardized testing to assessing clinical reasoning using peer-reviewed case reports. Unlike studies focused on multiple-choice questions, this requires the model to synthesize clinical descriptions and suggest logical steps in a workup. The study highlights significant improvement when the model receives objective test results. Final diagnostic accuracy increased to 77.8%, with the model identifying the disease in seven out of nine cases after receiving laboratory data. This demonstrates the model's capacity to refine its output based on clinical evidence, reflecting the iterative nature of the medical diagnostic process. By focusing specifically on dementia, the research provides a specialized benchmark for performance in chronic conditions that present with complex, overlapping symptoms.

Potential Clinical / Research Applications

These findings suggest several applications in medical education and clinical support. The model can help students practice formulating diagnostic plans and selecting appropriate laboratory tests. Given its 88.9% accuracy in recommending methods, it could serve as a digital checklist to ensure standard protocol adherence. In primary care settings, it could assist practitioners in identifying necessary tests before making a specialist referral. Research could scale this methodology to evaluate how artificial intelligence handles atypical dementia cases. By automating case history analysis, researchers can identify diagnostic error patterns, leading to refined decision support systems. The model provides a consistent framework for processing clinical data in the management of degenerative diseases.

Similar Posts

  • Vision-language model for report generation and outcome prediction in CT pulmonary angiogram

    Title AI Model for CT Scan Reports and Outcome Prediction One-Sentence Summary Researchers developed an AI framework that integrates vision and language models to analyze CT pulmonary angiogram scans, generating structured diagnostic reports and predicting patient survival outcomes for pulmonary embolism. Overview This study addresses the challenge of interpreting Computed Tomography Pulmonary Angiography (CTPA) for pulmonary embolism (PE), a process that can be complex and time-consuming. The authors created an agent-based AI framework that combines Vision-Language Models (VLMs) and Large Language Models (LLMs) to automate key aspects of the diagnostic workflow. Trained and validated on over 69,000 CTPA studies from three large, multi-institutional datasets, the framework performs three main tasks….

  • Identifying PPA Pathology Using Narrative Speech and AI

    Original Title: Identifying neuropathologic disease in primary progressive aphasia using narrative speech Journal: Alzheimer's & dementia : the journal of the Alzheimer's Association DOI: 10.1002/alz.71294 Overview Primary progressive aphasia is a neurodegenerative syndrome defined by the gradual loss of language functions. A significant challenge in clinical practice is that observable symptoms often fail to predict the underlying neuropathology, such as Alzheimer's disease or frontotemporal lobar degeneration. This study utilizes artificial intelligence to analyze narrative speech as a non-invasive diagnostic tool. Researchers analyzed transcribed "Cinderella" stories from 54 individuals with autopsy-confirmed pathology and 15 healthy controls. Using natural language processing and machine learning ensembles, the study classified participants into three groups:…

  • Multimodal Cancer Prognosis via Clinical Prompt Integration

    Original Title: Multimodal deep learning for cancer prognosis prediction with clinical information prompts integration Journal: NPJ digital medicine DOI: 10.1038/s41746-025-02257-y Overview Survival analysis is a critical component of oncological care, providing the scientific basis for treatment planning and outcome evaluation. While multimodal deep learning has advanced this field by integrating pathology images and genomic data, clinical records are frequently underutilized due to their discrete and low-dimensional nature. This study introduces SurvPGC, a framework designed to bridge this gap by transforming clinical characteristics into high-dimensional embeddings using text templates and a language foundation model. The researchers validated SurvPGC using data from The Cancer Genome Atlas, specifically focusing on liver hepatocellular carcinoma,…

  • An AI algorithm for coronary imaging standardizes high-risk plaque detection, improving risk prediction when assessing the entire vessel.

    Original Title: Artificial intelligence-based identification of thin-cap fibroatheroma: a new paradigm for risk stratification? Journal: European heart journal DOI: 10.1093/eurheartj/ehaf662 AI for Identifying Risky Heart Plaques Overview Atherosclerosis involves the buildup of plaques in arteries, but not all plaques are equally dangerous. Thin-cap fibroatheromas (TCFAs) are considered particularly high-risk and are associated with heart attacks. Identifying these TCFAs using intracoronary imaging like optical coherence tomography (OCT) is challenging, as manual interpretation by experts can be time-consuming and inconsistent. This editorial examines the PECTUS-AI study, which tested an artificial intelligence algorithm designed to automatically detect TCFAs from OCT images in 414 patients who had recently suffered a heart attack. The study…

  • Reform Strategies for Medicare Physician Payment Stability

    Original Title: How AI Will Help Solve Medicine's Productivity Challenges Journal: JAMA health forum DOI: 10.1001/jamahealthforum.2025.6647 Overview This analysis examines the mechanisms of the Medicare Physician Fee Schedule and the impact of budget neutrality requirements on physician reimbursement. Between 2001 and 2024, inflation-adjusted payments for physicians declined by 29 percent. Unlike other Medicare providers, physician payments are not automatically tied to inflation. Instead, they are governed by a conversion factor adjusted annually by the Centers for Medicare and Medicaid Services. The primary constraint is the budget neutrality mandate, requiring that any changes in the fee schedule projected to increase or decrease spending by more than 20 million dollars be offset…

  • Multimodal AI for Predicting IVF Pregnancy Outcomes

    Original Title: Multimodal intelligent prediction model for in vitro fertilization Journal: NPJ digital medicine DOI: 10.1038/s41746-025-02331-5 Overview This study introduces VaTEP, a multimodal deep learning framework that integrates time-lapse system videos of developing embryos with tabular clinical data. Developed and validated using data from 9,786 participants across three medical centers, VaTEP predicts three clinical outcomes: fetal heartbeat presence, singleton versus multiple pregnancy, and miscarriage versus live birth. Using a multi-task learning approach, the system optimizes these predictions simultaneously. Results show the model achieved an area under the curve (AUC) of 0.8000 for fetal heartbeat, 0.8823 for singleton versus multiple pregnancy, and 0.9258 for live birth versus miscarriage. These values exceeded…

Leave a Reply

Your email address will not be published. Required fields are marked *

CAPTCHA