LLMs for De-identifying Sensitive Health Information

Original Title: Leveraging large language models for the deidentification and temporal normalization of sensitive health information in electronic health records

Journal: NPJ digital medicine

DOI: 10.1038/s41746-025-01921-7

Overview

Overview
Sharing electronic health records (EHRs) for research is vital but requires the removal of sensitive health information (SHI) to protect patient privacy. This process, known as de-identification, also involves temporal normalization, which standardizes date and time expressions to preserve a coherent patient timeline. This paper evaluates the effectiveness of large language models (LLMs) for these two tasks. It presents a detailed analysis based on the SREDH/AI CUP 2023 competition, which challenged 291 teams to develop systems for SHI recognition and temporal normalization using a dataset of 3,244 pathology reports. The study systematically compares various approaches, including in-context learning, full model fine-tuning, and more efficient tuning methods, across a range of LLM sizes to establish performance baselines and identify effective strategies.

Novelty

Novelty
The study provides a comprehensive performance analysis of the Pythia suite of LLMs, ranging from 70 million to 12 billion parameters, on de-identification tasks. A key finding is an "inverse scaling issue," where model performance plateaus or even degrades with fine-tuning on models larger than 6 billion parameters, likely due to overfitting on the moderately sized dataset. Optimal performance was observed with a 2.8 billion parameter model. The research systematically compares different training strategies, showing that parameter-efficient fine-tuning (LoRA) can outperform traditional full-parameter fine-tuning, especially for larger models. The analysis of the competition results, where 77.2% of teams used LLMs, reveals that the most successful systems were often hybrid models. These combined LLMs for contextual understanding with pattern-based rules for precision, particularly for structured information. The top-performing system for SHI recognition achieved a macro-F1 score of 0.881, while the best for temporal normalization scored 0.869.

Potential Clinical / Research Applications

Potential Clinical / Research Applications
The methods evaluated in this paper can directly facilitate the creation of large, safe, and high-quality datasets from clinical notes for research purposes. Automated and reliable de-identification tools can accelerate studies on disease patterns, treatment outcomes, and social determinants of health by making more data accessible without compromising patient confidentiality. Clinically, robust temporal normalization is critical for accurately reconstructing patient histories from unstructured text. This capability can enhance clinical decision support systems by providing a clear timeline of events, which is fundamental for diagnosis and treatment planning. The study's findings also provide practical guidance for healthcare institutions on selecting cost-effective AI tools, showing that smaller, fine-tuned models can outperform larger, more resource-intensive ones for this specific task. This research paves the way for developing more advanced, trustworthy AI pipelines for processing sensitive medical information.

Similar Posts

  • A systematic literature review on integrating AI-powered smart glasses into digital health management for proactive healthcare solutions

    Title AI Smart Glasses in Digital Health One-Sentence Summary This systematic literature review analyzes 101 studies to assess the current applications, benefits, and challenges of integrating AI-powered smart glasses into digital health for proactive and personalized healthcare. Overview This paper systematically reviews the integration of AI-powered smart glasses into digital health management. The authors analyzed 101 selected studies from databases like PubMed and IEEE Xplore, categorizing applications into areas such as health management, clinical surgery assistance, and telemedicine. The review identifies significant improvements in healthcare delivery, noting that smart glasses enhance diagnostic accuracy and treatment efficiency. For instance, in emergency scenarios, their use improved casualty assessment accuracy by nearly 9…

  • A Multisociety Syllabus for AI in Radiology Education

    Original Title: Teaching AI for Radiology Applications: A Multisociety-Recommended Syllabus from the AAPM, ACR, RSNA, and SIIM Journal: Radiology. Artificial intelligence DOI: 10.1148/ryai.250137 Overview This paper presents a recommended syllabus for artificial intelligence (AI) education in radiology, developed through a collaboration of four major U.S. societies: the American Association of Physicists in Medicine (AAPM), the American College of Radiology (ACR), the Radiological Society of North America (RSNA), and the Society for Imaging Informatics in Medicine (SIIM). The framework addresses the growing need for standardized competencies as AI tools become more common in medical imaging. It defines the required knowledge for four distinct professional roles, or “personas”: users of AI systems…

  • Inaction on Artificial Intelligence Regulation in a Time of Upheaval

    AI Regulation Inaction: A Health Policy Crisis One-Sentence Summary This editorial argues that the rapid advancement of AI in healthcare, combined with political upheaval paralyzing federal agencies, creates a dangerous regulatory vacuum, highlighting the urgent need for a functional public sector to establish safety guardrails. Overview This article addresses the growing gap between the rapid adoption of artificial intelligence in health and healthcare and the lagging development of governmental oversight. The author posits that the widespread availability of large language models has accelerated AI integration across all sectors. However, the policies and standards necessary to govern this technology have not kept pace. This challenge is compounded by a described period…

  • Deep Learning MRI Super-Resolution for Alzheimer’s Atrophy

    Original Title: Biomarkers Journal: Alzheimer's & dementia : the journal of the Alzheimer's Association DOI: 10.1002/alz70856_107471 Overview Alzheimer's disease involves grey matter loss in regions like the hippocampus. Accurate atrophy measurement is essential for monitoring progression. Deformation Based Morphometry (DBM) quantifies these changes but is limited by the 1 millimeter cubed resolution of standard Magnetic Resonance Imaging. This study evaluates whether deep learning-based super-resolution improves the detection of subtle brain changes. The researchers used a dataset of 497 individuals from the Alzheimer’s Disease Neuroimaging Initiative. They compared standard 1 millimeter resolution images against high-resolution 0.5 millimeter isotropic images generated via an autoencoder-based model. By correlating measurements with ADASCog13 cognitive scores,…

  • Caregiver Views on AI-Based Dementia Screening Tools

    Original Title: Dementia Care Research and Psychosocial Factors Journal: Alzheimer's & dementia : the journal of the Alzheimer's Association DOI: 10.1002/alz70858_099304 Overview This study investigates caregiver perspectives on the dementia diagnostic process and the introduction of artificial intelligence (AI) technologies. Current pathways rely on pen-and-paper assessments and informant reports, which are often perceived as insufficient or overly simplistic. Researchers recruited 13 caregivers for a two-stage qualitative assessment to evaluate these existing methods against emerging digital alternatives. The first stage involved semi-structured interviews regarding the memory assessment pathway and the extent of caregiver involvement. The second stage used a think-aloud protocol where participants compared traditional tools, like the Cambridge Behavioural Inventory,…

  • Large-Scale Human Brain Single-Cell Atlas for Alzheimer’s

    Original Title: Basic Science and Pathogenesis Journal: Alzheimer's & dementia : the journal of the Alzheimer's Association DOI: 10.1002/alz70855_107196 Overview This research presents the development of the Alzheimer's Cell Atlas, a comprehensive resource for understanding the molecular mechanisms of neurodegenerative diseases at the level of individual cells. The study utilized single-nuclei RNA-sequencing data from 2,239 human postmortem samples, encompassing a wide spectrum of conditions including 658 Alzheimer's disease cases, 110 cases of cognitive resilience, and 1,031 control samples. The dataset is notable for its scale, containing approximately 14 million nuclei, which represents a significant expansion over previous efforts. By integrating data across 33 different brain regions and age ranges from…

Leave a Reply

Your email address will not be published. Required fields are marked *

CAPTCHA