ChatGPT-5 Generates Clinically Relevant Psychiatric Cases—But Safety Framing Lags Behind

Artificial intelligence can generate clinically relevant psychiatric case vignettes for medical education, but substantial gaps in safety framing and protective factors require expert modification before classroom deployment.

Background

Clinical case vignettes are essential teaching tools in psychiatric education. This study evaluated whether ChatGPT-5 Pro could reliably generate educational vignettes depicting patient use of psychiatric chatbots across nine diagnostic conditions, assessing their educational quality and safety.

Key Findings

Chatbot relevance and diagnostic sufficiency scored high (3.60/4 for both), with OCD vignettes strongest for relevance and major depressive disorder with suicidality strongest for diagnostic features.
Safety and ethics scores were significantly lower (2.99/4, p < 0.001) than all other domains, with insufficient protective factors and help-seeking resources across many vignettes.
Psychosis-related conditions (schizophrenia, schizoaffective disorder, bipolar psychosis) showed weaker diagnostic sufficiency (3.33–3.44/4) compared to anxiety and depression conditions.
Interrater reliability was low (ICC < 0.30 across all domains) but adjacent agreement was high (92.6–100%), indicating consistency at different rating thresholds.

Why It Matters

The findings demonstrate AI’s potential to accelerate production of teaching cases, particularly for conditions where quality is already high. However, the critical safety gap is concerning: inadequate framing of protective factors or risk assessment could inadvertently model unsafe clinical reasoning to trainees in chatbot-involved psychiatric encounters.

Limitations

Only ChatGPT-5 Pro was evaluated across 27 total vignettes. Generalizability to other AI models, additional psychiatric conditions, and different educational contexts remains unknown. Low interrater reliability suggests the rating rubric may need standardization.

Original paper: Evaluation of artificial intelligence-generated vignettes depicting patient chatbot use in psychiatric contexts. — NPJ digital medicine. 10.1038/s41746-026-02605-6

🎧 Listen to the podcast