AI-Generated Psychiatric Teaching Cases: Promise Tempered by Safety Gaps

ChatGPT-5 Pro can generate clinically realistic psychiatric teaching cases, but safety concerns require expert review before classroom use.

Background

AI tools increasingly support medical education. This study evaluated whether ChatGPT-5 Pro could generate realistic psychiatric diagnostic vignettes depicting patient chatbot interactions for teaching purposes.

Key Findings

  • Generated 27 vignettes across 9 conditions with high clinical relevance (3.60/4) and diagnostic sufficiency
  • Safety and ethics scores significantly lower (2.99 ± 0.51, p < 0.001) than other domains, lacking adequate risk assessment despite avoiding stigma
  • OCD cases scored highest on chatbot relevance (4.0); psychosis conditions scored lower (3.33–3.44)
  • Low interrater reliability (ICC < 0.30) but high adjacent-rating agreement (92.6–100%)

Why It Matters

AI can produce educationally valuable psychiatric cases, streamlining curriculum development. However, the safety gap is critical: expert faculty review is essential before classroom use, particularly for high-risk conditions like psychosis and suicidality. Implementation should include structured debriefing to contextualize findings.

Limitations

Only one AI model and 27 vignettes were evaluated. Low global interrater reliability raises robustness questions. The cross-sectional design does not assess learning outcomes.

Original paper: Evaluation of artificial intelligence-generated vignettes depicting patient chatbot use in psychiatric contexts. — NPJ digital medicine. 10.1038/s41746-026-02605-6