UniSa - IRIS Institutional Research Information System

The assessment of disorganized thought and language is essential for diagnosing schizophrenia (SZ), still remaining a subjective and labor-intensive clinical process. We present ClearThought, an automated framework that uses large language models (LLMs) to evaluate the Thought and Language Disorder (TALD) scale through few-shot prompting on transcribed psychiatric interviews. The model generates 0–4 item-level severity scores and structured justifications aligned with the TALD rubric. We evaluated ClearThought on a dataset of 33 SZ patient interviews, comparing model outputs to clinician ratings using both ordinal and binary performance metrics. For ordinal scoring, the system achieved macro F10.80 on 11 items and very strong Spearman correlations () on 15 items, particularly for disorders that show clear and noticeable patterns in spoken language, such as Blocking and Restricted Thinking. Entropy-aware analysis revealed that high performance was most meaningful when accompanied by sufficient label variability. For binary detection, the model accurately identified disorder presence in 26 of 30 items (F1 > 0.80), with over 11 items exceeding F1 = 0.90. However, high scores on low-entropy items (e.g., Clanging, Neologisms) were often driven by consistent disorder absence, highlighting the need for caution when interpreting results from skewed label distributions. Clinicians rated the model’s justifications as clinically sound and interpretable, with average scores above 4.0/5.0 for scoring accuracy, clarity, and overall trust. These results suggest that LLMs, guided by prompt-based scoring and structured justification, can support objective, interpretable, and scalable TALD assessments. The framework performs best for linguistically relevant disorders and provides a transparent interface to assist clinical reasoning. Future work will target rare disorders, enhance justification, and extend validation across broader clinical cohorts.

Prompt-based justification and scoring with large language models for thought and language disorder assessment

Francese R.;De Santis L.;Iannotta F.;Iasevoli F.

2026

Abstract

The assessment of disorganized thought and language is essential for diagnosing schizophrenia (SZ), still remaining a subjective and labor-intensive clinical process. We present ClearThought, an automated framework that uses large language models (LLMs) to evaluate the Thought and Language Disorder (TALD) scale through few-shot prompting on transcribed psychiatric interviews. The model generates 0–4 item-level severity scores and structured justifications aligned with the TALD rubric. We evaluated ClearThought on a dataset of 33 SZ patient interviews, comparing model outputs to clinician ratings using both ordinal and binary performance metrics. For ordinal scoring, the system achieved macro F10.80 on 11 items and very strong Spearman correlations () on 15 items, particularly for disorders that show clear and noticeable patterns in spoken language, such as Blocking and Restricted Thinking. Entropy-aware analysis revealed that high performance was most meaningful when accompanied by sufficient label variability. For binary detection, the model accurately identified disorder presence in 26 of 30 items (F1 > 0.80), with over 11 items exceeding F1 = 0.90. However, high scores on low-entropy items (e.g., Clanging, Neologisms) were often driven by consistent disorder absence, highlighting the need for caution when interpreting results from skewed label distributions. Clinicians rated the model’s justifications as clinically sound and interpretable, with average scores above 4.0/5.0 for scoring accuracy, clarity, and overall trust. These results suggest that LLMs, guided by prompt-based scoring and structured justification, can support objective, interpretable, and scalable TALD assessments. The framework performs best for linguistically relevant disorders and provides a transparent interface to assist clinical reasoning. Future work will target rare disorders, enhance justification, and extend validation across broader clinical cohorts.

Scheda breve

Scheda completa

Scheda completa (DC)

Anno

2026

Appare nelle tipologie:

1.1.1 Articolo su rivista con DOI

File in questo prodotto:

Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11386/4940775

Citazioni

ND

ND

ND

social impact