From 'Help' to Helpful: A Hierarchical Assessment of LLMs in Mental e-Health Applications

• Evaluated 11 LLMs generating six-word subject lines for German counselling emails. • Used hierarchical assessment: first categorize outputs, then rank within categories. • Nine assessors (counselling professionals + AI systems) measured agreement via Krippendorff’s α, Spearman’s ρ, Pearson’s r, Kendall’s τ. • Proprietary services outperformed open-source, but German fine-tuning improved all models. • Trade-offs highlighted between performance, privacy, and bias in mental health AI. • Study emphasizes ethical considerations: privacy, bias, accountability in e‑health deployments. • Provides framework for efficient case prioritisation in psychosocial online counselling.

Article Summaries:

A recent study examined how eleven large language models (LLMs) generate concise six‑word subject lines for German mental‑health counselling emails. Using a hierarchical evaluation-first classifying outputs, then ranking them within categories-nine assessors (counselling professionals and AI systems) applied statistical measures such as Krippendorff’s α, Spearman’s ρ, Pearson’s r, and Kendall’s τ. Results show a trade‑off between proprietary services and privacy‑preserving open‑source models, with German‑specific fine‑tuning consistently improving performance. The paper highlights key ethical issues for deploying mental‑health AI, including privacy, bias, and accountability.

Sources:

https://arxiv.org/abs/2602.18443