Evaluating Text-based Conversational Agents for Mental Health: A Systematic Review of Metrics, Methods and Usage Contexts

• Computer Science > Human-Computer Interaction [Submitted on 8 Jan 2026] Title:Evaluating Text-based Conversational Agents for Mental Health: A Systematic Review of Metrics, Methods and Usage Contexts View PDF HTML (experimental)Abstract:Text-based conversational agents (CAs) are increasingly used in mental health, yet evaluation practices remain fragmented. • We conducted a PRISMA-guided systematic review (May-June 2024) across ACM Digital Library, Scopus, and PsycINFO. • From 613 records, 132 studies were included, with dual-coder extraction achieving substantial agreement (Cohen’s kappa = 0.77-0.92). • We synthesized evaluation approaches across three dimensions: metrics, methods, and usage contexts. • Metrics were classified into CA-centric attributes (e.g., reliability, safety, empathy) and user-centric outcomes (experience, knowledge, psychological state, health behavior). • Methods included automated analyses, standardized psychometric scales, and qualitative inquiry.

Article Summaries:

A systematic review published on 8 January 2026 examined how text‑based conversational agents (CAs) used in mental‑health contexts are evaluated. The authors followed PRISMA guidelines, screening 613 records and including 132 studies. They mapped evaluation practices across three dimensions: metrics (CA‑centric attributes such as reliability and empathy versus user‑centric outcomes like experience and psychological state), methods (automated analysis, psychometric scales, qualitative inquiry), and usage contexts (temporal designs from momentary to follow‑up). Findings highlighted a heavy reliance on Western‑developed scales, limited cultural adaptation, small short‑term samples, and weak links between automated performance and user well‑being. The review calls for methodological triangulation, temporal rigor, and equitable measurement to improve CA evaluation.

Sources:

https://arxiv.org/abs/2602.17669