Assessing LLM Response Quality in the Context of Technology-Facilitated Abuse

• Computer Science > Human-Computer Interaction [Submitted on 11 Jan 2026] Title:Assessing LLM Response Quality in the Context of Technology-Facilitated Abuse View PDF HTML (experimental)Abstract:Technology-facilitated abuse (TFA) is a pervasive form of intimate partner violence (IPV) that leverages digital tools to control, surveil, or harm survivors. • While tech clinics are one of the reliable sources of support for TFA survivors, they face limitations due to staffing constraints and logistical barriers. • As a result, many survivors turn to online resources for assistance. • With the growing accessibility and popularity of large language models (LLMs), and increasing interest from IPV organizations, survivors may begin to consult LLM-based chatbots before seeking help from tech clinics. • In this work, we present the first expert-led manual evaluation of four LLMs - two widely used general-purpose non-reasoning models and two domain-specific models designed for IPV contexts - focused on their effectiveness in responding to TFA-related questions. • Using real-world questions collected from literature and online forums, we assess the quality of zero-shot single-turn LLM responses generated with a survivor safety-centered prompt on criteria tailored to the TFA domain.

Article Summaries:

Researchers have conducted the first expert‑led manual evaluation of large language models (LLMs) for responding to technology‑facilitated abuse (TFA) questions. The study compared four models-two general‑purpose and two IPV‑specific-using real‑world queries from literature and online forums. Responses were generated in a zero‑shot, single‑turn setting with a survivor‑safety prompt and assessed on TFA‑tailored criteria. A subsequent user study gathered feedback from individuals who have experienced TFA on the perceived actionability of the answers. Findings highlight current strengths and limitations of LLMs in this domain and offer concrete recommendations for improving future survivor‑support models.
A recent study evaluates how well large language models (LLMs) can respond to questions from survivors of technology‑facilitated abuse (TFA), a form of intimate partner violence that uses digital tools for control and harm. Researchers manually assessed four LLMs-two general‑purpose and two IPV‑domain‑specific-using real‑world queries from literature and online forums. Responses were generated in a zero‑shot, single‑turn format with a safety‑centered prompt and judged on TFA‑specific criteria. A separate user study gathered feedback from individuals with TFA experience on the actionability of the answers. Results highlight current strengths and gaps in LLM support for TFA survivors and offer concrete recommendations for improving future models.

Sources:

https://arxiv.org/abs/2602.17672