Beyond Refusal: Probing the Limits of Agentic Self-Correction for Semantic Sensitive Information

• Computer Science > Artificial Intelligence [Submitted on 25 Feb 2026] Title:Beyond Refusal: Probing the Limits of Agentic Self-Correction for Semantic Sensitive Information View PDF HTML (experimental)Abstract:While defenses for structured PII are mature, Large Language Models (LLMs) pose a new threat: Semantic Sensitive Information (SemSI), where models infer sensitive identity attributes, generate reputation-harmful content, or hallucinate potentially wrong information • The capacity of LLMs to self-regulate these complex, context-dependent sensitive information leaks without destroying utility remains an open scientific question • To address this, we introduce SemSIEdit, an inference-time framework where an agentic “Editor” iteratively critiques and rewrites sensitive spans to preserve narrative flow rather than simply refusing to answer • Our analysis reveals a Privacy-Utility Pareto Frontier, where this agentic rewriting reduces leakage by 34 • 6% across all three SemSI categories while incurring a marginal utility loss of 9 • We also uncover a Scale-Dependent Safety Divergence: large reasoning models (e

Article Summaries:

Computer Science > Artificial Intelligence [Submitted on 25 Feb 2026] Title:Beyond Refusal: Probing the Limits of Agentic Self-Correction for Semantic Sensitive Information View PDF HTML (experimental)Abstract:While defenses for structured PII are mature, Large Language Models (LLMs) pose a new threat: Semantic Sensitive Information (SemSI), where models infer sensitive identity attributes, generate reputation-harmful content, or hallucinate potentially wrong information. The capacity of LLMs to self-regulate these complex, context-dependent sensitive information leaks without destroying utili

Sources:

https://arxiv.org/abs/2602.21496 (Latest source article published: 2026-02-26 05:00 UTC)