When Backdoors Go Beyond Triggers: Semantic Drift in Diffusion Models Under Encoder Attacks

• Computer Science > Cryptography and Security [Submitted on 21 Feb 2026] Title:When Backdoors Go Beyond Triggers: Semantic Drift in Diffusion Models Under Encoder Attacks View PDF HTML (experimental)Abstract:Standard evaluations of backdoor attacks on text-to-image (T2I) models primarily measure trigger activation and visual fidelity. • We challenge this paradigm, demonstrating that encoder-side poisoning induces persistent, trigger-free semantic corruption that fundamentally reshapes the representation manifold. • We trace this vulnerability to a geometric mechanism: a Jacobian-based analysis reveals that backdoors act as low-rank, target-centered deformations that amplify local sensitivity, causing distortion to propagate coherently across semantic neighborhoods. • To rigorously quantify this structural degradation, we introduce SEMAD (Semantic Alignment and Drift), a diagnostic framework that measures both internal embedding drift and downstream functional misalignment. • Our findings, validated across diffusion and contrastive paradigms, expose the deep structural risks of encoder poisoning and highlight the necessity of geometric audits beyond simple attack success rates. • References & Citations export BibTeX citation Loading…

Article Summaries:

Computer Science > Cryptography and Security [Submitted on 21 Feb 2026] Title:When Backdoors Go Beyond Triggers: Semantic Drift in Diffusion Models Under Encoder Attacks View PDF HTML (experimental)Abstract:Standard evaluations of backdoor attacks on text-to-image (T2I) models primarily measure trigger activation and visual fidelity. We challenge this paradigm, demonstrating that encoder-side poisoning induces persistent, trigger-free semantic corruption that fundamentally reshapes the representation manifold. We trace this vulnerability to a geometric mechanism: a Jacobian-based analysis reve

Sources:

https://arxiv.org/abs/2602.20193 (Latest source article published: 2026-02-25 05:00 UTC)