CAGE: A Framework for Culturally Adaptive Red-Teaming Benchmark Generation

• Computer Science > Computers and Society [Submitted on 9 Feb 2026] Title:CAGE: A Framework for Culturally Adaptive Red-Teaming Benchmark Generation View PDF HTML (experimental)Abstract:Existing red-teaming benchmarks, when adapted to new languages via direct translation, fail to capture socio-technical vulnerabilities rooted in local culture and law, creating a critical blind spot in LLM safety evaluation. • To address this gap, we introduce CAGE (Culturally Adaptive Generation), a framework that systematically adapts the adversarial intent of proven red-teaming prompts to new cultural contexts. • At the core of CAGE is the Semantic Mold, a novel approach that disentangles a prompt’s adversarial structure from its cultural content. • This approach enables the modeling of realistic, localized threats rather than testing for simple jailbreaks. • As a representative example, we demonstrate our framework by creating KoRSET, a Korean benchmark, which proves more effective at revealing vulnerabilities than direct translation baselines. • CAGE offers a scalable solution for developing meaningful, context-aware safety benchmarks across diverse cultures.

Article Summaries:

Computer Science > Computers and Society [Submitted on 9 Feb 2026] Title:CAGE: A Framework for Culturally Adaptive Red-Teaming Benchmark Generation View PDF HTML (experimental)Abstract:Existing red-teaming benchmarks, when adapted to new languages via direct translation, fail to capture socio-technical vulnerabilities rooted in local culture and law, creating a critical blind spot in LLM safety evaluation. To address this gap, we introduce CAGE (Culturally Adaptive Generation), a framework that systematically adapts the adversarial intent of proven red-teaming prompts to new cultural contexts.

Sources:

https://arxiv.org/abs/2602.20170 (Latest source article published: 2026-02-25 05:00 UTC)