IndicJR: A Judge-Free Benchmark of Jailbreak Robustness in South Asian Languages

• Computer Science > Artificial Intelligence [Submitted on 18 Feb 2026] Title:IndicJR: A Judge-Free Benchmark of Jailbreak Robustness in South Asian Languages View PDF HTML (experimental)Abstract:Safety alignment of large language models (LLMs) is mostly evaluated in English and contract-bound, leaving multilingual vulnerabilities understudied. • We introduce \textbf{Indic Jailbreak Robustness (IJR)}, a judge-free benchmark for adversarial safety across 12 Indic and South Asian languages (2.1 Billion speakers), covering 45216 prompts in JSON (contract-bound) and Free (naturalistic) tracks. • IJR reveals three patterns. • (1) Contracts inflate refusals but do not stop jailbreaks: in JSON, LLaMA and Sarvam exceed 0.92 JSR, and in Free all models reach 1.0 with refusals collapsing. • (2) English to Indic attacks transfer strongly, with format wrappers often outperforming instruction wrappers. • (3) Orthography matters: romanized or mixed inputs reduce JSR under JSON, with correlations to romanization share and tokenization (approx 0.28 to 0.32) indicating systematic effects.

Article Summaries:

IndicJR: A Judge‑Free Benchmark of Jailbreak Robustness in South Asian Languages

A new benchmark, Indic Jailbreak Robustness (IJR), evaluates large language models (LLMs) on 12 South Asian languages using 45,216 prompts in both contract‑bound (JSON) and free‑form tracks. IJR shows that contract‑based prompts inflate refusal rates but do not prevent jailbreaks: LLaMA and Sarvam exceed 0.92 jailbreak success rate (JSR) in JSON, while all models reach 1.0 in free form with refusals collapsing. English attacks transfer strongly to Indic languages, and orthographic variations-especially romanization-significantly affect JSR, with correlations around 0.3. Human audits confirm detector reliability, underscoring the need for multilingual, judge‑free safety tests.

Sources:

https://arxiv.org/abs/2602.16832