How to Evaluate Jailbreak Methods: A Case Study with the StrongREJECT Benchmark

How to Evaluate Jailbreak Methods: A Case Study with the StrongREJECT Benchmark

• When we began studying jailbreak evaluations, we found a fascinating paper claiming that you could jailbreak frontier LLMs simply by translating forbidden prompts into obscure la

Research · August 28, 2024 (updated February 19, 2026) · 2 min · 219 words