Detecting backdoored language models at scale
• Today, we are releasing new research on detecting backdoors in open-weight language models. • Our research highlights several key properties of language model backdoors, laying t
• Today, we are releasing new research on detecting backdoors in open-weight language models. • Our research highlights several key properties of language model backdoors, laying t