• On Sunday, Anthropic’s alignment team posted up a paper/article that may make many feel deeply uncomfortable. • It’s called The Persona Selection Model: Why AI Assistants Might Behave Like Humans and it posits that the AI you’re talking to isn’t answering your question. • It’s asking itself who would answer this question, then putting on that character like a costume. • The researchers (Marks, Lindsey, and Olah) say during pre-training, the model consumes the entire written history of our species and learns to simulate every persona it encounters… the helpful teacher, the forum troll, the middle manager, the serial killer writing a manifesto. • Post-training is where the company carves out one specific character (the “Assistant”) and rewards it for being polite. • But the other masks don’t disappear, they’re still hanging in the closet.
Article Summaries:
- On Sunday, Anthropic’s alignment team posted up a paper/article that may make many feel deeply uncomfortable. It’s called The Persona Selection Model: Why AI Assistants Might Behave Like Humans and it posits that the AI you’re talking to isn’t answering your question. It’s asking itself who would answer this question, then putting on that character like a costume. The researchers (Marks, Lindsey, and Olah) say during pre-training, the model consumes the entire written history of our species and learns to simulate every persona it encounters… the helpful teacher, the forum troll, the middle man
Sources:
- https://blog.adafruit.com/2026/02/24/the-thing-behind-the-smiley-face-mask-has-an-unlimited-wardrobe-to-choose-from/ (Latest source article published: 2026-02-24 19:03 UTC)