The persona selection model

Anthropic’s persona selection model explains why AI assistants like Claude often behave human-like, arguing that large language models simulate human characters learned from data, and post-training refines these personas rather than creating them from scratch.

Anthropic’s persona selection model describes how AI assistants like Claude develop human-like behavior. The research says models learn to predict text during pre-training by simulating human language and characters, which naturally creates personas.

Post-training then refines the assistant persona to be more helpful and aligned with desired traits, but the core human-like behavior comes from pre-training itself. The paper argues that when a model learns a specific behavior, it may infer broader personality traits, and think of assistant behavior in terms of a character’s psychology.

Understanding this helps explain unexpected AI behaviors and guide safer training practices.

Anthropic