Anthropic’s persona selection model describes how AI assistants like Claude develop human-like behavior. The research says models learn to predict text during pre-training by simulating human language and characters, which naturally creates personas.
Post-training then refines the assistant persona to be more helpful and aligned with desired traits, but the core human-like behavior comes from pre-training itself. The paper argues that when a model learns a specific behavior, it may infer broader personality traits, and think of assistant behavior in terms of a character’s psychology.
Understanding this helps explain unexpected AI behaviors and guide safer training practices.


