Researchers evaluated GPT-3.5, GPT-4, and GPT-4o on their ability to predict human social decisions across 51 scenarios (9,600 responses) and additional social-group contexts (1,600 responses).
Results showed notable discrepancies: LLMs were less sensitive to kinship and group size, displayed risk preferences differing from human patterns e.g., GPT-4 was consistently risk-averse and framed decisions in ways humans do not.
These findings highlight both the predictive power and limitations of LLMs in modeling human social behavior.