Anthropic announced that its Claude Opus 4 and Opus 4.1 models now possess the ability to end conversations when confronted with persistently harmful or abusive user interactions.
This safety feature was introduced as part of the company’s exploratory work on “model welfare,” designed to safeguard both user experience and the model’s integrity in extreme edge cases.
According to Anthropic, termination only occurs after repeated attempts to redirect discussions have failed or at the explicit request of the user. Importantly, the vast majority of users, including those discussing complex or controversial topics, will not encounter this intervention during normal use.