Models
August 15, 2025

Anthropic’s Claude 4 can now end abusive or distressing conversations

Anthropic’s Claude Opus 4 and 4.1 now include a feature to terminate conversations in rare, extreme cases of persistent abuse or harmful user behavior, part of their “model welfare” initiative.

Anthropic announced that its Claude Opus 4 and Opus 4.1 models now possess the ability to end conversations when confronted with persistently harmful or abusive user interactions.

This safety feature was introduced as part of the company’s exploratory work on “model welfare,” designed to safeguard both user experience and the model’s integrity in extreme edge cases.

According to Anthropic, termination only occurs after repeated attempts to redirect discussions have failed or at the explicit request of the user. Importantly, the vast majority of users, including those discussing complex or controversial topics, will not encounter this intervention during normal use.

#
Anthropic

Read Our Content

See All Blogs
AWS

New AWS enterprise generative AI tools: AgentCore, Nova Act, and Strands SDK

Deveshi Dabbawala

August 12, 2025
Read more
ML

The evolution of machine learning in 2025

Siddharth Menon

August 8, 2025
Read more