Detecting and preventing distillation attacks

Anthropic reports industrial-scale distillation attacks on its Claude AI by three labs using fake accounts to extract capabilities at scale, describes how it detects and blocks these attacks, and outlines defensive measures.

Anthropic says it has detected coordinated distillation attacks by three AI labs that used roughly 24,000 fraudulent accounts to make millions of requests to its Claude model, aiming to extract reasoning, coding, and tool use capabilities for training their own systems.

It explains how these campaigns used proxy services, evaded detection, and targeted Claude’s most valuable features.

Anthropic outlines how it identifies and prevents such activity with classifiers and behavioral fingerprinting, strengthens account verification, and shares threat data with industry partners. The company calls for broader cooperation across AI developers and policymakers to defend against large-scale distillation attacks.

‍

Anthropic