Moonshot AI Uncovers Industrial-Scale Distillation Attacks Targeting Major Language Models
Photo by Ling App (unsplash.com/@lingapp) on Unsplash
Moonshot AI, along with DeepSeek and MiniMax, launched industrial‑scale distillation attacks that created 24,000 fake accounts and logged over 16 million Claude interactions to siphon capabilities, a recent report says.
Quick Summary
- •Moonshot AI, along with DeepSeek and MiniMax, launched industrial‑scale distillation attacks that created 24,000 fake accounts and logged over 16 million Claude interactions to siphon capabilities, a recent report says.
- •Key company: Moonshot AI
- •Also mentioned: DeepSeek, MiniMax, Anthropic
Moonshot AI, DeepSeek and MiniMax have orchestrated what the report from Anthropic describes as “industrial‑scale distillation attacks,” leveraging more than 24,000 fraudulent accounts to harvest over 16 million interactions with Claude, the company’s flagship large‑language model (LLM). The attackers used these accounts to issue prompts, capture the model’s responses, and then feed the resulting data into their own training pipelines, effectively “stealing” Claude’s capabilities without direct access to its weights. According to the Anthropic security bulletin, the scale of the operation—tens of thousands of accounts and millions of API calls—exceeds any previously documented attempt to reverse‑engineer a commercial LLM, moving the threat from a research‑grade proof‑of‑concept to a production‑level data exfiltration campaign.
The methodology mirrors classic model‑distillation attacks, where an adversary queries a target model repeatedly, aggregates the input‑output pairs, and trains a surrogate model to mimic the original’s behavior. In this case, the three labs automated account creation and prompt generation, bypassing rate‑limit safeguards by distributing traffic across a sprawling botnet of fake identities. Anthropic’s internal logs show that the malicious traffic accounted for a measurable fraction of Claude’s total API usage during the attack window, suggesting the perpetrators were able to sustain high query volumes without triggering immediate throttling. By amassing 16 million exchanges, the attackers obtained a dataset rich enough to capture nuanced reasoning patterns, code generation abilities, and domain‑specific knowledge that would otherwise require months of fine‑tuning on proprietary data.
Anthropic’s response highlights two technical gaps that the attack exploited. First, the platform’s authentication flow allowed the creation of new accounts with minimal friction, a weakness that enabled the rapid provisioning of the 24,000 fake identities. Second, the API’s per‑account rate limits were insufficient to detect coordinated abuse when the load was spread thinly across many accounts. The report notes that Anthropic has since instituted stricter identity verification, tightened rate‑limit thresholds, and deployed anomaly‑detection models that flag unusual query patterns across the user base. These mitigations aim to raise the cost of scaling a distillation campaign to the point where the return on investment diminishes for adversaries.
The implications extend beyond Anthropic’s ecosystem. As Claude’s capabilities have become a benchmark for enterprise AI—particularly in code assistance and multi‑modal reasoning—its stolen knowledge can accelerate the development of competing models that might otherwise lag behind. DeepSeek, Moonshot AI and MiniMax have each announced plans to launch next‑generation LLMs in the coming months, and the stolen data could give them a shortcut to performance levels that would normally require extensive proprietary data collection and compute budgets. Industry analysts, citing the Anthropic bulletin, warn that such “capability leakage” could reshape competitive dynamics, especially for smaller labs that lack the resources to train models from scratch.
Finally, the episode underscores a broader security challenge for the AI industry: protecting the intellectual property embedded in model outputs. Unlike traditional software, an LLM’s value is encoded in the statistical relationships it has learned, which can be approximated through enough high‑quality query‑response pairs. Anthropic’s disclosure, amplified by coverage on outlets such as The Verge and TechCrunch, is a rare, data‑driven glimpse into how adversaries can weaponize API access at scale. The company’s forthcoming technical blog promises to detail the detection algorithms and policy changes it has deployed, offering a potential playbook for other AI providers grappling with the same threat vector.
Sources
This article was created using AI technology and reviewed by the SectorHQ editorial team for accuracy and quality.