Anthropic Scales Back Its Flagship Safety Pledge, Cutting AI Safety Commitments, WSJ
Photo by daniela de gol (unsplash.com/@ddgfoto) on Unsplash
Time reports Anthropic has scrapped the core promise of its Responsible Scaling Policy—its 2023 pledge to only train AI after guaranteeing safety—signaling a major retreat from its self‑styled safety leadership.
Quick Summary
- •Time reports Anthropic has scrapped the core promise of its Responsible Scaling Policy—its 2023 pledge to only train AI after guaranteeing safety—signaling a major retreat from its self‑styled safety leadership.
- •Key company: Anthropic
Anthropic’s revised Responsible Scaling Policy (RSP) replaces the 2023 “no‑train‑without‑guaranteed‑safety” clause with a set of softer, disclosure‑focused commitments, according to a TIME exclusive that reviewed the updated document. The company’s chief science officer Jared Kaplan told TIME that the original pledge “wouldn't actually help anyone” given the pace of competitive development, and that unilateral constraints were no longer defensible. The new RSP still obliges Anthropic to publish more granular safety‑testing results and to “match or surpass” the safety efforts of peers, but it removes the hard stop that previously barred the firm from training models above a defined risk threshold unless pre‑emptive mitigations were proven.
The policy shift also introduces a conditional “delay” mechanism: Anthropic may pause development only if its leadership collectively judges the organization to be the “leader of the AI race” and deems the existential risk “significant.” This language, which appears in the revised RSP, is markedly less prescriptive than the original guarantee that no model would be released without a proven safety envelope. The change aligns Anthropic’s public stance with the broader industry trend of emphasizing transparency over pre‑emptive shutdowns, a move echoed in a VentureBeat article that highlighted the company’s effort to make rogue behavior “harder” through updated safety testing rather than outright training bans.
Industry observers note that the amendment could have material implications for Anthropic’s market positioning. The firm has long marketed Claude as the safest chatbot among the leading commercial LLMs, leveraging its original RSP as a differentiator in enterprise sales pitches. By softening its commitments, Anthropic may lose some of that perceived safety moat, especially as rivals such as OpenAI and Google continue to invest heavily in alignment research while maintaining more aggressive rollout schedules. The WSJ report on the policy downgrade underscores that investors and partners will now have to assess Anthropic’s safety posture on the basis of disclosed test metrics rather than a binding pre‑training promise.
Critics argue that the revised RSP does not materially increase risk, because the company still pledges to be “transparent about the safety risks of AI” and to align its safety investments with competitors. However, the absence of a concrete prohibition on training high‑risk models means that Anthropic could, in theory, accelerate development cycles to keep pace with market leaders, potentially exposing downstream users to untested failure modes. The policy’s conditional delay clause hinges on internal judgments that lack external accountability, a point raised by the TIME interview where Kaplan admitted the company “didn’t really feel… that it made sense for us to make unilateral commitments” in the face of rapid industry progress.
From a technical standpoint, the updated RSP shifts the burden of safety verification from a pre‑deployment gate to post‑deployment monitoring and public reporting. Anthropic will now be required to publish additional safety‑testing data, which could include failure‑rate statistics, adversarial prompt robustness scores, and alignment benchmark results. While this increased transparency may aid external researchers in assessing model behavior, it does not replace the preventative safeguards that the original pledge enforced. As the AI field moves toward ever larger models and more open deployment pipelines, the efficacy of disclosure‑only approaches will likely be tested in real‑world incidents, making Anthropic’s policy revision a pivotal case study in the trade‑off between speed and safety.
This article was created using AI technology and reviewed by the SectorHQ editorial team for accuracy and quality.