Skip to main content
Anthropic

Anthropic launches 300K‑query audit tool as Claude adds visual responses, Palantir keeps

Written by
Maren Kessler
AI News
Anthropic launches 300K‑query audit tool as Claude adds visual responses, Palantir keeps

Photo by Alexandre Debiève on Unsplash

300,000+ test queries revealed thousands of contradictions and ambiguities across Claude, GPT‑4o, Gemini and Grok, prompting Anthropic to roll out Petri, an automated behavioral‑audit tool, as it expands Claude’s visual response capabilities.

Key Facts

  • Key company: Anthropic

Anthropic’s rollout of Petri, an automated behavioral‑audit system, follows a massive internal test in which more than 300,000 queries were run against Claude, OpenAI’s GPT‑4o, Google’s Gemini and Grok. According to a post by Jamie Cole on the “report” blog, the audit uncovered “thousands of direct contradictions and interpretive ambiguities” across the models, prompting Anthropic to ship the production version of Petri on the same day the Pentagon’s chief technology officer labeled Claude a supply‑chain risk (report, Mar 13). Petri continuously monitors how model outputs shift across training runs and version updates, giving engineers a systematic way to flag regressions before they reach customers. The tool is now part of Anthropic’s standard deployment pipeline for Claude 3, including the newly announced “Haiku” variant, which the company touts as its fastest model yet (Anthropic).

At the same time, Claude is gaining multimodal capabilities that let it generate charts, diagrams and other visual artifacts directly in its responses. The Verge reported that the enhancement enables users to request structured visual output without resorting to external plotting libraries, effectively turning Claude into an end‑to‑end data‑visualization assistant. This move mirrors OpenAI’s recent release of GPT‑4o, which also supports image generation, and underscores Anthropic’s effort to keep pace in the increasingly visual AI market. Internally, the visual upgrade was timed to coincide with the Petri launch, allowing the audit system to evaluate not just textual consistency but also the correctness of generated graphics.

Despite the technical advances, Claude remains entangled in a geopolitical controversy. CNBC disclosed that the Department of Defense officially placed Anthropic on a supply‑chain blacklist last week, citing concerns that the model’s “training constitution” is baked into its behavior and could pose security risks (CNBC). Yet the Pentagon continues to rely on Claude for ongoing operations, including the war‑games scenario in Iran, according to Palantir CEO Alex Karp, who confirmed that his firm is still using Claude for defense‑related workloads (CNBC). The dichotomy highlights a broader tension: while the U.S. government is tightening procurement standards, it simultaneously depends on the very technology it deems risky.

Anthropic’s internal estimates of Claude’s scale suggest a model size that is large but not the “10‑trillion‑parameter” myth circulating online. A technical analysis on Unexcitedneurons argued that token‑generation throughput is limited by the number of active parameters loaded per forward pass, and that Claude Opus 4.5/4.6 likely falls well below the speculative upper bounds (Unexcitedneurons, Mar 12, 2026). This assessment aligns with Anthropic’s public positioning of Claude 3 Haiku as a performance‑optimized variant rather than a sheer size increase, reinforcing the company’s focus on efficiency and alignment over raw scale.

The convergence of Petri, visual response capabilities, and ongoing defense usage paints a picture of a model that is both technically evolving and politically sensitive. By institutionalizing behavioral audits, Anthropic aims to mitigate the very contradictions that triggered the Pentagon’s scrutiny, while the visual upgrade seeks to broaden Claude’s appeal in enterprise and consumer contexts. Whether these steps will satisfy regulators or simply buy time remains uncertain, but the rapid deployment of Petri signals that Anthropic is treating model reliability as a core product feature rather than an afterthought.

Sources

Primary source
Independent coverage
Other signals
  • Dev.to AI Tag

This article was created using AI technology and reviewed by the SectorHQ editorial team for accuracy and quality.

More from SectorHQ:📊Intelligence📝Blog
About the author
Maren Kessler
AI News

🏢Companies in This Story

Related Stories