Anthropic Challenges Pentagon as Sonnet 4.6 and Gemini 3.1 Pro Debut This Week
Photo by Compare Fibre on Unsplash
While the Pentagon threatens to cut off Anthropic over AI safeguards, the startup rolls out Claude Sonnet 4.6 and Google unveils Gemini 3.1 Pro this week, according to Lastweekin.
Quick Summary
- •While the Pentagon threatens to cut off Anthropic over AI safeguards, the startup rolls out Claude Sonnet 4.6 and Google unveils Gemini 3.1 Pro this week, according to Lastweekin.
- •Key company: Anthropic
- •Also mentioned: Anthropic
Anthropic’s rollout of Claude Sonnet 4.6 marks the company’s fastest‑paced model upgrade in a year, arriving just 12 days after the Opus 4.6 release. The midsized model now serves as the default for both the Free and Pro tiers, with pricing unchanged, and introduces a 1 million‑token context window—four times larger than its predecessor. According to Lastweekin, the expanded window enables users to feed entire codebases, lengthy contracts, or dozens of research papers into a single session, improving long‑context reasoning and reducing the need for compaction resets. Early testers preferred Sonnet 4.6 over Sonnet 4.5 in roughly 70 percent of cases and even beat Opus 4.5 about 60 percent of the time, citing stronger instruction‑following, fewer hallucinations, and more consistent multi‑step execution. Benchmark results show new records on OS World (computer‑use tasks) and SWE‑Bench (software engineering), and a 60.4 percent score on ARC‑AGI‑2, positioning Sonnet 4.6 “close to Opus‑level intelligence” for many real‑world workloads, though it still trails Google’s Gemini 3 Deep Think and a refined GPT‑5.2 variant (Lastweekin).
Google’s Gemini 3.1 Pro, unveiled the same week, pushes the frontier on logic and knowledge benchmarks. The model posted a 77.1 percent score on ARC‑AGI‑2—more than double Gemini 3 Pro’s 31.1 percent and ahead of Anthropic’s Opus 4.6 (68.8 percent) and OpenAI’s GPT‑5.2 (52.9 percent), according to Lastweekin. Additional results include 44.4 percent on Humanity’s Last Exam, 94.3 percent on GPQA Diamond, 92.6 percent on MMLU, and 80.6 percent on SWE‑Bench Verified. While Gemini 3.1 Pro trails OpenAI’s GPT‑5.3‑Codex on the SWE‑Bench Pro variant (54.2 percent vs. 56.8 percent) and sits just behind GPT‑5.2 on the same test (55.6 percent), Google emphasizes its “advanced reasoning” capabilities for structured explanations, data synthesis, and creative coding. The model is being rolled out across the Gemini app (free tier and higher‑usage AI Pro/Ultra plans), NotebookLM for paid users, and the Gemini API via AI Studio, Vertex AI, Gemini Enterprise, Gemini CLI, Antigravity, and Android Studio, making it the core engine for both consumer and developer surfaces (Lastweekin).
The timing of these releases coincides with a sharp escalation in Anthropic’s relationship with the U.S. Department of Defense. The Pentagon has warned it may cut off Anthropic’s access to defense contracts unless the startup implements stricter AI safeguards, a dispute highlighted in Lastweekin’s “Anthropic vs Pentagon” segment. The Department of Defense’s concerns stem from recent security lapses in Anthropic’s Model Context Protocol (MCP) server, which The Register reported as having been shipped without authentication, exposing the system to potential exploitation. VentureBeat previously detailed how a tool named Clawdbot demonstrated the practical risks of the unauthenticated MCP stack, underscoring the urgency of remediation. Anthropic has since “quietly fixed flaws” in its Git MCP server, according to The Register, but the Pentagon’s ultimatum suggests that compliance gaps remain a sticking point in the company’s bid for government business.
Analysts see the dual pressure of a high‑stakes government dispute and an accelerated model‑release cadence as a test of Anthropic’s operational resilience. The company’s strategy of offering frontier‑level capabilities at unchanged pricing aims to lock in a broad user base, especially among free‑tier developers who now gain access to larger context windows and enhanced coding tools (Lastweekin). However, the Pentagon’s stance could curtail a lucrative revenue stream, given that defense contracts have become a growing share of AI vendors’ top lines. In contrast, Google’s expansive deployment of Gemini 3.1 Pro across both consumer and enterprise channels may capture market share from firms wary of regulatory friction, reinforcing its position as the primary non‑OpenAI challenger on the “core reasoning” front (Lastweekin).
The week’s twin model launches therefore illustrate a broader industry shift: rapid, incremental upgrades are now the norm, and performance gains are measured in incremental benchmark jumps rather than order‑of‑magnitude breakthroughs. As Lastweekin’s editor notes, “the frontier labs may have gotten to the point of continuously post‑training their models via RL, and I wouldn’t be surprised if we see more impressive gains in just a few months.” Whether Anthropic can sustain its momentum while navigating the Pentagon’s safeguards demand will be a key indicator of how AI startups balance technical ambition with the growing imperative for robust security and compliance.
Sources
This article was created using AI technology and reviewed by the SectorHQ editorial team for accuracy and quality.