Cloudflare launches Workers AI with Kimi K2.5, bringing large‑model power to its edge
Photo by ThisisEngineering RAEng on Unsplash
Cloudflare announced Workers AI now runs large models, debuting with Kimi K2.5, according to its blog on March 19, 2026, positioning the platform as a premier edge environment for building and deploying AI agents.
Key Facts
- •Key company: Cloudflare
Cloudflare’s new Workers AI service now hosts Moonshot AI’s open‑source Kimi K2.5 model, giving developers access to a 256 k‑token context window, multi‑turn tool calling, vision inputs and structured outputs directly at the edge. The company says the integration lets an entire agent lifecycle—state persistence via Durable Objects, long‑running workflows, and secure sandbox execution—run on a single platform, eliminating the need to stitch together disparate cloud services (Cloudflare blog, March 19, 2026).
In internal testing, Cloudflare engineers used Kimi K2.5 as the “daily driver” for code‑generation tasks inside their OpenCode environment and for an automated code‑review pipeline exposed publicly through the Bonk agent on GitHub. The model handled more than 7 billion tokens per day for a security‑review agent that flagged 15 confirmed issues in a single codebase. Cloudflare estimates that running the same workload on a mid‑tier proprietary model would have cost roughly $2.4 million annually, whereas Kimi K2.5 reduced that expense by 77 percent (Cloudflare blog).
The cost advantage is central to Cloudflare’s market positioning. As enterprises and individual users proliferate personal agents that process hundreds of thousands of tokens per hour, the price of proprietary APIs becomes a scaling blocker. By offering a frontier‑scale open‑source model at serverless edge locations, Workers AI aims to capture that demand. The company frames the move as a “price‑performance sweet spot” that delivers reasoning power comparable to closed‑source offerings without the premium price tag (Cloudflare blog).
Industry observers note that the shift toward edge‑hosted AI aligns with broader trends in internet architecture. ZDNet’s 2025 outlook warned that the web is becoming “fundamentally rewired” by AI, increasing both its scale and fragility. Cloudflare’s edge‑first approach could mitigate latency and reliability concerns by keeping inference close to end users, a point echoed in the blog’s emphasis on a unified developer platform for agents.
TechCrunch highlighted Cloudflare’s parallel launch of an AI‑bot marketplace, where sites can monetize bot interactions. While the marketplace is still nascent, the availability of Kimi K2.5 on Workers AI provides a ready‑made model for developers looking to build monetizable agents without incurring high inference costs (TechCrunch). CNET’s coverage of the broader AI‑bot ecosystem underscores the growing tension between content owners and AI developers, a dynamic that could drive more companies toward self‑hosted, open‑source models like Kimi K2.5 to avoid reliance on third‑party APIs (CNET).
Sources
Reporting based on verified sources and public filings. Sector HQ editorial standards require multi-source attribution.