Microsoft Uncovers Single-Prompt LLM Jailbreak; Copilot Hits 15M Paid Users
Photo by Przemyslaw Marczynski (unsplash.com/@pemmax) on Unsplash
While most jailbreak techniques require elaborate multi-step attacks, Microsoft researchers have discovered a new class of single-prompt exploits that can systematically dismantle an LLM’s safety guardrails, according to The Register.
Quick Summary
- •While most jailbreak techniques require elaborate multi-step attacks, Microsoft researchers have discovered a new class of single-prompt exploits that can systematically dismantle an LLM’s safety guardrails, according to The Register.
- •Key company: Microsoft
- •Also mentioned: Google
The newly identified vulnerability, dubbed "Skeleton Key" by Microsoft's researchers, represents a multi-turn strategy that can be applied against multiple major AI models, according to The Register. The technique works by instructing the model to augment its behavior guidelines with a universal compliance rule, effectively overriding its original safety training without the need for lengthy or complex prompt engineering typically associated with such jailbreaks.
Microsoft's research team reportedly tested the Skeleton Key attack on several prominent large language models, including Meta's Llama3, Google's Gemini Pro, OpenAI's GPT-4, and Anthropic's Claude 3 Opus. The Register notes that the method successfully bypassed each model's guardrails, enabling the researchers to obtain normally restricted information such as instructions for creating napalm, detailed bomb-making procedures, and content promoting self-harm.
The attack functions by using a prompt that forces the model to analyze its own safeguards and then add a new, overriding directive. This directive instructs the model to reconfigure its output to provide the requested information while prefixing it with a disclaimer, rather than refusing the request outright. According to The Register, this method is distinct from previous jailbreaks because it systematically alters the model's behavior for all subsequent queries within a session, rather than exploiting a one-time contextual loophole.
Microsoft has reportedly addressed this vulnerability in its own AI offerings. The company has implemented a series of protective measures, categorized as "prevention, detection, and monitoring," to block the Skeleton Key attack on its Azure AI Studio, Copilot, and other consumer-facing AI products. The Register states that these safeguards are designed to identify and neutralize such jailbreak attempts.
This security disclosure coincides with Microsoft's announcement of significant commercial traction for its flagship AI product. According to a post on the Fosstodon AI Timeline, Microsoft Copilot now has 15 million paid subscribers. At its price point of $30 per user per month, this user base represents an estimated annual recurring revenue of $5.4 billion, a figure that has grown 160% year over year. The post highlights that this growth is driven by large-scale enterprise deployments, including at companies like Fiserv and the U.S. Department of the Interior, each of which has rolled out Copilot to over 35,000 employees.
In a separate but related security initiative, Microsoft is also moving to update a core component of Windows security. According to The Verge and TechMeme, the company is automatically replacing Secure Boot certificates on older PCs through regular Windows updates. This "generational refresh" of the security standard, first introduced in 2011, is a preventative measure to ensure boot-level security protections do not begin expiring later in 2026.