Skip to main content
OpenAI

Encyclopedia Britannica Sues OpenAI for Unauthorized Use of 100,000 Articles in AI

Published by
SectorHQ Editorial
Encyclopedia Britannica Sues OpenAI for Unauthorized Use of 100,000 Articles in AI

Photo by Clay Banks (unsplash.com/@claybanks) on Unsplash

100,000 articles. That’s the number of Britannica and Merriam‑Webster entries The‑Decoder reports were allegedly used without permission to train OpenAI’s models, prompting a federal lawsuit in Manhattan.

Key Facts

  • Key company: OpenAI

The complaint, filed in the U.S. District Court for the Southern District of New York, alleges that OpenAI scraped roughly 100,000 Britannica and Merriam‑Webster entries from the publishers’ publicly available websites and incorporated them into the training data for GPT‑4 and earlier models without obtaining a license (The‑Decoder; Reuters). Britannica’s lawyers argue that the language model has “memorized” large swaths of the copyrighted text, reproducing near‑verbatim passages when users query topics that overlap with the encyclopedia’s content. In several documented instances, ChatGPT’s responses included entire paragraphs that match Britannica articles word‑for‑word, effectively diverting traffic that would otherwise land on Britannica’s own platform (The‑Decoder).

Beyond copyright infringement, Britannica is also pursuing claims of trademark violation. The suit contends that the AI’s output frequently cites “Encyclopedia Britannica” as a source, creating the false impression that the publisher has endorsed or collaborated with OpenAI. Because the citations are often attached to inaccurate or out‑of‑date information, the plaintiffs argue that the brand’s reputation is being tarnished and that users are misled into believing they are receiving authoritative content directly from the trusted reference work (The‑Decoder). The complaint seeks both monetary damages and a preliminary injunction that would force OpenAI to halt further training on the disputed material and to remove any infringing outputs from its services (The‑Decoder).

OpenAI’s response to the allegations has been limited to standard legal filings, but the company has previously defended its data‑collection practices on the grounds that publicly accessible web content can be used for model training under the doctrine of “fair use.” In a 2024 blog post, OpenAI argued that its models transform raw text into statistical representations that do not retain the expressive elements of the original works, a position that has been tested in prior copyright cases involving large language models (TechCrunch). However, the Britannica suit underscores a growing trend of legacy content providers pushing back against what they view as wholesale appropriation of their intellectual property by AI developers.

Industry observers note that the lawsuit could have far‑reaching implications for the AI ecosystem. If the court sides with Britannica, it may compel OpenAI and other firms to renegotiate data‑licensing agreements with publishers, potentially reshaping the economics of model training. Analysts at The Next Web point out that the case highlights a “fundamental tension” between the open‑web training paradigm that has powered rapid AI advances and the commercial interests of content creators who rely on subscription revenue (The Next Web). A ruling that limits the use of unlicensed text could slow the pace of model improvement, especially for niche domains where high‑quality curated data is scarce.

The timing of the lawsuit also coincides with heightened regulatory scrutiny of AI practices. The European Union’s AI Act, slated for implementation later this year, includes provisions that could classify unlicensed data usage as a “high‑risk” activity subject to compliance audits. Meanwhile, the U.S. Federal Trade Commission has opened an inquiry into “deceptive AI claims,” which may intersect with Britannica’s trademark allegations (Reuters). As the legal battle unfolds, both parties have indicated they are prepared for a protracted fight, with OpenAI likely to appeal any adverse decision and Britannica signaling its intent to protect its digital assets aggressively.

Regardless of the outcome, the case serves as a bellwether for how the AI industry will navigate the balance between open data innovation and the rights of traditional publishers. For now, users of ChatGPT may continue to see Britannica‑sourced excerpts in their answers, but the lawsuit could soon force the model to either filter out such content or obtain explicit licensing—an adjustment that would reshape the user experience and the competitive dynamics of AI‑driven knowledge services.

Sources

Primary source
  • PYMNTS.com
Independent coverage

Reporting based on verified sources and public filings. Sector HQ editorial standards require multi-source attribution.

More from SectorHQ:📊Intelligence📝Blog

🏢Companies in This Story

Related Stories