Skip to main content
Apple

Apple Accused of Scraping Millions of YouTube Videos to Train Its AI, 9to5Mac Reports

Published by
SectorHQ Editorial
Apple Accused of Scraping Millions of YouTube Videos to Train Its AI, 9to5Mac Reports

Photo by ThisisEngineering RAEng on Unsplash

While Apple touts its breakthrough AI research, a proposed class action alleges the tech giant secretly scraped millions of YouTube videos—bypassing anti‑scraping safeguards—to build its model, 9to5Mac reports.

Key Facts

  • Key company: Apple

The lawsuit, filed by Ted Entertainment, Matt Fisher and Golfholics, alleges that Apple’s research team built a video‑generation model on a corpus the plaintiffs call “Panda‑70M,” a dataset that allegedly maps millions of YouTube URLs, identifiers and timestamps to individual clips. According to the complaint, each clip was extracted by repeatedly accessing the source video and isolating the designated segment, a process the plaintiffs describe as “a separate act of circumvention for each clip retrieved.” The plaintiffs claim that content from their own channels appears more than 500 times in the index and that Apple bypassed YouTube’s anti‑scraping safeguards to download the underlying footage for training purposes. The complaint seeks class‑action certification, statutory damages under 17 U.S.C. § 1203, injunctive relief to halt further use of the material, and attorneys’ fees (9to5Mac).

Apple’s defense rests on the fact that the Panda‑70M dataset, as described in the paper titled STIV: Scalable Text and Image Conditioned Video Generation, contains only links to publicly available YouTube videos, not the video files themselves. The researchers, who published the study in late 2024, argue that the dataset functions as an “index file” that identifies specific clips by URL, video identifier and timestamp, rather than storing the media. The lawsuit, however, contends that Apple’s alleged circumvention of YouTube’s technical protections to retrieve the clips transforms the index into a de facto copy of the copyrighted material, thereby violating the Digital Millennium Copyright Act (9to5Mac).

If the plaintiffs succeed, Apple could face exposure comparable to the recent litigation against Amazon and OpenAI, which also cite the Panda‑70M dataset in alleged copyright infringements. The parallel filings suggest a broader industry pattern: major AI developers are being scrutinized for leveraging large‑scale, scraped video corpora to train generative models. Legal analysts note that the outcome could set a precedent for how “link‑only” datasets are treated under copyright law, potentially forcing AI firms to redesign their data‑collection pipelines or obtain explicit licenses for every clip used in training (9to5Mac). The stakes are amplified by the fact that Apple has positioned its video‑generation capabilities as a flagship feature of its upcoming AI suite, promising developers tools that can synthesize high‑quality video from text prompts.

Beyond the immediate financial liability, the case raises strategic questions for Apple’s AI roadmap. The company has been relatively quiet about the specifics of its generative‑video research, yet the STIV paper signals a concerted effort to compete with rivals such as Google’s Imagen Video and Meta’s Make‑a‑Video. A court‑ordered injunction that bars Apple from using the Panda‑70M dataset—or from scraping YouTube altogether—could delay product rollouts and erode the first‑mover advantage the firm hopes to claim in the nascent video‑generation market. Moreover, a finding of willful circumvention would likely attract heightened scrutiny from regulators concerned about the broader implications of large‑scale data scraping on platform ecosystems (9to5Mac).

The plaintiffs also request a trial by jury on all claims, emphasizing the desire for a public reckoning of the alleged infringement. Their demand for “prejudgment and post‑judgment interest” and “the fullest extent available” of statutory damages underscores the potential monetary exposure, which could run into tens of millions of dollars per violation under current copyright statutes. While the complaint lists specific exhibits (A, B and C) that allegedly contain the infringing content, the public record does not yet disclose the full scope of those materials, leaving the court to weigh the merits of the alleged 500‑plus instances against Apple’s claim of merely indexing publicly available links (9to5Mac).

In sum, the case pits Apple’s ambitious AI ambitions against a growing wave of copyright enforcement actions targeting the data‑intensive foundations of generative models. The resolution will likely hinge on whether courts interpret a dataset of URLs and timestamps as a permissible “reference” or as an unlawful copy of protected audiovisual works. As the litigation proceeds, investors and industry observers will be watching for signals about the durability of Apple’s AI pipeline and the broader legal landscape that could shape the future of large‑scale video‑generation research.

Sources

Primary source

Reporting based on verified sources and public filings. Sector HQ editorial standards require multi-source attribution.

More from SectorHQ:📊Intelligence📝Blog

🏢Companies in This Story

Related Stories