[ Enter Database → ]

OpenAI defends fair-use scraping of every copyrighted work — while suing over scraping of its own model outputs

CASE A
The New York Times v. Microsoft, OpenAI, et al. (1:23-cv-11195)
United States District Court, Southern District of New York · 2024-01-08
Training large language models on the entire public web — including paywalled, copyrighted news articles — is transformative fair use. The model's outputs do not substitute for the original works, and the public benefit of advanced AI outweighs any incidental copying during training.
Source ↗
CASE B
OpenAI Terms of Use — Section 2 (Restrictions) and DeepSeek model-distillation allegations
Contractual; multiple public statements (Bloomberg, Financial Times) · 2025-01-29
Users may not 'use Output to develop models that compete with OpenAI'; OpenAI publicly accused DeepSeek of using its model outputs to train competing models in violation of those terms, and is preparing legal action.
Source ↗
// THE PARADOX

OpenAI's fair-use defense in the NYT case turns on the principle that ingesting copyrighted material to train a derivative system that competes with the original is non-infringing because of transformative purpose. When the same act is done to OpenAI — extracting outputs from GPT-4 to train a competing model — OpenAI treats it as a contractual and arguably tortious wrong. The principle defended in court when OpenAI is the defendant disappears when OpenAI is the would-be plaintiff.

Filed: 1970-01-01