OpenAI defends fair-use scraping of every copyrighted work — while suing over scraping of its own model outputs

CASE A

The New York Times v. Microsoft, OpenAI, et al. (1:23-cv-11195)

United States District Court, Southern District of New York · 2024-01-08

Training large language models on the entire public web — including paywalled, copyrighted news articles — is transformative fair use. The model's outputs do not substitute for the original works, and the public benefit of advanced AI outweighs any incidental copying during training.

Source ↗

CASE B

OpenAI Terms of Use — Section 2 (Restrictions) and DeepSeek model-distillation allegations

Contractual; multiple public statements (Bloomberg, Financial Times) · 2025-01-29

Users may not 'use Output to develop models that compete with OpenAI'; OpenAI publicly accused DeepSeek of using its model outputs to train competing models in violation of those terms, and is preparing legal action.

Source ↗

// THE PARADOX

OpenAI's fair-use defense in the NYT case turns on the principle that ingesting copyrighted material to train a derivative system that competes with the original is non-infringing because of transformative purpose. When the same act is done to OpenAI — extracting outputs from GPT-4 to train a competing model — OpenAI treats it as a contractual and arguably tortious wrong. The principle defended in court when OpenAI is the defendant disappears when OpenAI is the would-be plaintiff.

// Evidence

Filed: 1970-01-01