OpenAI defends fair-use scraping of every copyrighted work — while suing over scraping of its own model outputs
CASE A
The New York Times v. Microsoft, OpenAI, et al. (1:23-cv-11195)
Training large language models on the entire public web — including paywalled, copyrighted news articles — is transformative fair use. The model's outputs do not substitute for the original works, and the public benefit of advanced AI outweighs any incidental copying during training.Source ↗
CASE B
OpenAI Terms of Use — Section 2 (Restrictions) and DeepSeek model-distillation allegations
Users may not 'use Output to develop models that compete with OpenAI'; OpenAI publicly accused DeepSeek of using its model outputs to train competing models in violation of those terms, and is preparing legal action.Source ↗
// THE PARADOX
OpenAI's fair-use defense in the NYT case turns on the principle that ingesting copyrighted material to train a derivative system that competes with the original is non-infringing because of transformative purpose. When the same act is done to OpenAI — extracting outputs from GPT-4 to train a competing model — OpenAI treats it as a contractual and arguably tortious wrong. The principle defended in court when OpenAI is the defendant disappears when OpenAI is the would-be plaintiff.
// Evidence
- https://www.courtlistener.com/docket/68117049/the-new-york-times-company-v-microsoft-corporation/ ↗
- https://www.ft.com/content/a0dfedd1-5255-4fa9-8ccc-1fe01de87ea6 ↗
- https://openai.com/policies/terms-of-use/ ↗
- https://www.bloomberg.com/news/articles/2025-01-29/microsoft-probing-if-deepseek-improperly-took-openai-data ↗