Anthropic Scores Partial Victory in Copyright Case Over AI Training Data

CN
Decrypt
Follow
11 hours ago

AI firm Anthropic has won a key legal victory in a copyright battle over how artificial intelligence companies use copyrighted material to train their models, but the fight is far from over.


U.S. District Judge William Alsup found that Anthropic’s use of copyrighted books to train its AI chatbot Claude qualifies as “fair use” under U.S. copyright law, in a ruling late Monday.



“Like any reader aspiring to be a writer, Anthropic’s LLMs trained upon works not to race ahead and replicate or supplant them — but to turn a hard corner and create something different,” U.S. District Judge William Alsup said in his ruling.


But the judge also faulted the Amazon and Google-backed firm for building and maintaining a massive “central library” of pirated books, calling that part of its operations a clear copyright violation.


“No carveout” from Copyright Act


The case, brought last August by authors Andrea Bartz, Charles Graeber, and Kirk Wallace Johnson, accused Anthropic of building Claude using millions of pirated books downloaded from notorious sites like Library Genesis and Pirate Library Mirror.


The lawsuit, which seeks damages and a permanent injunction, alleges Anthropic “built a multibillion-dollar business by stealing hundreds of thousands of copyrighted books,” to train Claude, its family of AI models.


Alsup said that AI training can be “exceedingly transformative,” noting how Claude’s outputs do not reproduce or regurgitate authors’ works but generate new text “orthogonal” to the originals.


Court records reveal that Anthropic downloaded at least seven million pirated books, including copies of each author’s works, to assemble its library.


Internal emails revealed that Anthropic co-founders sought to avoid the “legal/practice/business slog” of licensing books, while employees described the goal as creating a digital collection of “all the books in the world” to be kept “forever.”


“There is no carveout, however, from the Copyright Act for AI companies,” Alsup said, noting that maintaining a permanent library of stolen works — even if only some were used for training — “destroy the academic publishing market” if allowed.


Judge William Alsup's ruling is the first substantive decision by a U.S. federal court that directly analyzes and applies the doctrine of fair use specifically to the use of copyrighted material for training generative AI models.


The court distinguished between copies used directly for AI training, which were deemed fair use, and the retained pirated copies, which will now be subject to further legal proceedings, including potential damages.


AI copyright cases


While several lawsuits have been filed—including high-profile cases against OpenAI, Meta, and others—those cases are still in early stages, with motions to dismiss pending or discovery ongoing.


OpenAI and Meta both face lawsuits from groups of authors alleging their copyrighted works were exploited without consent to train large language models such as ChatGPT and LLaMA.


The New York Times sued OpenAI and Microsoft in 2023, accusing them of using millions of Times articles without permission to develop AI tools.


Reddit also recently sued Anthropic, alleging it scraped Reddit’s platform over 100,000 times to train Claude, despite claiming to have stopped.


免责声明:本文章仅代表作者个人观点,不代表本平台的立场和观点。本文章仅供信息分享,不构成对任何人的任何投资建议。用户与作者之间的任何争议,与本平台无关。如网页中刊载的文章或图片涉及侵权,请提供相关的权利证明和身份证明发送邮件到support@aicoin.com,本平台相关工作人员将会进行核查。

ad
Gate: 注册赢取$10000+礼包
Ad
Share To
APP

X

Telegram

Facebook

Reddit

CopyLink