Wikipedia Reveals Multiple Deals with AI Giants to Use Its Content

CN
Decrypt
Follow
3 hours ago

The Wikimedia Foundation has announced a series of new partnerships with artificial intelligence companies that will allow them to use Wikipedia content to train and power their AI models, as the nonprofit seeks to shore up its long-term sustainability amid changing online behavior.


The agreements were signed through Wikimedia Enterprise, the foundation’s commercial product designed for large-scale reusers and distributors of content from Wikimedia projects. New signups include Ecosia, Microsoft, Mistral AI, Perplexity, Pleias and ProRata. They join existing partners such as Amazon, Google and Meta.


“In the AI era, Wikipedia and its human-created and curated knowledge has never been more valuable,” the foundation said in a statement.


“Its knowledge power[s] generative AI chatbots, search engines, voice assistants and more. Wikipedia is one of the highest-quality datasets used in training Large Language Models.”


The announcement was made as part of an update tied to Wikipedia’s 25th anniversary.


The online encyclopedia is among the top ten most-visited websites globally and is the only one in that group operated by a nonprofit organization. Its more than 65 million articles, published in over 300 languages, are viewed nearly 15 billion times each month, according to the foundation.


However, it has warned that traffic patterns are shifting. In October, it said human visits to Wikipedia fell 8% year over year, attributing the decline to users relying on AI-generated summaries rather than visiting the site directly. Nearly 60% of Google searches now end without a click, with on-page responses often powered by Wikipedia content.





AI vs publishers


The deals come amid a broader debate over how AI companies obtain training data. Large language models are typically trained on vast amounts of online material, a practice that has drawn criticism from authors, publishers and other rights holders who argue that the use of copyrighted works without permission is infringement.


Among them, Reddit is involved in several suits with AI companies for the use of its content to train models, although it has reached licensing agreements with the likes of Google.


On Thursday, major book publishers Hachette Book Group and Cengage Group filed a motion to join an existing class action lawsuit against Google, accusing the company of carrying out “historic copyright infringement” to build its Gemini AI platform. The lawsuit alleges Google copied books without proper licenses during its AI training processes. The case was originally filed in 2023 by a group of authors.


OpenAI faces a similar case from plaintiffs including "Game of Thrones" writer George R.R. Martin.


Entertainment companies are also pressing the issue. In mid-December, Disney sent Google a cease-and-desist letter accusing it of copyright infringement, even as Disney struck a separate licensing deal with OpenAI covering hundreds of characters for AI-generated video. Disney has issued similar notices to other AI firms and is involved in litigation alongside major studios against image-generation company Midjourney.


The same month a coalition of writers, actors and technologists launched a new industry group aimed at pushing for enforceable standards governing how AI is trained and used in the entertainment sector. More than 500 prominent figures have backed the initiative, including Natalie Portman, Cate Blanchett, Ben Affleck, Guillermo del Toro and Taika Waititi.


The European Commission has also opened a formal antitrust investigation into whether Google violated EU competition rules by using publisher and YouTube content to power its AI services without fair compensation or consent.


Whether copyright holders will ultimately find recourse isn’t certain. Federal judges in the U.S. have recently delivered partial victories to Meta and Anthropic, ruling that their use of copyrighted books to train AI models constituted fair use, while criticizing the companies for maintaining permanent libraries of pirated works.


免责声明:本文章仅代表作者个人观点,不代表本平台的立场和观点。本文章仅供信息分享,不构成对任何人的任何投资建议。用户与作者之间的任何争议,与本平台无关。如网页中刊载的文章或图片涉及侵权,请提供相关的权利证明和身份证明发送邮件到support@aicoin.com,本平台相关工作人员将会进行核查。

Share To
APP

X

Telegram

Facebook

Reddit

CopyLink