Lawsuit Accuses Meta Of Training AI On Torrented 82TB Dataset With Millions Of Pirated Books

Meta is involved in a class action lawsuit alleging copyright infringement, a claim the company disputes. Newly unsealed court documents containing private conversations between Meta employees appear to support the plaintiffs’ allegations, however. These documents suggest Meta employees were aware of the potential legal risks associated with their actions, including using torrented data from shadow libraries like LibGen to train their AI models.

Unsealed court documents reveal that Meta allegedly used a minimum of 81.7TB of illegally torrented data sourced from shadow libraries to train its AI models. Internal emails further show that Meta employees expressed concerns about this practice. Some employees voiced strong ethical objections, with one noting that using content from sites like LibGen, known for distributing copyrighted material, would be unethical.

A research engineer with Meta, Nikolay Bashlykov, also noted that “torrenting from a corporate laptop doesn’t feel right,” highlighting his discomfort surrounding the practice. Additionally, the documents suggest that these concerns, including discussions about using data from LibGen, reached CEO Mark Zuckerberg, who may have ultimately approved the activity. Furthermore, the documents showed that despite these misgivings, employees discussed using VPNs to mask Meta’s IP address to create anonymity, enabling them to download and share torrented data without it being easily traced back to the company’s network.

The plaintiffs argue that these internal emails prove Meta knew its actions were illegal, yet it persisted in torrenting copyrighted material. This activity includes not only unauthorized use of the material to train its AI models, but also active distribution and dissemination—or “seeding”—of these pirated works across the torrent network.

The plaintiffs further assert that this evidence contradicts previous deposition testimony from Meta employees, including Zuckerberg. This discrepancy, they argue, necessitates further legal investigation through additional depositions and justifies bringing additional allegations against Meta.

Earlier efforts by the plaintiffs to access Meta’s torrenting and seeding data were unsuccessful. This newly uncovered evidence, however, could significantly damage Meta’s defense.

OpenAI quietly releases GPT-4o update amid leadership turmoil

OpenAI quietly releases a cheaper, more powerful GPT-4o update as the company faces leadership exodus and fierce competition in the rapidly evolving AI landscape.Read More

August 7, 2024

Video Friday: Unitree Talent Awakening

4 min read Evan Ackerman is IEEE Spectrum’s robotics editor. Please do not imitate dangerous actions. Video Friday is your weekly selection of awesome robotics

December 27, 2024

This portable AI supercomputer in a carry-on suitcase: 4 x GPUs, 246TB storage, 2500W PSU

GigaIO and SourceCode have just unveiled Gryf, an ultra-portable AI supercomputer-class system that weighs less than 55 pounds, and fits inside of a TSA-friendly carry-on

May 14, 2024

Supercharge Your Portfolio with Future Tech Stocks!

Join us for Profitable Insights & Expert Tips!

With expert analysis, comprehensive market coverage, and actionable insights, our newsletter equips you with the knowledge & tools necessary to make informed decisions & maximize your potential returns in the dynamic world of future tech stocks.