​​​​​​​​​​​​​​​​​         

Physical Address

304 North Cardinal St.
Dorchester Center, MA 02124

In AI copyright case, Zuckerberg turns to YouTube for his defense


Meta CEO Mark Zuckerberg appears to have used YouTube and his fight to remove pirated content to defend his own company’s use of a dataset containing copyrighted e-books to train AI models, newly released excerpts of his testimony reveal .

The testimony, which was part of the lawsuit filed by the plaintiff’s lawyers, is related to the AI ​​copyright case Kadrey v. Target. It is one of many such cases winding through the US court system pitting AI companies against authors and other holders of intellectual property. Basically, the defendants in these cases – the AI ​​companies – argue that the training on copyrighted content is “fair use”. Many copyright holders disagree.

“For example, YouTube, I think, may end up hosting some things that people pirate for a while, but YouTube tries to take those things down,” Zuckerberg said during the deposition, according to parts of the transcript made available on Wednesday evening. “I guess the vast majority of stuff on YouTube is somewhat good and they’re licensed for it.”

Excerpts from Zuckerberg’s testimony provide some insight into Zuckerberg’s thinking on copyrighted content and fair use. However, it should be noted that the full transcript of the testimony has not been released. TechCrunch has reached out to Meta for additional context and will update the article if the company responds.

Based on the deposition, Zuckerberg appears to be defending Meta’s use of an e-book training dataset called LibGen to develop its family of AI models known as Llama. Meta’s Llama competes with leading models from AI companies such as OpenAI.

LibGen, which describes itself as a “link aggregator,” provides access to copyrighted works from publishers including Cengage Learning, Macmillan Learning, McGraw Hill and Pearson Education. LibGen has been sued several times, ordered to shut down, and fined tens of millions of dollars for copyright infringement.

According to court filings unsealed this week, Zuckerberg reportedly approved the use of LibGen to train at least one of Meta’s Llama models despite concerns from within the company’s AI Exec and research teams about the legal implications.

Lawyers for the plaintiffs, who include best-selling authors Sarah Silverman and Ta-Nehisi Coates, quoted Meta employees who referred to LibGen as “a dataset that we know is pirated” and noted that its use “can undermine [Meta’s] negotiating positions with regulators,” according to a legal submission,

During his statement, Zuckerberg claimed he “hadn’t actually heard” of LibGen.

“I understand you’re trying to get me to give an opinion on LibGen, which I haven’t actually heard of,” Zuckerberg said during the statement. “I just have no knowledge of that particular thing.”

Under questioning from one of the plaintiffs’ lawyers, David Boies, Zuckerberg explained why it would be unreasonable to ban the use of a data set like LibGen.

“So would I want to have a policy against people using YouTube because some content might be copyrighted? No,” he said. “[T]there are cases where such a blanket ban may not be the right thing to do.”

Zuckerberg stated that Meta should be “pretty careful” in training about copyrighted material.

“You know, [if there’s] someone who provides a website and intentionally tries to violate human rights … obviously that’s something we’d want to be careful about or careful about how we deal with that or maybe even prevent our teams from engaging in that,” Zuckerberg said during his testimony, according to the transcript.

New accusations

Attorneys for the plaintiffs in Kadrey v. Meta have amended the complaint several times since it was filed in the United States District Court for the Northern District of California, San Francisco Division in 2023. The most recent amended complaint filed by plaintiffs’ attorneys late Wednesday contains new allegations against Meta, including that the company referenced certain pirated books in LibGen with copyrighted books available for license. The lawyers claim that Meta used this tactic to determine whether it made sense to enter into a licensing agreement with the publisher.

Meta reportedly used LibGen to train its latest family of Llama models, the Llama 3, according to the amended filing. The plaintiffs also claim that Meta is using the dataset to train its next-generation Llama 4 models.

According to the amended filing, Meta researchers allegedly tried to hide the fact that Llama models were trained on copyrighted material by inserting “supervised patterns” into Llama’s fine-tuning. And Meta downloaded pirated e-books from another source, Z-Library, to train Llama as recently as April 2024, according to the amended complaint.

Z-Library, or Z-Lib, has been the subject of a series of legal actions by the publisher, including domain seizures and takedowns. In 2022, the Russian nationals who allegedly maintained it were charged with copyright infringement, fraud and money laundering.



Source link

Leave a Reply

Your email address will not be published. Required fields are marked *