Meta Claims Using Pirated Books to Train its LLM is “Fair Use”
In response to a lawsuit filed by authors and being tried as a class action, Meta is making some interesting claims.
The authors claim that Meta use massive quantities of copyrighted material to train its large language model (LLM).
The plaintiffs unveiled evidence that Meta used BitTorrent to download at least 81.7 terabytes of data from multiple shadow (i.e. illegal) libraries.
A group of law professors claim this is not a problem as this is transformative – transformed from a book to a piece of software that spits out the book (they didn’t say the second part, but that is about the only transformation, in my opinion). They submitted an Amicus brief as a friend of the court.
The authors say that in order to claim fair use, you have to acquire the work legally.
A couple of countries have legalized stealing copyrighted works for training an AI, but only a couple. The US is not one of them.
The other side disagrees, of course, saying transformative use must serve a fundamentally different purpose. The amicus brief also fails to mention that the transformative purpose is to make much money for Meta.
Unsealed emails show that Meta’s legal beagles were directly involved in discussions to stop trying to (expensively) license training content in favor of using pirated material. The plaintiffs say this shows willful commercial exploitation, not good faith transformative use.
Studies show that LLMs perform better – measurably better (23% better) – when trained on copyrighted material.
Other Meta emails show employees discussed the risks of getting caught (note to future crooks – don’t memorialize your crimes in email). and suggested using VPNs to hide what they were doing.
While it will be quite a while before this is settled, it does point to two things:
- Companies will go to extreme and sketchy lengths to train their LLMs
- Authors will have to go to significant expense if they don’t want these companies to commercially benefit from the unauthorized use of their work
Stay tuned, this is not over. Credit: Cybernews