Comment by tpmoney
The fair use test (in US copyright law) is a 4 part test under which impact on the market for the original work is one of 4 parts. Notably, just because a use has massively detrimental harms to a work's market does not in and of itself constitute a copyright violation. And it couldn't be any other way. Imagine if you could be sued for copyright infringement for using a work to criticize that work or the author of that work if the author could prove that your criticism hurt their sales. Imagine if you could be sued for copyright infringement because you wrote a better song or book on the same themes as a previous creator after seeing their work and deciding you could do it better.
Perhaps famously, emulators very clearly and objectively impact the market for a game consoles and computers and yet they are also considered fair use under US copyright law.
No one part of the 4 part test is more important than the others. And so far in the US, training and using an LLM has been ruled by the courts to be fair use so long as the materials used in the training were obtained legally.
> And so far in the US, training and using an LLM has been ruled by the courts to be fair use so long as the materials used in the training were obtained legally.
Just like OpenAI is rightfully upset if their LLM output is used to train a competitor’s model and might seek to restrict it contractually, publishers too may soon have EULAs just for reading their books.