Comment by aprilthird2021
Comment by aprilthird2021 17 hours ago
> let's not pretend that an LLM that autocompletes a couple lines from harry potter with 50% accuracy is some massive new avenue to piracy. No one is using this as a substitute for buying the book.
Well, luckily the article points out what people are actually alleging:
> There are actually three distinct theories of how training a model on copyrighted works could infringe copyright:
> Training on a copyrighted work is inherently infringing because the training process involves making a digital copy of the work.
> The training process copies information from the training data into the model, making the model a derivative work under copyright law.
> Infringement occurs when a model generates (portions of) a copyrighted work.
None of those claim that these models are a substitute to buying the books. That's not what the plaintiffs are alleging. Infringing on a copyright is not only a matter of privacy (piracy is one of many ways to infringe copyright)
I think that last scenario seems to be the most problematic. Technically it is the same thing that piracy via torrent does, distributing a small piece of a copyrighted material without the copyright holders consent.