Comment by DiabloD3

Comment by DiabloD3 11 hours ago

6 replies

Yes, but two fold.

There is no reason why I can't sue every single developer to ever use an LLM and publish and/or distribute that code for AGPLv3 violations. They cannot prove to the court that their model did not use AGPLv3 code, as they did not make the model. I can also, independently, sue the creator of the model, for any model that was made outside of China.

No wonder the model makers don't want to disclose who they pirated content from.

kabes 11 hours ago

Isn't it up to you to prove the model used AGPLv3 code, target then for them to prove they didn't?

  • DiabloD3 9 hours ago

    Not inherently.

    If their model reproduces enough of an AGPLv3 codebase near verbatim, and it cannot be simply handwaved away as a phonebook situation, then it is a foregone conclusion that they either ingested the codebase directly, or did so through somebody or something that did (which dooms purely synthetic models, like what Phi does).

    I imagine a lot of lawyers are salivating over the chance of bankrupting big tech.

    • reissbaker 9 hours ago

      The onus is on you to prove that the code was reproduced and is used by the entity you're claiming violated copyright. Otherwise literally all tools capable of reproduction — printing presses, tape recorders, microphones, cameras, etc — would pose existential copyright risks for everyone who owns one. The tool having the capacity for reproduction doesn't mean you can blindly sue everyone who uses it: you have to show they actually violated copyright law. If the code it generated wasn't a reproduction of the code you have the IP rights for, you don't have a case.

      TL;DR: you have not discovered an infinite money glitch in the legal system.

      • DiabloD3 6 hours ago

        Yes! All of those things DO pose existential copyright risks if they use them to violate copyright!. We're both on the same page.

        If you have a VHS deck, copy a VHS tape, then start handing out copies of it, I pick up a copy of it from you, and then see, lo and behold, it contains my copyrighted work, I have sufficient proof to sue you and most likely win.

        If you train an LLM on pirated works, then start handing out copies of that LLM, I pick up a copy of it, and ask it to reproduce my work, and it can do so, even partially, I have sufficient proof to sue you and most likely win.

        Technically, even involving "which license" is a bit moot, AGPLv3 or not, its a copyright violation to reproduce the work without license. GPL just makes the problem worse for them: anything involving any flavor of GPLv3 can end up snowballing with major GPL rightsholders enforcing the GPLv3 curing clause, as they will most likely also be able to convince the LLM to reproduce their works as well.

        The real TL;DR is: they have not discovered an infinite money glitch. They must play by the same rules everyone else does, and they are not warning their users of the risk of using these.

        BTW, if I was wrong about this, (IANAL after all), then so are the legal departments at companies across the world. Virtually all of them won't allow AGPLv3 programs in the door just because of the legal risk, and many of them won't allow the use of LLMs with the current state of the legal landscape.