Comment by dathinab

Comment by dathinab 3 hours ago

1 reply

> But creating a derivative work based on the song?

1. it wouldn't matter as derivative work still needs the original license

2. expect if it's not derivative but just inspired,

and the court case was about it being pretty much _the same work_

OpenAIs defense also wasn't that it's derived or inspired but, to quote

> Since the output would only be generated as a result of user inputs known as prompts, it was not the defendants, but the respective user who would be liable for it, OpenAI had argued.

and the court oder said more or less

- if it can reproduce the song lyrics it means it stored a copy of the song lyrics somehow somewhere (memorization), but storing copies requires a license and OpenAI has no license

- it it outputs a copy of the song lyrics it means it's making another copy of them and giving them to the user which is copyright infringement

and this makes sens, if a human memorizes a song and then writes it down when asked it's still is and always has been copyright infringement (else you could just launder copy right by hiring people to memorize things and then write them down, which would be ridiculous).

and technically speaking LLMs are at the core a lossy compressed storage of their training content + statistic models about them. And to be clear that isn't some absurd around five corners reasoning. It's a pretty core aspect of their design. And to be clear this are things well know even before LLMs became a big deal and OpenAI got huge investment. OpenAI pretty much knew about this being a problem from the get to go. But like any recent big US "startup" following the law doesn't matter.

it technically being a unusual form of lossy compressed storage means it makes that the memorization counts as a copyright infringement (with current law)

but I would argue the law should be improved in that case, so that under some circumstances "memorization" in LLMs is treated as "memorization" in Humans (i.e. not a illegal copy, until you make it one by writing it down). But you can't make it all circumstances because like mentioned you can use the same tech to bascially to lossy file compression and you don't want people to launder copy right by training an LLM on a a single text/song/movie and then distributing that...

knollimar 2 hours ago

That seems like a really broad interpretation of "technically memorization" that could have unintended side effects (like say banning equations that could be used to generate specific lyrics), but I suppose some countries consider loading into RAM a copy already. I guess we're already at absurdity