Comment by petesergeant
Comment by petesergeant 3 hours ago
Yeah? Which commercial provider’s model do you think was trained without using lyrics?
Comment by petesergeant 3 hours ago
Yeah? Which commercial provider’s model do you think was trained without using lyrics?
They scan for commercial work already. Isn’t the law about training, not output?
they clearly didn't do that properly, or we wouldn't have the current law suite
the lawsuit was also not about weather it is or isn't copy right infringement. It was about who is responsible (OpenAI or the user who tries to bait it into making another illegal copy of song lyrics).
A model outputting song lyrics means it has it stored somehow somewhere. Just because the storage is in a lossy compressed obscure hyper dimensional transformation of some kind, doesn't mean it didn't store an illegal copy. Or it wouldn't have been able to output it. _Technical details do not protect from legal responsibilities (in general)_
you could (maybe should) add new laws which in some form treat LLM memorized things the same as if a human did memorize it, but currently LLMs have no special legal treatment when it comes to them storing copies of things.
Perhaps; I didn't read the court ruling.
But I'd be surprised if that was generally the case. It's easy to see why ChatGPT 1:1 reproducing a song's lyrics would be a copyright issue. But creating a derivative work based on the song?
What if I made a website that counts the number of alliterations in certain songs' lyrics? Would that be copyright infringement, because my algorithm uses the original lyrics to derive its output?
If this ruling really applied to any alogrithm deriving content from copyright protected works, it would be pretty absurd.
But absurd copyright laws would be nothing new, so I won't discount the possibility.
> But creating a derivative work based on the song?
1. it wouldn't matter as derivative work still needs the original license
2. expect if it's not derivative but just inspired,
and the court case was about it being pretty much _the same work_
OpenAIs defense also wasn't that it's derived or inspired but, to quote
> Since the output would only be generated as a result of user inputs known as prompts, it was not the defendants, but the respective user who would be liable for it, OpenAI had argued.
and the court oder said more or less
- if it can reproduce the song lyrics it means it stored a copy of the song lyrics somehow somewhere (memorization), but storing copies requires a license and OpenAI has no license
- it it outputs a copy of the song lyrics it means it's making another copy of them and giving them to the user which is copyright infringement
and this makes sens, if a human memorizes a song and then writes it down when asked it's still is and always has been copyright infringement (else you could just launder copy right by hiring people to memorize things and then write them down, which would be ridiculous).
and technically speaking LLMs are at the core a lossy compressed storage of their training content + statistic models about them. And to be clear that isn't some absurd around five corners reasoning. It's a pretty core aspect of their design. And to be clear this are things well know even before LLMs became a big deal and OpenAI got huge investment. OpenAI pretty much knew about this being a problem from the get to go. But like any recent big US "startup" following the law doesn't matter.
it technically being a unusual form of lossy compressed storage means it makes that the memorization counts as a copyright infringement (with current law)
but I would argue the law should be improved in that case, so that under some circumstances "memorization" in LLMs is treated as "memorization" in Humans (i.e. not a illegal copy, until you make it one by writing it down). But you can't make it all circumstances because like mentioned you can use the same tech to bascially to lossy file compression and you don't want people to launder copy right by training an LLM on a a single text/song/movie and then distributing that...
That seems like a really broad interpretation of "technically memorization" that could have unintended side effects (like say banning equations that could be used to generate specific lyrics), but I suppose some countries consider loading into RAM a copy already. I guess we're already at absurdity
The point is that some other vendor will do the work to implement the filtering required by Germany even if OpenAI doesn't.