Comment by ninetyninenine
Comment by ninetyninenine a day ago
Right but the physical encoding already exists in my brain or how can I reproduce it in the first place? We may not know how the encoding works but we do know that an encoding exists because a decoding is possible.
My question is… is that in itself a violation of copyright?
If not then as long as LLMs don’t make a publication it shouldn’t be a copyright violation right? Because we don’t understand how it’s encoded in LLMs either. It is literally the same concept.
To me the primary difference between the potential "copy" that exists in your brain and a potential "copy" that exists in the LLM, is that you can't make copies and distribute your brain to billions of people.
If you compressed a copy of HP as a .rar, you couldn't read that as is, but you could press a button and get HP out of it. To distribute that .rar would clearly be a copyright violation.
Likewise, you can't read whatever of HP exists in the LLM model directly, but you seemingly can press a bunch of buttons and get parts of it out. For some models, maybe you can get the entire thing. And I'm guessing you could train a model whose purpose is to output HP verbatim and get the book out of it as easily as de-compressing a .rar.
So, the question in my mind is, how similar is distributing the LLM model, or giving access to it, to distributing a .rar of HP. There's likely a spectrum of answers depending on the LLM