Comment by littlestymaar

Comment by littlestymaar 2 days ago

4 replies

> An mp3 file is also a machine-generated lossy compression of a cd-quality .wav file, but it's clearly copyrightable.

Not the .mp3 itself, the creative piece of art that it encode.

You can't record Taylor Swift at a concert and claim copyright on that. Nor can you claim copyright on mp3 re-encoded old audio footage that belong to the public domain.

Whether LLMs are in the first category (copyright infringement of copyright holders of the training data) or in the second (public domain or fair use) is an open question that jurisprudence is slowly resolving depending on the jurisdiction, but that doesn't address the question of the weight themselves.

mitthrowaway2 2 days ago

Right, the .mp3 is machine generated but on a creatively -generated input. The analogy I'm making is that an LLM's weights (or let's say, a diffusion image model) are also machine-generated (by the training process) from the works in its training set, many of which are creative works, and the neural network encodes those creative works much like mp3 file does.

In this analogy, distributing the weights would be akin to distributing an mp3, and offering a genAI service, like charGPT inference or a stable diffusion API, would be akin to broadcasting.

  • CamperBob2 a day ago

    A better analogy, when it comes to weights, is this: You are a skilled musician. You talk to another musician on the phone. She tells you in detail about a new song from yet another musician. It has these chords, it's in this key, it has this time signature and syncopation, it uses these instruments, it uses these effects. She doesn't tell you the lyrics, though. Mostly, she just tells you how it's related to a lot of other songs in the same genre.

    You now have some "weights." You go to your studio and compose something that's so close to the original song that you'd definitely end up on the wrong end of a lawsuit like The Verve vs. The Rolling Stones if you were to release it.

    Now... how in the world does this state of affairs "promote the progress of science and the useful arts" or otherwise make the world a better place?

    The copyright industry cannot be allowed to stop AI, or put a tollgate on it. It simply can't. If it dies, it dies. We got along just fine without it for thousands of years. AI is the next stage of our own evolution as Homo sapiens, while copyright is a dead-end street.

  • littlestymaar 2 days ago

    I'd be fine with this interpretation, but that would definitely rule out fair use for training, and be even worse for LLM makers than having LLM non-copyrightable.