littlestymaar 2 days ago

We don't even have to do that: weights being entirely machine generated without human intervention, they are likely not copyrightable in the first place.

In fact, we should collectively refuse to abide to these fantasy license before weight copyrightability gets created out of thin air because it's been commonplace for long enough.

  • mitthrowaway2 2 days ago

    There's an argument by which machine-learned neural network weights are a lossy compression of (as well as a smooth interpolator over) the training set.

    An mp3 file is also a machine-generated lossy compression of a cd-quality .wav file, but it's clearly copyrightable.

    To that extent, the main difference between a neural network and an .mp3 is that the mp3 compression cannot be used to interpolate between two copyrighted works to output something in the middle. This is, on the other hand, perhaps the most common use case for genAI, and it's actually tricky to get it to not output something "in the middle" (but also not impossible).

    I think the copyright argument could really go either way here.

    • littlestymaar 2 days ago

      > An mp3 file is also a machine-generated lossy compression of a cd-quality .wav file, but it's clearly copyrightable.

      Not the .mp3 itself, the creative piece of art that it encode.

      You can't record Taylor Swift at a concert and claim copyright on that. Nor can you claim copyright on mp3 re-encoded old audio footage that belong to the public domain.

      Whether LLMs are in the first category (copyright infringement of copyright holders of the training data) or in the second (public domain or fair use) is an open question that jurisprudence is slowly resolving depending on the jurisdiction, but that doesn't address the question of the weight themselves.

      • mitthrowaway2 2 days ago

        Right, the .mp3 is machine generated but on a creatively -generated input. The analogy I'm making is that an LLM's weights (or let's say, a diffusion image model) are also machine-generated (by the training process) from the works in its training set, many of which are creative works, and the neural network encodes those creative works much like mp3 file does.

        In this analogy, distributing the weights would be akin to distributing an mp3, and offering a genAI service, like charGPT inference or a stable diffusion API, would be akin to broadcasting.

larodi 2 days ago

Of course we should! And everyone who says otherwise must be delusional or sort of a gaslighter, as this whole "innovation" (or remix (or comopression)) is enabled by the creative value of the source product. Given AI companies never ever respected this copyright, we should give them similar treatment.