Comment by Retric

Comment by Retric 6 months ago

4 replies

I think you may have something with that line of reasoning.

The threshold for transformative for fictional works is fairly high unfortunately. Fan fiction and reasonably distinct works with excessive inspiration are both copyright infringing. https://en.wikipedia.org/wiki/Tanya_Grotter

> Models themselves are very clearly transformative.

A near word for word copy of large sections of a work seems nowhere near that threshold. An MP3 isn’t even close to a 1:1 copy of a piece of music but the inherent differences are irrelevant, a neural network containing and allowing the extraction of information looks a lot like lossy compression.

Models could easily be transformative, but the justification needs to go beyond well obviously they are.

Lerc 6 months ago

Models are not word for word copies of large sections of text. They are capable of emitting that text though.

It would be interesting to look at what legal precidents were set regarding mp3s or other encodings. Is the encoding itself an infringement, or is it the decoding, or is it the distribution of a decodable form of a work.

There is also the distinction with a lossy encoding that encodes a single work. There is clarity when the encoded form serves no other purpose other than to be decoded into a given work. When the encoding acts as a bulk archive, does the responsibility shift to those who choose what to extract from the archive?

  • int_19h 6 months ago

    > When the encoding acts as a bulk archive, does the responsibility shift to those who choose what to extract from the archive?

    If you take many gigabytes of, say, public domain music, and stick them on a flash drive with just one audio file that is an unlicensed copy of a copyrighted song, distributing that drive would constitute copyright infringement, quite obviously so. I don't see why it'd matter what else the model can produce, if it can produce that one thing verbatim by itself.

    (If you could only prompt the model to regurgitate the original text with a framing of, say, critical analysis of said text around it, and not in any other context, then I think there would be a stronger fair use argument here.)

  • Retric 6 months ago

    > Is the encoding itself an infringement

    Barring a fair use exception, yes.

    From what I’ve read MP3’s get the same treatment as cassette tapes which were also lossy. It’s 1:1 digital copies that represented some novelty, but that rarely matters.

    I’m hesitant to comment of the rest of that. The ultimate question isn’t if some difference exists but why that difference matters.

  • [removed] 6 months ago
    [deleted]