Comment by BobbyTables2

Comment by BobbyTables2 16 hours ago

7 replies

Indeed but since when is a blatantly derived work only using 50% of a copyrighted work without permission a paragon of copyright compliance?

Music artists get in trouble for using more than a sample without permission — imagine if they just used 45% of a whole song instead…

I’m amazed AI companies haven’t been sued to oblivion yet.

This utter stupidity only continues because we named a collection of matrices “Artificial Intelligence” and somehow treat it as if it were a sentient pet.

Amassing troves of copyrighted works illegally into a ZIP file wouldn’t be allowed. The fact that the meaning was compressed using “Math” makes everyone stop thinking because they don’t understand “Math”.

yorwba 15 hours ago

Music artists get in trouble for using more than a sample from other music artists without permission because their work is in direct competition with the work they're borrowing from.

A ZIP file of a book is also in direct competition of the book, because you could open the ZIP file and read it instead of the book.

A model that can take 50 tokens and give you a greater than 50% probability for the 50 next tokens 42% of the time is not in direct competition with the book, since starting from the beginning you'll lose the plot fairly quickly unless you already have the full book, and unlike music sampling from other music, the model output isn't good enough to read it instead of the book.

  • em-bee 12 hours ago

    this is the first sensible argument in defense of AI models i read in this debate. thank you. this does make sense.

    AI can reproduce individual sentences 42% of the time but it can't reproduce a summary.

    the question however us, is that in the design if AI tools or us that a limitation of current models? what if future models get better at this and are able to produce summaries?

  • otabdeveloper4 11 hours ago

    LLMs aren't probabilistic. The randomness is bolted on top by the cloud providers as a trick to give them a more humanistic feel.

    Under the hood they are 100% deterministic, modulo quantization and rounding errors.

    So yes, it is very much possible to use LLMs as a lossy compressed archive for texts.

    • fennecfoxy 9 hours ago

      Has nothing to do with "cloud providers". The randomness is inherent to the sampler, using a sampler that picks top probability for next token would result in lower quality output as I have definitely seen it get stuck in certain endless sequences when doing that.

      Ie you get something like "Complete this poem 'over yonder hills I saw' output: a fair maiden with hair of gold like the sun gold like the sun gold like the sun gold like the sun..." etc.

      • otabdeveloper4 8 hours ago

        > would result in lower quality output

        No it wouldn't.

        > seen it get stuck in certain endless sequences when doing that

        Yes, and infinite loops is just an inherent property of LLMs, like hallucinations.

Dylan16807 15 hours ago

> a blatantly derived work only using 50% of a copyrighted work without permission

What's the work here? If it's the output of the LLM, you have to feed in the entire book to make it output half a book so on an ethical level I'd say it's not an issue. If you start with a few sentences, you'll get back less than you put in.

If the work is the LLM itself, something you don't distribute is much less affected by copyright. Go ahead and play entire songs by other artists during your jam sessions.

colechristensen 15 hours ago

>Amassing troves of copyrighted works illegally into a ZIP file wouldn’t be allowed. The fact that the meaning was compressed using “Math” makes everyone stop thinking because they don’t understand “Math”.

LLMs are in reality the artifacts of lossy compression of significant chunks of all of the text ever produced by humanity. The "lossy" quality makes them able to predict new text "accurately" as a result.

>compressed using “Math”

This is every compression algorithm.