Comment by yorwba

Comment by yorwba 16 hours ago

4 replies

Music artists get in trouble for using more than a sample from other music artists without permission because their work is in direct competition with the work they're borrowing from.

A ZIP file of a book is also in direct competition of the book, because you could open the ZIP file and read it instead of the book.

A model that can take 50 tokens and give you a greater than 50% probability for the 50 next tokens 42% of the time is not in direct competition with the book, since starting from the beginning you'll lose the plot fairly quickly unless you already have the full book, and unlike music sampling from other music, the model output isn't good enough to read it instead of the book.

em-bee 13 hours ago

this is the first sensible argument in defense of AI models i read in this debate. thank you. this does make sense.

AI can reproduce individual sentences 42% of the time but it can't reproduce a summary.

the question however us, is that in the design if AI tools or us that a limitation of current models? what if future models get better at this and are able to produce summaries?

otabdeveloper4 12 hours ago

LLMs aren't probabilistic. The randomness is bolted on top by the cloud providers as a trick to give them a more humanistic feel.

Under the hood they are 100% deterministic, modulo quantization and rounding errors.

So yes, it is very much possible to use LLMs as a lossy compressed archive for texts.

  • fennecfoxy 10 hours ago

    Has nothing to do with "cloud providers". The randomness is inherent to the sampler, using a sampler that picks top probability for next token would result in lower quality output as I have definitely seen it get stuck in certain endless sequences when doing that.

    Ie you get something like "Complete this poem 'over yonder hills I saw' output: a fair maiden with hair of gold like the sun gold like the sun gold like the sun gold like the sun..." etc.

    • otabdeveloper4 9 hours ago

      > would result in lower quality output

      No it wouldn't.

      > seen it get stuck in certain endless sequences when doing that

      Yes, and infinite loops is just an inherent property of LLMs, like hallucinations.