Comment by vintermann

Comment by vintermann 16 hours ago

4 replies

All this study really says, is that models are really good at compressing the text of Harry Potter. You can't get Harry Potter out of it without prompting it with the missing bits - sure, impressively few bits, but is that surprising, considering how many references and fair use excerpts (like discussion of the story in public forums) it's seen?

There's also the question of how many bits of originality there actually are in Harry Potter. If trained strictly on text up to the publishing of the first book, how well would it compress it?

fiddlerwoaroof 15 hours ago

The alternate here is that Harry Potter is written with sentences that match the typical patterns of English and so, when you prompt with a part of the text, the LLM can complete it with above-random accuracy

  • vintermann 15 hours ago

    Anything that can tell you what the typical patterns of English is, is going to be a language model by definition.

    • fiddlerwoaroof 15 hours ago

      My point is that this might just prove that Harry Potter is the sort of prose “fancy autocomplete” would produce and not all that original.

      EDIT Actually, on rereading, I see I replied to the wrong comment.

  • fiddlerwoaroof 15 hours ago

    Or else, LLMs show that copyright and IP are ridiculous concepts that should be abolished