Comment by vintermann

Comment by vintermann 16 hours ago

All this study really says, is that models are really good at compressing the text of Harry Potter. You can't get Harry Potter out of it without prompting it with the missing bits - sure, impressively few bits, but is that surprising, considering how many references and fair use excerpts (like discussion of the story in public forums) it's seen?

There's also the question of how many bits of originality there actually are in Harry Potter. If trained strictly on text up to the publishing of the first book, how well would it compress it?

fiddlerwoaroof 15 hours ago

The alternate here is that Harry Potter is written with sentences that match the typical patterns of English and so, when you prompt with a part of the text, the LLM can complete it with above-random accuracy

Reply View 3 replies

vintermann 15 hours ago

Anything that can tell you what the typical patterns of English is, is going to be a language model by definition.

Reply View | 1 reply
- fiddlerwoaroof 15 hours ago
  
  My point is that this might just prove that Harry Potter is the sort of prose “fancy autocomplete” would produce and not all that original.
  EDIT Actually, on rereading, I see I replied to the wrong comment.
  
  Reply View | 0 replies
fiddlerwoaroof 15 hours ago

Or else, LLMs show that copyright and IP are ridiculous concepts that should be abolished

Reply View | 0 replies