Comment by raincole

Comment by raincole 18 hours ago

2 replies

(Disclaimer: haven't read the original paper)

It sounds like a ridiculous way to measure it. Producing 50-token excerpts absolutely doesn't translate to "recall X percent of Harry Potter" for me.

(Edit: I read this article. Nothing burger if its interpretation of the original paper is correct.)

tanaros 18 hours ago

Their methodology seems reasonable to me.

To clarify, they look at the probability a model will produce a verbatim 50-token excerpt given the preceding 50 tokens. They evaluate this for all sequences in the book using a sliding window of 10 characters (NB: not tokens). Sequences from Harry Potter have substantially higher probabilities of being reproduced than sequences from less well-known books.

Whether this is "recall" is, of course, one of those tricky semantic arguments we have yet to settle when it comes to LLMs.

  • raincole 13 hours ago

    > one of those tricky semantic arguments we have yet to settle when it comes to LLMs

    Sure. But imagine this: In a hypothetical world where LLMs never ever exist, I tell you that I can recall 42 percent of the first Harry Potter book. What would you assume I can do?

    It's definitely not "this guy can predict next 10 characters with 50% accuracy."

    Of course the semantic of 'recall' isn't the point of this article. The point is that Harry Potter was in the training set. But I still think it's a nothing burger. It would be very weird to assume Llama was trained on copyright-free materials only. And afaik there isn't a legal precedent saying training on copyrighted materials is illegal.