Comment by jxjnskkzxxhx

Comment by jxjnskkzxxhx 21 hours ago

Suppose for simplicity that every sentence in the book is 50 tokens or shorter.

According to the stated methodology, I could give the LLM sentence 1 and have 42% chance of getting sentence 2 recalled. Then I could give it sentence 2 and have 42% chance of getting sentence 3. Therefore, the LLM contains 42% of the book in some sense.

I disagree this is "not really very much". If a person could do this you would undoubtedly conclude that the person read the book.

In fact the number 42% even understates the severity of the matter. Superficially it makes it sound that the LLM only contains less than half of the book. In reality the process I described applies to 100% of the sentences. Additionally I'm guessing that the 58% times where the 50 tokens arent recalled correctly, the outputted token probably have the same meaning as the correct one.

TeMPOraL 19 hours ago

Except it's not what happened, per the article. Instead, they walked down the logits, which is more like asking someone to give 10-20 best guesses for next word, and should one of them match the secret answer, telling them which one is it and asking them to go on with the next word. Seems like a substantially easier task, and most of information is coming from researchers making a choice at every step.

Reply View 0 replies