Comment by fennecbutt

I honestly do not think that we should be training models to regurgitate training data anyway.

Humans do this to a minimum degree, but the things that we can recount from memory are simpler than the contents of an entire paper, as an example.

There's a reason we invented writing stuff down. And I do wonder if future models should be trying to optimise for rag with their training; train for reasoning and stringing coherent sentences together, sure, but with a focus on using that to connect hard data found in the context.

And who says models won't have massive or unbounded contexts in the future? Or that predicting a single token (or even a sub-sequence of tokens) still remains a one shot/synchronous activity?