MontyCarloHall 6 hours ago

There is no evidence that RAG delivers equivalent performance to retraining on new data. Merely having information in the context window is very different from having it baked into the model weights. Relying solely on RAG to keep model results current would also degrade with time, as more and more information would have to be incorporated into the context window the longer it's been since the knowledge cutoff date.

  • fennecbutt an hour ago

    I honestly do not think that we should be training models to regurgitate training data anyway.

    Humans do this to a minimum degree, but the things that we can recount from memory are simpler than the contents of an entire paper, as an example.

    There's a reason we invented writing stuff down. And I do wonder if future models should be trying to optimise for rag with their training; train for reasoning and stringing coherent sentences together, sure, but with a focus on using that to connect hard data found in the context.

    And who says models won't have massive or unbounded contexts in the future? Or that predicting a single token (or even a sub-sequence of tokens) still remains a one shot/synchronous activity?