kacperlukawski 4 days ago

The problem is to scale that properly. If you have millions of documents, that won't scale that well. You are not going to prompt the LLM millions of times, aren't you?

Embedding models usually have fewer parameters than the LLMs, and once we index the documents, their retrieval is also pretty fast. Using LLM as a judge makes sense, but only on a limited scale.