Ask HN: How would you architect a RAG system for 10M+ documents today?
21 points by Ftrea 4 days ago
I'm tasked with building a private AI assistant for a corpus of 10 million text documents (living in PostgreSQL). The goal is semantic search and chat, with a requirement for regular incremental updates.
I'm trying to decide between:
Bleeding edge: Implementing something like LightRAG or GraphRAG.
Proven stack: Standard Hybrid Search (Weaviate/Elastic + Reranking) orchestrated by tools like Dify.
For those who have built RAG at this scale:
What is your preferred stack for 2025?
Is the complexity of Graph/LightRAG worth it over standard chunking/retrieval for this volume?
How do you handle maintenance and updates efficiently?
Looking for architectural advice and war stories.
Are the documents individually large or fairly small - like a page or two each? If they are small docs since you already have Postgres, you can just add the pgvector extension determine what embeddings that you want to use and try it out without committing to much. Maybe add a hash column first so that you can avoid paying to compute the embeddings again if you decide to use a different approach. They are all basically doing the same math to find things so you aren't going to get magically better results with other things. If the docs are larger then you have to do chunking anyway.
Would the 10M documents be searched with a single vector search or would it be pre-filtered by other columns in your table first. If some prefiltering is happening it naturally make things faster. You will likely want to use regular text / tsvector based search as well and potentially feed the LLM with this as well since vector search isn't perfect.
You would then decide if you want to do re-ranking or not before handing it to the final LLM context window. These days, models are pretty good so they will do their own re-ranking to some extent but depends a bit on cost, latency and the quality of result that you are looking for.