Comment by pamelafox
Comment by pamelafox 4 days ago
If you're using cosine similarity when retrieving for a RAG application, a good approach is to then use a "semantic re-ranker" or "L2 re-ranking model" to re-rank the results to better match the user query.
There's an example in the pgvector-python that uses a cross-encoder model for re-ranking: https://github.com/pgvector/pgvector-python/blob/master/exam...
You can even use a language model for re-ranking, though it may not be as good as a model trained specifically for re-ranking purposes.
In our Azure RAG approaches, we use the AI Search semantic ranker, which uses the same model that Bing uses for re-ranking search results.
Another tip: do NOT store vector embeddings of nothingness, mostly whitespace, a solid image, etc. We've had a few situations with RAG data stores which accidentally ingested mostly-empty content (either text or image), and those dang vectors matched EVERYTHING. WAs I like to think of it, there's a bit of nothing in everything.. so make sure that if you are storing a vector embedding, there is some amount of signal in that embedding.