Comment by pamelafox

Comment by pamelafox 6 months ago

If you're using cosine similarity when retrieving for a RAG application, a good approach is to then use a "semantic re-ranker" or "L2 re-ranking model" to re-rank the results to better match the user query.

There's an example in the pgvector-python that uses a cross-encoder model for re-ranking: https://github.com/pgvector/pgvector-python/blob/master/exam...

You can even use a language model for re-ranking, though it may not be as good as a model trained specifically for re-ranking purposes.

In our Azure RAG approaches, we use the AI Search semantic ranker, which uses the same model that Bing uses for re-ranking search results.

pamelafox 6 months ago

Another tip: do NOT store vector embeddings of nothingness, mostly whitespace, a solid image, etc. We've had a few situations with RAG data stores which accidentally ingested mostly-empty content (either text or image), and those dang vectors matched EVERYTHING. WAs I like to think of it, there's a bit of nothing in everything.. so make sure that if you are storing a vector embedding, there is some amount of signal in that embedding.

Reply View 9 replies

variaga 6 months ago

Interesting. A project I worked on (audio recognition for a voice-command system) we ended up going the other way and explicitly adding an encoding of "nothingness" (actually 2, one for "silence" and another for "white noise") and special casing them ("if either 'silence' or 'noise' is in the top 3 matches, ignore the input entirely").
This was to avoid the problem where, when we only had vectors for "valid" sounds and there was an input that didn't match anything in the training set (a foreign language, garbage truck backing up, a dog barking, ...) the model would still return some word as the closest match (there's always a vector that has the highest similarity) and frequently do so with high confidence i.e. even though the actual input didn't actually match anything in the training set, it would be "enough" more like one known vector than any of the others that it would pass most threshold tests, leading to a lot of false positives.

Reply View | 0 replies
pbhjpbhj 6 months ago

That sounds like a problem for the embedding, would you need to renormalise so that low signal inputs could be well represented. A white square and a red square shouldn't be different levels of details. Depending on the purpose of the vector embedding, there should be a difference between images of mostly white pixels and partial images.
Disclaimer, I don't know shit.

Reply View | 4 replies
- pamelafox 6 months ago
  
  I should clarify that I experienced these issues with text-embedding-ada-002 and the Azure AI vision model (based on Florence). I have not tested many other embedding models to see if they'd have the same issue.
  
  Reply View | 3 replies
  
  refulgentis 6 months ago
  
  FWIW I think you're right, we have very different stacks, and I've observed the same thing, with a much clunkier description thank your elegant way of putting it.
  I do embeddings on arbitrary websites at runtime, and had a persistent problem with the last chunk of a web page matching more things. In retrospect, its obvious that the smaller the chunk was, the more it was matching everything
  Full details: MSMARCO MiniLM L6V3 inferenced using ONNX on iOS/web/android/macos/windows/linux
  
  Reply View | 0 replies
  
  mattvr 6 months ago
  
  You could also work around this by adding a scaling transformation that normalizes and centers (e.g. sklearn StandardScaler) in between the raw embeddings — based on some example data points from your data set. Might introduce some bias, but I’ve found this helpful in some cases with off the shelf embeddings.
  
  Reply View | 0 replies
  
  OutOfHere 6 months ago
  
  Use horrible quality embeddings and get horrible results. No surprise there. ada is obsolete - I would never want to use it.
  
  Reply View | 0 replies
jhy 6 months ago

We used to have this problem in AWS Rekognition; a poorly detected face -- e.g. a blurry face in the background -- would hit with high confidence with every other blurry face. We fixed that largely by adding specific tests against this [effectively] null vector. The same will work for text or other image vectors.

Reply View | 0 replies
short_sells_poo 6 months ago

If you imagine a cartesian coordinate space where your samples are clustered around the origin, then a zero vector will tend to be close to everything because it is the center of the cluster. Which is a different way of saying that there's a bit of nothing in everything I guess :)

Reply View | 0 replies
jsenn 6 months ago

Same experience embedding random alphanumeric strings or strings of digits with smaller embedding models—very important to filter those out.

Reply View | 0 replies

pilooch 6 months ago

Statistically you want the retriever to be trained for cosine similarity. Vision LLM retriever such as DSE do this correctly. No need for reranker once done.

Reply View 1 reply

OutOfHere 6 months ago

Precisely. Ranking is a "smell" in this regard. They are using ada embedding which I consider to be of poor quality.

Reply View | 0 replies

antirez 6 months ago

I propose a different technique:

- Use a large context LLM.

- Segment documents to 25% of context or alike.

- With RAG, retrieve fragments from all the documents, they do a first pass semantic re-ranking like this, sending to the LLM:

I have a set of documents I can show you to reply the user question "$QUESTION". Please tell me from the title and best matching fragments what document IDs you want to see to better reply:

[Document ID 0]: "Some title / synopsis. From page 100 to 200"

... best matching fragment of document 0...

... second best fragment ...

[Document ID 1]: "Some title / synopsis. From page 200 to 300"

... fragmnets ...

LLM output: show me 3, 5, 13.

New query, with attached the full documents for 75% of context window.

"Based on the attached documents in this chat, reply to $QUESTION".

Reply View 2 replies

datadrivenangel 6 months ago

Slow/expensive. Good idea otherwise.

Reply View | 1 reply
- danielmarkbruce 6 months ago
  
  but inference time compute is the new hotness.
  
  Reply View | 0 replies

sharath7693000 6 months ago

great, would love to see your application, is it on github?

Reply View 0 replies