Comment by romanhn

Comment by romanhn 4 days ago

4 replies

Say I generate embeddings for a bunch of articles. Given the query "articles about San Francisco that don't mention cars" would cosine similarity uprank or downrank the car mentions? Assuming exclusions aren't handled well, what techniques might I use to support them?

stared 4 days ago

It is up for testing, but you likely get the effect of "don't think about a pink elephant." So I guess that for most embedding models, "articles about San Francisco that don't mention cars" are closest to articles about SF that mention cars.

The fundamental issue here is comparing apples to oranges, questions, and answers.

  • romanhn 4 days ago

    So is LLM pre/post-processing the best approach here?

mirekrusin 4 days ago

I think you have to separate it into negative query and run (negative) rank and combine results yourself.

breadislove 4 days ago

no this wont work. embedding models at the moment are pretty bad with negations