Comment by deepsquirrelnet
Comment by deepsquirrelnet 6 months ago
> So, what can we use instead?
> The most powerful approach
> The best approach is to directly use LLM query to compare two entries.
Cross encoders are a solution I’m quite fond of, high performing and much faster. I recently put an STS cross encoder up on huggingface based on ModernBERT that performs very well.
Technically speaking, cross encoders are LLMs - they use the last layer to predict similarity (a single number) rather than the probability of the next token. They are faster than generative models only if they are simpler - otherwise, there is no performance gain (the last layer is negligible). In any case, even the simplest cross-encoders are more computationally intensive than those using a dot product from pre-computed vectors.
That said, for many applications, we may be perfectly fine with some version of a fine-tuned BERT-like model rather than using the newest AGI-like SoTA just to compare if two products are vaguely similar, and it is worth putting the other one in suggestions.