Comment by dspoka

Comment by dspoka 10 months ago

Looks cool! Any advantages to the mini-lm model - it seems better on most mteb tasks but wondering if maybe inference or something is better.

deepsquirrelnet 10 months ago

Mini-lm is a better embedding model. This model does not perform attention calculations, or use a deep learning framework after training. You won’t get the contextual benefits of transformer models in this one.

It’s not meant to be a state of the art model though. I’ve put in pretty limiting constraints in order to keep dependencies, size and hardware requirements low, and speed high.

Even for a word embedding model it’s quite lightweight, as those have much larger vocabularies are are typically a few gigabytes.

Reply View 4 replies

ryeguy_24 10 months ago

Which do use attention? Any recommendations?

Reply View | 3 replies
- nostrebored 10 months ago
  
  Depends immensely on use case — what are your compute limitations? are you fine with remote code? are you doing symmetric or asymmetric retrieval? do you need support in one language or many languages? do you need to work on just text or (audio, video, image)? are you working in a specific domain?
  A lot of people wind up using models based purely on one or two benchmarks and wind up viewing embedding based projects as a failure.
  If you do answer some of those I’d be happy to give my anecdotal feedback :)
  
  Reply View | 1 reply
  
  ryeguy_24 10 months ago
  
  Sorry, I wasn’t clear. I was speaking about utility models/libraries to compute things like meaning similarity with not just token embeddings but with attention too. I’m really interested in finding a good utility that leverages the transformer to compute “meaning similarity” between two texts.
  
  Reply View | 0 replies
- deepsquirrelnet 10 months ago
  
  Most current models are transformer encoders that use attention. I like most of the options that ollama provides.
  I think this one is currently the top of the MTEB leaderboard, but large dimension vectors and a multi billion parameter model: https://huggingface.co/nvidia/NV-Embed-v1
  
  Reply View | 0 replies

lennxa 10 months ago

looks like it's the size of the model itself, more lightweight and faster. mini-lm is 80mb while the smallest one here is 16mb.

Reply View 2 replies

authorfly 10 months ago

Mini-lm isn't optimized to be as small as possible though, and is kind of dated. It was trained on a tiny amount of similarity pairs compared to what we have available today.
As of the last time I did it in 2022, Mini-lm can be distilled to 40mb with only limited loss in accuracy, so can paraphrase-MiniLM-L3-v1 (down to 21mb), by reducing the dimensions by half or more and projecting a custom matrix optimization(optionally, including domain specific or more recent training pairs). I imagine today you could get it down to 32mb (= project to ~156 dim) without accuracy loss.

Reply View | 1 reply
- byefruit 10 months ago
  
  What are some recent sources for high quality similarity pairs?
  
  Reply View | 0 replies