Comment by refulgentis

Its a 17 MB model that benchmarks obviously worse than MiniLM v2 (which is SBERT). I run V3 on ONNX on every platform you can think of with a 23 MB model.

I don't intend for that to be read as dismissive, it's just important to understand work like this in context - here, it's that there's a cool trick where if you get to an advanced understanding of LLMs, you notice they have embeddings too, and if that is your lens, it's much more straightforward to take a step forward and mess with those, than take a step back and survey the state of embeddings.