Comment by deepsquirrelnet
Comment by deepsquirrelnet 5 days ago
Mini-lm is a better embedding model. This model does not perform attention calculations, or use a deep learning framework after training. You won’t get the contextual benefits of transformer models in this one.
It’s not meant to be a state of the art model though. I’ve put in pretty limiting constraints in order to keep dependencies, size and hardware requirements low, and speed high.
Even for a word embedding model it’s quite lightweight, as those have much larger vocabularies are are typically a few gigabytes.
Which do use attention? Any recommendations?