Comment by magicalhippo

Comment by magicalhippo 4 days ago

1 reply

This might be a dumb question but... if I get the embeddings of words with a common theme like "burning", "warm", "cool", "freezing", would I be able to relatively well fit an arc (or line) between them? So that if I interpolate along that arc/line, I get vectors close to "hot" and "cold"?

authorfly 4 days ago

This was the original argument for the King-Queen-Man-Women Word2Vec paper - it turns out no, not beyond basic categories. Yes to a degree. But all embeddings as trained based on what the creator decides they want them to do; to represent semantic(meanginful) similarity - similar word use - or topics or domains - or level of language use - or indeed to work multilingually and to clump together embeddings in one language, etc.

Different models will give you different results - many are based on search-retrieval, for which MTEB is a good benchmark. But those ones won't generally "excel" at what you propose, they'll just be in the same area.