Comment by curl-up
Tone of your post is really strange and condescending, not sure why. You made a statement that I, in my work, very often see people make when they first start learning about embeddings (expecting words that we humans see as "opposite" to actually have opposite embeddings), and I corrected it, as it might help other people reading this thread.
> Firstly, you can choose what you embed the word with, such as "Article Topic:" or "Temperature:" to adjust the output of the embedding and results of cosine similarity to be relevant for your use case
As far as LLM-based embeddings go, unless you train the model for this type of format, this is not true at all. In fact, the opposite is true - adding such qualifiers before your text only increases the similarity, as those two texts are, in fact, more similar after such additions. I am aware that instruct-embedding models work, but their performance and flexibility is, in my experience, very limited
As for the rest of your post, I really don't see why you are trying to convince me that LLM-based embeddings have so much more to them than previous models. I am very well aware of this - my work revolves around such new models. I simply corrected a common misconception that you gave, and I don't really care if you "really think that" or if you know what the truth is but just wrote it as an off-hand remark.
Saying "Perfectly opposite" does not need to mean the mathematical cosine similarity would be -1. The point you implied by bringing up this irrelevant information is to be dismissive of the relevance of generative model embeddings for different tasks (and 0.41 is less similar than you get in previous embedding modes which don't have the rich context of LLMs or RLFF models). This is why you got the snarky tone back, you took an unnecessary literal interpretation, and revealed in your later paragraphs a dated attitude to embeddings that you tend to get from a surface level understanding i.e. that adjective, noun or other PoS type or presence is more important for similarity (e.g. adjectives are closer to each other in Word2Vec but NOT consistently so in generative embeddings).
Ofcourse embeddings prefixed will be generally closer. You misunderstand the use case and are looking at embeddings in an outdated way. The point is thus:
When I want to use embeddings to model newspaper articles, I put "Article:" infront of the topic as I embed it, and for that purpose they will suite my needs better. When I need to use embeddings for temperature or scientific literature purposes, I might put "Temperature:" in front of them, and "Burning"/"Freezing" will be further apart. That is useful in a way that Word2Vec, GloVe and even to lesser degree SBERT cannot do.
The misconception you claim is based on Word2Vec and GloVe and not true generall - words can have several senses with polysemy, as can phrases anyhow so it's a difficult point to argue for in the first place - when you say " words that have the opposite meaning will have opposite embeddings. Instead, words with opposite meanings have a lot in common" is only true of embeddings from Word2Vec, GloVe, and the early BART era, which are quickly falling out of fashion as they are limited. Your understanding is dated, and you see a misconception, because you have failed to adequately explore or understand the possible use cases or representations viable with these embeddings. There is so much more! You can embed across languages. You can embed conversations!
As for your call to authority - I don't need to make such a claim - I'm sorry if you work in a job stuck in the past trying to apply pre 2020 understanding of NLP to 2024 models but well, that sounds like your choice. To me, it sounds like you're assuming the past holds true and taking points absolutely; is that really wise in a fast-changing field? There have been several hackathons about embeddings. Try exploring the recent ones and look at what is really possible.