Comment by warangal
Embeddings capture a lot of semantic information based on the training data and objective function, and can be used independently for a lot of useful tasks.
I used to use embeddings from the text-encoder of CLIP model, to augment the prompt to better match corresponding images. For example given a word "building" in prompt, i would find the nearest neighbor in the embedding matrix like "concrete", "underground" etc. and substitute/append those after the corresponding word. This lead to a higher recall for most of the queries in my limited experiments!
Yup, and you can train these in-domain contextual relationships into the embedding models.
https://www.marqo.ai/blog/generalized-contrastive-learning-f...