Comment by warangal

Comment by warangal 10 months ago

Embeddings capture a lot of semantic information based on the training data and objective function, and can be used independently for a lot of useful tasks.

I used to use embeddings from the text-encoder of CLIP model, to augment the prompt to better match corresponding images. For example given a word "building" in prompt, i would find the nearest neighbor in the embedding matrix like "concrete", "underground" etc. and substitute/append those after the corresponding word. This lead to a higher recall for most of the queries in my limited experiments!

nostrebored 10 months ago

Yup, and you can train these in-domain contextual relationships into the embedding models.

https://www.marqo.ai/blog/generalized-contrastive-learning-f...

Reply View 0 replies

deepsquirrelnet 10 months ago

That’s a really cool idea. I’ll think about it some more, because it sounds like a feasible implementation for this. I think if you take the magnitude of any token embedding in wordllama, it might also help identify important tokens to augment. But it might work a lot better if trained on data selected for this task.

Reply View 0 replies