Comment by warangal

Comment by warangal 5 days ago

2 replies

Embeddings capture a lot of semantic information based on the training data and objective function, and can be used independently for a lot of useful tasks.

I used to use embeddings from the text-encoder of CLIP model, to augment the prompt to better match corresponding images. For example given a word "building" in prompt, i would find the nearest neighbor in the embedding matrix like "concrete", "underground" etc. and substitute/append those after the corresponding word. This lead to a higher recall for most of the queries in my limited experiments!

deepsquirrelnet 5 days ago

That’s a really cool idea. I’ll think about it some more, because it sounds like a feasible implementation for this. I think if you take the magnitude of any token embedding in wordllama, it might also help identify important tokens to augment. But it might work a lot better if trained on data selected for this task.