Comment by brunosan

Hey hey! We tried Clay v1 with 768 embeddings size using your tutorials. We then split NAIP SF to chips and indexed them. Afterwards, we performed image-to-image similarity search like in your explorer.

We tried to search for bridges, beaches, tennis courts, etc. It worked, but it didn't work well. The top of the ranking was filled with unrelated objects. We found that similarity scores are stacked together too much (similarity values are between 0.91 and 0.92 with 4 digit difference, ~200k tiles), so the encoder made very little difference between objects.

I believe that Clay can be used with additional fine-tuning for classification and segmentation, but standalone embeddings are pretty poor.

Check this: https://github.com/wangzhecheng/SkyScript. It is a dataset of OSM tags and satellite images. CLIP fine-tuned on that gives good embeddings for text-to-image search as well as image-to-image.