Comment by msgodel

Comment by msgodel 2 days ago

0 replies

Multimodal Qwen is pretty good at OCR although it's pretty slow without a GPU.

For pure search you're almost certainly better off building an index of CLIP embeddings and then doing cosine similarity with a query embedding to find things. I have gigabytes of reaction images and memes I've been thinking about doing this with.