Comment by y04nn

Comment by y04nn 10 months ago

How does CLIP compare to YOLO[1]? I haven't looked into image classification/object recognition for a while, but I remember that YOLO was quite good was working on realtime video too.

[1]: https://pjreddie.com/darknet/yolo/

Eisenstein 10 months ago

CLIP and YOLO work completely differently and have different purposes. CLIP uses transformers and embeddings and can compare text with images for classification. YOLO using a CNN and is trained with bounding boxes on images and is used for image recognition.

Give an image to CLIP and you can compare the similarity between the image and a sentence like 'a vase with roses in it'. Whereas with YOLO you give it an image and get the coordinates of bounding boxes around a vase, and around roses.

Reply View 0 replies