Comment by visioninmyblood
Comment by visioninmyblood 11 hours ago
I agree claude and chatgpt and even gemini does a poor job in detecting and cropping into a region. Some of the simplest tasks, Qwen also is great at summerization but not into solving simple vision tasks like cropping, segmentetation and detection. Here is an examples where we compared claude, gemini, chatgpt and other frontier models for simple(and complicated) visual tasks https://chat.vlm.run/showdown#:~:text=Crop%20into%20the%20cl...
The part that was funny to me is I would respond "is that right?" and it would tell me exactly how it was wrong and proceed to do it incorrectly again in a very similar but different way. It was like a Monty Python sketch. I might have also been very tired and easily amused.