Comment by nylonstrung
Comment by nylonstrung a day ago
My experience with deepseek and Kimi is quite the opposite: smarter than benchmarks would imply
Whereas the benchmark gains seem by new OpenAI, Grok and Claude models don't feel accompanied by vibe improvement