Comment by YetAnotherNick
Comment by YetAnotherNick 3 days ago
Chatbot arena also has H2H win rate for each pair of models for non tied results[1], so as to detect the global drift. e.g the gpt-4o released on 2024/09/03 wins 69% of the times with respect to gpt-4o released on 2024/05/13 in blind test.
[1]: https://lmarena.ai/