Comment by jampekka
1491 vs 1418 ELO means the stronger model wins about 60% of the time.
1491 vs 1418 ELO means the stronger model wins about 60% of the time.
I wouldn't trust LMArena results much. They measure user preference and users are highly skewed by style, tone etc.
You can litteraly "improve" your model on LMArena by just adding a bunch of emojis.
Probably naive questions:
Does that also mean that Gemini-3 (the top ranked model) loses to mistral 3 40% of the time?
Does that make Gemini 1.5x better, or mistral 2/3rd as good as Gemini, or can we not quantify the difference like that?