uejfiweun 12 hours ago

Wow. If all the trillions only produces that small of a diff... that's shocking. That's the sort of knowledge that could pop the bubble.

  • JustFinishedBSG 4 hours ago

    I wouldn't trust LMArena results much. They measure user preference and users are highly skewed by style, tone etc.

    You can litteraly "improve" your model on LMArena by just adding a bunch of emojis.