Comment by vjerancrnjak
Comment by vjerancrnjak 21 hours ago
If it overfits on the whole internet then it’s like a search engine that returns really relevant results with some lossy side effect.
Recent benchmark on unseen 2025 Math Olympiad shows none of the models can problem solve . They all accidentally or on purpose had prior solutions in the training set.
You probably mean the USAMO 2025 paper. They updated their comparison with Gemini 2.5 Pro, which did get a nontrivial score. That Gemini version was released five days after USAMO, so while it's not entirely impossible for the data to be in its training set, it would seem kind of unlikely.
https://x.com/mbalunovic/status/1907436704790651166