Comment by simianwords
Comment by simianwords 2 days ago
A bit important that this model is not general purpose whereas the ones Google and OpenAI used were general purpose.
Comment by simianwords 2 days ago
A bit important that this model is not general purpose whereas the ones Google and OpenAI used were general purpose.
https://x.com/sama/status/1946569252296929727
>we achieved gold medal level performance on the 2025 IMO competition with a general-purpose reasoning system! to emphasize, this is an LLM doing math and not a specific formal math system; it is part of our main push towards general intelligence.
asterisks mine
This model can’t be used for say questions on biology or history.
Do note that that is a different model. The one we are talking about here, DeepSeekMath-V2, is indeed overcooked with math RL. It's so eager to solve math problems, that it even comes up with random ones if you prompt it with "Hello".
That's a different model: https://huggingface.co/deepseek-ai/DeepSeek-V3.2-Speciale
Oh you may be correct. Are these models general purpose or fine tuned for mathematics?
Both OpenAI and Google used models made specifically for the task, not their general-purpose products.
OpenAI: https://xcancel.com/alexwei_/status/1946477756738629827#m "we are releasing GPT-5 soon, and we’re excited for you to try it. But just to be clear: the IMO gold LLM is an experimental research model. We don’t plan to release anything with this level of math capability for several months."
DeepMind: https://deepmind.google/blog/advanced-version-of-gemini-with... "we additionally trained this version of Gemini on novel reinforcement learning techniques that can leverage more multi-step reasoning, problem-solving and theorem-proving data. We also provided Gemini with access to a curated corpus of high-quality solutions to mathematics problems, and added some general hints and tips on how to approach IMO problems to its instructions."