Comment by simianwords

Comment by simianwords 2 days ago

A bit important that this model is not general purpose whereas the ones Google and OpenAI used were general purpose.

yorwba 2 days ago

Both OpenAI and Google used models made specifically for the task, not their general-purpose products.

OpenAI: https://xcancel.com/alexwei_/status/1946477756738629827#m "we are releasing GPT-5 soon, and we’re excited for you to try it. But just to be clear: the IMO gold LLM is an experimental research model. We don’t plan to release anything with this level of math capability for several months."

DeepMind: https://deepmind.google/blog/advanced-version-of-gemini-with... "we additionally trained this version of Gemini on novel reinforcement learning techniques that can leverage more multi-step reasoning, problem-solving and theorem-proving data. We also provided Gemini with access to a curated corpus of high-quality solutions to mathematics problems, and added some general hints and tips on how to approach IMO problems to its instructions."

Reply View 9 replies

simianwords 2 days ago

https://x.com/sama/status/1946569252296929727
>we achieved gold medal level performance on the 2025 IMO competition with a general-purpose reasoning system! to emphasize, this is an LLM doing math and not a specific formal math system; it is part of our main push towards general intelligence.
asterisks mine

Reply View | 7 replies
- yorwba 2 days ago
  
  DeepSeekMath-V2 is also an LLM doing math and not a specific formal math system. What interpretation of "general purpose" were you using where one of them is "general purpose" and the other isn't?
  
  Reply View | 6 replies
  
  simianwords 2 days ago
  
  This model can’t be used for say questions on biology or history.
  
  Reply View | 5 replies
simianwords 2 days ago

Not true

Reply View | 0 replies

mangolie 2 days ago

https://x.com/deepseek_ai/status/1995452646459858977

Boom

Reply View 3 replies

andy12_ 2 days ago

Do note that that is a different model. The one we are talking about here, DeepSeekMath-V2, is indeed overcooked with math RL. It's so eager to solve math problems, that it even comes up with random ones if you prompt it with "Hello".
https://x.com/AlpinDale/status/1994324943559852326?s=20

Reply View | 0 replies
yorwba 2 days ago

That's a different model: https://huggingface.co/deepseek-ai/DeepSeek-V3.2-Speciale

Reply View | 0 replies
simianwords 2 days ago

Oh you may be correct. Are these models general purpose or fine tuned for mathematics?

Reply View | 0 replies