Comment by tarruda

Comment by tarruda 20 hours ago

Here's what I understood from the blog post:

- Mistral Large 3 is comparable with the previous Deepseek release.

- Ministral 3 LLMs are comparable with older open LLMs of similar sizes.

And implicit in this is that it compares very poorly to SOTA models. Do you disagree with that? Do you think these Models are beating SOTA and they did not include the benchmarks, because they forgot?

Reply View 5 replies

saubeidl 20 hours ago

Those are SOTA for open models. It's a separate league from closed models entirely.

Reply View | 1 reply
- supermatt 19 hours ago
  
  > It's a separate league from closed models entirely.
  To be fair, the SOTA models aren't even a single LLM these days. They are doing all manner of tool use and specialised submodel calls behind the scenes - a far cry from in-model MoE.
  
  Reply View | 0 replies
tarruda 20 hours ago

> Do you disagree with that?
I think that Qwen3 8B and 4B are SOTA for their size. The GPQA Diamond accuracy chart is weird: Both Qwen3 8B and 4B have higher scores, so they used this weid chart where "x" axis shows the number of output tokens. I missed the point of this.

Reply View | 2 replies
- meatmanek 17 hours ago
  
  Generation time is more or less proportional to tokens * model size, so if you can get the same quality result with fewer tokens from the same size of model, then you save time and money.
  
  Reply View | 1 reply
  
  kergonath 4 hours ago
  
  Thanks. That was not obvious to me either.
  
  Reply View | 0 replies