Comment by tarruda
Here's what I understood from the blog post:
- Mistral Large 3 is comparable with the previous Deepseek release.
- Ministral 3 LLMs are comparable with older open LLMs of similar sizes.
Here's what I understood from the blog post:
- Mistral Large 3 is comparable with the previous Deepseek release.
- Ministral 3 LLMs are comparable with older open LLMs of similar sizes.
> Do you disagree with that?
I think that Qwen3 8B and 4B are SOTA for their size. The GPQA Diamond accuracy chart is weird: Both Qwen3 8B and 4B have higher scores, so they used this weid chart where "x" axis shows the number of output tokens. I missed the point of this.
And implicit in this is that it compares very poorly to SOTA models. Do you disagree with that? Do you think these Models are beating SOTA and they did not include the benchmarks, because they forgot?