Comment by khimaros

Comment by khimaros 5 days ago

3 replies

it's great to see this kind of progress in reproducible weights, but color me confused. this claims to be better and smaller than Devstral-Small-2-24B, while clocking in at 32B (larger) and scoring more poorly?

ethan_l_shen 5 days ago

Hey! We are able to outperform Devstral-Small-2-24B when specializing on repositories, and come well within the range of uncertainty with our best SERA-32B model. That being said, our model is a bit larger than Devstral 24B. Could you point out what in the paper gave the impression that we were smaller? If theres something unclear we would love to revise

  • khimaros 5 days ago

    "SERA-32B is the first model in Ai2's Open Coding Agents series. It is a state-of-the-art open-source coding agent that achieves 49.5% on SWE-bench Verified, matching the performance of much larger models like Devstral-Small-2 (24B)" from https://huggingface.co/allenai/SERA-32B

    • ethan_l_shen 5 days ago

      Ah great catch I don't know how we missed that. Thanks! Will fix.