rabf 2 days ago

That depends a lot on the system prompt and the tooling available to the model. Are you trying thin in Claude code or Factory.ai, or are you using a chat interface? The difference in the outcome can be large.

gjimmel 2 days ago

The name of the model is not the end of the story. There is a Pareto frontier of performance vs computational cost, and the companies have various knobs and dials they can tune to trade off performance for cost. This is why openai reports costs of $1k/problem when they test their models on the math/coding benchmarks, yet charge you only $15/month for a subscription to their web interface.