Comment by gjimmel

Comment by gjimmel 2 days ago

4 replies

Ok, but if you wrote some massive corpus of code with no testing it probably would not compile either.

I think if you want to make this a useful experiment you should use one of the coding assistants that can test and iterate on its code, not some chatbot which is optimized to impress nontechnical people while being as cheap as possible to run.

belter 2 days ago

>> Chatbot which is optimized to impress nontechnical people

Is that how we call Opus 4.5 now? :-)

  • rabf 2 days ago

    That depends a lot on the system prompt and the tooling available to the model. Are you trying thin in Claude code or Factory.ai, or are you using a chat interface? The difference in the outcome can be large.

  • gjimmel 2 days ago

    The name of the model is not the end of the story. There is a Pareto frontier of performance vs computational cost, and the companies have various knobs and dials they can tune to trade off performance for cost. This is why openai reports costs of $1k/problem when they test their models on the math/coding benchmarks, yet charge you only $15/month for a subscription to their web interface.