Comment by trq_

Comment by trq_ 2 days ago

3 replies

Yes, we do but harnesses are hard to eval, people use them across a huge variety of tasks and sometimes different behaviors tradeoff against each other. We have added some evals to catch this one in particular.

amelius 2 days ago

Can't you keep the model the same, until the user chooses to use a different model?

  • rovr138 2 days ago

    He said it was the harness, not the model though.