Comment by trq_

Comment by trq_ 2 days ago

Yes, we do but harnesses are hard to eval, people use them across a huge variety of tasks and sometimes different behaviors tradeoff against each other. We have added some evals to catch this one in particular.

amelius 2 days ago

Can't you keep the model the same, until the user chooses to use a different model?

Reply View 1 reply

rovr138 2 days ago

He said it was the harness, not the model though.

Reply View | 0 replies

hu3 2 days ago

Thank you. Fair enough

Reply View 0 replies