Comment by imbusy111
Funny to see tau2-bench on the list of benchmarks, when tau2-bench is flawed and 100% score is impossible, unless you add the tasks to the training set: https://github.com/sierra-research/tau2-bench/issues/89
Funny to see tau2-bench on the list of benchmarks, when tau2-bench is flawed and 100% score is impossible, unless you add the tasks to the training set: https://github.com/sierra-research/tau2-bench/issues/89