Comment by dr_dshiv

Comment by dr_dshiv 9 hours ago

1 reply

Pretty serious flaws in the original paper.

1. Scoring unsolvable challenges as incorrect

2. Not accounting for token span

3. Not allowing LLMs to code as part of solution.

I tend to see Apple’s paper as an excuse for not having competitive products.