Comment by dr_dshiv

Comment by dr_dshiv 9 hours ago

Pretty serious flaws in the original paper.

1. Scoring unsolvable challenges as incorrect

2. Not accounting for token span

3. Not allowing LLMs to code as part of solution.

I tend to see Apple’s paper as an excuse for not having competitive products.

Sounds like confirmation bias in action