Comment by ForHackernews
Comment by ForHackernews 13 hours ago
"5 Alternative Representations Restore Performance To test whether the failures reflect reasoning limitations or format constraints, we conducted preliminary testing of the same models on Tower of Hanoi N = 15 using a different representation: Prompt: "Solve Tower of Hanoi with 15 disks. Output a Lua function that prints the solution when called."
Results: Very high accuracy across tested models (Claude-3.7-Sonnet, Claude Opus 4, OpenAI o3, Google Gemini 2.5), completing in under 5,000 tokens.
The generated solutions correctly implement the recursive algorithm, demonstrating intact reasoning capabilities when freed from exhaustive enumeration requirement""
Is there's something I'm missing here?
This seems like it demonstrates the exact opposite of what the authors are claiming: Yes, your bot is an effective parrot that can output a correct Lua program that exists somewhere in the training data. No, your bot is not "thinking" and cannot effectively reason through the algorithm itself.
It seems to just reillustrate the point that the model cannot follow algorithmic steps once it is out of distribution.