Comment by PaulHoule
Even though Postgres is a pretty good database, for any given hardware there is some number of rows that will break it. I don't expect anything less out of LLMs.
There's a much deeper issue with CoT and such that many of the domains that we are interested in reasoning over (engineering, science, finance, ...) involve at the very least first order logic + arithmetic which runs into problems that Kurt Godel warned us about. People might say "this is a problem for symbolic AI" but really it is a problem with the problems you're trying to solve, not a problem with the way you go out about solving them -- getting a PhD in theoretical physics taught me that a paper with 50 pages of complex calculations written by a human has a mistake in it somewhere.
(People I know who didn't make it in the dog-eat-dog world of hep-th would have been skeptical about that whole magnetic moment of the muon thing because between "perturbation theory doesn't always work" [1] and "human error" the theoretical results that were not matching experiment were wrong all along...)
[1] see lunar theory
> there is some number of rows that will break it. I don't expect anything less out of LLMs.
I'd expect better than 8 disk towers of Hanoi, which seems to be beyond current LLMs