Comment by candiddevmike
Comment by candiddevmike 4 days ago
> We have mathematically proven that transformers can solve any problem, provided they are allowed to generate as many intermediate reasoning tokens as needed.
That seems like a bit of a leap here to make this seem more impressive than it is (IMO). You can say the same thing about humans, provided they are allowed to think across as many years/generations as needed.
Wake me up when a LLM figures out stable fusion or room temperature superconductors.
I think you're misrepresenting the study. It builds upon previous work that examines the computation power of the transformer architecture from a circuit-complexity perspective. Previous work showed that the class of problems that a "naive" Transformer architecture could compute was within TC0 [1, 2] and as a consequence it was fundamentally impossible for transformers to solve certain classes of mathematical problems. This study actually provides a more realistic bound of AC0 (by analyzing the finite-precision case) which rules out even more problems, including such 'simple' ones as modular parity.
We also had previous work that hinted that part of the reason why chain-of-thought works from a theoretical perspective is that it literally allows the model to perform types of computations it could not under the more limited setting (in the same way jumping from FSMs to pushdown automata allows you to solve new types of problems) [3].
[1] https://news.ycombinator.com/item?id=35609652 [2] https://blog.computationalcomplexity.org/2023/02/why-cant-li... [3] https://arxiv.org/abs/2305.15408