Comment by shawntan

Comment by shawntan 4 days ago

2 replies

Theoretical results exist that try to quantify the number of CoT tokens needed to reach different levels of computational expressibility: https://arxiv.org/pdf/2310.07923

TL;DR: Getting to Turing completeness can require polynomial CoT tokens, wrt the input problem size. For a field that constantly harps on parallelism and compute efficiency, this requirement seems prohibitive.

We really need to get away from constant depth architectures.

benkuykendall 3 days ago

> Getting to Turing completeness can require polynomial CoT tokens, wrt the input problem size.

So, as stated, this is impossible since it violates the Time Hierarchy Theorem.

The actual result of the paper is that any poly-time computable function can be computed with poly-many tokens. Which is... not a particularly impressive bound? Any non-trivial fixed neural network can, for instance, compute the NAND of two inputs. And any polynomial computable function can be computed with a polynomial number of NAND gates.

  • shawntan 3 days ago

    > The actual result of the paper is that any poly-time computable function can be computed with poly-many tokens.

    You're right.

    Re: NAND of two inputs. Isn't this doable even by a single layer (no hidden layers) neural network?

    Re: Polynomial computable function. I'm assuming this makes no assumption of constant-depth.

    Because my entire point was that the result of this paper is not actually impressive AND covered by a previous paper. Hopefully that's clearer.