Comment by sigmoid10

Comment by sigmoid10 3 days ago

3 replies

>Remarkably, constant depth is sufficient.

How would that be remarkable, when it is exactly what he Universal Approximation Theorem already states? Since transformers also use fully connected layers, none of this should really come as a surprise. But from glancing at the paper, they don't even mention it.

nexustext 3 days ago

It's 'remarkable' because (a) academic careers are as much about hype as science, (b) arxiv doesn't have peer review process to quash this, (c) people take arxiv seriously.

logicchains 3 days ago

>How would that be remarkable, when it is exactly what he Universal Approximation Theorem already states

Only with infinite precision, which is highly unrealistic. Under realistic assumptions, fixed depth transformer without chain-of-thought are very limited in what they can express: https://arxiv.org/abs/2207.00729 . Chain of thought increases the class of problems which fixed depth transformers can solve: https://arxiv.org/abs/2310.07923

IshKebab 3 days ago

The universal approximation theorem has no practical relevance.