Comment by sigmoid10
>Remarkably, constant depth is sufficient.
How would that be remarkable, when it is exactly what he Universal Approximation Theorem already states? Since transformers also use fully connected layers, none of this should really come as a surprise. But from glancing at the paper, they don't even mention it.
It's 'remarkable' because (a) academic careers are as much about hype as science, (b) arxiv doesn't have peer review process to quash this, (c) people take arxiv seriously.