Comment by baq
Now just need an autoregressive transformer <==> RNN isomorphism paper and we're golden
Now just need an autoregressive transformer <==> RNN isomorphism paper and we're golden
The paper says transformers perform better than RNNs, which is not surprising.
However, they are both, theoretically, Turing complete computers. So they are equally expressive.
Plain RNNs are theoretically weaker than transformers with COT: https://arxiv.org/abs/2402.18510 .