tossandthrow 3 days ago

The paper says transformers perform better than RNNs, which is not surprising.

However, they are both, theoretically, Turing complete computers. So they are equally expressive.