Comment by gwern
So if it's not using attention and it processes the entire input into an embedding to process in one go, I guess this is neither a Transformer nor a RNN but just a MLP?
So if it's not using attention and it processes the entire input into an embedding to process in one go, I guess this is neither a Transformer nor a RNN but just a MLP?