Comment by gwern

Comment by gwern 2 days ago

So if it's not using attention and it processes the entire input into an embedding to process in one go, I guess this is neither a Transformer nor a RNN but just a MLP?