Comment by marcosdumay
Comment by marcosdumay 3 days ago
What I don't get is... didn't people prove that in the 90s for any multi-layer neural network? Didn't people prove transformers are equivalent on the transformers paper?
Comment by marcosdumay 3 days ago
What I don't get is... didn't people prove that in the 90s for any multi-layer neural network? Didn't people prove transformers are equivalent on the transformers paper?