Comment by gkapur

> The TLDR/key (from my understanding) is that verifying N tokens can be faster than generating N tokens.

Yes. This is because to generate token n+1 you need token n etc. So generating from scratch is a sequential (thus slow) process. When we verify tokens, we can, for each token, use all preceding tokens as input and check that the output token matches the expectation. But since the full sequence we want to verify already exist, we can do it in parallel for each token we want to verify and not sequentially.

This is why training transformer models is much faster than RNN, we do the same thing during training, it's just that the sequence we compare to is the ground truth and not coming from another model.