Comment by refulgentis
Comment by refulgentis 4 hours ago
That math is for comparing all n-grams for all n <= N simultaneously, which isn't what was being discussed.
For any fixed n-gram size, the complexity is still O(N^2), same as standard attention.
I was talking about all n-gram comparisons.