Comment by Analemma_

Comment by Analemma_ 10 months ago

1 reply

Literally everybody doing cutting edge AI research is trying to replace the transformer, because transformers have a bunch of undesirable properties like being quadratic in context window size. But they're also surprisingly resilient: despite the billions of dollars and man-hours poured into the field and many attempted improvements, cutting-edge models aren't all that different architecturally from the original attention paper, aside from their size and a few incidental details like the ReLU activation function, because nobody has found anything better yet.

I do expect transformers to be replaced eventually, but they do seem to have their own "bitter lesson" where trying to outperform them usually ends in failure.

PaulHoule 10 months ago

My guess is there is a cost-capability tradeoff such that the O(N^2) really is buying you something you couldn't get for O(N). Behind that, there really are intelligent systems problems that boil down to solving SAT and should be NP-complete... LLMs may be able to short circuit those problems and get lucky guesses quite frequently, maybe the 'hallucinations' won't go away for anything O(N^2).