Ask HN: Is anybody building an alternative transformer?
147 points by taiboku256 7 days ago
Curious if anybody out there is trying to build a new model/architecture that would succeed the transformer?
I geek out on this subject in my spare time. Curious if anybody else is doing so and if you're willing to share ideas?
The MAMBA [1] model gained some traction as a potential successor. It's basically an RNN without the non linearity applied across hidden states, which makes it logarithmic time (instead of linear time) inference with a parallelizable scan [2].
It promises much faster inference with much lower compute costs, and I think up to 7B params, performs on par with transformers. I've yet to see a 40B+ model trained.
The researches of MAMBA went on to start a company called Cartesia [3], which is MAMBA applied to voice models
[1] https://jackcook.com/2024/02/23/mamba.html
[2] https://www.csd.uwo.ca/~mmorenom/HPC-Slides/Parallel_prefix_... <- Pulled up a random example from google, but Stanford CS149 has an entire lecture devoted to parallel scan.
[3] https://cartesia.ai/