Comment by quantadev
Right now as long as the rocket's heading straight up, everyone's on board with MLPs (Multilayer Perceptrons/Transformers)! Why not stay on the same rocket for now!? We're almost at AGI already!
Right now as long as the rocket's heading straight up, everyone's on board with MLPs (Multilayer Perceptrons/Transformers)! Why not stay on the same rocket for now!? We're almost at AGI already!
Sure. But that isn’t a reason to conflate the two?
OP wasn’t suggesting looking for an alternative/successor to MLPs, but for an alternative/successor to transformers (while presumably still using MLPs) in the same way that transformers are an alternative/successor to LSTMs.
I wouldn't conflate MLPs with transformers, MLP is a small building block of almost any standard neural architecture (excluding spiking/neuromorphic types).
But to your point, the trend towards increasing inference-time compute costs, being ushered by CoT/reasoning models is one good reason to look for equally capable models that can be optimized for inference efficiency. Traditionally training was the main compute cost, so it's reasonable to ask if there's unexplored space there.