Comment by bilsbie
Interesting! Makes me wonder if you could replace transformers with some sort of fancy Markov chain. Maybe with a meta chain that acts as attention.
Interesting! Makes me wonder if you could replace transformers with some sort of fancy Markov chain. Maybe with a meta chain that acts as attention.