Comment by bilsbie

Comment by bilsbie 10 hours ago

0 replies

Interesting! Makes me wonder if you could replace transformers with some sort of fancy Markov chain. Maybe with a meta chain that acts as attention.