Comment by samagra14
Sounds interesting! I love all these edge experiments. But as long as there is architecture dependent code for models, I feel these edge experiments can't fully express their strong suit.
You try to run something and Voila you need Ampere or Hopper or Laplace for flash attnt.