Comment by 0xdeadbeefbabe Comment by 0xdeadbeefbabe 4 days ago 1 reply Copy Link View on Hacker News Is anyone excited to do ablative testing on it?
Copy Link manbitesdog 4 days ago Collapse Comment - With such a high throughput because of sparsity, I'm particulary interested in distilling it into other architectures. I'd like to try a recurrent transformer when I have the time Reply View | 0 replies
With such a high throughput because of sparsity, I'm particulary interested in distilling it into other architectures. I'd like to try a recurrent transformer when I have the time