Comment by measurablefunc
Comment by measurablefunc 5 hours ago
I read the paper. All the prerequisites are already available in existing literature & they basically profiled & optimized around the bottlenecks to avoid pipeline stalls w/ instructions that utilize the available tensor & CUDA cores. Seems like something these super duper AIs that don't get tired should be able to do pretty easily.