Comment by touisteur
If you don't need full ieee-754 double precision, ozaki scheme (emulation with tensor cores) might do the trick. It's been added (just a little bit) to cublas recently.
If you don't need full ieee-754 double precision, ozaki scheme (emulation with tensor cores) might do the trick. It's been added (just a little bit) to cublas recently.