Comment by bloppe
The best way to drive inference cost down right now is to use TPUs. Either that or invest tons of additional money and manpower into silicon design like Google did, but they already have a 10 year lead there.
The best way to drive inference cost down right now is to use TPUs. Either that or invest tons of additional money and manpower into silicon design like Google did, but they already have a 10 year lead there.
> The best way to drive inference cost down right now is to use TPUs
TPUs are cool, but the best leverage remains to reduce your (active) parameters count.