Comment by HarHarVeryFunny
Comment by HarHarVeryFunny a day ago
You don't need exact gradients, since gradient descent is self-correcting (which can make it hard to find gradient calculation bugs!). One approach using inexact gradients is to use predicted "synthetic gradients" which avoids needing to wait for backward pass for weight updates.