Comment by WithinReason
Comment by WithinReason 2 days ago
Karpathy suggests the following error:
def clipped_error(x):
return tf.select(tf.abs(x) < 1.0,
0.5 * tf.square(x),
tf.abs(x) - 0.5) # condition, true, false
Following the same principles that he outlines in this post, the "- 0.5" part is unnecessary since the gradient of 0.5 is 0, therefore -0.5 doesn't change the backpropagated gradient. In addition, a nicer formula that achieves the same goal as the above is √(x²+1)
If we don't subtract from the second branch, there will be a discontinuity around x = 1, so the derivative will not be well-defined. Also the value of the loss will jump at this value, which will make it hard to inspect the errors, for one thing.