Comment by macleginn

Comment by macleginn 2 days ago

2 replies

If we don't subtract from the second branch, there will be a discontinuity around x = 1, so the derivative will not be well-defined. Also the value of the loss will jump at this value, which will make it hard to inspect the errors, for one thing.

WithinReason 2 days ago

No, that's not how backprop works. There will be no discontinuity in a backpropagated gradient.

  • macleginn 2 days ago

    I did not say there will be a discontinuity in the gradient; I said that the modified loss function will not have a mathematically well-defined derivative because of the discontinuity in the function.