Comment by macleginn
If we don't subtract from the second branch, there will be a discontinuity around x = 1, so the derivative will not be well-defined. Also the value of the loss will jump at this value, which will make it hard to inspect the errors, for one thing.
No, that's not how backprop works. There will be no discontinuity in a backpropagated gradient.