Comment by empiko
> The way I think of these transformations, but happy to be corrected, is more a matter of adding information rather than modifying
This is very much the case considering the residual connections within the model. The final representation can be expressed as a sum of representations from N layers, where the N-th representation is a function of N-1-th.