Comment by empiko

Comment by empiko 5 days ago

0 replies

> The way I think of these transformations, but happy to be corrected, is more a matter of adding information rather than modifying

This is very much the case considering the residual connections within the model. The final representation can be expressed as a sum of representations from N layers, where the N-th representation is a function of N-1-th.