Comment by godelski

Comment by godelski 2 days ago

0 replies

It is actually really common practice. It is a single linear layer so there's no connection intranodes. The reason to do this is because it is a bit less computationally intensive.

tldr: linear layers have an associative property