Comment by janalsncm

Comment by janalsncm 2 days ago

0 replies

This is a really intuitive explanation, thanks for posting. I think everyone’s first intuition for “how do we turn these logits into probabilities” is to use a weighted sum of the absolute values of the numbers. The unjustified complexity of softmax annoyed me in college.

The author gives a really clean explanation for why that’s hard for a network to learn, starting from first principles.