Comment by joe_the_user
Comment by joe_the_user a day ago
I wonder if its just about the important neural nets now being trained by large, secretive corporations that aren't interested in sharing their knowledge.
Comment by joe_the_user a day ago
I wonder if its just about the important neural nets now being trained by large, secretive corporations that aren't interested in sharing their knowledge.
I'm sure that's part of it, which is why it's nice to see Hugging Face sharing this, but still it obviously reflects the reality that large LLMs are difficult to train for whatever reasons (maybe more than just gradient issues - I haven't read that HF doc yet).
For simpler nets, like ResNet, it may just be that modern initialization and training recipes avoid most gradient issues, even though they are otherwise potentially still there.