Comment by ACCount37
Good luck trying to use theory from the 1940s to predict modern ML. And if theory has little predictive power, then it's of little use.
There's a reason why so many "laws" of ML are empirical - curves fitted to experimental observation data. If we had a solid mathematical backing for ML, we'd be able to derive those laws from math. If we had solid theoretical backing for ML, we'd be able to calculate whether a training run would fail without actually running it.
People say this tech is mysterious because it is mysterious. It's a field where practical applications are running far ahead of theory. We build systems that work, and we don't know how or why.
We have solid backing in maths for it. But the fact is what we are seeking is not what the math told us, but an hope that what it told us is sufficiently close to the TRUTH. Hence the pervasive presence of errors and loss functions.
We know it’s not the correct answer, but better something close than nothing. (close can be awfully far, which is worse than nothing)