Comment by skydhash
We have solid backing in maths for it. But the fact is what we are seeking is not what the math told us, but an hope that what it told us is sufficiently close to the TRUTH. Hence the pervasive presence of errors and loss functions.
We know it’s not the correct answer, but better something close than nothing. (close can be awfully far, which is worse than nothing)
The math covers the low level decently well, but you run out of it quick. A lot of it fails to scale, and almost all of it fails to capture the high level behavior of modern AIs.
You can predict how some simple narrow edge case neural networks will converge, but this doesn't go all the way to frontier training runs, or even the kind of runs you can do at home on a single GPU. And that's one of the better covered areas.