Comment by ACCount37

Comment by ACCount37 3 days ago

Good luck trying to use theory from the 1940s to predict modern ML. And if theory has little predictive power, then it's of little use.

There's a reason why so many "laws" of ML are empirical - curves fitted to experimental observation data. If we had a solid mathematical backing for ML, we'd be able to derive those laws from math. If we had solid theoretical backing for ML, we'd be able to calculate whether a training run would fail without actually running it.

People say this tech is mysterious because it is mysterious. It's a field where practical applications are running far ahead of theory. We build systems that work, and we don't know how or why.

skydhash 3 days ago

We have solid backing in maths for it. But the fact is what we are seeking is not what the math told us, but an hope that what it told us is sufficiently close to the TRUTH. Hence the pervasive presence of errors and loss functions.

We know it’s not the correct answer, but better something close than nothing. (close can be awfully far, which is worse than nothing)

Reply View 3 replies

ACCount37 2 days ago

The math covers the low level decently well, but you run out of it quick. A lot of it fails to scale, and almost all of it fails to capture the high level behavior of modern AIs.
You can predict how some simple narrow edge case neural networks will converge, but this doesn't go all the way to frontier training runs, or even the kind of runs you can do at home on a single GPU. And that's one of the better covered areas.

Reply View | 2 replies
- skydhash 2 days ago
  
  You can’t predict because the data is unknown before training. And training is computation based on math. And the results are the weights. And every further computation is also math based. The result can be surprising, but there’s no fairy dust here.
  
  Reply View | 1 reply
  
  ACCount37 2 days ago
  
  There's no fairy dust there, but that doesn't mean we understand how it works. There's no fairy dust in human brain either.
  Today's mathematical background applied to frontier systems is a bit like trying to understand how a web browser works from knowing how a transistor works. The mismatch is palpable.
  Sure, if you descend to a low enough level, you wouldn't find any magic fairy dust - it's transistors as far as eye can see. But "knowing how a transistor works" doesn't come close to capturing the sheer complexity. Low level knowledge does not automatically translate to high level knowledge.
  
  Reply View | 0 replies