Comment by marcosdumay
Comment by marcosdumay 10 months ago
What I don't get is... didn't people prove that in the 90s for any multi-layer neural network? Didn't people prove transformers are equivalent on the transformers paper?
Comment by marcosdumay 10 months ago
What I don't get is... didn't people prove that in the 90s for any multi-layer neural network? Didn't people prove transformers are equivalent on the transformers paper?
Putting a 50 bucks bet that some very smart kid in the near future will come with some enthrophy-meets-graphical-structures theorem which gives an estimation of how the loss of information is affected by the size and type of the underlying structure holding this information.
It took a while for people to actually start talking about LZW as grammar algo, not a "dictionary"-based algorithm. Which is then reasoned about in a more general sense again by https://en.wikipedia.org/wiki/Sequitur_algorithm.
This is not to say that LLMs are not cool, we put them to use every day. But the reasoning part is never going to be a trustworthy one without a 100% discreet system, which can infer the syllogistic chain with zero doubt and 100% tracable origin.
Yes they did. A two layer network with enough units in the hidden layer can form any mapping to any desired accuracy.
And a two layer network with single-delay feedback from the hidden units to themselves can capture any dynamic behavior (to any desired accuracy).
Adding layers and more structured architectures creates the opportunity for more efficient training and inference, but doesn't enable any new potential behavior. (Except in the sense that reducing resource requirements can allow impractical problems to become practical.)