Comment by cma

Comment by cma a day ago

4 replies

1991

> Each RNN tries to solve the pretext task of predicting its next input, sending only unexpected inputs to the next RNN above. This greatly facilitates downstream supervised deep learning such as sequence classification. By 1993, the approach solved problems of depth 1000 (requiring 1000 subsequent computational stages/layers—the more such stages, the deeper the learning). A variant collapses the hierarchy into a single deep net. It uses a so-called conscious chunker RNN which attends to unexpected events that surprise a lower-level so-called subconscious automatiser RNN. The chunker learns to understand the surprising events by predicting them. The automatiser uses my neural knowledge distillation procedure of 1991 [UN0-UN2] to compress and absorb the formerly conscious insights and behaviours of the chunker, thus making them subconscious. The systems of 1991 allowed for much deeper learning than previous methods.

https://people.idsia.ch/~juergen/very-deep-learning-1991.htm...

HarHarVeryFunny 16 hours ago

It's unfortunate that Schmidhuber has both made many seminal contributions to the field, but also engages in "retroactive flag planting" whereby he claims credit for any current successes that are remotely related to anything he has worked on, even if only in terms of hand-wavy problem approach rather than actually building upon his own work.

It's obvious that things like memory, on various timescales (incl. working), selective attention, surprise (i.e. prediction failure) as a learning/memorization signal are going to be part of any AGI solution, but the question is how do you combine and realize these functionalities into an actual cognitive architecture?

Schmidhuber (or in this case you, on his behalf!) effectively saying "I worked on that problem, years ago" is irrelevant. He also worked on LSTMs, which learned to memorize and forget, and the reference section of the "Titans" paper leads to many more recent attempts - different proposed architectures - addressing the same problems around (broadly speaking) learning how best to use working memory. Lots of people suggesting alternatives, but it would seem no compelling solution that has been published.

If it's one of the commercial frontier model labs that does discover the next piece of the architectural puzzle in moving beyond transformers towards AGI, I very much doubt they'll be in any hurry to publish it!

  • cma 16 hours ago

    "I like the idea of a meta-mechanism that learns to update an associative memory based on how surprising the data is."

    Just pointing out that that idea was in some of Schmidhuber's earlier work.

    > Schmidhuber (or in this case you, on his behalf!) effectively saying "I worked on that problem, years ago" is irrelevant.

    Ok. People do read his work and get ideas from it even if this didn't necessarily. He had a lot of good stuff.

    > but the question is how do you combine and realize these functionalities into an actual cognitive architecture?

    I believe Schmidhuber gave one at the time?

    • sdenton4 14 hours ago

      Does it work out-of-the-box today?

      Execution is what matters. We can smoke a blunt and have some nice sounding ideas, but building something that works on data at scale is what actually counts.

      • cma 13 hours ago

        I think it's widely agreed a lot of useful stuff came out of Schmidhubers lab. The example I gave was one of the first things that scaled in lots of ways especially in depth, and it shares some characteristics with this. I doubt it outperforms this Titan architecture or is equivalent. That's not the same as him just putting out random ideas while high.