Comment by wildermuthn

Comment by wildermuthn 6 months ago

Great research here. Contextual real-time weight modification is definitely one of the breakthroughs required for AGI. Why create a LoRA when you can generate one on the fly suited to the task at hand?

verdverm 6 months ago

It does not seem like they are doing inference time weight changes, to the tune of running backprop. It sounds more like they are applying a pre-trained vector to the model, and select that vector based on the input, in a two step process

Reply View 10 replies

wildermuthn 6 months ago

That’s my general understanding as well, but it isn’t a large conceptual leap to go from real-time selection of pretrained “z-vectors” to real-time generation of the same. The larger conceptual breakthrough, with demonstration of its effectiveness, is the big success here.

Reply View | 8 replies
- verdverm 6 months ago
  
  While not a large conceptual leap, the real-time generation of "z-vectors" is not cheap in terms of compute or data requirements, the latter of which I see as the main issue. How are you going to generate the vector from a single real-time input?
  I still have yet to see anything that dissuades me from agreeing with Yann LeCun when he says Transformers are fundamentally limited. We won't get creativity, reasoning, or even move past hallucinations without a major breakthrough
  
  Reply View | 3 replies
  
  mordymoop 6 months ago
  
  How do the o3 results fit in context of this perspective?
  
  Reply View | 2 replies
- mtts 6 months ago
  
  The interesting thing here is that the human brain also seems to use pretrained ... things. For vision, use the visual subsystem. For hearing, use the auditory subsystem. For movement ... you get the point. Plus you can combine these pretrained ... things, so for example for complex movement, like balancing on a tightrope, multiple subsystems are used (try standing on one leg with your eyes closed).
  Z-vectors are of course nothing like the subsystems in your brain, but general the approach is certainly similar to how the brain works.
  
  Reply View | 3 replies
  
  dleeftink 6 months ago
  
  > things
  Senses?
  
  Reply View | 2 replies
mtts 6 months ago

Sort of. According to the text they can use multiple z-vectors (sets of weights that select for parts of the system to be used to answer a specific question) simultaneously, using a "simple optimization algorithm" to determine the relative weight for each of these vectors.

Reply View | 0 replies

bugglebeetle 6 months ago

See also the work being done by GoodFire AI:

https://www.goodfire.ai/

They now have an API that allows for dynamic exploration and manipulation of the latent space for LLama 8-70B models (think Golden Gate Claude). They also open sourced the sparse auto-encoders that (in part) allow for this:

https://huggingface.co/Goodfire/Llama-3.3-70B-Instruct-SAE-l...

Reply View 0 replies

logicchains 6 months ago

>Contextual real-time weight modification is definitely one of the breakthroughs required for AGI.

It's already been invented: https://arxiv.org/abs/2202.05780 . That design is just very inefficient to scale up / use as a transformer backbone.

Reply View 0 replies

mnky9800n 6 months ago

Why not, as each new task comes up, and then weights are revalued, save those weights and keep them for reference as priors for similar future tasks? As the model is exposed to new data the average of the set of priors of things the model thinks is similar might move closer to the posterior making the model quicker and more able to arrive at good outcomes. I suppose storage might be an issue.

Reply View 6 replies

magospietato 6 months ago

I'm wondering if you could fine tune the model on an aggregate of a temporal slice of revalued weights? Something analogous to REM sleep's involvement in embedding the days events into long term memory.

Reply View | 1 reply
- Jerrrry 6 months ago
  
  Sieve the temporary backprop interim weights as a function of its loss of varentrophy relative to its place in the revalued weights.
  Remove the bottom weights dynamically based on the local gradient in varentrophy so that internal dissonance ("doubt") can be selected against.
  "Preference Optimization" but with more opportunities for meta-optimization.
  
  Reply View | 0 replies
QuadmasterXLII 6 months ago

thats just mixture of experts

Reply View | 3 replies
- mnky9800n 6 months ago
  
  i thought mixture of experts didn't update itself with new sets of weights and was just a collection of already trained networks/weights? I could be wrong.
  
  Reply View | 2 replies
  
  QuadmasterXLII 6 months ago
  
  Well, that depends in whether you keep training it
  
  Reply View | 1 reply
  
  mnky9800n 6 months ago
  
  perhaps they should always be training and never static. haha. i allegedly grow wiser in my age, why not neural networks?
  
  Reply View | 0 replies