Comment by ivape

Comment by ivape 4 days ago

The most obvious blocker is compute. This just requires a shit ton more compute.

If it was pure compute we'd have simple examples. We can't do this even on the smallest of AI models.

There are tons of benchmarks around this you can easily run with 1 gpu.

It's compute only in the sense that the only way to do it is retrain a model from scratch at every step.

If you solve CL with a CNN you just created AGI.

Davidzheng 3 days ago

yeah but training from scratch is a valid solution. And if we can't find easier solutions we should just try to make it work. Compute is the main advantage we have in silica vs biological computers so we might as well push it--like ideally soon we will have one large AI running on datacenter size computer solving really hard problems and it could easily be most of the compute (>95%) is on training step--which is where really AI excels tbh not inference techniques. Like even Alphaproof for example spends most of compute training on solving simpler problems--which btw is one instance of continual training/training at test time which is implemented.

Reply View | 1 reply
- johnsmith1840 2 days ago
  
  Retrain from stratch does technically solve it.
  But it doesn't solve the time aspect.
  You need to randomize data in order to train to best quality. In doing that the model has no idea t0 was before t1000. If you don't you get model collapse or heavy bias.
  Some attempts at it but nothing crazy effective.
  
  Reply View | 0 replies
zelphirkalt 2 days ago

How do you make the mental jump from being able to train a model continuously to an "artificial general intelligence"?

Reply View | 0 replies

That tracks, but say cost was no object and you had as many H100s as you wanted. Would continuous learning actually work even then?

Maybe part of the inference outputs could be the updates to make to the network