Comment by Davidzheng

Comment by Davidzheng 3 days ago

1 reply

yeah but training from scratch is a valid solution. And if we can't find easier solutions we should just try to make it work. Compute is the main advantage we have in silica vs biological computers so we might as well push it--like ideally soon we will have one large AI running on datacenter size computer solving really hard problems and it could easily be most of the compute (>95%) is on training step--which is where really AI excels tbh not inference techniques. Like even Alphaproof for example spends most of compute training on solving simpler problems--which btw is one instance of continual training/training at test time which is implemented.

johnsmith1840 2 days ago

Retrain from stratch does technically solve it.

But it doesn't solve the time aspect.

You need to randomize data in order to train to best quality. In doing that the model has no idea t0 was before t1000. If you don't you get model collapse or heavy bias.

Some attempts at it but nothing crazy effective.