Comment by marcosdumay
Comment by marcosdumay a day ago
I guess it's worth pointing that when humans learn the long-horizon tasks that we learn by repetitive training, we segment them in tasks with a shorter horizon and compose them hierarchically later.
It does (naively I'll admit) seem like the problem is one more of approach more than algorithm.
Yes the model may not be able to tackle long horizon tasks from scratch, but learn some shorter horizon skills first then learn a longer horizon by leveraging groupings of those smaller skills. Chunking like we all do.
Nobody learns how to fly a commercial airplane plane cross country as a sequence of micro hand and arm movements. We learn to pick up a ball that way when young, but learning to fly or play a sport consists of a hierarchy of learned skills and plans.