Comment by BoiledCabbage

It does (naively I'll admit) seem like the problem is one more of approach more than algorithm.

Yes the model may not be able to tackle long horizon tasks from scratch, but learn some shorter horizon skills first then learn a longer horizon by leveraging groupings of those smaller skills. Chunking like we all do.

Nobody learns how to fly a commercial airplane plane cross country as a sequence of micro hand and arm movements. We learn to pick up a ball that way when young, but learning to fly or play a sport consists of a hierarchy of learned skills and plans.