Comment by throw5829646

I had the opportunity to meet Tamay not too long ago, very sharp guy. A lot of people I know are working on approaches to meta RL or exploration-based RL, where the goal is to build a foundation model of sorts with a really good world model across diverse tasks, and can predict good policies (or good policy updates) from limited rollouts and/or a sparse reward signal. We're not there quite yet, but as Altman recently said, "we don't have AGI until we have something that learns continuously", and there's a huge race in this space to make that happen.