Comment by curious_cat

> Rather than fine-tuning models on a small number of environments, we expect the field will shift toward massive-scale training across thousands of diverse environments.

This is a great hypothesis for you to prove one way or the other.

> Doing this effectively will produce RL models with strong few-shot, task-agnostic abilities capable of quickly adapting to entirely new tasks.

I am not sure if I buy that, frankly. Even if you were to develop radically efficient means to create "effective and comprehensive" test suites that power replication training, it is not at all a given that it will translate to entirely new tasks. Yes, there is the bitter lesson and all that but we don't know if this is _the_ right hill to climb. Again, at best, this is a hypothesis.

> But achieving this will require training environments at a scale and diversity that dwarf anything currently available.

Yes. You should try it. Let us know if it works. All the best!

Comment by curious_cat_163