Comment by curious_cat_163
Comment by curious_cat_163 19 hours ago
> Rather than fine-tuning models on a small number of environments, we expect the field will shift toward massive-scale training across thousands of diverse environments.
This is a great hypothesis for you to prove one way or the other.
> Doing this effectively will produce RL models with strong few-shot, task-agnostic abilities capable of quickly adapting to entirely new tasks.
I am not sure if I buy that, frankly. Even if you were to develop radically efficient means to create "effective and comprehensive" test suites that power replication training, it is not at all a given that it will translate to entirely new tasks. Yes, there is the bitter lesson and all that but we don't know if this is _the_ right hill to climb. Again, at best, this is a hypothesis.
> But achieving this will require training environments at a scale and diversity that dwarf anything currently available.
Yes. You should try it. Let us know if it works. All the best!