Comment by jxmorris12

I recently wrote a post about scaling RL that has some similar ideas:

> How to Scale RL to 10^26 FLOPs (blog.jxmo.io/p/how-to-scale-rl-to-1026-flops)

The basic premise behind both essays is that for AI to make another big jump in capabilities, we need to find new data to train on.

My proposal was reusing text from the Internet and doing RL on next-token prediction. The linked post here instead suggests doing 'replication training', which they define as "tasking AIs with duplicating existing software products, or specific features within them".