Comment by gcanyon
Comment by gcanyon a day ago
The "GPT-3 moment" framing is a bit hype-y I think? GPT-3 eliminated the need for task-specific fine-tuning, but from the article RL wouldn't replace LLM-style pretraining. So this is more of an incremental advance than the paradigm shift GPT-3 represented. That said, if it unlocks RL generalization that would be huge.
The core claim that massive-scale RL will unlock generalization doesn't seem that surprising since we've seen the scaling hypothesis play out across ML. But "replication training" on software is interesting: learning by copying existing programs potentially unlocks a ton of complex training data with objective evaluation criteria.
To me, the big unanswered question is whether skills learned from replicating software would generalize to other reasoning tasks. That's a significant "if" - great if it works, pointless if it doesn't.
It's a very big "if" because other fields are comparatively underspecified. There's no equivalent to a compiler or interpreter in most cases (with spreadsheets being the lingua franca that comes even close for most industries).
It would "work" but I think it will need even more scrutiny by experts to confirm what's correct and what needs to be re-generated. Please please no vibe accounting.