Comment by tomp

Comment by tomp 3 months ago

Did we read the same article?

They clearly mention, take into account and extrapolate this; LLM have first scaled via data, now it's test time compute, but recent developments (R1) clearly show this is not exhausted yet (i.e. RL on synthetically (in-silico) generated CoT) which implies scaling with compute. The authors then outline further potential (research) developments that could continue this dynamic, literally things that have already been discovered just not yet incorporated into edge models.

Real-world data confirms their thesis - there have been a lot of sceptics about AI scaling, somewhat justified ("whoom" a.k.a. fast take-off hasn't happened - yet) but their fundamental thesis has been wrong - "real-world data has been exhausted, next algorithmic breakthroughs will be hard and unpredictable". The reality is, while data has been exhausted, incremental research efforts have resulted in better and better models (o1, r1, o3, and now Gemini 2.5 which is a huge jump! [1]). This is similar to how Moore's Law works - it's not given that CPUs get better exponentially, it still requires effort, maybe with diminishing returns, but nevertheless the law works...

If we ever get to models be able to usefully contribute to research, either on the implementation side, or on research ideas side (which they CANNOT yet, at least Gemini 2.5 Pro (public SOTA), unless my prompting is REALLY bad), it's about to get super-exponential.

Edit: then once you get to actual general intelligence (let alone super-intelligence) the real-world impact will quickly follow.

Jianghong94 3 months ago

Well based on what I'm reading, the OP's intent is that, not all (hence 'fully') validation, if not most of, can be done in-silico. I think we all agree that and that's the major bottleneck making agents useful - you have to have human-in-the-loop to closely guardrail the whole process.

Of course you can get a lot of mileage via synthetically generated CoT but does that lead to LLM speed up developing LLM is a big IF.

Reply View 3 replies

tomp 3 months ago

No, the entire point of this article is that when you get to self-improving AI, it will become generally intelligent, then you can use that to solve robotics, medicine etc. (like a generally-intelligent baby can (eventually) solve how to move boxes, assemble cars, do experiments in labs etc. - nothing special about a human baby, it's just generally intelligent).

Reply View | 2 replies
- Jianghong94 3 months ago
  
  Not only does the article claim that when we get to self-improving ai it becomes generally intelligent, it's assuming that AI is pretty close right now:
  > OpenBrain focuses on AIs that can speed up AI research. They want to win the twin arms races against China (whose leading company we’ll call “DeepCent”)16 and their US competitors. The more of their research and development (R&D) cycle they can automate, the faster they can go. So when OpenBrain finishes training Agent-1, a new model under internal development, it’s good at many things but great at helping with AI research.
  > It’s good at this due to a combination of explicit focus to prioritize these skills, their own extensive codebases they can draw on as particularly relevant and high-quality training data, and coding being an easy domain for procedural feedback.
  > OpenBrain continues to deploy the iteratively improving Agent-1 internally for AI R&D. Overall, they are making algorithmic progress 50% faster than they would without AI assistants—and more importantly, faster than their competitors.
  > what do we mean by 50% faster algorithmic progress? We mean that OpenBrain makes as much AI research progress in 1 week with AI as they would in 1.5 weeks without AI usage.
  To me, claiming today's AI IS capable of such thing is too hand-wavy. And I think that's the crux of the article.
  
  Reply View | 0 replies
- polynomial 3 months ago
  
  You had me at "nothing special about a human baby"
  
  Reply View | 0 replies

visarga 3 months ago

Yeah I think the math+code reasoning models, like o1 and r1, are doing what can be done with just pure compute without real world validation. But the real world is complex, we can't simulate it. Why do we make particle accelerators, fusion reactor prototypes, space telescopes, year long vaccine trials? It's because we need to validate ideas in the real world that cannot be done theoretically or computationally.

Reply View 0 replies