Comment by torginus
Comment by torginus a day ago
Much has been made in its article about autonomous agents ability to do research via browsing the web - the web is 90% garbage by weight (including articles on certain specialist topics).
And it shows. When I used GPT's deep research to research the topic, it generated a shallow and largely incorrect summary of the issue, owning mostly to its inability to find quality material, instead it ended up going for places like Wikipedia, and random infomercial listicles found on Google.
I have a trusty Electronics textbook written in the 80s, I'm sure generating a similarly accurate, correct and deep analysis on circuit design using only Google to help would be 1000x harder than sitting down and working through that book and understanding it.
This story isn’t really about agents browsing the web. It’s a fiction about a company that consumes all of the web and all other written material into a model that doesn’t need to browse the web. The agents in this story supersede the web.
But your point hits on one of the first cracks to show in this story: We already have companies consuming much of the web and training models on all of our books, but the reports they produce are of mixed quality.
The article tries to get around this by imagining models and training runs a couple orders of magnitude larger will simply appear in the near future and the output of those models will yield breakthroughs that accelerate the next rounds even faster.
Yet here we are struggling to build as much infrastructure as possible to squeeze incremental improvements out of the next generation of models.
This entire story relies on AI advancement accelerating faster in a self-reinforcing way in the coming couple of years.