Comment by mohsen1
Comment by mohsen1 21 hours ago
I’ve been exploring this too, since I rely on LLMs a lot to build software. I’ve noticed that our dev loop-writing, testing-is often mostly human-guided, but language models frequently outperform us in reasoning. If we plug in more automation; MCP tools controlling browsers, documentation readers, requirement analysers, we can make the cycle much more automated, with less human involvement.
This article suggests scaling up RL by exposing models to thousands of environments
I think we can already achieve something similar by chaining multiple agents:
1. A “requirement” agent that uses browser tools to craft detailed specs from docs.
2. A coding agent that sets up environments (Docker, build tools) via browser or CLI.
3. A testing agent that validates code against specs, again through tooling.
4. A feedback loop where the tester guides the coder based on results.
Put together, this system becomes a fully autonomous development pipeline-especially for small projects. In practice, I’ve left my machine running overnight, and these agents propose new features, implement them, run tests, and push to repo once they pass. It works surprisingly well.
The main barrier is cost—spinning up many powerful models is expensive. But on a modest scale, this method is remarkably effective.
> The main barrier is cost
I very much disagree. For the larger, more sophisticated stuff that runs our world, it is not cost that prohibits wide and deep automation. It's deeply sophisticated and constrained requirements, highly complex existing behaviors that may or may not be able to change, systems of people who don't always hold the information needed, usually wildly out of date internal docs that describe the system or even how to develop for it, and so on.
Agents are nowhere near capable of replacing this, and even if they were, they'd change it differently in ways that are often undesirable or illegal. I get that there's this fascination with "imagine if it were good enough to..." but it's not, and the systems AI must exist in are both vast and highly difficult to navigate.