Comment by cadamsdotcom

There’s something interesting here around when to write code, when to reuse code, and when to replace the code that was written.

Browser use could be improved by being partly done with code and part genetically.. completing tasks on the web is deceptive because it seems easily codifiable (just have the model write some Playwright code!) while actually being gnarly as hell. What if the page has changed completely since last visit?

It’d be interesting to let the agent build up a library of code that it can reuse if it feels confident that will get the job done, while feeding back any error to let the agent debug.. and that might lead to it writing a new routine to stick in the library, possibly replacing the old one.

Seems like something today’s models could be made to do with a bit of work in the harness. Anyone tried anything like this?