Comment by GaggiX
Maybe you can divide the task into verifiable environments like an HTML5 parser environment where an agent is going to build the parser and also check the progress against a test suites (the https://github.com/html5lib/html5lib-tests in this case) and then write the API into a .md, the job of the human is going to be at the beginning to create the various environments where the agents are going to build the components from (and also how much it can be divided into standalone components).
Thanks for the ideas, but I'll leave the torch for someone to pickup, the goal was to get as far as possible within 3 days, and with human steering, so I'm stopping here personally :)
I'll keep them in mind for the future, who knows, maybe some interesting iteration could be done on what's been made so far.