Comment by falcor84
So for anyone who doubted SWE-BENCH's relevance's to typical tasks, it seems that its stated 13.86% almost exactly matches this 3 successes out of 20 pilot outcome.
We're not quite there yet, but all of these issues seem to be solvable with current tech by applying additional training and/or "agentic" direction. I would now expect pretty much the textbook disruptive innovation process over the next decade or so, until the typical human dev role is pushed to something more akin to the responsibilities of current day architects and product managers. QA engineering though will likely see a big resurgence.
>> but all of these issues seem to be solvable with current tech by applying additional training and/or "agentic" direction.
Can you explain why you think this. From what I gather from other comments it seems like if we continue on current trajectory at best you'd still need a dev who understands the projects context to work in tandem w/ the agent so the code doesn't devolve into slop.