Comment by jebarker

Comment by jebarker 9 months ago

> code still needs to be reviewed and tested, at least as much as you'd scrutinize the code of a brand new engineer just out of boot camp

> ..._massive_ boost to productivity. ~20% of the commits to the OpenHands codebase are now authored or co-authored by OpenHands itself.

I'm having trouble reconciling these statements. Where does the productivity boost come from since that reviewing burden seems much greater than you'd have if you knew commits were coming from a competent human?

lars512 9 months ago

There's often a lot of small fixes that not time efficient to do, but a solution is not much code and is quick to verify.

If the cost is small to setting a coding agent (e.g. aider) on a task, seeing if it reaches a quick solution, and just aborting if it spins out, you can solve a subset of these types of issues very quickly, instead of leaving them in issue tracking to grow stale. That lets you up the polish on your work.

That's still quite a different story to having it do the core, most important part of your work. That feels a little further away. One of the challenges is the scout rule, the refactoring alongside change that makes the codebase nicer. I feel like today it's easier to get a correct change that slightly degrades codebase quality, than one that maintains it.

Reply View 3 replies

jebarker 9 months ago

Thanks - this all makes sense - I still don't feel like this would constitute a massive productivity boost in most cases, since it's not fixing time consuming major issues. But I can see how it's nice to have.

Reply View | 2 replies
- rbren 9 months ago
  
  The bigger win comes not from saving keystrokes, but from saving you from a context switch.
  Merge conflicts are probably the biggest one for me. I put up a PR and move onto a new task. Someone approves, but now there are conflicts. I could switch off my task, spend 5-10 min remembering the intent of this PR and fixing the issues. Or I could just say "@openhands fix the merge conflicts" and move back to my new task.
  
  Reply View | 1 reply
  
  svieira 9 months ago
  
  The issue is that you still need to review the fixed PR (or someone else does) which means you just deferred the context switch, you didn't eliminate it. And if the fix is in a new commit, that's possible (whereas if it rebases you have to remember your old SHA).
  Playing the other side, pipelining is real.
  
  Reply View | 0 replies

lolinder 9 months ago

I haven't started doing this with agents, but with autocomplete models I know exactly what OP is talking about: you stop trying to use models for things that models are bad at. A lot of people complain that Copilot is more harm than good, but after a couple of months of using it I figured out when to bother and when not to bother and it's been a huge help since then.

I imagine the same thing applies to agents. You can waste a lot of time by giving them tasks that are beyond them and then having to review complicated work that is more likely to be wrong than right. But once you develop an intuition for what they can and cannot do you can act appropriately.

Reply View 0 replies

drewbug01 9 months ago

I suspect that many engineers do not expend significant energy on reviewing code; especially if the change is lengthy.

Reply View 0 replies

linsomniac 9 months ago

>burden seems much greater than...

Because the burden is much lower than if you were authoring the same commit yourself without any automation?

Reply View 4 replies

jebarker 9 months ago

Is that true? I'd like to think my commits are less burdensome to review than a fresh out of boot camp junior dev especially if all that's being done is fixing linter issues. Perhaps there's a small benefit, but doesn't seem like a major productivity boost.

Reply View | 3 replies
- ErikBjare 9 months ago
  
  A junior dev is not a good approximation of the strengths and weaknesses of these models.
  
  Reply View | 2 replies
  
  rbren 9 months ago
  
  Agreed! The comparison is great for estimating the scope of the tasks they're capable of--they do very well with bite-sized tasks that can be individually verified. But their world knowledge is that of a principal engineer!
  I think this is why people struggle so much with agents--they see the agent perform magic, then assume it can be trusted with a larger task, where it completely falls down.
  
  Reply View | 0 replies
  
  jebarker 9 months ago
  
  The post I originally commented on literally made that comparison when describing the models as a massive productivity boost.
  
  Reply View | 0 replies