Comment by davedx

Comment by davedx 6 months ago

One thing that surprised me a little is that there doesn't seem to be an "ask for help" escape hatch in it - it would work away for literally days on a task where any human would admit they were stuck?

One of the more important features of agents is supposedly that they can stop and ask for human input when necessary. It seems it does do this for "hard stops" - like when it needed a human to setup API keys in their cloud console - but for "soft stops" it wouldn't.

By contrast, a human dev would probably throw in the towel after a couple of hours and ask a senior dev for guidance. The chat interface definitely supports that with this system but apparently the agent will churn away in a sort of "infinite thinking loop". (This matches my limited experience with other agentic systems too.)

coffeebeqn 6 months ago

LLMs can create infinite worlds in the error message it’s receiving. It probably needs some outside signal to stop and re-assess. I don’t think LLMs have any ability to reason if they’re lost in their own world on their own. They’ll just keep creating new less and less coherent context for themselves

Reply View 8 replies

someothherguyy 6 months ago

If you correct an LLM based agent coder, you are always right. Often, if you give it advice, it pretends like it understands you, then goes on to do something different from what it said it was going to do. Likewise, it will outright lie to you telling you it did things it didn't do. (In my experience)

Reply View | 2 replies
- rsynnott 6 months ago
  
  So when people say these things are like junior developers, they really mean that they’re like the worst _stereotype_ of junior developers, then?
  
  Reply View | 1 reply
  
  [removed] 6 months ago
  
  [deleted]
  
  Reply View | 0 replies
davedx 6 months ago

For sure - but if I'm paying for a tool like Devin then I'd expect the infrastructure around it to do things like stop it if it looks like that has happened.
What you often see with agentic systems is that there's an agent whose role is to "orchestrate", and that's the kind of thing the orchestrator would do: every 10 minutes or so, check the output and elapsed time and decide if the "developer" agent needs a reality check.

Reply View | 2 replies
- mousetree 6 months ago
  
  How would it decide if it needs a reality check? Would the thing checking have the same limitations?
  
  Reply View | 1 reply
  
  svieira 6 months ago
  
  Decision trees and random forests (funnily enough, this is not sarcasm).
  
  Reply View | 0 replies
tobyhinloopen 6 months ago

You can maybe have a supervisor AI agent trigger a retry / new approach

Reply View | 0 replies
nejsjsjsbsb 6 months ago

They need impatience!

Reply View | 0 replies

verdverm 6 months ago

I think training it to do that would be the hard part.

- stopping is probably the easy part

- I assume this happens during RLFH phase

- Does the model simply stop or does it ask a question?

- You need a good response or interaction, depending on the query? So probably sets or decision trees of them, or agentic even? (chicken-egg problem?)

- This happens 10s of thousands of times, having humans do it, especially with coding, is probably not realistic

- Incumbents like M$ with Copilot may have an advantage in crafting a dataset

Reply View 0 replies

csomar 6 months ago

> One thing that surprised me a little is that there doesn't seem to be an "ask for help" escape hatch in it - it would work away for literally days on a task where any human would admit they were stuck?

You are over-estimating the sophistication of their platform and infrastructure. Everyone was talking about Cursor (or maybe was it astroturfing?) but once I checked it out, it was not far from avante on neovim.

Reply View 2 replies

a1j9o94 6 months ago

Cursor isn't designed to do long running tasks. As someone mentioned in another comment it's closer to a function call than a process like Devin.
It will only do one task at a time that it's asked to do.

Reply View | 1 reply
- dimitri-vs 6 months ago
  
  ...for now.
  They are pushing in this direction with the Composer Agent mode which can carry out a sequence of multi-file changes without you having to specify the files. It's pretty decent. If you're feeling brave there is also a beta "YOLO" mode that will auto approve these changes and run console commands.
  
  Reply View | 0 replies

ImHereToVote 6 months ago

There should be an energy coefficient to problems. You only get a set amount of energy to solve per issue. When the energy runs out. A human must help.

Reply View 0 replies

rfoo 6 months ago

Devin does ask for help when it can't do something. I think I have it asked me how to use a testing suite it had trouble running.

The problem is it really really hate asking for help if it had a skill issue, it would prefer running in circles than admitting it just can't do something.

Reply View 2 replies

Aeolun 6 months ago

So they perfectly nailed the junior engineer. It’s just that that isn’t what people are looking for.

Reply View | 1 reply
- rfoo 6 months ago
  
  Maybe. It's pretty weird and I'm still thinking about it.
  You can't throw junior engineers working on an issue under the bus when they clearly can't do that. Or at least it takes some effort. In return you may coach them and hope they eventually improves.
  Devin does look like junior engineers, but I've learned to just click "Terminate Session" immediately after I spotted that it was doing something hopeless. I've managed to get some real work done out of it, without much effort on my side (just check what it's doing every 10~15 minutes and type a few lines or restart session).
  
  Reply View | 0 replies

mkagenius 6 months ago

If they had built that from the beginning people would have said "every other tasks it asks me for help, how is it a developer then if I have to assist it all the time?"

But now since you are okay with that, I think it's the right time to add that feature.

Reply View 0 replies

bot403 6 months ago

You can set a "max work time" before it pauses so it wont go for days endlessly spending your credits. By default its set to 10 credits.

So I'm not sure how the author got it to go for days.

Reply View 0 replies