Comment by ianbutler

Comment by ianbutler 2 days ago

5 replies

Disclosure: Working on a company in the space and have recently been compared to Devin in at least one public talk.

Devin has tried to do too much. There is value in producing a solid code artifact that can be handed off for review to other developers in limited capacities like P2s and minor bugs which pile up in business backlogs.

Focusing on specific elements of the development loop such as fix bugs, add small feature, run tests, produce pull request is enough.

Businesses like Factory AI or my own are taking that approach and we're seeing real interest in our products.

yoavm 2 days ago

Not to take away from your opinion, but I guess time will tell? As models get better, it's possible that wide tools like Devin will work better and swallow tools that do one thing. I think companies much rather have a AI solution that works like what they already know (developers), than one that works in the IDE, another that watches to Github issues, another that reviews PRs, and one that hangs on Slack and makes small fixes.

> Businesses like Factory AI or my own are taking that approach and we're seeing real interest in our products.

Interest isn't what tools like Devin are lacking, (un)fortunately.

To be clear, I do share a lot of scepticism regarding all the businesses working around AI code generation. However, that isn't because I think they'll never be able to figure it out, but because I think they are all likely to figure it out at the end, at the same time, when better models come out. And none of them will have a real advantage over the other.

  • ianbutler 2 days ago

    I've recently had several enterprise level conversations with different companies and what we're being asked for is specifically the simpler approach. I think that is the level of risk they're willing to tolerate and it will still ameliorate a real issue for them.

    The key here is my product is no worse positioned to do more things if and when the time comes, but building a solid foundation and trust, and not having the quiet part be (which I heard as early as several months ago) that your product doesn't work means we'll hopefully still have the customer base to roll that out to.

    I've talked to Devin's CEO once at Swyx's conference last June, they're very thoughtful and very kind so this must be very rough but between when they showed their demo then and what I'm hearing now the product has not evolved in a way where they are providing value commensurate with their marketing or hype.

    I'm a fan of Guillermo Rauch's (Vercel CEO) take on these things. You earn the right to take on bigger challenges and no one in this space has earned the right yet including us.

    Devin's investment was fueled by hyperspeculation early on when no one knew what the shape of the game was. In many ways we still don't, but if you burn your reputation before we get there you may not be able to capitalize on it.

    To be completely fair to them, taking the long view and the bank account to go with it they may still be entirely fine.

    • likium 2 days ago

      > You earn the right to take on bigger challenges and no one in this space has earned the right yet including us.

      Not entirely. We're in interesting times where products with better models can suddenly leapfrog and displace even current upstarts. Cursor won over Copilot from leveraging Claude Sonnet 3.5. They didn't "earn the right".

      Improvements with models will help those with the existing infrastructure that can benefit from it. I'm not saying Devin will win when that time comes, but a similar product might find their space quickly.

      • kgilpin a day ago

        I just want to note that Copilot is multi model now and can also run Sonnet.

morgante 2 days ago

You can get a much higher hit rate with more constrained agents, but unfortunately if it's too constrained it just doesn't excite people as much.

Ex. the Grit agent (my company) is designed to handle larger maintenance tasks. It has a much higher success rate, with <5% rejected tasks and 96% merged PRs (including some pretty huge repos).

It's also way less exciting. People want the flashy tool that can solve "everything."