Comment by mritchie712

Comment by mritchie712 13 hours ago

11 replies

Some things we've[0] learned on agent design:

1. If your agent needs to write a lot of code, it's really hard to beat Claude Code (cc) / Agent SDK. We've tried many approaches and frameworks over the past 2 years (e.g. PydanticAI), but using cc is the first that has felt magic.

2. Vendor lock-in is a risk, but the bigger risk is having an agent that is less capable then what a user gets out of chatgpt because you're hand rolling every aspect of your agent.

3. cc is incredibly self aware. When you ask cc how to do something in cc, it instantly nails it. If you ask cc how to do something in framework xyz, it will take much more effort.

4. Give your agent a computer to use. We use e2b.dev, but Modal is great too. When the agent has a computer, it makes many complex features feel simple.

0 - For context, Definite (https://www.definite.app/) is a data platform with agents to operate it. It's like Heroku for data with a staff of AI data engineers and analysts.

CuriouslyC 13 hours ago

Be careful about what you hand off to Claude versus another agent. Claude is a vibe project monster, but it will fail at hard things, come up with fake solutions, and then lie to you about them. To the point that it'll add random sleeps and do pointless work to cover up the fact that it's reward hacking. It's also very messy.

For brownfield work, work on hard stuff or work in big complex codebases you'll save yourself a lot of pain if you use Codex instead of CC.

  • wild_egg 10 hours ago

    Claude is amazing at brownfield if you take the time to experiment with your approach.

    Codex is stronger out of the box but properly customized Claude can't be matched at the moment

    • CuriouslyC 9 hours ago

      The issue with Claude are twofold:

      1. Poor long context performance compared to GPT5.1, so Claude gets confused about things when it has to do exploration in the middle of a task.

      2. Claude is very completion driven, and very opinionated, so if your codebase has its own opinions Claude will fight you, and if there are things that are hard to get right, rather than come back and ask for advice, Claude will try to stub/mock it ("let's try a simpler solution...") which would be fine, except that it'll report that it completed the task as written.

    • gnat 9 hours ago

      What have you done to make Claude stronger on brownfields work? This is very interesting to me.

faxmeyourcode 11 hours ago

Point 2 is very often overlooked. Building products that are worse than the baseline chatgpt website is very common.

smcleod 13 hours ago

It's quite worrying that I have several times in the last few months had to really drive home why people should probably not be building bespoke agentic systems just to essentially act as a half baked version of an agentic coding tool when they could just go use Claude code and instead focus their efforts on creating value rather than instant technical debt.

  • CuriouslyC 13 hours ago

    You can pretty much completely reprogram agents just by passing them through a smart proxy. You don't need to rewrite claude/codex, just add context engineering and tool behaviors at the proxy layer.

verdverm 7 hours ago

yes, we should all stop experimenting and outsource our agentic workflows to our new overlords...

this will surely end up better than where big tech has already brought our current society...

For real though, where did the dreamers about ai / agentic free of the worst companies go? Are we in the seasons of capitulation?

My opinion... build, learn, share. The frameworks will improve, the time to custom agent will be shortened, the knowledge won't be locked in another unicorn

anecdotally, I've come quite far in just a week with ADK and VS Code extensions, having never done extensions before, which has been a large part of the time spent