Comment by simonw

Comment by simonw 3 months ago

"It’s becoming clear that real-world agentic systems work best when multiple agents collaborate, rather than having one agent attempt to do everything."

I'll be honest: I don't buy that premise (yet). It's clearly a popular idea and I see a lot of excitement about it (see Google's A2A thing) but it feels to me like a pattern that, in many cases, will make LLM code even harder to get reliable results from.

I worry it's the AI-equivalent of microservices: useful in a small set of hyper complex systems, the vast majority of applications that adopt it would have been better off without.

If there are strong arguments counter to what I've said here I'd love to hear them!

danenania 3 months ago

A few concrete examples of multi-agent collaboration being useful in my project Plandex[1]:

- While it uses Sonnet 3.7 by default for creating the edit snippet when writing code, calls related to applying the snippet and validating the result (and falling back to a whole file write if needed) use o3-mini (soon to be o4-mini) which is 1/3 the cost, much faster, and actually more accurate and reliable than Sonnet for this particular narrow task.

- If Sonnet 3.7's context limit is exceeded in the planning stages, it can switch to a Gemini model for planning, then go back to Sonnet again for the implementation steps (since these only need the files relevant to each step).

- It eagerly summarizes the conversation after each response so that the summary can be used later if the conversation gets too long. This is only practical because much smaller models than the main planning/coding models are sufficient for a good summary. Otherwise it would be way too expensive.

It's definitely more complex, but I think in these cases at least, there's a real payoff for the trouble.

1 - https://github.com/plandex-ai/plandex

Reply View 2 replies

rchaves 3 months ago

is this multi-agent collaboration though, or is it just a workflow? All examples you listed seem to have pretty deterministic control flows (write then validade, context exceeded, after each response, etc)
when I think of multi-agent collaboration I think of also the control flow and handover to be defined by the agents themselves, this is the thing I have yet to see examples of in production, and the premise that I also don't buy yet

Reply View | 1 reply
- danenania 3 months ago
  
  You’re right that it’s a fuzzy line. That said, if you can make the contract/handoff between agents deterministic, you’ll always get better results by doing that, compared to letting the agents try to handle it through inference, since there will always be some error rate.
  For this reason, I think that for at least the next couple years, even very advanced agent systems are likely to have a lot of deterministic control flow and glue in their guts. To me, that doesn’t make them “not multi-agent”. Rather, this is how you can build multi-agent systems that actually work in reality. But much of it comes down to semantics, admittedly.
  
  Reply View | 0 replies

segmenta 3 months ago

Here are a few practical reasons for multi-agent systems:

1. LLMs handle narrower, simpler instructions better - decomposing into multiple agents improves reliability (related to instruction following accuracy).

2. Similarly, tool-calling accuracy improves when each agent has a smaller set of specific tools assigned to them.

3. Smaller agents mean prompt changes (which aren't very deterministic) can be isolated and tested more easily.

4. Dividing agents by task enables stronger, more precise guardrails for real-world use cases.

Happy to discuss further!

Reply View 3 replies

simonw 3 months ago

That's a really good answer. I suggest turning that into a set of working examples to help promote the idea - part of my hesitance around this is that it sounds good on paper but I've not seen convincing evidence that it works yet.
(Claude Code is an example that I believe does make good use of this pattern, but it's frustratingly closed source.)

Reply View | 2 replies
- pylotlight 3 months ago
  
  This article talks about it somewhat I think as well. Highlighting the difference between more advanced workflows and agentic style systems vs 'agents' https://blog.langchain.dev/how-to-think-about-agent-framewor...
  
  Reply View | 0 replies
- segmenta 3 months ago
  
  That’s a great suggestion, and I get the hesitation - we'll work on adding more concrete examples to help make the case!
  
  Reply View | 0 replies

nurettin 3 months ago

The sentence should read;

"It is becoming clear that agentic systems which run a prompt per work node is becoming a curiosity so we should hype it as the correct solution in order to make a buck despite all the efforts that have been spent trying to one-shot complex problems."

Reply View 1 reply

rchaves 3 months ago

well I think hype is not bad per se, I'd do it even if not trying to make a buck, it's okay (up to a point) to hype up something so that eventually it finds a problem where it fits well, but yeah, I'm still waiting on this one

Reply View | 0 replies

ethan_smith 3 months ago

The microservices analogy is spot-on - multi-agent systems introduce coordination overhead that's only justified when domain complexity naturally decomposes into specialized tasks with clear interfaces.

Reply View 1 reply

segmenta 3 months ago

Agree that the microservices analogy is great for the maintainability aspect of multi-agents. However, there is one more dimension which is specific to LLMs - performance. Smaller agents tend to have better instruction-following accuracy.

Reply View | 0 replies

ActionHank 3 months ago

It has been my experience that having short focused tasks overseen by some controller code that wires things together works more efficiently than multiagent approaches.

The agents “chat” a whole lot back and forth to figure out what code be a direct instruction.

Reply View 1 reply

segmenta 3 months ago

Curious - what was the use case you were trying out?

Reply View | 0 replies

rchaves 3 months ago

same here, but I would even avoid "strong arguments" because that's what we all have been doing so far

what I want is real use cases, show me real-world production examples from established companies where multi-agent collaboration helped them better than a simple agent + tools and deterministic workflows

Reply View 0 replies