Comment by segmenta

Thanks for the pointer. We do agree that not all agentic systems should be multi-agent.

Having said that, from our experience we see that for complex workflows e.g. customer support for enterprises, both quality and maintainability stands to gain when the system is decomposed into smaller scoped agents. We see a parallel of this in humans as well. For instance, when we call into customer support we get routed to different human agents based on our query.

OpenAI says something similar in their recent guide on building agents [0]: "For many complex workflows, splitting up prompts and tools across multiple agents allows for improved performance and scalability. When your agents fail to follow complicated instructions or consistently select incorrect tools, you may need to further divide your system and introduce more distinct agents."

A relevant benchmark here might be the Instruction Following benchmark: https://scale.com/leaderboard/multichallenge. Even the best reasoning models top out at ~60% accuracy on this.

The options to improve accuracy then, are (a) either fine-tune a model on this task specific dataset, (b) or decompose the problem into smaller sub-problems (divide-and-conquer) - this is more practical and maintainable.

[0] https://cdn.openai.com/business-guides-and-resources/a-pract...