Year round learning for product, design and engineering professionals

How Many Agents Are Too Many? The Hidden Cost of Multi-Agent Systems — Anannya Roy Chowdhury at AI Engineer Melbourne 2026

Anannya Roy Chowdhury at AI Engineer Melbourne 2026

How Many Agents Are Too Many? The Hidden Cost of Multi-Agent Systems

The multi-agent architecture has become the aspirational design pattern for ambitious AI systems. The logic is appealing: distribute reasoning across specialized agents, orchestrate their collaboration, and watch them tackle problems that single models struggle with. But somewhere between the whiteboard architecture and production deployment, a sobering reality often emerges: more agents don't always mean better performance, and they frequently mean higher costs and slower responses.

This tension reveals something important about how we're designing AI systems. We've borrowed concepts from distributed systems and microservices architecture and applied them to LLMs, assuming the same principles apply. Yet language models have different failure modes, different economics, and different scaling characteristics than traditional software components.

The costs compound in ways that catch teams off-guard. Each agent invocation means additional token processing, API calls, and latency added to the critical path. If you're running orchestration logic that spins up three agents to solve a problem that a well-engineered single model could handle, you're not just paying three times the cost—you're paying for the coordination overhead, the context switching, and the increased failure surface area. A single weak link in your agent chain can collapse the entire response.

Then there's the question of diminishing returns. At some point, adding more specialized agents to handle edge cases or improve accuracy stops being worth the engineering complexity. The performance gains flatten while the operational burden continues to grow. Teams end up maintaining sprawling systems with dozens of agents, most of which handle rare scenarios or provide marginal improvements.

This matters right now because we're in the phase where multi-agent systems are being adopted widely, but we haven't yet developed mature operational practices around them. Teams are making architectural decisions based on conference talks and research papers rather than the grinding realities of production systems. By the time they discover that their orchestration logic is the bottleneck, or that a simpler approach would have served better, they've already built substantial infrastructure around the multi-agent pattern.

The engineering discipline that's emerging now is about honest tradeoff analysis. When does coordination logic between agents actually improve your system? When would a single larger model, or a model with retrieval augmentation, be cheaper and faster? How do you measure whether that additional agent is earning its cost, or whether it's just adding complexity?

These questions require metrics beyond accuracy. Production systems need to track latency at each stage, cost per invocation, failure rates when individual agents fail, and the correlation between system complexity and operational burden. The most sophisticated teams are discovering that their best-performing systems often have fewer agents than their initial designs predicted—not because the teams were wrong about needing multi-agent reasoning, but because they learned to be ruthless about agent minimalism.

The real insight is that multi-agent architecture is valuable for genuine problems that require fundamentally different reasoning modes. But many systems claimed to need multi-agent approaches actually benefit from careful engineering of simpler alternatives: better prompts, smarter retrieval augmentation, or selective use of specialized smaller models. The skill now is knowing which camp your problem falls into, before you've committed to building and supporting a complex distributed system.

Anannya Roy Chowdhury is presenting on this critical architectural decision-making at AI Engineer Melbourne 2026, June 3-4 in Melbourne, Australia.

delivering year round learning for front end and full stack professionals

Learn more about us

Web Directions South is the must-attend event of the year for anyone serious about web development

Phil Whitehouse General Manager, DT Sydney