Agent Observability: Monitoring and Understanding Agents at Internet Scale — Daniel Nadasi at AI Engineer Melbourne 2026
Agent Observability: Monitoring and Understanding Agents at Internet Scale
When you build software that runs on the scale Google operates at, the usual rules stop applying. You're not debugging a single request. You're managing millions of simultaneous decision-making processes, each one making autonomous choices, each one capable of cascading failures you won't see until they've already affected users.
Traditional monitoring breaks down almost immediately at that scale. You can track metrics: latency, error rates, throughput. But those aggregate metrics tell you almost nothing about what an agent actually did, why it made the decisions it made, or where the failure points are. If your error rate spikes 0.1%, that might represent thousands of individual failures — or it might represent one agent behaving badly in a specific scenario that affects a large percentage of requests matching a particular pattern.
Understanding that difference requires fundamentally different observability patterns.
Daniel Nadasi's work at Google on agent infrastructure reveals what those patterns need to be. The key insight is that agents are decision-making systems, so observability needs to focus on decisions: what inputs led to this decision, what reasoning pathway did the agent follow, what alternatives were considered, and why was this one selected?
This is different from traditional systems observability. You can't just sample requests or trace execution. You need to understand the agent's decision process in sufficient detail that you can identify where it's going wrong without examining every single decision. You need to be able to say "agents in scenario X are making bad choices" and have enough trace data to understand why.
At scale, this creates immediate practical problems. Storing complete decision traces for millions of agents costs enormous amounts of storage. Searching through that data to find patterns is computationally expensive. You need to be selective about what you capture, when you capture it, and how you make it queryable. But you also can't be so selective that you miss the failures that matter.
The solution requires new primitives. You need structured decision logs that capture the essential information about each decision: inputs, the decision made, confidence levels, alternatives considered. You need sampling strategies that preserve rare events (the failures) while discarding routine decisions. You need analysis tooling that lets you ask questions like "show me all decisions where the agent had low confidence but proceeded anyway" or "show me cases where the agent's prediction diverged significantly from actual outcomes."
You also need to think differently about alerting. Traditional alerting is threshold-based: if latency exceeds 100ms, alert. But agent failures aren't always obvious in latency. You might need to alert on decision patterns: if an agent's confidence consistently drops below a threshold, or if predicted outcomes diverge from actual outcomes by a certain margin.
This matters now because production agent systems are becoming common. Teams are moving from research prototypes to systems that handle real work, real decisions, real consequences. Without good observability, they're flying blind — they won't see problems until users hit them. Building the observability infrastructure to understand agent behaviour is as important as building the agents themselves.
It's also becoming clear that agent behaviour is harder to predict and control at scale than traditional software. A system that works perfectly in testing can behave unexpectedly at scale when it encounters edge cases it wasn't trained on or encounters novel combinations of conditions. Good observability is how you catch those cases and understand what's actually happening.
The framework for thinking about this is shifting. Instead of "does this agent work?" the question becomes "can I understand what this agent is doing well enough to catch problems before they harm users?" That's an observability problem first, an engineering problem second.
Daniel Nadasi, Principal Engineer at Google, is presenting this talk at AI Engineer Melbourne 2026 on June 3-4.
Great reading, every weekend.
We round up the best writing about the web and send it your way each Friday.
