Fail Fast, Fix Faster: Why Faster AI Models Beat Smarter Ones Intuition says the smartest model should win. The model that reasons deeper, thinks longer, produces better results—that's the one you'd pick for an agentic coding loop. But intuition is wrong about this. In practice, a model that's 10x faster but only marginally competent often […]
Our AI Hallucinated in Production: How We Fixed It With Evals There's something particularly jarring about discovering that an AI system you deployed to production is making things up. Not failing gracefully, not returning errors, but confidently generating false information and presenting it as fact. This is what happened at REA Group, one of Australia's […]
Your AI Can't Engineer (Yet): Where AI Fails in Professional Contexts The demos are remarkable. An AI system accepts a brief specification and generates a detailed engineering design. It analyses complex problems and proposes solutions. The output looks professional and complete. But when actual engineers try to use these systems on real work, something critical […]
Orbital Lasers vs For Loops: Economically Matching Models to Tasks There's a persistent mythology in AI adoption: bigger is better. If GPT-4 is powerful, use GPT-4. If a model can handle complex reasoning, use it for everything. The assumption is that you're being conservative by using the most capable model available. You're minimising risk, guaranteeing […]
What Killed My Chat-as-a-Service? The Economics of AI Product Death A promising AI product launches to excitement and early adoption. The demo is impressive. Users sign up. Press coverage arrives. And then, quietly, the product fails—not due to technical limitations or bad marketing, but from economics that were never addressed in the initial business model. […]
Beyond Forgetful Bots: Architectural Patterns for Persistent, Proactive AI Agents Most AI agents in production are fundamentally stateless and reactive. They receive a request, process it, generate a response, and forget everything about the interaction. This architectural simplicity makes them easy to deploy and scale, but it also means they can never develop genuine understanding […]
Why Most AI De-Identification Fails in Production, And How We Built One Lawyers Actually Trust De-identifying text sounds simple when you're sitting in a demo. You've got a paragraph with personal information in it. You replace names with "[NAME]", phone numbers with "[PHONE]", dates with "[DATE]". The text is de-identified. Success. Show it to a […]
What We Learned Taking a Culture-First Approach to AI Adoption at Scale When a new technology arrives at an engineering organization, the typical response is predictable: roll out the tool, measure adoption, monitor code output. But what if the real story isn't about the tool at all — it's about how people actually work? Culture […]
AI Agents Are Distributed Systems: Applying Distributed Systems Thinking to Agent Engineering There's a curious blind spot in how many people approach AI agents: they think of them as monolithic systems. You give an agent a task, the agent processes it, the agent returns an answer. Simple cause and effect. Real AI agents are nothing […]
Before we begin with this week’s reading some news about upcoming events and more form Web Directions. Or jump straight to this week’s reading! Project Noops Mark Pesce and I team up to parse the signals out of the AI transformation as it happens at Noops. Read more and sign up. AI Engineer Nights (Sydney […]
Edge AI with Direct Device Control: Moving Intelligence Off the Cloud We're in what might be called the timeshare mainframe moment of AI. Even the devices in our pockets — phones with powerful processors, cameras, microphones — still route most of their AI inference through the cloud. Your voice assistant sends audio to a data […]
Treating Infrastructure as Data: Building an AI-Native Control Plane The way we manage cloud infrastructure has been fundamentally static. You write Infrastructure-as-Code declarations, engineers review and approve them, and automated systems deploy them. The human remains in the decision-making loop. But what if AI agents could directly query, understand, and modify infrastructure the way they […]
AGENTS.md Is the Wrong Conversation The AI industry is in specification-mode. How should agents communicate? What should the protocol look like? How do you define a standard so that agents built by different organizations can interoperate? There are frameworks, working groups, proposals for standardization. The energy is palpable. Finally, we're going to solve the agent […]
Agent Observability: Monitoring and Understanding Agents at Internet Scale When you build software that runs on the scale Google operates at, the usual rules stop applying. You're not debugging a single request. You're managing millions of simultaneous decision-making processes, each one making autonomous choices, each one capable of cascading failures you won't see until they've […]
Legacy Software + Agentic Discovery Legacy codebases are nightmares in the truest sense. Thousands of lines of code written by people who've left. Business logic embedded in places it has no right to be. Dependencies no one fully understands. Documentation that's either nonexistent or spectacularly out of date. When you need to change something, you're […]