Stop vibing your agents to production: applying ML discipline to agent development
When I joined my current team, it was a familiar pattern: 6-8 experiments over a year, each taking 10-12 weeks, 60-70% of the time burned on infrastructure, one thing in production held together with duct tape, and our entire agent lifecycle dependent on what our cloud provider made available in our region. The fix wasn’t a new framework. It was an old playbook: ML engineering. Version artifacts like model checkpoints, define evaluators like loss functions, search hyperparameters systematically, and decouple your tooling from your cloud provider. The first experiment under this approach finished in 4 weeks, and other teams across the organisation started running their own experiments without us. In this talk, I’ll walk through the methodology, the key trade-offs, and demo HoloDeck, the open-source distillation of everything I learned.
Justin Barias
Justin Barias is a Lead Software & AI Engineer currently building experimentation and AIOps platforms for an Australian government department. Before that, he spent 6 years at Microsoft as a Senior Software Engineer, working across distributed systems, data engineering, and early custom Copilot development using Azure OpenAI, LangChain, and Semantic Kernel. His background in platform engineering heavily shaped how he approaches AI engineering. The same principles that make platforms work (consistency, self-service, fast feedback loops) are what he’s been applying to agent development since GPT-3.5 Turbo. He recently built his first open-source tool, HoloDeck, a CLI toolkit for config-driven agent development, testing, and deployment. He writes about the realities of shipping agents at justinbarias.io.