Three Lanes Below One Millisecond: A Rust SDK for Gemini Live
Full-duplex voice agents punish the choices text-only agents forgive. A 50ms GC pause is a perceptible glitch. A blocked event loop is a barged-in user talking over your model. Google’s Python ADK is a beautiful kit for building agents — until audio frames arrive every 20 milliseconds and refuse to wait.
This talk walks through gemini-rs, an open-source Rust SDK that rebuilds Google’s Agent Development Kit and the Gemini Multimodal Live wire protocol from scratch, in three layered crates: a zero-copy WebSocket transport, an agent runtime with phase machines and typed prefix-scoped state, and a fluent builder where a production voice agent fits in twenty lines of declarative Rust.
The architectural keystone is a three-lane processor: a sync fast lane for audio and VAD under one millisecond, an async control lane for tool dispatch and phase transitions, and a debounced telemetry lane built on atomics. We’ll cover what broke in the naive translation, why Vertex AI sends binary frames where Google AI sends text, how to compose tool-using agents without rebuilding LangChain, and where the Python ADK still belongs in the stack.
Vamsi Ramakrishnan
Vamsi Ramakrishnan is the AI Lead Engineer for APAC at Google Cloud, where he leads the technical and customer engineering practice for AI across the region, helping organisations move from prototype to production with modern agent and LLM infrastructure. His work spans the full stack of applied AI — from agent execution frameworks like ADK, LangGraph and CrewAI, through to context engineering, evaluation, and the operational realities of running agents at scale.
Vamsi writes and speaks regularly about where the agent ecosystem is heading, drawing on lessons from compiler design, distributed systems, and even legal philosophy to think about how we architect AI systems that actually hold up in production. He brings a builder’s perspective to a field that often gets stuck at the demo stage.