Multi-Armed Bandits: The Scientific Shotgun for Evals
A/B testing is too rigid a tool for AI systems. You’re stuck serving worse results for the duration of the experiment and getting billed for slower models while three providers release SOTA updates this week.
Steal a trick from data science instead and use multi-armed bandits to organically surface ideal models, prompting choices and harnesses. You want your evals to be more than scores– make them an exploration in minimising regret.
Ron Au
Ron Au is a creative engineer at Canva moonlighting as a data scientist and has always waxed lyrical about the delightful parts of tech in equal parts with good humour. Tinkering since the HTML 4.0 days, he’s been in lockstep with production AI working at Leonardo.Ai, Hugging Face and Relevance AI.