The Application Layer Is the New Research Lab
In the pre-genAI era, vertical product teams handed insights to a separate R&D group, who shipped a new model two quarters later. That handoff is now a bug. Agentic systems are built from dozens of model calls, judges, tools, and harness decisions, and every one of those is a hyperparameter. The product surface and the training surface are the same surface. This talk argues that every vertical AI company is now its own applied research lab. I walk through what that function actually ships (custom judges, scenario benchmarks, data flywheels, harness tuning), where the thesis breaks (most domains are not Cursor), and how to staff for it without losing engineering velocity.
Abdul Karim
Abdul Karim is ML a researcher and engineer with around a decade in applied AI.
His work spans LLM evaluation, agentic systems, RL training, and the data-flywheel side of vertical AI products.