Avni Bhatt

Avni Bhatt

Principal Architect

When a Small Language Model Beat Our LLM in Production

explore the program

When a Small Language Model Beat Our LLM in Production

Large language models are often the default choice for production AI systems, even when the task does not require broad reasoning or generative depth. In this talk, I will share a real production case where an LLM-based solution underperformed on latency, cost, and reliability and was ultimately replaced, in part, by a small language model.

The system in question supported a high-volume enterprise workflow involving structured extraction, classification, and validation. While the initial LLM implementation performed well in early prototypes, production usage exposed several issues: inconsistent outputs, escalating inference costs, and difficulty enforcing deterministic behaviour. These problems became more pronounced under scale.

I will walk through the decision process that led us to introduce an SLM, the architectural changes required, and the criteria we used to evaluate success. The talk will cover where the SLM outperformed the LLM, where it clearly did not, and how we designed a hybrid pattern that escalates to an LLM only when necessary.

The session includes a live demo showing the before-and-after behaviour of the system, along with production metrics such as latency, cost per request, and error rates. I will also discuss failure modes we encountered, trade-offs we accepted, and the signals that helped us decide early whether an SLM was a viable replacement.

My aim is not to advocate for SLMs over LLMs in general, but to share the signals, metrics, and decision criteria that helped us choose the right tool for the job. I believe this perspective is timely as more teams move beyond experimentation into sustained production usage.

Avni Bhatt

My role spans the full spectrum including enterprise architecture, data architecture, solution and integration design, technical delivery management, and building high-performing teams from the ground up. I am energized by opportunities where innovation meets scale and where the right architecture is considered as an investment. https://thefinallyblock.com/