Deploying AI at the Edge: Model Compression and Hardware-Aware Optimization
Large AI models often struggle to meet the latency, memory, and power constraints required for real-world edge deployments. This talk explores practical techniques for making modern AI models efficient enough to run on-device using model distillation, quantization, and hardware-aware optimization strategies. Attendees will learn how to reduce model size and inference costs while maintaining accuracy, covering approaches such as post-training quantization and efficient runtime optimization across modern AI frameworks and accelerators. The session will also highlight real-world tradeoffs between performance, memory footprint, and power efficiency when deploying AI applications on edge devices.
Shivay Lamba
Shivay Lamba is a software engineer and open source contributor passionate about AI and edge computing. With experience across startups and enterprise tech, he focuses on simplifying complex technologies for developers. Shivay actively speaks at global conferences, organizes community events, and contributes to projects in cloud-native ecosystems, WebAssembly, and machine learning. He has also built and mentored educational programs to bridge gaps in emerging tech adoption. When not coding, he enjoys writing, traveling, and mentoring young technologists.