Your AI Can’t Engineer (Yet)
Large language models excel at code—but engineering isn’t just code. When you ask an AI to calculate short-circuit currents per IEC 60909 or size a pavement per Austroads 2022, you’re asking it to operate outside its training distribution. The result: confident answers that miss unit conversions, ignore standard-specific constraints, and fail the “gotchas” that trip up junior engineers.
At Aurecon, a multinational engineering consultancy, we found that two-thirds of project rework stems from controllable errors—dimensional mistakes, specification mismatches, standards compliance failures. These are exactly the errors AI should catch. But how do you know if your AI assistant is actually reliable on engineering tasks?
This talk introduces aecbench, an open benchmark suite born from Aurecon’s quality engineering practice. With tasks across 12 engineering disciplines—electrical, civil, structural, geotechnical, and more—it maps the capability space AI needs to inhabit: deterministic calculations with standards compliance, mixed problems requiring judgment, and verification workflows that catch errors before they become rework.
But benchmarks aren’t just for measurement. Each task is an environment of experience—a structured space where agents learn what “correct” means in engineering. Deterministic tasks provide dense reward signals. Complexity tiers enable curriculum learning. “Gotchas” become adversarial scenarios that force understanding over pattern matching.
I’ll showcase results comparing frontier models, custom agentic harnesses, and early RL fine-tuning experiments on real engineering tasks—plus how the community can contribute challenges to the open benchmark and run agents on the private leaderboard.
Theodoros Galanos
Theodoros Galanos is the Generative AI Leader at Aurecon, where he leads AI engineering and the development of AI solutions to complex engineering problems. He contributes to initiatives such as Aurecon’s strategic partnership with Nomic to accelerate complex problem‑solving across the asset lifecycle. He is also the Chief Science Officer and Co‑Founder of infrared city, leading the development of AI‑powered environmental simulation tools that make advanced environmental analysis accessible to designers and engineers. His background spans research roles at the Austrian Institute of Technology, the creation of Archi|text, first LLM-driven architectural design approach, and published work in computational design and machine learning.