AI and Software Engineering. Part I The impact of generative AI on software development
One of the most focused areas in the use of large language models, which has arguably seen the best outcomes, is in software development.
Both numerous startups and established players in the IDE market now offer generative AI solutions. OpenAI recently launched their ‘canvas’ UI, which is similar to Claude’s UI for working with code. Specialized models like Meta’s Code Llama and Mistral’s Codestral demonstrate that specialized models can outperform general ones. A recent survey, evaluating dozens of models from the past year, highlights just how active this field is.
There are also tools like V0 from Vercel, which focuses on producing React-based code, and CodeWP, a specialized development environment for WordPress. It’s likely we’ll see many more such tools, focused on specific languages and environments.
Is It Worth All the Effort?
These tools are still in their infancy, and evidence of their effectiveness is largely anecdotal. However, some large-scale reports have highlighted their benefits.
Amazon has reported saving thousands of developer years and tens of millions of dollars annually in a very specific use case—upgrading the version of Java used at AWS.
Surveys of developers (to be taken with a grain of salt for a number of reasons, including their source as in this case) suggest developers at least feel more productive. For example, GitHub (ok, not the most unbiased source) reported productivity and quality improvements with GitHub Copilot, such as faster code completion, quality improvements, and faster merge times.
Our research found that the quality of the code authored and reviewed was better across the board with GitHub Copilot Chat enabled, even though none of the developers had used the feature before.
- 85% of developers felt more confident in their code quality when authoring code with GitHub Copilot and GitHub Copilot Chat.
- Code reviews were more actionable and completed 15% faster with GitHub Copilot Chat.
- 88% of developers reported maintaining flow state with GitHub Copilot Chat because they felt more focused, less frustrated, and enjoyed coding more, too.
There have been more skeptical results as well. Reports of Downward Pressure on Code Quality and, as recently widely covered, Uplevel Data labs reported an increased bug rate of 41%, and little increase in productivity based on PR throughput.
The vibe check
At such an early stage in the adoption of these technologies, mixed results should not be unexpected. The models, tools, and modes of use are still very new. Perhaps there’s no real long-term advantage—maybe we’re experiencing a short-term surge of easily generated code, which is long known not to necessarily correlate with quality.
The first-hand experiences of respected software engineers, like Simon Willison, give me a sense that there’s something important here to pursue. Simon, a co-creator of the Django Web Framework and a Board Member of the Python Software Foundation, is a prominent voice in AI Engineering. In a recent interview on TWIML, Simon discussed his use of LLMs for code, noting two distinct modes: exploratory for quick prototyping and production for high-quality code.
The first is exploratory mode, mainly for quick prototyping—sometimes in programming languages I am unfamiliar with.
The other side is when I’m writing production code, code that I intend to ship, then it’s much more like I’m treating it basically as an intern who’s faster at typing than I am.
That’s when I’ll say things like, “Write me a function that takes this and this and returns exactly that.”
I strongly recommend either listening to the whole interview or at least reading Simon’s write-up.
Ultimately, I recommend exploration. Engage with all tools you can to develop intuitions about what these technologies do well and where they falter. How can they help you improve? It’s not just about generating code. For instance, if you’re unsure how some code works, ask it to explain. If you’re curious about strategies for solving a particular problem, ask for suggestions and test your current ideas. These tools are also being used to write unit tests and conduct code reviews.
Looking for inspiration?
Looking for some inspiration or ideas? At our Summit last year, Lachlan Hardy explored how he’s using AI as a software engineer. Watch for free on Conffab (with no signup required).
And at our Dev Summit in late November, we have several sessions that will be valuable for all software engineers:
- Phil Nash explores how to work with generative AI as a JavaScript developer.
- Shivay Lamba covers semantic and vector search for front-end and product engineers.
- Mat Colman talks about building an AI team when no one knows anything about AI.
- Jason Mayes gets into details on AI in the browser and web apps of the future with Web AI.
These technologies are transforming how we work as software engineers—they already are. But this transformation is only beginning.
We’ll also look at how they transform what we build and the nature of web applications in Part II of this series.
Great reading, every weekend.
We round up the best writing about the web and send it your way each Friday.