Now problems vs. forever problems
When I hear criticism of or scepticism about AI code generation, in my conversation with often quite experienced software engineers, it often goes something like this: “I tried it and it didn’t work” or “the code quality was terrible.” Fair enough. But then I ask: when was that?
Six months ago. A year ago. Sometimes longer.
There’s an important distinction we need to make when evaluating any rapidly evolving technology: the difference between now problems and fundamental problems. Problems that will just never go away.
A now problem is a limitation that exists today but is likely to improve—often dramatically—with time. A forever problem is foundational to the thing itself. It won’t go away no matter how much progress we make.
Code quality from large language models? That’s a now problem. The models from six months ago are not the models of today. The models of today won’t be the models of six months hence. Anyone who’s been working with these tools for a year or more has an instinctive feel for this. They’ve watched Claude or GPT go from producing code that needed heavy editing to code that more often than not runs correctly on first try. They’ve seen context windows expand from thousands of tokens to hundreds of thousands. They’ve experienced the shift from “interesting toy” to “genuine productivity multiplier.”
Andrei Karpathy has recently observed that he produces perhaps 20% of his code now, 80% is produced by LLMs.
But if you tried these tools once, hit a wall, and walked away? You’re evaluating something at a particular moment in time, but generalising to how it will always be.
But weeks or perhaps a couple of months later the limitation you ran into no longer exists.
This doesn’t mean every criticism is invalid. Some problems are fundamental—or at least far more persistent than the quality-of-output concerns. The imprecision of human language when specifying what we want? That’s arguably closer to fundamental. The challenge of maintaining large codebases when the AI doesn’t have full context? Harder to solve (though context windows continue to increase, and techniques mean that the limit that seemed pretty fundamental a year or so ago of context windows having quadratic scaling costs no longer seems to be the case). The question of trust and verification when you can’t review every line (because there are too many lines for humans to review themselves in a reasonable time frame)? That’s a genuine tension that won’t simply disappear with better models.
But conflating now problems with forever problems leads to bad decisions. It leads people to dismiss technologies that could genuinely help them. It leads to the peculiar situation where the people most skeptical of AI coding tools are often those with the least recent experience using them. It’s a negative self-reinforcing feedback loop.
The conversation I keep having goes like this: someone expresses deep reservations about AI-generated code, I ask about their experience, they describe something from months ago, and when I show them what the current models can do, they’re genuinely surprised.
The technology moved. Their mental model didn’t.
I’m not suggesting we should be uncritical enthusiasts. Skepticism is healthy. But skepticism based on outdated experience isn’t skepticism—it’s nostalgia for a problem that may have already been solved.
If you tried AI code generation and found it wanting, maybe it’s time to try again.
Actually scratch that, it is definitely time. You’re not just trying it again, but recognising this is what software engineering actually looks like in 2026.
Great reading, every weekend.
We round up the best writing about the web and send it your way each Friday.