The AI code quality gap, in numbers
New research quantifies the AI code quality gap, revealing more defects, higher severity issues, and security risks compared with human-written code.
Read More
“The risk that keeps me up at night is trust in machine decision-making.”
That’s what Nikk Gilbert (CISO at RWE) told us when we asked him what concerns him the most about the future of cybersecurity. And it’s a single sentence that becomes more and more relevant as time goes on – because we’re handing authority to AI systems faster than we can test their limits.
As organisations continue to leverage AI to write new code, trust in that code could be dangerously misplaced.
A new report from Veracode, on the security of GenAI code, offers a clear snapshot of how today’s LLMs behave when you ask them to write production-like code. One stat jumps off the page: across all languages and tasks, models chose secure implementations only 55% of the time – meaning 45% of AI-generated code introduced a known security flaw.
But even that average hides sharp contrasts. Java performs the worst by a wide margin, managing only 28.5% secure code, far behind Python (61.7%), JavaScript (57.3%) and C# (55.3%).
And while models handle SQL injection (CWE-89) and weak cryptography (CWE-327) relatively well (80.4% and 85.6% success respectively), their performance collapses on cross-site scripting and log injection, where pass rates fall to 13.5% and 12%. These are not, by any measure, obscure vulnerabilities; they’re among the oldest and most common ways attackers compromise systems.
The explanation lies deep in how these models are trained. As Veracode notes, LLMs aren’t equipped to infer which variables require sanitisation, because that depends on the surrounding application context – and no model today has enough architectural understanding to replicate the data flow reasoning of a static analysis engine.
One of the findings that stood out to us is what hasn’t changed. Because while LLMs have become dramatically better at producing syntactically correct, executable code, security performance has remained pretty much flat over the past two years.
Even model size barely moves the needle. Larger models don’t produce meaningfully safer code; in fact, the report finds that size has “only a very small effect” on security, and even that advantage disappears in newer generations.
So newer models may produce tidier outputs – but they’re not any better at avoiding dangerous patterns. Gilbert warned that “there’s no safety net when decisions outpace human reaction time,” and Veracode’s findings suggest that organisations are already operating without one.
Code quality is only one piece of the complex code safety puzzle. Research from Gartner, published in November 2025, focuses on another (more human) problem taking root: shadow AI. We’ve written about it recently on the blog; and in this new survey of 302 cybersecurity leaders, 69% reported they suspect or have evidence that employees are using unauthorised GenAI tools.
The consequences of this are looming. Gartner predicts that by 2030, more than 40% of enterprises will suffer a security or compliance incident linked to shadow AI.
Many of those predicted incidents will likely be linked to uncontrolled code, dependencies, decisions and artefacts created outside organisational oversight. Just as shadow IT reshaped governance a decade ago, shadow AI is reshaping it again – but faster.
Gartner highlights a second blind spot with long-term consequences: AI technical debt. As organisations increasingly rely on AI to generate code, content and design assets, few are tracking these outputs as maintainable objects. Yet by 2030, Gartner expects half of enterprises to face delayed upgrades or rising maintenance burdens because of unmanaged GenAI artefacts.
Add to that four more blind spots – sovereignty, skills erosion, vendor lock-in and interoperability gaps – and the pattern doesn’t look positive. As they adopt AI, organisations are absorbing hidden liabilities they may not have the capability (or the budget) to unwind.
The thing that ties Veracode and Gartner’s findings together is a failure of pace. Because AI accelerates everything: delivery, iteration, experimentation. What it doesn’t accelerate is security review, governance, or the careful architectural thinking required to decide whether code should exist at all.
Developers reach for AI because it saves time, and teams integrate it because it boosts output. Boards approve it because competitors are doing the same. But in many organisations, no one is asking the slower questions:
If you avoid these questions, you’re setting yourself up for the exact scenario Gilbert was worried about: “By the time we realise something has gone wrong, the damage will already be done.”
There’s no mystery about what resilience looks like. Gartner urges CIOs to publish clear AI use policies, audit regularly for shadow AI and track GenAI artefacts as first-class assets on IT dashboards. Veracode points to treating AI-generated code like any untrusted third-party contribution: analyse it with SAST, review it, run dependency scans and never assume safety by default.
Equally important is killing the habit of ‘vibe coding’: handing an LLM a loose description and hoping it chooses a secure pattern. Security requirements have to be embedded into prompts, workflows and CI/CD stages – because AI doesn’t default to safe behaviour unless you tell it to.
And despite the automation, human expertise remains critical. LLMs are powerful pattern machines, but they don't understand architecture, context or intent. Developers and security engineers do.
Join the newsletter to receive the latest updates in your inbox.
New research quantifies the AI code quality gap, revealing more defects, higher severity issues, and security risks compared with human-written code.
Read More
From Mr. Robot to blockbuster cyber chaos, we examine how film portrayals of hacking shape cybersecurity awareness, budgets, and risk perception in the real world.
Read More
Cybersecurity predictions for 2026 highlight AI agents, identity risk, supply chain attacks and resilience.
Read More