When AI writes the code, who takes responsibility?

by Black Hat Middle East and Africa

on 23 Dec 2025

When AI writes the code, who takes responsibility?

“The risk that keeps me up at night is trust in machine decision-making.”

That’s what Nikk Gilbert (CISO at RWE) told us when we asked him what concerns him the most about the future of cybersecurity. And it’s a single sentence that becomes more and more relevant as time goes on – because we’re handing authority to AI systems faster than we can test their limits.

As organisations continue to leverage AI to write new code, trust in that code could be dangerously misplaced.

The code looks clean – but if you look closer, vulnerabilities show up

A new report from Veracode, on the security of GenAI code, offers a clear snapshot of how today’s LLMs behave when you ask them to write production-like code. One stat jumps off the page: across all languages and tasks, models chose secure implementations only 55% of the time – meaning 45% of AI-generated code introduced a known security flaw.

But even that average hides sharp contrasts. Java performs the worst by a wide margin, managing only 28.5% secure code, far behind Python (61.7%), JavaScript (57.3%) and C# (55.3%).

And while models handle SQL injection (CWE-89) and weak cryptography (CWE-327) relatively well (80.4% and 85.6% success respectively), their performance collapses on cross-site scripting and log injection, where pass rates fall to 13.5% and 12%. These are not, by any measure, obscure vulnerabilities; they’re among the oldest and most common ways attackers compromise systems.

The explanation lies deep in how these models are trained. As Veracode notes, LLMs aren’t equipped to infer which variables require sanitisation, because that depends on the surrounding application context – and no model today has enough architectural understanding to replicate the data flow reasoning of a static analysis engine.

The models are faster, but security is stagnating

One of the findings that stood out to us is what hasn’t changed. Because while LLMs have become dramatically better at producing syntactically correct, executable code, security performance has remained pretty much flat over the past two years.

Even model size barely moves the needle. Larger models don’t produce meaningfully safer code; in fact, the report finds that size has “only a very small effect” on security, and even that advantage disappears in newer generations.

So newer models may produce tidier outputs – but they’re not any better at avoiding dangerous patterns. Gilbert warned that “there’s no safety net when decisions outpace human reaction time,” and Veracode’s findings suggest that organisations are already operating without one.

Shadow AI is the risk you can’t see (but will pay for)

Code quality is only one piece of the complex code safety puzzle. Research from Gartner, published in November 2025, focuses on another (more human) problem taking root: shadow AI. We’ve written about it recently on the blog; and in this new survey of 302 cybersecurity leaders, 69% reported they suspect or have evidence that employees are using unauthorised GenAI tools.

The consequences of this are looming. Gartner predicts that by 2030, more than 40% of enterprises will suffer a security or compliance incident linked to shadow AI.

Many of those predicted incidents will likely be linked to uncontrolled code, dependencies, decisions and artefacts created outside organisational oversight. Just as shadow IT reshaped governance a decade ago, shadow AI is reshaping it again – but faster.

AI technical debt is a slow-motion crisis

Gartner highlights a second blind spot with long-term consequences: AI technical debt. As organisations increasingly rely on AI to generate code, content and design assets, few are tracking these outputs as maintainable objects. Yet by 2030, Gartner expects half of enterprises to face delayed upgrades or rising maintenance burdens because of unmanaged GenAI artefacts.

Add to that four more blind spots – sovereignty, skills erosion, vendor lock-in and interoperability gaps – and the pattern doesn’t look positive. As they adopt AI, organisations are absorbing hidden liabilities they may not have the capability (or the budget) to unwind.

The gap between speed and scrutiny

The thing that ties Veracode and Gartner’s findings together is a failure of pace. Because AI accelerates everything: delivery, iteration, experimentation. What it doesn’t accelerate is security review, governance, or the careful architectural thinking required to decide whether code should exist at all.

Developers reach for AI because it saves time, and teams integrate it because it boosts output. Boards approve it because competitors are doing the same. But in many organisations, no one is asking the slower questions:

What is this code dependent on?
Who will maintain it?
How will we test it?
What happens when it inevitably needs to change?

If you avoid these questions, you’re setting yourself up for the exact scenario Gilbert was worried about: “By the time we realise something has gone wrong, the damage will already be done.”

Creating safety in an AI-accelerated world

There’s no mystery about what resilience looks like. Gartner urges CIOs to publish clear AI use policies, audit regularly for shadow AI and track GenAI artefacts as first-class assets on IT dashboards. Veracode points to treating AI-generated code like any untrusted third-party contribution: analyse it with SAST, review it, run dependency scans and never assume safety by default.

Equally important is killing the habit of ‘vibe coding’: handing an LLM a loose description and hoping it chooses a secure pattern. Security requirements have to be embedded into prompts, workflows and CI/CD stages – because AI doesn’t default to safe behaviour unless you tell it to.

And despite the automation, human expertise remains critical. LLMs are powerful pattern machines, but they don't understand architecture, context or intent. Developers and security engineers do.

PRE-REGISTER FOR BLACK HAT MEA 2026

Share on

Join newsletter

Join the newsletter to receive the latest updates in your inbox.

Topics

Webinars Cryptography Network Defense Articles Ransomware Podcasts CyberSecurity Applied Security Whitepaper Exploit Development Reverse Engineering Newsletters

Sign up for more like this.

Join the newsletter to receive the latest updates in your inbox.

When AI writes the code, who takes responsibility?

The code looks clean – but if you look closer, vulnerabilities show up

The models are faster, but security is stagnating

Shadow AI is the risk you can’t see (but will pay for)

AI technical debt is a slow-motion crisis

The gap between speed and scrutiny

Creating safety in an AI-accelerated world

Join newsletter

Follow us

Topics

Sign up for more like this.

Related articles

From access to impact: why 2025 was the year OT threats grew teeth

A ransomware speed record: three hours to disaster

From 745 days to 44: the collapse of the patching grace period