When privacy creates blind spots: the exploitation of privacy-first tech
New research reveals how privacy-first technologies are creating fraud blind spots, as AI-driven attacks scale faster than detection capabilities.
Read More
Last month, we wrote about the rise of AI code – and why it’s critical that human developers remain responsible for outputs, even when they leverage AI tools. Now, a new report from CodeRabbit (which, by the way, is an AI-powered code review tool – worth knowing the context behind their research) lays the risks out in numbers.
The researchers analysed 470 real GitHub open-source pull requests (PRs), comparing 320 labelled as AI co-authored with 150 treated as human-only, and normalised findings as issue rates per 100 PRs. The average AI PR produced 10.83 findings vs 6.45 for human PRs – about 1.7× more issues.
This shows that any organisation using AI to write code needs to make sure they have the capacity to review the output.
Stepping away from the averages for a moment, the report shows AI PRs have a much heavier tail. At the 90th percentile* AI PRs hit 26 issues vs 12.3 for humans; by the 95th percentile, it’s 39.2 vs 22.65.
*(Here, ‘90th percentile’ means the number of issues found in the worst-reviewed 10% of pull requests).
In practice, that means more PRs that stall pipelines, burn reviewer attention, and increase the chance something serious slips through simply because everyone’s skimming.
More findings would be manageable if they were mostly cosmetic. But unfortunately, they aren’t.
When normalised per 100 PRs, critical issues rise from 240 (human) to 341 (AI) – that’s 1.4× higher. Major issues jump from 257 to 447 – 1.7× higher. This suggests AI isn’t just adding noise; it’s increasing the count of defects with real production blast radius.
Top-level category comparisons put logic and correctness at the centre of the gap: 570 findings per 100 AI PRs vs 326 for humans (1.75×).
If we dig deeper, the pattern becomes more actionable. Algorithm and business-logic mistakes show up 194.28 times per 100 AI PRs vs 86 for humans (2.25× higher). Error and exception-handling gaps are nearly doubled (70.37 vs 36; 1.97×). Concurrency control and null-pointer risks also rise sharply.
These are exactly the defects that tend to evade superficial review: the code ‘looks right’, compiles, and even passes happy-path tests – until it hits the edge case that was never modelled.
Security findings are also higher: 94 vs 60 per 100 PRs (1.57×). The standout is improper password handling (hardcoded credentials, unsafe hashing, ad-hoc auth logic): 65.99 vs 35 (1.88×).
And the way the report frames this is worth taking seriously: these aren’t exotic AI-only flaws. They’re foundational mistakes that are appearing more often.
Not everything gets worse. Humans showed more spelling errors and slightly more testability issues in this dataset, which we think is a helpful reminder that ‘human-only’ isn’t a quality guarantee – it’s just a different risk profile.
If you’re adopting AI coding tools, the report suggests your review discipline needs to evolve: focus hard on domain logic, error paths, dependency ordering, concurrency, and credential-handling defaults – and assume you’ll see more spiky PRs that require deeper scrutiny.
Join the newsletter to receive the latest updates in your inbox.
New research reveals how privacy-first technologies are creating fraud blind spots, as AI-driven attacks scale faster than detection capabilities.
Read More
Cybersecurity founders share how Black Hat MEA helps them test ideas, prove product value, and grow their business.
Read More
Cybersecurity is now a leading barrier to financial innovation. New research explains why fraud, legacy systems and risk are slowing payments progress.
Read More