The bottleneck just moved. AI adoption is a reading problem, not a writing problem.
PR review time is up 91% in high-adoption engineering teams. We solved the writing problem and created a reading problem. Here's what to do about it.
For the last two years, every conversation about AI in engineering has been about output. How fast it can write code. How many tickets it can close. How much of a pull request a coding agent can draft before a human ever sees it.
But the teams furthest along in adoption are quietly running into a different problem, and the data is finally catching up to what reviewers have been feeling in their bones.
The numbers tell a different story
Faros AI's 2025 telemetry analysis of more than 10,000 developers across 1,255 teams put a real number on the productivity paradox engineering leaders have been wrestling with: developers using AI complete 21% more tasks and merge 98% more pull requests, but PR review time has climbed 91% in the same window. The writing got faster. The reading didn't. (Source: Faros AI)
Increase in PR review time for AI-assisted teams, even as task throughput jumped 21% and merged PRs jumped 98%.
And it's getting worse, not better. In Faros's follow-up 2026 dataset across 22,000 developers, median time in PR review has now climbed 441%, with 31% more production incidents tied to those changes. Whatever the original 91% number was, it was a leading indicator.
Median PR review time across 22,000 developers in the 2026 follow-up dataset, with a 31% rise in production incidents tied to those changes.
The 2025 DORA Report from Google Cloud, drawn from a global survey of nearly 5,000 technology professionals, tells a complementary story. AI adoption is now near-universal, with 90% of respondents using AI at work and over 80% reporting productivity gains. But DORA's researchers also found that higher AI adoption correlates with increased software delivery instability. The systems shipping more code are also breaking more often. (Google Cloud)
AI is a mirror and a multiplier. It magnifies whatever review and quality systems you already had. If those systems were strong, AI makes you faster. If they were already strained, AI doesn't break them. It just makes the cracks impossible to ignore.
Why this shouldn't surprise anyone
Reading and evaluating code has always been the hard part. Writing a reasonable-looking diff is a mechanical exercise. Understanding whether that diff is correct, safe, aligned with the direction of the system, and worth the maintenance burden it introduces is judgment work, and judgment doesn't accelerate on the same curve that generation does.
The data backs this up in detail. CodeRabbit's December 2025 study of 470 GitHub pull requests found that AI co-authored PRs produced about 75% more logic and correctness findings overall, with algorithm and business logic errors appearing more than twice as often in AI-generated changes. Error and exception handling gaps nearly doubled. Readability issues showed up more than three times as often. Security findings rose roughly 1.5x across the board. (Help Net Security)
A 2025 study cited by LogRocket found senior engineers now spend an average of 4.3 minutes reviewing AI-generated suggestions, compared to 1.2 minutes for human-written code. That's nearly 4x the cognitive load per change. (LogRocket)
None of this means AI-generated code is "bad." It means AI-generated code is different. It tends to be longer, more elaborate, more confidently written, and structured in patterns that a human reviewer didn't choose. That combination of polish and unfamiliarity is exactly what slows down review. The signal-to-noise ratio shifts. You can't skim it the way you'd skim a colleague's diff because you don't have a mental model of how the author thinks.
There's also a trust dimension worth naming. The DORA 2025 data found that even with adoption near universal, 30% of developers still report little or no trust in AI-generated code. A separate Qodo survey of senior engineers found that while 68% see quality improvements from AI, only 26% would ship AI-generated code without a careful review. The most experienced people on your team are the most likely to read every line. That's not paranoia. That's pattern recognition from thousands of incidents that started with code that looked fine.
What this means for leaders
A few things, and most of them are uncomfortable.
Throughput metrics are lying to you. If you're tracking commits, PRs opened, or lines of code, you're measuring the easy half of the loop. The DORA researchers call this the "verification tax", where time saved writing is often re-spent auditing. A team can look 2x faster on the dashboard while net cycle time from commit to production gets worse. If your operating reviews don't include review time, queue depth, and rework rate, you don't actually know whether AI is helping you ship.
Your senior engineers are becoming the bottleneck. Not because they're slow, but because review is now the scarce resource. Multiple 2025 industry reports estimate senior engineers are spending 40-60% of their time on manual review work. That's the most expensive talent in your building, doing the work AI made cheaper to create but more expensive to validate. The economics are upside down.
Your junior engineers are losing the reps that used to build judgment. The work they used to learn on, writing the boring CRUD endpoint, the validation function, the error handler, is now being done by tools they haven't yet learned to evaluate. Five years from now, the people you need to grow into senior reviewers won't have the same scar tissue the current seniors have. This is a slow-burning organizational risk that won't show up in any quarterly metric.
Quality issues are showing up downstream, not upstream. DORA, Faros, and CodeRabbit all converge on the same finding from different angles: instability is rising even as throughput rises. The bug isn't catching the engineer's eye in the diff. It's catching the customer's attention in production.
The right question isn't how much more AI can write. It's how much more a human can meaningfully review, and what the review system needs to look like when half the code wasn't written by a human.
Where to go from here
The teams handling this well aren't the ones slowing down AI use. They're the ones redesigning the review loop. A few patterns are emerging across the 2025-2026 research and the practitioners actually shipping at velocity.
Smaller diffs, by design. The single highest-leverage intervention is making PRs smaller. The Graphite team's 2025 analysis is blunt about why: smaller PRs give both human reviewers and AI reviewers dramatically better results. A 200-line PR with a clear scope is reviewable. A 2,000-line PR with twelve concerns interleaved isn't, no matter how good your tooling is. Stacked PRs, feature flags, and atomic commits aren't trendy practices anymore. They're load-bearing infrastructure for AI-era development.
Pair coding agents with reviewing agents. Microsoft's 2025 internal data on their AI code review assistant, now running on over 90% of the company's PRs and processing more than 600,000 pull requests per month, showed 10-20% median PR completion time improvements on the 5,000 repositories that adopted it. That's not a magic bullet. It's an acknowledgment that if AI is generating the code, AI also needs to do a first pass on the review so humans can focus on judgment calls instead of style nits. (Microsoft Engineering Blog)
Make reviewability a first-class requirement. This is a culture change, not a tool. Code that's hard to review is now a liability, not a quirk. PR descriptions, commit hygiene, test coverage, and clear scope have to graduate from "nice to have" to "non-negotiable." The teams treating reviewability as an engineering discipline are the ones whose senior engineers aren't drowning.
Measure what actually matters. End-to-end cycle time from commit to production. Review queue depth. Rework rate after merge. Defect escape rate. Senior engineer time allocation. These are the metrics that tell you whether AI is helping your organization or just helping the dashboard.
Treat AI adoption as organizational change, not tool rollout. This is DORA's core 2025 finding, and it applies far beyond engineering. The seven capabilities they identified, clear AI stance, healthy data ecosystems, AI-accessible internal data, strong version control practices, working in small batches, user-centric focus, and quality internal platforms, are systems-level investments. Tools alone don't unlock the value. The system around the tool does.
The honest version
Here's the part most leadership conversations dance around: a lot of the productivity gains being claimed in 2025 and 2026 are real at the individual level and largely invisible at the organizational level. Developers feel faster. Dashboards show more PRs. Production gets more unstable. Senior engineers get more tired. And the people responsible for connecting all of those signals, engineering leaders, CTOs, and the executives they report to, often only see the first two.
The teams that win the next eighteen months won't be the ones writing the most AI-generated code. They'll be the ones who figured out, earlier than their competitors, that the bottleneck moved. They invested in review infrastructure, redesigned their workflows around small batches, and held themselves accountable to end-to-end metrics that capture what actually ships safely to customers.
That's a cultural shift as much as a tooling one. And it starts with being honest about where the bottleneck actually is.