This new Claude Code Review tool uses AI agents to check your pull requests for bugs - here's how
Publish Time: 09 Mar, 2026
Claude Code adds automated AI reviewers to analyze developer code
Anthropic / Elyse Betters Picaro /

Follow : Add us as a preferred source on Google.


Key takeaways

  • Anthropic launches AI agents to review developer pull requests.
  • Internal tests tripled meaningful code review feedback.
  • Automated reviews may catch critical bugs humans miss.

Anthropic today announced a new Code Review beta feature built into Claude Code for Teams and Enterprise plan users. It's a new software tool that uses agents working in teams to analyze completed blocks of new code for bugs and other potentially problematic issues.

What's a pull request?

To understand this new Anthropic offering, you need to understand the concept of a pull request. And that leads me to a story about a man named Linus.

Long ago, Linux creator Linus Torvalds had a problem. He was managing lots of contributions to the open source Linux operating system. All the changes were getting out of control. Source code control systems (a method for managing source code changes) had been around for quite a while before then, but they had a major problem. Those old SCCSs were not meant to manage distributed development by coders all across the world.

Also: I used Claude Code to vibe code a Mac app in 8 hours, but it was more work than magic

So, Linus invented Git. If you're a coder, you know Git. It's the underlying coordinating mechanism for code changes. And if you thought Linus was a coding god just for Linux, the creation of Git and its offspring, particularly GitHub, should put him up there at the top of Mount Olympus. Dude created not just one, but two world-changing technologies.

Today, almost every large project uses GitHub or one of its competitors. GitHub (as differentiated from Git) is the centralized cloud service that holds code repositories managed by Git. A few years back, GitHub was purchased by Microsoft, fostering all sorts of doom-and-gloom conspiracy theories. But Microsoft has proven to be a good steward of this precious resource, and GitHub keeps chugging along, managing the world's code.

All that brings us back to pull requests, known as PRs in coder-speak. A pull request is initiated when a programmer wants to check in some new or changed code to a code repository. Rather than just merging it into the main track, a PR tells repo supervisors that there's something new, ready to be reviewed.

Also: I tried to save $1,200 by vibe coding for free - and quickly regretted it

Quick note: to coders, PR is an acronym for pull request. For marketers, PR means public relations. When you read about tech, you'll see both acronyms, so pay attention to the context to distinguish between the two.

Sometimes, the code is very carefully checked over before being merged into the main codebase. But other times, it just gets rubber-stamped and merged. Code reviews, while necessary, are also tedious and time-consuming.

Of course, the cost of rubber-stamping a PR can be catastrophic. You might ship code that is buggy, loses data, or damages user systems. At best, buggy code is just annoying. At worst, it can cause catastrophic damage.

That's where Anthropic's new Claude Code Review comes in.

Code review at Anthropic

In my article, 7 AI coding techniques I use to ship real, reliable products - fast, my bonus technique was using AI for code review. As a lone developer, I don't use a formalized code review process like the one Anthropic is introducing.

I just tell a new session of the AI to look at my code and let me know what's not right. Sometimes I use the same AI (ie, Claude Code to look at Claude's code), and other times I use a different AI (like when I use OpenAI's Codex to review Claude Code generated code). It's far from a comprehensive review, but almost every time I ask for a review, one AI or the other finds something that needs fixing.

The new Claude Code Review capability is modeled on the process used by Anthropic. The company has essentially productized its own internal methodology. According to Anthropic, customers "Tell us developers are stretched thin, and many PRs get skims rather than deep reads."

Also: How to switch from ChatGPT to Claude: Transferring your memories and settings is easy

This new agentic Code Review AI is able to provide deeper automated review coverage before needing human decisions.

Anthropic says that code output per Anthropic engineer has increased 200% in the past year, intensifying pressure on human reviewers. You think? The company has been using its own AI to write code, which speeds up code production, so the changes and new code blocks are coming faster than ever before.

Anthropic reports that the new Code Review system is run on nearly every pull request internally. When a PR is reviewed, human reviewers often make comments about the issues they see, which the coder needs to go back and fix.

Before running Code Review, Anthropic coders got back "substantive" review comments about 16% of the time. With Code Review, coders are getting back substantive comments 54% of the time. While that seems to mean more work for coders, what it really means is that nearly three times the number of coding oopsies have been caught before they cause damage.

Also: I used Claude Code to vibe code an Apple Watch app in just 12 hours - instead of 2 months

According to Anthropic, the size of the internal PR impacts the level of review findings. Large pull requests with more than 1,000 changed lines show findings 84% of the time. Small pull requests of under 50 lines produce findings 31% of the time. Anthropic engineers "largely agree with what it surfaces: less than 1% of findings are marked incorrect."

Heck, when I code, even if I add just one line of code, there's a chance I'll introduce a bug. Testing and code reviews are essential if you don't want thousands of users coming at you brandishing virtual pitchforks and torches. Don't ask me how I know.

Examples of issues surfaced during testing

I'm always fascinated by what others experience while doing their jobs. Anthropic provided some examples of problems Code Review identified during its early testing.

In one case, a single line change appeared to be routine. It would have normally been quickly approved. But Code Review flagged it as critical. It turns out this tiny little change would have broken authentication for the service. Because Code Review caught it, it was fixed before the move. The original coder said that they wouldn't have caught that error on their own.

Also: I tried a Claude Code rival that's local, open source, and completely free - how it went

Another example occurred when filesystem encryption code was being reorganized in an open source product. According to the report, "Code Review surfaced a pre-existing bug in adjacent code: a type mismatch that was silently wiping the encryption key cache on every sync."

This is what we call a silent killer in coding. It could have resulted in data loss, performance degradation, and security risks. Anthropic described it as "A latent issue in code the PR happened to touch, the kind of thing a human reviewer scanning the changeset wouldn't immediately go looking for."

If that hadn't been caught and fixed, it would have made for a very bad day for someone (or a whole bunch of someones).

How the multi-agent review system works

Code Review runs fairly quickly, turning around fairly complex reviews in about 20 minutes. When a pull request is opened, Code Review kicks off a bunch of agents that analyze code in parallel.

Various agents detect potential bugs, verify findings to filter false positives, and rank issues by severity. The results are consolidated so that all the results from all the agents appear as a single summary comment on the pull request, alongside inline comments for specific problems.

Also: How to install and configure Claude Code, step by step

In a demo, Anthropic showed that the summary comment can also include a fix directive. So if Code Review finds a bug, it can be fed to Claude Code to fix. The company says that reviews scale with complexity: larger pull requests receive deeper analysis and more agents.

Anthropic really seems to like spawning multiple agents. In the past, I've had some fairly serious difficulty wrangling them after they're launched. In fact, the first technique I shared in my 7 coding techniques article was to specifically tell Claude Code to avoid launching agents in parallel.

There are some internal task management features in Claude (the /tasks command, for example), but I'd prefer to see a more comprehensive task management dashboard before I rely on the results of dozens of spawned agents.

Cost model and administrative controls

Reviews are billed based on token usage. Pricing scales with the size and complexity of the pull request being analyzed, but the company says that a code review typically costs between $15 and $25. In some ways, this could get very expensive very quickly.

One of the most popular engineering-related Substacks is The Pragmatic Engineer. In an article, Gergely Orosz says that Anthropic engineers each typically produce about five PRs per day. In practice, typical developers not using AI coding support produce at most one or two a week.

Also: Want local vibe coding? This AI stack might replace Claude Code and Codex - for free

As a quick calculation, let's say a company has a hundred developers, each producing one PR a day, for five days a week. In our fantasy example, software engineers get weekends off. That would lead to 500 PRs a week, or 2,000 per month. At an average of $20 per PR, that amount of Code Review PRs could cost this sample company about $40,000 a month, or $480,000 per year.

That might seem like a lot. But then factor in the cost of a catastrophic bug leaking out to customers, and how much that might cost in real dollars and reputation brand value to fix, and it starts to seem affordable.

It's clear Anthropic has found a new profit center. Even at that expense level, it's probably worth it for companies to actively employ Code Review.

The company does say that there are ways to control spending and usage, including:

  • Monthly organization caps: Define total monthly spend across all reviews.
  • Repository-level control: Enable reviews only on the repositories you choose.
  • Analytics dashboard: Track PRs reviewed, acceptance rate, and total review costs.

Automatic checking

Administrators with Team and Enterprise plans can enable Code Review through Claude Code settings and a GitHub app install. Once activated, reviews automatically run on new pull requests without additional developer configuration. That's part of why usage caps and repository-level control become pretty important for cost management.

What about you?

Are you using AI tools to review your code or pull requests yet? Would you trust an automated multi-agent system to flag bugs and security problems before humans see the code? Do you think paying $15 to $25 per pull request for automated review makes sense, or would the costs add up too quickly?

Also: Claude Code made an astonishing $1B in 6 months - and my own AI-coded iPhone app shows why

If you're a developer, have AI code reviewers already caught issues you might have missed? Like I said, I'm just using basic prompting to generate code reviews, but that has certainly helped me produce better code.

What about you? Let us know in the comments below.


You can follow my day-to-day project updates on social media. Be sure to subscribe to my weekly update newsletter, and follow me on Twitter/X at @DavidGewirtz, on Facebook at Facebook.com/DavidGewirtz, on Instagram at Instagram.com/DavidGewirtz, on Bluesky at @DavidGewirtz.com, and on YouTube at YouTube.com/DavidGewirtzTV.

I’d like Alerts: