Updated: May 2, 2024

The AI/Human Code Review Process

A lot of developers dislike doing code reviews. It’s often seen as something that has to be done or even something that slows down shipping. In my opinion, with the current state of LLMs, code reviews have changed a lot and become one of the most critical engineering skills you can have.

Since AI tools can generate thousands of lines of code in a matter of seconds, writing code is no longer the bottleneck. The hardest part is making sure this massive amount of code doesn’t turn your codebase into an unmaintainable mess.

But since we use AI to write code, it also makes sense to use AI to review it, at least to some extent - otherwise, human reviewers will simply become the new bottleneck. However, relying on AI for 100% of the lifecycle is a trap. AI does not have a deep understanding of your specific business goals, and leaving it entirely unchecked will result in a messy, chaotic architecture. Also, understanding the code you ship is how you build deep knowledge about the product you build. If you skip this, you will eventually lose the ability to maintain and scale it.

The solution I like to use is the two-phase code review: First, AI handles the code review (static analysis, performance, coherence), while you reserve your energy for the part that won’t be caught by LLMs - the business logic, UX, and architectural integrity. Your focus remains strictly on what AI cannot understand.

Here is a short guide to implement this workflow:

Step 0: Use Atomic Commits and Traditional PRs (As Usual)

LLMs are strictly bound by their context windows. If you feed an AI agent too much information at once, it gets overwhelmed - so it starts to miss obvious errors and suggests unnecessary, confusing changes. To prevent this, work on one feature at a time to ensure the AI does not disrupt the project's overall architecture. When AI tries to handle too much, it mixes logic - for instance, imports components or functions from Module A directly into Module B.

// ❌ WRONG: Cross-Module "Convenience" Import
// Location: /src/modules/notifications/services/SMSProvider.ts

/**
 * The AI needed a way to strip special characters from a string.
 * Instead of suggesting a 'regex-utils.ts' or creating a local helper,
 * it saw that the Auth module had a private function for cleaning 
 * legacy database keys and just used it.
 */
import { _sanitizeSessionKey } from '../../auth/internals/key-formatters';

export const sendSMS = (phoneNumber: string, message: string) => {
  // AI uses a session-key formatter to "clean" a phone number.
  // So if the Auth module changes its key format, 
  // your SMS notifications will suddenly break.
  const cleanNumber = _sanitizeSessionKey(phoneNumber);
  
  console.log(`Sending SMS to ${cleanNumber}: ${message}`);
  // ... carrier logic
};

To manage your codebase, you should stick to traditional atomic commits and self-contained Pull Requests.

AI loves to code everything at once. If you try to do everything at once, it will first generate the entire database schema, then create the entire API layer, and finally the UI. This is impossible for a human to review because you can’t test the actual behavior until the end.

That’s why you should review a tiny, fully functional part of the application. This way, code review actually makes sense because you can run the code and test its real-world behavior. It may seem slower at first because you won’t be able to run the agent for 20 minutes to generate most of the app, but it will actually be much faster in the later stages of the project.

Step 1: Don’t Burn Tokens on the Obvious Findings

Before any AI even looks at your code, you must filter out predictable errors.

AI models have a tendency to waste tokens and focus on trivial things if you allow them to. If your AI agent is reviewing a pull request filled with unused imports and files (LLMs often leave out the unused code instead of removing it), linting errors, and poorly formatted files, it will get distracted. It will use up its context window instead of finding actual logical flaws. Given that AI API calls are expensive (and will continue to be), this is burning money and time.

The fix is to rely on traditional tools first. Set up strict TypeScript rules, ESLint, and use tools for pre-commit formatting with Prettier. Make this a strict rule: if the code doesn’t pass a local build, it has no business being analyzed by AI.

// package.json
// Force a local cleanup before the AI review even triggers:
{
  "scripts": {
    "lint": "eslint . --max-warnings 0",
    "type-check": "tsc --noEmit",
    "format": "prettier --write ."
  },
  "husky": {
    "hooks": {
      "pre-commit": "lint-staged"
    }
  },
  "lint-staged": {
    "*.{ts,tsx}": [
      "npm run format",
      "npm run lint",
      "npm run type-check"
    ]
  }
}

Step 2: The AI Review Phase (Coherence, Security, and Performance)

Now that your code passes local builds and formatting checks, it’s time for the AI part of the code review. This is the part of the workflow that should be automated as much as code generation.

To get real value, I like to split the code review process into a few parallel agents:

The Security Agent: Made for finding vulnerabilities. Ask it to check if there are any .env variables leaking to the frontend, spotting missing authorization checks in your server actions, and flagging data injection risks in your endpoints.
The Performance Agent: Ask the AI to look for any bottlenecks. Have it search for N+1 database queries, missing pagination on large data sets, and unnecessary multiple asynchronous fetches running sequentially.
The Coherence Agent: Use this agent to check if new code follows your architectural rules. Its job is to ensure that all generated code stays coherent. For example: all APIs must use API specs, endpoints must be structured similarly, and UIs must use your reusable, atomic components instead of raw HTML elements.
The NPM Agent: Whenever the package.json file changes, this agent will check the new content and ask these questions: Do we really need a new dependency for this, or can we write it natively? Is the package version up to date and well-maintained? (I still recommend doing a manual check on new packages yourself, especially to verify that the license actually allows you to use it legally in your project. Never leave legal compliance to a bot.)
The Test Agent: This agent evaluates the actual quality of your tests. It flags useless tests that over-mock internal logic just to get coverage, pushing you instead toward tests that verify real user behavior.

I like to create a single automated skill (for example, code-review-agents.md) that triggers all of these checks at once. It also instructs the AI to rank the priority of each finding and separate the “must-fix” issues from the “nice-to-have” ones.

Step 3: The Human Part (Business Logic & Architecture)

Current LLMs rarely make simple syntax errors anymore or security injections, but a lot of people still think that’s the case. Instead, they make subtle mistakes. Because it is impossible to specify every single detail in your specs, even if you use PRD or SDD, the AI will inevitably fill in the blanks by making “silent decisions” on your behalf.

That means that even if the code compiles perfectly, these invented business rules can quietly ruin your product. This is where you should do your part. Your focus must shift to what the AI cannot understand:

This is where you do your part. It involves:

Checking the architecture: AI prioritizes "making it work" over "making it right." You need to ensure the logic actually lives where it belongs. Check that the code follows the system design, rather than just solving the immediate ticket in the quickest way possible.
Searching for silent AI decisions: Did it invent a specific rule or flow that doesn’t fit your actual business goals?
Clicking through the UI: Making UX decisions is still a human job. AI has no physical eyes, even with access to the browser. You still are the last bastion of product quality.

Conclusion

So, although AI can write code incredibly fast, software engineering skills and doing reviews still matter. If you let AI work without the proper guidance and good tests, you will quickly end up with a messy, broken project. That’s why it’s important that you act as the lead designer who sets the rules, leaving the heavy typing to the AI.

Here is a great video on a similiar subject: