AI agents are execution engines, not architects

AI coding agents are powerful for scoped, mechanical work. They should not be making architectural or behavioral decisions.

3 min read

I started using AI coding agents for everything.

Scaffolding, refactors, tests, random fixes—if it looked like code, I pushed it through an agent. It felt like cheating at first. Work that used to take an hour was done in minutes.

Then it started breaking things in ways that were hard to see and expensive to fix.

Not obvious bugs. Subtle ones. Wrong assumptions. Quiet coupling. Behavior that technically worked but didn’t belong in the system.

That’s when the model clicked:

AI agents execute. They do not decide.

For context, I use an internal agent called :contentReference[oaicite:0]0. It’s reliable, fast, and extremely useful. It just isn’t an architect.

What AI agents are actually good at

They perform best when the work is tight, repeatable, and verifiable.

Think in terms of transformations, not decisions.

  • Boilerplate and migrations
    Generating large volumes of similar code without copy-paste errors.

  • Refactor sweeps
    Renaming, restructuring, import cleanup—purely mechanical work.

  • Test scaffolding
    Creating baseline coverage around known function boundaries.

If you can clearly define the input and the expected output, an agent will usually do it faster and cleaner than you.

The operating rule

This is the part most people skip.

  1. Define exact scope (paths, files, language).
  2. Define acceptance criteria (build, lint, at least one behavior check).
  3. Limit the task to mechanical changes.
  4. Review for behavior drift before merging.

If any of these are missing, you’re guessing—and the agent is guessing with you.

Where things go wrong

The mistakes are consistent:

  • Letting the agent make architectural decisions
  • Prompting vague tasks like “fix this module”
  • Shipping generated code without validation

This is how you end up with code that passes checks but slowly degrades the system.

The agent doesn’t know what matters unless you tell it. And if you don’t know what matters, it will happily invent something that looks reasonable.

The correct split

Safe agent work

  • Schema updates with deterministic input/output
  • Migration scripts
  • Documentation for already-understood logic

Human-owned work

  • Defining system boundaries
  • Changing failure behavior
  • Anything with unclear blast radius

If you can’t clearly define “correct,” the agent shouldn’t own it.

Guardrails that actually work

Minimum loop:

npm run lint
npm run build
npm run test || true

Then read the diff.

Specifically look for:

  • behavior that changed without intent
  • assumptions that got rewritten
  • new dependencies or coupling

Most issues show up here, not in the tests.

The shift

The mistake is thinking AI agents replace engineers. They don’t. They compress execution. Once you treat them like a high-speed implementation layer—not a thinking layer—the failure modes become obvious and manageable.