AI agents are execution engines, not architects
AI coding agents are powerful for scoped, mechanical work. They should not be making architectural or behavioral decisions.
I started using AI coding agents for everything.
Scaffolding, refactors, tests, random fixes—if it looked like code, I pushed it through an agent. It felt like cheating at first. Work that used to take an hour was done in minutes.
Then it started breaking things in ways that were hard to see and expensive to fix.
Not obvious bugs. Subtle ones. Wrong assumptions. Quiet coupling. Behavior that technically worked but didn’t belong in the system.
That’s when the model clicked:
AI agents execute. They do not decide.
For context, I use an internal agent called :contentReference[oaicite:0]0. It’s reliable, fast, and extremely useful. It just isn’t an architect.
What AI agents are actually good at
They perform best when the work is tight, repeatable, and verifiable.
Think in terms of transformations, not decisions.
Boilerplate and migrations
Generating large volumes of similar code without copy-paste errors.Refactor sweeps
Renaming, restructuring, import cleanup—purely mechanical work.Test scaffolding
Creating baseline coverage around known function boundaries.
If you can clearly define the input and the expected output, an agent will usually do it faster and cleaner than you.
The operating rule
This is the part most people skip.
- Define exact scope (paths, files, language).
- Define acceptance criteria (build, lint, at least one behavior check).
- Limit the task to mechanical changes.
- Review for behavior drift before merging.
If any of these are missing, you’re guessing—and the agent is guessing with you.
Where things go wrong
The mistakes are consistent:
- Letting the agent make architectural decisions
- Prompting vague tasks like “fix this module”
- Shipping generated code without validation
This is how you end up with code that passes checks but slowly degrades the system.
The agent doesn’t know what matters unless you tell it. And if you don’t know what matters, it will happily invent something that looks reasonable.
The correct split
Safe agent work
- Schema updates with deterministic input/output
- Migration scripts
- Documentation for already-understood logic
Human-owned work
- Defining system boundaries
- Changing failure behavior
- Anything with unclear blast radius
If you can’t clearly define “correct,” the agent shouldn’t own it.
Guardrails that actually work
Minimum loop:
npm run lint
npm run build
npm run test || true
Then read the diff.
Specifically look for:
- behavior that changed without intent
- assumptions that got rewritten
- new dependencies or coupling
Most issues show up here, not in the tests.
The shift
The mistake is thinking AI agents replace engineers. They don’t. They compress execution. Once you treat them like a high-speed implementation layer—not a thinking layer—the failure modes become obvious and manageable.