- Published on
Codex in practice: what it is good at, what it is not
Codex is easy to overestimate.
If you treat it like a pair programmer with infinite patience, you get good velocity on repeatable tasks. If you treat it like a principal engineer in full context, you get nonsense and flaky defaults.
I use Codex as a force multiplier for three kinds of tasks:
- Boilerplate creation
- Refactor sweeps across many files
- Test scaffolding and edge-case hunting
That sounds small, but this is where it shines.
The best uses
1) Structural repetition
Codex is excellent when the task is repetitive but large.
Examples:
- adding endpoints across services,
- converting old CLI scripts into typed wrappers,
- and generating test cases from existing schema definitions.
2) Exploration in a safe sandbox
Let it prototype quickly, then narrow down before merge.
A useful loop:
- ask for a minimal diff,
- ask for a risk list,
- apply only the safest parts,
- run full checks.
3) Translation layer
Codex does well translating intent across languages: "here’s this Python function" to "give me equivalent TS utility with types".
You still review, because translation is where small behavior changes hide.
Where it fails most often
- Ambiguous success criteria: if you didn’t define acceptance, it optimizes for plausible output, not correctness.
- Stateful domains: migrations, rollout strategies, or security-sensitive flows often need human sequencing.
- Long-running assumptions: it can write very clean code for systems that require production-grade judgment calls.
How I keep it useful
I pair each Codex run with hard constraints:
- exact directory scope,
- explicit file ownership,
- and a strict validation checklist.
Then it’s no longer “AI writing code blindly.”
One practical rule
If the change can be measured without manual inference, Codex is great. If the change affects a system behavior you cannot easily assert, I keep it as a suggestion source and manually implement.
That keeps the speed and keeps quality from drifting.
When it works, it does what people underestimate: it removes drudgery so the engineer can focus on decisions.