Codex in practice: what it is good at, what it is not

Codex is easy to overestimate.

If you treat it like a pair programmer with infinite patience, you get good velocity on repeatable tasks. If you treat it like a principal engineer in full context, you get nonsense and flaky defaults.

I use Codex as a force multiplier for three kinds of tasks:

Boilerplate creation
Refactor sweeps across many files
Test scaffolding and edge-case hunting

That sounds small, but this is where it shines.

The best uses

1) Structural repetition

Codex is excellent when the task is repetitive but large.

Examples:

adding endpoints across services,
converting old CLI scripts into typed wrappers,
and generating test cases from existing schema definitions.

2) Exploration in a safe sandbox

Let it prototype quickly, then narrow down before merge.

A useful loop:

ask for a minimal diff,
ask for a risk list,
apply only the safest parts,
run full checks.

3) Translation layer

Codex does well translating intent across languages: "here’s this Python function" to "give me equivalent TS utility with types".

You still review, because translation is where small behavior changes hide.

Where it fails most often

Ambiguous success criteria: if you didn’t define acceptance, it optimizes for plausible output, not correctness.
Stateful domains: migrations, rollout strategies, or security-sensitive flows often need human sequencing.
Long-running assumptions: it can write very clean code for systems that require production-grade judgment calls.

How I keep it useful

I pair each Codex run with hard constraints:

exact directory scope,
explicit file ownership,
and a strict validation checklist.

Then it’s no longer “AI writing code blindly.”

One practical rule

If the change can be measured without manual inference, Codex is great. If the change affects a system behavior you cannot easily assert, I keep it as a suggestion source and manually implement.

That keeps the speed and keeps quality from drifting.

When it works, it does what people underestimate: it removes drudgery so the engineer can focus on decisions.