Published on

Codex in practice: what it is good at, what it is not

avatar for Jigar PatelJigar Patel
2 min read

Codex is easy to overestimate.

If you treat it like a pair programmer with infinite patience, you get good velocity on repeatable tasks. If you treat it like a principal engineer in full context, you get nonsense and flaky defaults.

I use Codex as a force multiplier for three kinds of tasks:

  • Boilerplate creation
  • Refactor sweeps across many files
  • Test scaffolding and edge-case hunting

That sounds small, but this is where it shines.

The best uses

1) Structural repetition

Codex is excellent when the task is repetitive but large.

Examples:

  • adding endpoints across services,
  • converting old CLI scripts into typed wrappers,
  • and generating test cases from existing schema definitions.

2) Exploration in a safe sandbox

Let it prototype quickly, then narrow down before merge.

A useful loop:

  1. ask for a minimal diff,
  2. ask for a risk list,
  3. apply only the safest parts,
  4. run full checks.

3) Translation layer

Codex does well translating intent across languages: "here’s this Python function" to "give me equivalent TS utility with types".

You still review, because translation is where small behavior changes hide.

Where it fails most often

  • Ambiguous success criteria: if you didn’t define acceptance, it optimizes for plausible output, not correctness.
  • Stateful domains: migrations, rollout strategies, or security-sensitive flows often need human sequencing.
  • Long-running assumptions: it can write very clean code for systems that require production-grade judgment calls.

How I keep it useful

I pair each Codex run with hard constraints:

  • exact directory scope,
  • explicit file ownership,
  • and a strict validation checklist.

Then it’s no longer “AI writing code blindly.”

One practical rule

If the change can be measured without manual inference, Codex is great. If the change affects a system behavior you cannot easily assert, I keep it as a suggestion source and manually implement.

That keeps the speed and keeps quality from drifting.

When it works, it does what people underestimate: it removes drudgery so the engineer can focus on decisions.