Engineering with AI agents · field guide

Agentic Loop
quality loops for shipping with AI agents

The goal, plain and simple: ship quality products, by running quality loops, and delegating as much as possible to agentic loops.

What is an agentic quality loop?

A quality loop is a cycle where one model builds, a model from a different family reviews it adversarially, a runtime gate verifies it against a real environment, and the human only steps in at the irreversible gates. The core idea: stop prompting every step by hand and design the loop that prompts your agents. Three things lift quality above an ordinary review: a reviewer of a different lineage (it doesn't share your blind spots), real runtime verification (diff-correct ≠ works), and structured findings (data, not prose).

1 · The why — the bottleneck is the human orchestrating

A manual adversarial loop already produces solid code, but orchestration costs human time in the seams: build, write the review prompt, confront, decide GO, and re-invoke between steps. You don't lose quality; you lose your time.

The painful evidence (anonymized): a change survived ~20 rounds of adversarial review and still failed in production, because no round ever executed the flow against the real API/environment (a provider permission wasn't approved). The review looks at the diff, not the runtime. Not a model failure — a process gap.

2 · The method

build → verify locally → adversarial review (different model) → confront/fix → re-review → runtime smoke vs real env → GO/NO-GO → integrate → deploy → next phase

3 · State of the art

How to work

Design the loop, not the prompt

  • Delegate the "after". Run the server, verify, commit, push, PR, pull comments and fix, re-review, merge, next — all delegable. That's where the value is.
  • Don't look at the code too early. Let another agent review it before you do; come in last.
  • Loops that create loops; shape = the work. Don't hard-code a persona zoo; let the problem dictate the structure.
  • Isolate (worktrees) so concurrent loops don't collide.
  • Linear goal vs dynamic workflow — pick per task.
  • Confront, don't obey — verify every finding against real code.
What to aim at

Think bigger

  • Experimentation is cheap now. As the cloud made scaling cheap, AI makes writing code cheap → bigger bets.
  • You can go horizontal: cover the whole range "functional but simple" + extensibility so users go deep.
  • Stop building glue. Fixing seams one after another is living in the margins; reinvent the whole piece when it makes sense.
  • Don't just automate the old work — enable new work. Push until you hit the wall; it's farther than you think.
Synthesis: quality loops are the production system; ambition is where you aim them. One without the other falls short.

4 · Capabilities people under-use

OpenAI Codex (CLI)

The right engine

  • codex exec is the automation primitive (not headless codex review, not first-class yet).
  • --output-schema: findings as structured JSON (severity/file:line/status/fix). The key upgrade.
  • -s read-only · -a never · codex exec - (prompt via stdin).
  • "Skills define the method, automations define the schedule."
  • Subagents: agents.max_depth=1 by default (raising it = costly fan-out).
Claude Code / Agent SDK

Commonly under-used

  • Headless claude -p + --json-schema, --bare (reproducible CI), --resume (stateful loops), --permission-mode dontAsk.
  • Native orchestration: multi-agent workflows (pipeline/parallel, schema, loop-until-dry, adversarial verify), subagents, background monitors, scheduled wake-ups.
  • Hooks (pre/post-tool) as deterministic gates; cloud multi-agent review.
Insight: much of what people coordinate "by hand" with threads already exists natively in these tools. Don't invent infrastructure — use what's there.

5 · The autonomy ladder

Guiding principle: autonomy scales with reversibility, not with model quality. Self-drive where a mistake degrades safe; human-gate where it's irreversible.

L0
The agent builds + reviews-with-another-model + decides GO per phase. The human: mission + irreversible gate + supervision.
L1
The skill runs phase→phase without human re-invocation (notifies / interruptible) + runtime gate. The agent decides GO.
L2
Dynamic multi-lens review + auto-merge to staging for the safe class (additive / behind a flag / reversible).
L3
Self-served missions (pull from a queue/backlog on a schedule). The human = exception handler.
NEVER
Production / money / destructive data / migrations: human-gated by policy, not by incapacity.

6 · The skill

Installable as a Claude Code skill — /agentic-loop. It discovers the project's verify command, diff base, and runtime target, then runs the loop. See QUALITY_LOOPS.md and SKILL.md.

PieceWhat it does
EngineReviewer of a different family via codex exec (or claude -p + --json-schema), adversarial prompt via stdin, read-only, background.
FindingsStructured JSON via --output-schema → re-review with memory.
SynthesisTriaged: real / false-positives and why / deferred with backstop — not the raw dump.
Runtime gateSmoke against a real environment. NO-GO even if the diff is perfect.
ShapeDynamic per problem — no fixed persona panel.
DecisionThe agent decides GO; the human holds the irreversible gate. Once predictable → schedule it.

7 · 12 principles for quality loops

  1. Delegate the "after". The value is automating what you do after prompting, not the prompt.
  2. Don't look at the code too early. Let another agent review it before you do.
  3. Two model families > one. The reviewer must be a different lineage than the builder.
  4. Diff-correct ≠ works. Always a real runtime gate.
  5. Findings as data, not prose. --output-schema / --json-schema.
  6. Dynamic shape, not a persona zoo. Let the problem dictate structure.
  7. Isolate so loops don't collide.
  8. Confront, don't obey. Verify every finding against real code.
  9. Autonomy = reversibility. Human gate only on the irreversible.
  10. Treat the limit as a challenge. Subscription → loop; API → measure first.
  11. Skill = method; automation = schedule. In that order.
  12. Aim at something that seems impossible. The wall is farther than you think.

8 · Your next loop — quick guide

When you finish prompting, instead of reading the code, ask:

The habit that moves the needle most: when an agent finishes, don't open the editor — ask it "can you do the next step yourself?" and watch how far it gets. It'll surprise you.