compound

Overview

compound is the Claude Code plugin EveryInc/compound-engineering-plugin, maintained by Every Inc. (the media and software company behind the Cora email product). The plugin was developed in public alongside Kieran Klaassen’s “compound engineering” writing, and it packages that methodology as a large bundle of slash commands, skills, and specialist agents. The benchmark installs it via claude plugin marketplace add EveryInc/compound-engineering-plugin and pins release 2.65.0 (MIT license). The surface area is large — roughly fifty agents and forty skills — but the user-facing primitives cluster into four phases: /ce:plan, /ce:work, /ce:review, and /ce:compound, plus a handful of research, design, and git utilities.

The methodology behind the name is the thesis that each engineering task should leave the system easier to work on next time, by codifying plans, reviews, and learnings back into the repository. In practice that shows up as markdown artifacts in docs/plans/, reviewer persona subagents, and a /ce:compound step that writes learnings to a wiki-style store.

Entry point: `/lfg`

The benchmark drives the plugin through its beta autonomous entry point, /compound-engineering:lfg (“let’s f***ing go”). The skill file is explicit about ordering: it is a fixed six-step pipeline with stop-gates between phases.

Optionally delegate to a ralph-loop skill if present (not installed in the benchmark).
/ce:plan $ARGUMENTS — gated on producing a plan file under docs/plans/.
/ce:work — gated on observing code changes beyond the plan.
/ce:review mode:autofix plan:<plan-path> — passes the plan path so review can check requirement coverage.
/compound-engineering:todo-resolve.
/compound-engineering:test-browser, then emit <promise>DONE</promise>.

So lfg is a plan-execute-review loop with a knowledge-compaction tail, not a freeform agent. Phases are sequenced with “GATE: STOP” language rather than parallel fan-out.

What the transcripts actually show

The feature-task transcript set contains five top-level JSONL sessions and thirteen subagent logs, consistent with the plugin’s reviewer-persona pattern firing during /ce:review. Tool usage skews toward analysis over editing: 57 Read, 51 Bash, 35 Grep, 13 Agent (Task) calls, and only 12 write-side calls (Edit+Write). One session begins with the raw lfg command as advertised; another begins with the harness re-prompting the model to run ce:plan/ce:work/ce:review explicitly, suggesting the orchestrator did not always drive the full pipeline cleanly and required a manual nudge.

Bugfix and refactor runs are much leaner — single top-level sessions, one subagent each, and a single Agent dispatch — because the per-task prompt is just /compound-engineering:lfg $TASK and neither task benefits from large plan/review fan-out. The refactor run has the expected edit-heavy profile (24 Edit, 8 Write); the bugfix run stays Bash-and-Read dominant (35 Bash, 16 Read, 7 Edit).

Benchmark outcome

Across the three tasks compound lands at rank 7 of 9 with a mean z-score of -0.167. It is consistently near the cohort average rather than catastrophically weak:

Feature (z = -0.082, rank 6/9): produces a 5-file, 892-insertion diff with 92 of 92 tests passing and zero TypeScript errors. ESLint reports 10 errors and 6 warnings — the plugin did not close the loop on style gates despite running the review phase. Tier-2 group with omc and claudekit.
Bugfix (z = -0.487, rank 7/9): the smallest diff of the cohort at 2 files and 33 insertions. The change compiles and tests pass, but judges rated it a minimal patch relative to peers that wrote more substantial fixes.
Refactor (z = +0.070, rank 4/9): the only task where compound is above average, with a 12-file, 157-insertion / 83-deletion diff — the kind of multi-file edit the plugin’s reviewer agents are shaped to support.

Self-preference effects on refactor are strongly positive (+12.1 judge points), meaning the opus judge liked compound’s refactor output more than the cross-judge consensus did; this pattern is common across the cohort and does not distinguish compound.

Reading of the result

compound is not a lightweight prompt — it ships a full plan-then-work-then-review pipeline with reviewer personas, gates, and a learning-capture step. In the benchmark’s single-shot /lfg invocation the pipeline does run, but its advantages (reviewer fan-out, plan artifacts) are most visible on the feature task, where they translate into a large, test-passing diff but not higher judge scores than leaner tools like pure or bmad. On the short tasks the overhead is not repaid, and the tool drifts to the middle of the pack. Residual ESLint errors on the feature task suggest the review phase verified logic but not style, which is consistent with an autofix review mode scoped to correctness personas rather than lint gates.

The plugin is more interesting as a methodology carrier — plans in docs/plans/, learnings in a wiki — than as a single-shot speed runner, and the benchmark, which rewards diff quality on one attempt with no memory carryover, is a harsh venue for that design.

Observed in trial timelines

Subagent dispatch confirms the orchestration-amortisation hypothesis: feature mean 5.2 (range 1–13, with t1 dispatching 13 reviewer/researcher agents) collapses to mean 1.0 on both bugfix and refactor. The full reviewer-persona pipeline only fires when the brief is large enough to warrant it; on short tasks compound shrinks to a single Explore.

Detail: see the per-trial timeline files linked below.

Trial timelines

Per-trial event timelines auto-extracted from session-logs/*.jsonl — skill activations, plugin/skill file reads, subagents dispatched, code mutations, Bash usage:

Trial timelines

Per-trial session execution extracted from each trial's session-logs/*.jsonl. Each card shows the subagents dispatched, skill activations, Bash command mix, and the final diff. Switch task tabs to compare behaviour across feature, bugfix, and refactor trials.

Feature4 trials Bugfix2 trials Refactor2 trials

t1 05:07 → 05:34 UTC · 27 min

2 commits5 files+892

“<command-message>compound-engineering:lfg</command-message> <command-name>/compound-engineering:lfg</command-name> <command-args>Read the PRD at docs/infina-product-docs/docs/core-products/td-cd/user-logic/[PRD] [TD-CD]…”

Agents13
New files5
Edits7
Bash51
Skills4
Sessions5

Bash command mix · 51 calls

tests 30
git ops 11
other 5
typecheck 3
inspection 2

Skill activations (4)

ce-plan — Read the PRD at docs/infina-product-docs/docs/core-products/td-cd/user-logic/[PRD] [TD-CD] User stories - Mode 2 CD Batc… at 05:07
compound-engineering:ce-plan — Read the PRD at docs/infina-product-docs/docs/core-products/td-cd/user-logic/[PRD] [TD-CD] User stories - Mode 2 CD Batc… at 05:07
compound-engineering:ce-work at 05:16
compound-engineering:ce-review — mode:autofix plan:docs/plans/2026-04-15-001-feat-td-cd-mode2-batch-plan.md at 05:26

Subagents dispatched (13)

Explore · Explore Mode 1 CD implementation at 05:07
Explore · Explore CD entities and constants at 05:08
Explore · Read strategy interface and Mode 1 at 05:11
Explore · Explore withdrawal and settlement at 05:11
compound-engineering:review:correctness-reviewer · Correctness review Mode 2 at 05:27
compound-engineering:review:kieran-typescript-reviewer · TypeScript review Mode 2 at 05:27
compound-engineering:review:testing-reviewer · Testing review Mode 2 at 05:27
Explore · Explore Mode 1 implementation at 04:15
Explore · Find Mode strategy DI wiring at 04:19
Explore · Find CD batch entities and holdings at 04:19
…and 3 more

Subagent transcripts (13)

agent-a00b9c2474c5… — Find and report the complete contents of these files in the repo at /Users/randytran/Codes/ai-tool-b… [Read×17, Bash×16]
agent-a0237a5471fc… — Thoroughly explore the Mode 1 CD implementation in this NestJS monorepo. I need to understand: 1. Th… [Read×22, Bash×20, Glob×6, Grep×2]
agent-a0307e435ab8… — Read the following files fully and report their complete contents, method signatures, and patterns.… [Bash×30, Read×30]
agent-a151644ba93f… — Review the new TDCDMode2Strategy TypeScript implementation for type safety, clarity, and maintainabi… [Read×6, Grep×5]
agent-a3f3eea6c27e… — Explore the TD-CD related entities, constants, enums, and DTOs in this NestJS/TypeORM monorepo. I ne… [Read×34, Bash×16]
agent-a595dd7bacbf… — Thoroughly explore the Mode 1 CD implementation in this NestJS monorepo. I need to understand: 1. Th… [Bash×32, Read×24, Grep×1]
agent-a5a37a780a33… — Find how Mode 1 strategy is wired in NestJS DI. Search for: 1. Where TD_CD_MODE_STRATEGY symbol is p… [Bash×20, Read×18, Grep×13, Glob×2]
agent-a6daea87fa1d… — Thoroughly explore the existing Mode 1 TD-CD implementation in this NestJS monorepo. I need to under… [no tools]

New files created (3)

docs/plans/2026-04-15-001-feat-td-cd-mode2-batch-plan.md
libs/core/src/domain/savings-cd/td-cd-mode2.strategy.spec.ts
libs/core/src/domain/savings-cd/td-cd-mode2.strategy.ts

t2 14:40 → 15:19 UTC · 39 min

2 commits9 files+722

Agents1
New files7
Edits8
Bash46
Skills5

Bash command mix · 46 calls

other 24
tests 11
inspection 7
typecheck 3
git ops 1

Skill activations (5)

compound-engineering:ce-plan — Read the PRD at docs/infina-product-docs/docs/core-products/td-cd/user-logic/[PRD] [TD-CD] User stories - Mode 2 CD Batc… at 14:40
compound-engineering:ce-work — Execute the plan at docs/plans/2026-04-15-001-feat-td-cd-mode-2-cd-batch-plan.md Key constraints: - Implement end-to-en… at 15:06
compound-engineering:ce-review — mode:autofix plan:docs/plans/2026-04-15-001-feat-td-cd-mode-2-cd-batch-plan.md at 15:15
compound-engineering:todo-resolve at 15:19
compound-engineering:test-browser at 15:19

Subagents dispatched (1)

compound-engineering:review:correctness-reviewer · Correctness review of Mode 2 at 15:16

Subagent transcripts (1)

agent-acaa2407f200… — Review the TD-CD Mode 2 implementation committed in HEAD on this repo. Scope: `git diff HEAD~1` — on… [Read×4, Bash×3]

New files created (7)

docs/plans/2026-04-15-001-feat-td-cd-mode-2-cd-batch-plan.md
libs/core/src/domain/savings-cd/cd-aging-days.util.spec.ts
libs/core/src/domain/savings-cd/cd-aging-days.util.ts
libs/core/src/domain/savings-cd/td-cd-mode2-buy-price.util.spec.ts
libs/core/src/domain/savings-cd/td-cd-mode2-buy-price.util.ts
libs/core/src/domain/savings-cd/td-cd-mode2.strategy.spec.ts
libs/core/src/domain/savings-cd/td-cd-mode2.strategy.ts

t3 04:41 → 05:07 UTC · 26 min

2 commits7 files+720

“<local-command-caveat>Caveat: The messages below were generated by the user while running local commands. DO NOT respond to these messages or otherwise consider them in your response unless the user explicitly asks you t…”

Agents6
New files5
Edits12
Bash35
Skills5
Sessions2

Bash command mix · 35 calls

tests 27
other 4
git ops 4

Skill activations (5)

compound-engineering:ce-plan — Read the PRD at docs/infina-product-docs/docs/core-products/td-cd/user-logic/[PRD] [TD-CD] User stories - Mode 2 CD Batc… at 04:41
compound-engineering:ce-work at 04:49
compound-engineering:ce-review — mode:autofix plan:docs/plans/2026-04-16-001-feat-td-cd-mode2-batch-plan.md at 04:58
compound-engineering:todo-resolve at 05:04
compound-engineering:test-browser at 05:04

Subagents dispatched (6)

compound-engineering:research:repo-research-analyst · Research Mode 1 implementation patterns at 04:42
compound-engineering:research:learnings-researcher · Search for institutional learnings at 04:42
compound-engineering:review:correctness-reviewer · Correctness review of Mode 2 at 05:00
compound-engineering:review:testing-reviewer · Testing review of Mode 2 at 05:00
Explore · Explore Mode 1 implementation at 04:36
Explore · Find test files and CD batch models at 04:39

Subagent transcripts (6)

agent-a140e92b682d… — Thoroughly explore the Mode 1 CD implementation in this NestJS monorepo. I need to understand: 1. Th… [Read×24, Bash×21, Glob×2, Grep×2]
agent-a499c46c3fb3… — Review the following diff for correctness issues — logic errors, edge cases, state management bugs,… [Grep×12, Read×10, Bash×3, Glob×2]
agent-a582bd29ac2c… — I need to find several things in this NestJS monorepo at /Users/randytran/Codes/ai-tool-benchmark/ru… [Read×24, Bash×14, Glob×2]
agent-ab31bd190fbd… — Search docs/solutions/ for any relevant past solutions related to: - TD-CD (term deposit / certifica… [Read×10, Grep×6, Bash×4, Glob×1]
agent-ac444e767799… — Review the test files in this diff for coverage gaps, weak assertions, brittle tests, and missing ed… [Read×5, Glob×2, Grep×2]
agent-aed15d199435… — Research the existing Mode 1 TD-CD implementation in this NestJS/TypeORM monorepo. I need to underst… [Read×34, Grep×16, Glob×4, Bash×1]

New files created (5)

docs/plans/2026-04-16-001-feat-td-cd-mode2-batch-plan.md
libs/core/src/domain/savings-cd/cd-aging-days.util.spec.ts
libs/core/src/domain/savings-cd/cd-aging-days.util.ts
libs/core/src/domain/savings-cd/td-cd-mode2.strategy.spec.ts
libs/core/src/domain/savings-cd/td-cd-mode2.strategy.ts

t4 09:07 → 09:23 UTC · 16 min

2 commits8 files+671

Agents1
New files5
Edits5
Bash27
Skills6

Bash command mix · 27 calls

tests 14
git ops 7
other 5
inspection 1

Skill activations (6)

ce-plan — Read the PRD at docs/infina-product-docs/docs/core-products/td-cd/user-logic/[PRD] [TD-CD] User stories - Mode 2 CD Batc… at 09:07
compound-engineering:ce-plan — Read the PRD at docs/infina-product-docs/docs/core-products/td-cd/user-logic/[PRD] [TD-CD] User stories - Mode 2 CD Batc… at 09:07
compound-engineering:ce-work at 09:15
compound-engineering:ce-review — mode:autofix plan:docs/plans/2026-04-16-001-feat-td-cd-mode2-batch-plan.md at 09:21
compound-engineering:todo-resolve at 09:23
compound-engineering:test-browser at 09:23

Subagents dispatched (1)

Explore · Explore Mode 1 implementation at 09:08

Subagent transcripts (1)

agent-a597f0058c73… — Thoroughly explore the Mode 1 TD-CD implementation in this NestJS monorepo. I need to understand the… [Bash×29, Read×26]

New files created (5)

docs/plans/2026-04-16-001-feat-td-cd-mode2-batch-plan.md
libs/core/src/domain/savings-cd/cd-aging-days.util.spec.ts
libs/core/src/domain/savings-cd/cd-aging-days.util.ts
libs/core/src/domain/savings-cd/td-cd-mode2.strategy.spec.ts
libs/core/src/domain/savings-cd/td-cd-mode2.strategy.ts

t1 16:20 → 16:32 UTC · 12 min

2 commits2 files+33

“<command-message>compound-engineering:lfg</command-message> <command-name>/compound-engineering:lfg</command-name> <command-args>A QA bug report was filed. Read docs/benchmark/TASK.md for the full report: reproduction st…”

Agents1
New files1
Edits7
Bash35

Bash command mix · 35 calls

tests 22
other 8
inspection 2
git ops 2
lint/format 1

Subagents dispatched (1)

Explore · Explore savings CD codebase at 16:21

Subagent transcripts (1)

agent-a5b7b9d311c6… — I'm investigating bug SHP-2376 in a NestJS/TypeORM monorepo. The bug: when depositing into a Savings… [Read×14, Grep×10, Glob×3, Bash×1]

New files created (1)

compound-t1/docs/plans/shp-2376-fix-plan.md

t2 16:42 → 16:49 UTC · 7 min

2 commits3 files+137

Agents1
New files2
Edits3
Bash14

Bash command mix · 14 calls

other 7
tests 3
git ops 2
lint/format 1
inspection 1

Subagents dispatched (1)

Explore · Explore savings-cd batch eligibility at 16:42

Subagent transcripts (1)

agent-af90fcf3eb35… — I need to find the root cause of bug SHP-2376 in this NestJS monorepo. The bug is: deposits into Sav… [Read×13, Bash×10, Grep×6, Glob×1]

New files created (2)

compound-t2/docs/plans/shp-2376-fix-plan.md
compound-t2/libs/savings-cd/src/domain/savings-cd-batch-data-source.spec.ts

t1 03:56 → 04:11 UTC · 14 min

2 commits12 files+157

“<command-message>compound-engineering:lfg</command-message> <command-name>/compound-engineering:lfg</command-name> <command-args>A refactor has been requested by the tech lead. Read docs/benchmark/TASK.md for the full br…”

Agents1
New files8
Edits24
Bash16

Bash command mix · 16 calls

tests 4
git ops 4
inspection 3
install/build 2
lint/format 2
other 1

Subagents dispatched (1)

Explore · Explore core savings-cd structure at 03:57

Subagent transcripts (1)

agent-a008839e16c9… — I need to understand the current code structure for a refactor. The task involves: 1. Moving `cdBatc… [Read×17, Bash×7, Glob×3, Grep×3]

New files created (5)

compound-t1/docs/plans/shp2317-refactor-plan.md
compound-t1/libs/core/src/domain/savings-cd/td-cd-mode2.strategy.spec.ts
compound-t1/libs/core/src/domain/savings-cd/td-cd-mode2.strategy.ts
compound-t1/libs/core/src/model/cd-batch-info.model.ts
compound-t1/libs/core/src/port/service/td-cd-mode-strategy.port.ts

t2 06:33 → 06:45 UTC · 11 min

2 commits12 files+102

Agents1
New files2
Edits20
Bash12

Bash command mix · 12 calls

other 4
git ops 4
tests 2
inspection 2

Subagents dispatched (1)

Explore · Investigate codebase for SHP-2317 refactor at 06:34

Subagent transcripts (1)

agent-a80009616939… — I need to understand the current codebase structure for a refactor (SHP-2317) in this NX monorepo. T… [Read×19, Bash×6, Glob×5, Grep×5]

New files created (2)

compound-t2/docs/plans/2026-04-21-shp2317-refactor.md
compound-t2/libs/core/src/model/cd-batch-info.model.ts