compound
Overview
compound is the Claude Code plugin EveryInc/compound-engineering-plugin, maintained by Every Inc. (the media and software company behind the Cora email product). The plugin was developed in public alongside Kieran Klaassen’s “compound engineering” writing, and it packages that methodology as a large bundle of slash commands, skills, and specialist agents. The benchmark installs it via claude plugin marketplace add EveryInc/compound-engineering-plugin and pins release 2.65.0 (MIT license). The surface area is large — roughly fifty agents and forty skills — but the user-facing primitives cluster into four phases: /ce:plan, /ce:work, /ce:review, and /ce:compound, plus a handful of research, design, and git utilities.
The methodology behind the name is the thesis that each engineering task should leave the system easier to work on next time, by codifying plans, reviews, and learnings back into the repository. In practice that shows up as markdown artifacts in docs/plans/, reviewer persona subagents, and a /ce:compound step that writes learnings to a wiki-style store.
Entry point: /lfg
The benchmark drives the plugin through its beta autonomous entry point, /compound-engineering:lfg (“let’s f***ing go”). The skill file is explicit about ordering: it is a fixed six-step pipeline with stop-gates between phases.
- Optionally delegate to a
ralph-loopskill if present (not installed in the benchmark). /ce:plan $ARGUMENTS— gated on producing a plan file underdocs/plans/./ce:work— gated on observing code changes beyond the plan./ce:review mode:autofix plan:<plan-path>— passes the plan path so review can check requirement coverage./compound-engineering:todo-resolve./compound-engineering:test-browser, then emit<promise>DONE</promise>.
So lfg is a plan-execute-review loop with a knowledge-compaction tail, not a freeform agent. Phases are sequenced with “GATE: STOP” language rather than parallel fan-out.
What the transcripts actually show
The feature-task transcript set contains five top-level JSONL sessions and thirteen subagent logs, consistent with the plugin’s reviewer-persona pattern firing during /ce:review. Tool usage skews toward analysis over editing: 57 Read, 51 Bash, 35 Grep, 13 Agent (Task) calls, and only 12 write-side calls (Edit+Write). One session begins with the raw lfg command as advertised; another begins with the harness re-prompting the model to run ce:plan/ce:work/ce:review explicitly, suggesting the orchestrator did not always drive the full pipeline cleanly and required a manual nudge.
Bugfix and refactor runs are much leaner — single top-level sessions, one subagent each, and a single Agent dispatch — because the per-task prompt is just /compound-engineering:lfg $TASK and neither task benefits from large plan/review fan-out. The refactor run has the expected edit-heavy profile (24 Edit, 8 Write); the bugfix run stays Bash-and-Read dominant (35 Bash, 16 Read, 7 Edit).
Benchmark outcome
Across the three tasks compound lands at rank 7 of 9 with a mean z-score of -0.167. It is consistently near the cohort average rather than catastrophically weak:
- Feature (
z = -0.082, rank 6/9): produces a 5-file, 892-insertion diff with 92 of 92 tests passing and zero TypeScript errors. ESLint reports 10 errors and 6 warnings — the plugin did not close the loop on style gates despite running the review phase. Tier-2 group withomcandclaudekit. - Bugfix (
z = -0.487, rank 7/9): the smallest diff of the cohort at 2 files and 33 insertions. The change compiles and tests pass, but judges rated it a minimal patch relative to peers that wrote more substantial fixes. - Refactor (
z = +0.070, rank 4/9): the only task wherecompoundis above average, with a 12-file, 157-insertion / 83-deletion diff — the kind of multi-file edit the plugin’s reviewer agents are shaped to support.
Self-preference effects on refactor are strongly positive (+12.1 judge points), meaning the opus judge liked compound’s refactor output more than the cross-judge consensus did; this pattern is common across the cohort and does not distinguish compound.
Reading of the result
compound is not a lightweight prompt — it ships a full plan-then-work-then-review pipeline with reviewer personas, gates, and a learning-capture step. In the benchmark’s single-shot /lfg invocation the pipeline does run, but its advantages (reviewer fan-out, plan artifacts) are most visible on the feature task, where they translate into a large, test-passing diff but not higher judge scores than leaner tools like pure or bmad. On the short tasks the overhead is not repaid, and the tool drifts to the middle of the pack. Residual ESLint errors on the feature task suggest the review phase verified logic but not style, which is consistent with an autofix review mode scoped to correctness personas rather than lint gates.
The plugin is more interesting as a methodology carrier — plans in docs/plans/, learnings in a wiki — than as a single-shot speed runner, and the benchmark, which rewards diff quality on one attempt with no memory carryover, is a harsh venue for that design.
Observed in trial timelines
Subagent dispatch confirms the orchestration-amortisation hypothesis: feature mean 5.2 (range 1–13, with t1 dispatching 13 reviewer/researcher agents) collapses to mean 1.0 on both bugfix and refactor. The full reviewer-persona pipeline only fires when the brief is large enough to warrant it; on short tasks compound shrinks to a single Explore.
Detail: see the per-trial timeline files linked below.
Trial timelines
Per-trial event timelines auto-extracted from session-logs/*.jsonl — skill activations, plugin/skill file reads, subagents dispatched, code mutations, Bash usage:
Trial timelines
Per-trial session execution extracted from each trial's session-logs/*.jsonl. Each card
shows the subagents dispatched, skill activations, Bash command mix, and the final diff. Switch task
tabs to compare behaviour across feature, bugfix, and refactor trials.
“<command-message>compound-engineering:lfg</command-message> <command-name>/compound-engineering:lfg</command-name> <command-args>Read the PRD at docs/infina-product-docs/docs/core-products/td-cd/user-logic/[PRD] [TD-CD]…”
- Agents13
- New files5
- Edits7
- Bash51
- Skills4
- Sessions5
Skill activations (4)
ce-plan— Read the PRD at docs/infina-product-docs/docs/core-products/td-cd/user-logic/[PRD] [TD-CD] User stories - Mode 2 CD Batc… at 05:07compound-engineering:ce-plan— Read the PRD at docs/infina-product-docs/docs/core-products/td-cd/user-logic/[PRD] [TD-CD] User stories - Mode 2 CD Batc… at 05:07compound-engineering:ce-workat 05:16compound-engineering:ce-review— mode:autofix plan:docs/plans/2026-04-15-001-feat-td-cd-mode2-batch-plan.md at 05:26
Subagents dispatched (13)
Explore· Explore Mode 1 CD implementation at 05:07Explore· Explore CD entities and constants at 05:08Explore· Read strategy interface and Mode 1 at 05:11Explore· Explore withdrawal and settlement at 05:11compound-engineering:review:correctness-reviewer· Correctness review Mode 2 at 05:27compound-engineering:review:kieran-typescript-reviewer· TypeScript review Mode 2 at 05:27compound-engineering:review:testing-reviewer· Testing review Mode 2 at 05:27Explore· Explore Mode 1 implementation at 04:15Explore· Find Mode strategy DI wiring at 04:19Explore· Find CD batch entities and holdings at 04:19- …and 3 more
Subagent transcripts (13)
agent-a00b9c2474c5…— Find and report the complete contents of these files in the repo at /Users/randytran/Codes/ai-tool-b… [Read×17, Bash×16]agent-a0237a5471fc…— Thoroughly explore the Mode 1 CD implementation in this NestJS monorepo. I need to understand: 1. Th… [Read×22, Bash×20, Glob×6, Grep×2]agent-a0307e435ab8…— Read the following files fully and report their complete contents, method signatures, and patterns.… [Bash×30, Read×30]agent-a151644ba93f…— Review the new TDCDMode2Strategy TypeScript implementation for type safety, clarity, and maintainabi… [Read×6, Grep×5]agent-a3f3eea6c27e…— Explore the TD-CD related entities, constants, enums, and DTOs in this NestJS/TypeORM monorepo. I ne… [Read×34, Bash×16]agent-a595dd7bacbf…— Thoroughly explore the Mode 1 CD implementation in this NestJS monorepo. I need to understand: 1. Th… [Bash×32, Read×24, Grep×1]agent-a5a37a780a33…— Find how Mode 1 strategy is wired in NestJS DI. Search for: 1. Where TD_CD_MODE_STRATEGY symbol is p… [Bash×20, Read×18, Grep×13, Glob×2]agent-a6daea87fa1d…— Thoroughly explore the existing Mode 1 TD-CD implementation in this NestJS monorepo. I need to under… [no tools]
New files created (3)
docs/plans/2026-04-15-001-feat-td-cd-mode2-batch-plan.mdlibs/core/src/domain/savings-cd/td-cd-mode2.strategy.spec.tslibs/core/src/domain/savings-cd/td-cd-mode2.strategy.ts
“<command-message>compound-engineering:lfg</command-message> <command-name>/compound-engineering:lfg</command-name> <command-args>Read the PRD at docs/infina-product-docs/docs/core-products/td-cd/user-logic/[PRD] [TD-CD]…”
- Agents1
- New files7
- Edits8
- Bash46
- Skills5
Skill activations (5)
compound-engineering:ce-plan— Read the PRD at docs/infina-product-docs/docs/core-products/td-cd/user-logic/[PRD] [TD-CD] User stories - Mode 2 CD Batc… at 14:40compound-engineering:ce-work— Execute the plan at docs/plans/2026-04-15-001-feat-td-cd-mode-2-cd-batch-plan.md Key constraints: - Implement end-to-en… at 15:06compound-engineering:ce-review— mode:autofix plan:docs/plans/2026-04-15-001-feat-td-cd-mode-2-cd-batch-plan.md at 15:15compound-engineering:todo-resolveat 15:19compound-engineering:test-browserat 15:19
Subagents dispatched (1)
compound-engineering:review:correctness-reviewer· Correctness review of Mode 2 at 15:16
Subagent transcripts (1)
agent-acaa2407f200…— Review the TD-CD Mode 2 implementation committed in HEAD on this repo. Scope: `git diff HEAD~1` — on… [Read×4, Bash×3]
New files created (7)
docs/plans/2026-04-15-001-feat-td-cd-mode-2-cd-batch-plan.mdlibs/core/src/domain/savings-cd/cd-aging-days.util.spec.tslibs/core/src/domain/savings-cd/cd-aging-days.util.tslibs/core/src/domain/savings-cd/td-cd-mode2-buy-price.util.spec.tslibs/core/src/domain/savings-cd/td-cd-mode2-buy-price.util.tslibs/core/src/domain/savings-cd/td-cd-mode2.strategy.spec.tslibs/core/src/domain/savings-cd/td-cd-mode2.strategy.ts
“<local-command-caveat>Caveat: The messages below were generated by the user while running local commands. DO NOT respond to these messages or otherwise consider them in your response unless the user explicitly asks you t…”
- Agents6
- New files5
- Edits12
- Bash35
- Skills5
- Sessions2
Skill activations (5)
compound-engineering:ce-plan— Read the PRD at docs/infina-product-docs/docs/core-products/td-cd/user-logic/[PRD] [TD-CD] User stories - Mode 2 CD Batc… at 04:41compound-engineering:ce-workat 04:49compound-engineering:ce-review— mode:autofix plan:docs/plans/2026-04-16-001-feat-td-cd-mode2-batch-plan.md at 04:58compound-engineering:todo-resolveat 05:04compound-engineering:test-browserat 05:04
Subagents dispatched (6)
compound-engineering:research:repo-research-analyst· Research Mode 1 implementation patterns at 04:42compound-engineering:research:learnings-researcher· Search for institutional learnings at 04:42compound-engineering:review:correctness-reviewer· Correctness review of Mode 2 at 05:00compound-engineering:review:testing-reviewer· Testing review of Mode 2 at 05:00Explore· Explore Mode 1 implementation at 04:36Explore· Find test files and CD batch models at 04:39
Subagent transcripts (6)
agent-a140e92b682d…— Thoroughly explore the Mode 1 CD implementation in this NestJS monorepo. I need to understand: 1. Th… [Read×24, Bash×21, Glob×2, Grep×2]agent-a499c46c3fb3…— Review the following diff for correctness issues — logic errors, edge cases, state management bugs,… [Grep×12, Read×10, Bash×3, Glob×2]agent-a582bd29ac2c…— I need to find several things in this NestJS monorepo at /Users/randytran/Codes/ai-tool-benchmark/ru… [Read×24, Bash×14, Glob×2]agent-ab31bd190fbd…— Search docs/solutions/ for any relevant past solutions related to: - TD-CD (term deposit / certifica… [Read×10, Grep×6, Bash×4, Glob×1]agent-ac444e767799…— Review the test files in this diff for coverage gaps, weak assertions, brittle tests, and missing ed… [Read×5, Glob×2, Grep×2]agent-aed15d199435…— Research the existing Mode 1 TD-CD implementation in this NestJS/TypeORM monorepo. I need to underst… [Read×34, Grep×16, Glob×4, Bash×1]
New files created (5)
docs/plans/2026-04-16-001-feat-td-cd-mode2-batch-plan.mdlibs/core/src/domain/savings-cd/cd-aging-days.util.spec.tslibs/core/src/domain/savings-cd/cd-aging-days.util.tslibs/core/src/domain/savings-cd/td-cd-mode2.strategy.spec.tslibs/core/src/domain/savings-cd/td-cd-mode2.strategy.ts
“<command-message>compound-engineering:lfg</command-message> <command-name>/compound-engineering:lfg</command-name> <command-args>Read the PRD at docs/infina-product-docs/docs/core-products/td-cd/user-logic/[PRD] [TD-CD]…”
- Agents1
- New files5
- Edits5
- Bash27
- Skills6
Skill activations (6)
ce-plan— Read the PRD at docs/infina-product-docs/docs/core-products/td-cd/user-logic/[PRD] [TD-CD] User stories - Mode 2 CD Batc… at 09:07compound-engineering:ce-plan— Read the PRD at docs/infina-product-docs/docs/core-products/td-cd/user-logic/[PRD] [TD-CD] User stories - Mode 2 CD Batc… at 09:07compound-engineering:ce-workat 09:15compound-engineering:ce-review— mode:autofix plan:docs/plans/2026-04-16-001-feat-td-cd-mode2-batch-plan.md at 09:21compound-engineering:todo-resolveat 09:23compound-engineering:test-browserat 09:23
Subagents dispatched (1)
Explore· Explore Mode 1 implementation at 09:08
Subagent transcripts (1)
agent-a597f0058c73…— Thoroughly explore the Mode 1 TD-CD implementation in this NestJS monorepo. I need to understand the… [Bash×29, Read×26]
New files created (5)
docs/plans/2026-04-16-001-feat-td-cd-mode2-batch-plan.mdlibs/core/src/domain/savings-cd/cd-aging-days.util.spec.tslibs/core/src/domain/savings-cd/cd-aging-days.util.tslibs/core/src/domain/savings-cd/td-cd-mode2.strategy.spec.tslibs/core/src/domain/savings-cd/td-cd-mode2.strategy.ts
“<command-message>compound-engineering:lfg</command-message> <command-name>/compound-engineering:lfg</command-name> <command-args>A QA bug report was filed. Read docs/benchmark/TASK.md for the full report: reproduction st…”
- Agents1
- New files1
- Edits7
- Bash35
Subagents dispatched (1)
Explore· Explore savings CD codebase at 16:21
Subagent transcripts (1)
agent-a5b7b9d311c6…— I'm investigating bug SHP-2376 in a NestJS/TypeORM monorepo. The bug: when depositing into a Savings… [Read×14, Grep×10, Glob×3, Bash×1]
New files created (1)
compound-t1/docs/plans/shp-2376-fix-plan.md
“<command-message>compound-engineering:lfg</command-message> <command-name>/compound-engineering:lfg</command-name> <command-args>A QA bug report was filed. Read docs/benchmark/TASK.md for the full report: reproduction st…”
- Agents1
- New files2
- Edits3
- Bash14
Subagents dispatched (1)
Explore· Explore savings-cd batch eligibility at 16:42
Subagent transcripts (1)
agent-af90fcf3eb35…— I need to find the root cause of bug SHP-2376 in this NestJS monorepo. The bug is: deposits into Sav… [Read×13, Bash×10, Grep×6, Glob×1]
New files created (2)
compound-t2/docs/plans/shp-2376-fix-plan.mdcompound-t2/libs/savings-cd/src/domain/savings-cd-batch-data-source.spec.ts
“<command-message>compound-engineering:lfg</command-message> <command-name>/compound-engineering:lfg</command-name> <command-args>A refactor has been requested by the tech lead. Read docs/benchmark/TASK.md for the full br…”
- Agents1
- New files8
- Edits24
- Bash16
Subagents dispatched (1)
Explore· Explore core savings-cd structure at 03:57
Subagent transcripts (1)
agent-a008839e16c9…— I need to understand the current code structure for a refactor. The task involves: 1. Moving `cdBatc… [Read×17, Bash×7, Glob×3, Grep×3]
New files created (5)
compound-t1/docs/plans/shp2317-refactor-plan.mdcompound-t1/libs/core/src/domain/savings-cd/td-cd-mode2.strategy.spec.tscompound-t1/libs/core/src/domain/savings-cd/td-cd-mode2.strategy.tscompound-t1/libs/core/src/model/cd-batch-info.model.tscompound-t1/libs/core/src/port/service/td-cd-mode-strategy.port.ts
“<command-message>compound-engineering:lfg</command-message> <command-name>/compound-engineering:lfg</command-name> <command-args>A refactor has been requested by the tech lead. Read docs/benchmark/TASK.md for the full br…”
- Agents1
- New files2
- Edits20
- Bash12
Subagents dispatched (1)
Explore· Investigate codebase for SHP-2317 refactor at 06:34
Subagent transcripts (1)
agent-a80009616939…— I need to understand the current codebase structure for a refactor (SHP-2317) in this NX monorepo. T… [Read×19, Bash×6, Glob×5, Grep×5]
New files created (2)
compound-t2/docs/plans/2026-04-21-shp2317-refactor.mdcompound-t2/libs/core/src/model/cd-batch-info.model.ts