bmad — BMad-Method
A structured Agile-style workflow harness that runs a Plan → Code → Review loop through a single
/bmad-quick-devslash command.
Upstream
- Repo: https://github.com/bmad-code-org/BMAD-METHOD
- Version used:
6.3.0(pinned vianpx bmad-method@6.3.0 installinscripts/setup-tool-config.sh; also recorded inplugin_versionsof the bugfix and refactorauto-metrics.json). - Author / maintainer:
bmad-code-org(GitHub organisation). Individual maintainer unknown — not verified in this pass. - License: unknown — not verified in this pass.
- Primary doc: https://github.com/bmad-code-org/BMAD-METHOD (README). Installer also writes
_bmad/and.claude/skills/bmad-*into the repo with its own docs; upstream doc URL beyond the README unknown — not verified in this pass.
Performance in this benchmark
| Task | Mean score | 95% CI | z |
|---|---|---|---|
| feature | 125.13 | [122.73, 127.37] | +0.215 |
| bugfix | 178.83 | [175.83, 181.33] | +0.640 |
| refactor | 159.42 | [155.33, 163.17] | +0.020 |
Rank 1 / 9 overall (combined z̄ = +0.292). bmad is in the top statistical tier on feature (tier: bmad, gstack, superpower, ecc, pure) and on bugfix (tier: ecc, bmad, pure, mindful, gstack), and in the top tier but near the bottom of it on refactor. The ordering inside the top-4 overall (bmad, ecc, pure, gstack) is not statistically distinguishable.
Mechanism — what actually runs
- Install surface (from
scripts/setup-tool-config.shlines 145–157):npx bmad-method@6.3.0 install --directory . --modules bmm --tools claude-code --yesis run inside the cloned repo. This writes_bmad/(config + agent prompts, gitignored by the bench safety rules),.claude/skills/bmad-*(skill files exposed to Claude Code), and_bmad-output/(tracked, used for phase artifacts). No plugin is added to the CLI’s plugin registry — theconfig/bmad-t1/plugins/directory contains only the defaultclaude-plugins-officialmarketplace. The tool’ssettings.jsonis just{ "skipDangerousModePermissionPrompt": true }; all bmad behaviour comes from the in-repo skill files, not from CLI-level config. - Entry point (from
scripts/manual-bench.shline 146): the harness prepends an intro that tells bmad which path to take, then sendsPROMPT="/bmad-quick-dev $BMAD_INTRO\n\n$SHARED_TASK". The intro always says “Pick the Plan-Code-Review path” and adds task-shape hints (“non-trivial feature”, “scoped bugfix — reproduce first”, “scoped refactor — no behavior change”). Observable behaviour: the command parses the intro, then proceeds through plan → code → review phases inline. In the transcripts the loaded skill adds workflow instructions to the session (substringbmad-quick-devappears 11/24/22 times across feature/bugfix/refactor;Plan-Code-Reviewappears 4/10/11 times;workflow18/16/17 times;checkpoint6/9/9 times;elicitation1/4/4 times — all from within the transcript). - Skills / sub-agents / hooks activated: one Claude Code
Skillinvocation at session start (feature log). During execution bmad dispatchesAgenttool calls to sub-agents (4 in bugfix, 1 in refactor, 0 in feature). Roles observed in sub-agent prompts: anExploreinvestigator, a blind adversarial code reviewer, an edge-case hunter, and an acceptance auditor. No external hooks, no MCP servers — only the installed skill files and Claude Code’s built-inTask/Agenttool. - Core mental model: BMad stages a cut-down Agile cycle — scope → plan → implement → multi-angle review — into a single slash command, with a fact-forcing preamble and explicit phase checkpoints before file edits.
How this benchmark invoked it
Exact PROMPT (from manual-bench.sh, with per-task intro):
/bmad-quick-dev Pick the Plan-Code-Review path — this is a <non-trivial feature | scoped bugfix | scoped refactor> in an existing <brownfield> codebase. <task-shape hint>
<SHARED_TASK>
Base model: claude-opus-4-6 (same for all nine tools).
What actually happened in the transcripts
- feature (Mode 2 CD Batch, z = +0.215): 270-turn session, no sub-agent dispatch. The agent skipped elicitation (“I have sufficient context. Mode 1 is library-only — planning implementation now.”) and went straight into extending
ITDCDModeStrategy, wiringvalidateInventoryfor per-batch scheme context, adding a newtd-cd-mode2.strategy.ts+ utilities, and writing specs. Result: 12 files changed (+508 / −6), 82/82 tests passed, 0 tsc errors, 13 eslint errors / 6 warnings (results/bmad/t1/auto-metrics.json). - bugfix (z = +0.640, the strongest trial): 273-turn session, 4
Agentdispatches. First anExploresub-agent mapped savings-cd batch eligibility; then three independent reviewers (blind-adversarial diff review, edge-case hunter, acceptance auditor) ran against the proposed patch before commit. 3 files changed (+222 / −4), 90 tests passed / 15 failed (core: 27 pass / 15 fail; savings-cd: 63/63 green), 0 tsc errors, 10 eslint errors / 15 warnings. - refactor (z = +0.020): 301-turn session, 1
Agentdispatch (scopedExploremapping the two design seams being cleaned). The agent carried out 11 edits (+129 / −75) acrosslibs/core, kept a fact-forcing preamble before each edit, and landed all 61 tests green — but the resulting diff scored at cohort mean. 8 eslint errors / 7 warnings.
Why it ranked 1 (noting the top-4 tie)
- The bugfix trial is the single biggest contributor (z = +0.640, the largest per-task lift among bmad’s three tasks). Judges rewarded the triple-reviewer gate before commit and a reproduced-test-first patch; every judge — opus (184.0), codex (160.5), qwen (192.0) — placed bmad at or near the top for this task.
- On feature, bmad sat in the top tier but near the top of the field on length of work rather than on elegance: the opus judge gave it the single highest feature score (124.85) while codex put it only mid-field (95.85). A meaningful opus/codex divergence.
- The refactor trial was flat (z ≈ 0); bmad’s rank on refactor is 6, so its overall rank-1 finish is entirely driven by the other two tasks and the top-4 tie is statistically indistinguishable in CI terms.
Strengths & failure modes
Strengths (transcript-grounded):
- Multi-reviewer gate: bugfix’s three post-diff reviewers (adversarial, edge-case, acceptance) show up in the sub-agent prompts, not in the shared task brief — they come from bmad’s skill files.
- Fact-forcing preamble before edits and explicit “checkpoint” markers keep the agent from drifting on long sessions (9 checkpoint references each in bugfix and refactor).
- Matches task shape via the intro: the harness-provided hint flips the workflow between feature / bugfix / refactor without changing the slash command.
Failure modes (transcript-grounded):
- Lint is not gated: 13 / 10 / 8 eslint errors at commit across feature / bugfix / refactor; the workflow does not round-trip
eslintbefore declaring done. - Pre-maturity elicitation is skipped when the agent judges context “sufficient” (feature turn 0: “I have sufficient context … planning implementation now”), so the plan step can collapse into a monologue.
- Refactor run used only one sub-agent and produced a mean-level result — the multi-reviewer pattern that lifted bugfix did not fire here, suggesting the “Plan-Code-Review” loop is strongest on tasks with a named defect to review against.
- Overhead shows in length: 270–301 turns per trial is among the longer sessions in the benchmark.
References
- Install surface:
scripts/setup-tool-config.sh(casebmad), lines 145–157) - Prompt construction:
scripts/manual-bench.sh(casebmad), lines 131–149) - Config snapshot:
config/bmad-t1/settings.json,config/bmad-t1/plugins/ - Transcripts:
results/bmad/t1/session-logs/03edbe91-3a0d-4351-a8f8-2d6956ab36e8.jsonl,results/bugfix/bmad/t1/session-logs/16b2a47a-78f9-4a3e-86b0-c8aef7432577.jsonl,results/refactor/bmad/t1/session-logs/809fef16-9f6f-45ad-8622-517fba071d30.jsonl - Metrics:
results/bmad/t1/auto-metrics.json,results/bugfix/bmad/t1/auto-metrics.json,results/refactor/bmad/t1/auto-metrics.json,results/cross-task-stats.json - Upstream repo: https://github.com/bmad-code-org/BMAD-METHOD
Observed in trial timelines
bmad is the only tool whose skill content shows up as explicit Read events in the session (mean 3.5 unique skill files on feature, 6.5 on bugfix, 4.5 on refactor; range 2–8). Every other tool injects skill content via slash command into the system prompt, so no Read events fire — bmad’s step-by-step step-01…step-05 files are loaded at runtime as the workflow advances. This is why bmad’s transcripts read like a script: the skill content is materialised in-band.
Detail: see the per-trial timeline files linked below.
Trial timelines
Per-trial event timelines auto-extracted from session-logs/*.jsonl — skill activations, plugin/skill file reads, subagents dispatched, code mutations, Bash usage:
Trial timelines
Per-trial session execution extracted from each trial's session-logs/*.jsonl. Each card
shows the subagents dispatched, skill activations, Bash command mix, and the final diff. Switch task
tabs to compare behaviour across feature, bugfix, and refactor trials.
“Use bmad-quick-dev to handle this task. Pick the Plan-Code-Review path — this is a non-trivial feature in an existing codebase. Read the PRD at docs/infina-product-docs/docs/core-products/td-cd/user-logic/[PRD] [TD-CD] U…”
- New files5
- Edits18
- Bash52
- Skills1
- Skill files2
Skill activations (1)
bmad-quick-dev— Plan-Code-Review path. Implement Mode 2 CD Batch for TD-CD end-to-end. PRD: docs/infina-product-docs/docs/core-products… at 15:10
Plugin/skill files read (2 unique)
.claude/skills/bmad-quick-dev/workflow.md.claude/skills/bmad-quick-dev/step-01-clarify-and-route.md
New files created (5)
libs/core/src/domain/savings-cd/td-cd-mode2-price-calculator.spec.tslibs/core/src/domain/savings-cd/td-cd-mode2-price-calculator.tslibs/core/src/domain/savings-cd/td-cd-mode2.strategy.spec.tslibs/core/src/domain/savings-cd/td-cd-mode2.strategy.tslibs/core/src/port/service/td-cd-mode2-batch-resolver.port.ts
“Use bmad-quick-dev to handle this task. Pick the Plan-Code-Review path — this is a non-trivial feature in an existing codebase. Read the PRD at docs/infina-product-docs/docs/core-products/td-cd/user-logic/[PRD] [TD-CD] U…”
- Agents1
- New files2
- Edits2
- Bash30
- Skills1
- Skill files2
- Todos7
Skill activations (1)
bmad-quick-dev— Plan-Code-Review path. Implement Mode 2 CD Batch for TD-CD product end-to-end. PRD: docs/infina-product-docs/docs/core-… at 14:32
Plugin/skill files read (2 unique)
.claude/skills/bmad-quick-dev/workflow.md.claude/skills/bmad-quick-dev/step-01-clarify-and-route.md
Subagents dispatched (1)
Explore· Map Mode 1 TD-CD layer at 14:33
Subagent transcripts (1)
agent-a775876df985…— I'm about to implement Mode 2 CD Batch for the TD-CD product in this NestJS/NX monorepo at /Users/ra… [Bash×31, Read×24]
New files created (2)
libs/core/src/domain/savings-cd/td-cd-mode2.strategy.spec.tslibs/core/src/domain/savings-cd/td-cd-mode2.strategy.ts
“<local-command-caveat>Caveat: The messages below were generated by the user while running local commands. DO NOT respond to these messages or otherwise consider them in your response unless the user explicitly asks you t…”
- New files3
- Edits8
- Bash23
- Skills1
- Skill files7
Skill activations (1)
bmad-quick-dev— Pick the Plan-Code-Review path — this is a non-trivial feature in an existing codebase. Read the PRD at docs/infina-pro… at 04:38
Plugin/skill files read (7 unique)
.claude/skills/bmad-quick-dev/workflow.md_bmad/bmm/config.yaml.claude/skills/bmad-quick-dev/step-01-clarify-and-route.md.claude/skills/bmad-quick-dev/step-02-plan.md.claude/skills/bmad-quick-dev/spec-template.md.claude/skills/bmad-quick-dev/step-03-implement.md.claude/skills/bmad-quick-dev/step-04-review.md
New files created (3)
_bmad-output/implementation-artifacts/spec-td-cd-mode2-batch.mdlibs/core/src/domain/savings-cd/td-cd-mode2.strategy.spec.tslibs/core/src/domain/savings-cd/td-cd-mode2.strategy.ts
“<command-message>bmad-quick-dev</command-message> <command-name>/bmad-quick-dev</command-name> <command-args>Pick the Plan-Code-Review path — this is a non-trivial feature in an existing codebase. Read the PRD at docs/in…”
- Agents3
- New files2
- Edits11
- Bash29
- Skill files3
- Todos6
Plugin/skill files read (3 unique)
.claude/skills/bmad-quick-dev/workflow.md_bmad/bmm/config.yaml.claude/skills/bmad-quick-dev/step-01-clarify-and-route.md
Subagents dispatched (3)
Explore· Explore Mode 1 TD-CD implementation at 08:55Explore· Explore Mode 1 PRD for reference at 08:55Explore· Find Mode 1 DI wiring at 09:02
Subagent transcripts (3)
agent-a4944196081c…— Find and read the Mode 1 TD-CD PRD document. It should be at a path like `docs/infina-product-docs/d… [Glob×2, Bash×2, Read×1]agent-a58442e4e930…— Search the codebase for where TDCDMode1Strategy is instantiated or provided as a dependency. Look fo… [Read×19, Bash×16, Grep×7, Glob×5]agent-ab21289cb0a8…— Thoroughly explore the Mode 1 TD-CD (savings CD) implementation in this codebase. I need to understa… [Read×30, Bash×23]
New files created (2)
libs/core/src/domain/savings-cd/td-cd-mode2.strategy.spec.tslibs/core/src/domain/savings-cd/td-cd-mode2.strategy.ts
“<command-message>bmad-quick-dev</command-message> <command-name>/bmad-quick-dev</command-name> <command-args>Pick the Plan-Code-Review path — this is a scoped bugfix in an existing brownfield codebase. Investigate before…”
- Agents4
- New files2
- Edits11
- Bash19
- Skill files8
Plugin/skill files read (8 unique)
bmad-t1/.claude/skills/bmad-quick-dev/workflow.mdbmad-t1/_bmad/bmm/config.yamlbmad-t1/.claude/skills/bmad-quick-dev/step-01-clarify-and-route.mdbmad-t1/.claude/skills/bmad-quick-dev/step-02-plan.mdbmad-t1/.claude/skills/bmad-quick-dev/spec-template.mdbmad-t1/.claude/skills/bmad-quick-dev/step-03-implement.mdbmad-t1/.claude/skills/bmad-quick-dev/step-04-review.mdbmad-t1/.claude/skills/bmad-quick-dev/step-05-present.md
Subagents dispatched (4)
Explore· Investigate savings-cd batch eligibility at 16:41Blind adversarial review· Blind adversarial review at 16:57Edge case review· Edge case review at 16:57Acceptance auditor review· Acceptance auditor review at 16:57
Subagent transcripts (4)
agent-a1585943ab39…— You are an acceptance auditor. Verify the implementation matches the spec and bug report requirement… [Bash×34, Read×7, Grep×3, Glob×1]agent-a4f2418311a0…— You are an edge case hunter reviewing a code change. Walk every branching path and boundary conditio… [Read×3, Glob×1]agent-a92b111f975e…— You are a blind adversarial code reviewer. You have NO context about the project, no spec, no requir… [no tools]agent-abeabb30b8f3…— Thoroughly explore the savings-cd codebase in /Users/randytran/Codes/ai-tool-benchmark/runs/shp2376/… [Read×15, Bash×7, Grep×4]
New files created (2)
bmad-t1/_bmad-output/implementation-artifacts/spec-shp-2376-deposit-maturity-fix.mdbmad-t1/libs/savings-cd/src/domain/savings-cd-batch-data-source.spec.ts
“<command-message>bmad-quick-dev</command-message> <command-name>/bmad-quick-dev</command-name> <command-args>Pick the Plan-Code-Review path — this is a scoped bugfix in an existing brownfield codebase. Investigate before…”
- Agents1
- New files1
- Bash1
- Skill files5
Plugin/skill files read (5 unique)
bmad-t2/.claude/skills/bmad-quick-dev/workflow.mdbmad-t2/_bmad/bmm/config.yamlbmad-t2/.claude/skills/bmad-quick-dev/step-01-clarify-and-route.mdbmad-t2/.claude/skills/bmad-quick-dev/step-02-plan.mdbmad-t2/.claude/skills/bmad-quick-dev/spec-template.md
Subagents dispatched (1)
Explore· Investigate SHP-2376 bug at 16:44
Subagent transcripts (1)
agent-a4da47e0c3e1…— Thoroughness: very thorough I'm investigating bug SHP-2376 in an NX monorepo at /Users/randytran/Cod… [Read×11, Grep×6, Bash×3, Glob×3]
New files created (1)
bmad-t2/_bmad-output/implementation-artifacts/spec-shp-2376-near-maturity-deposit-stuck.md
“<command-message>bmad-quick-dev</command-message> <command-name>/bmad-quick-dev</command-name> <command-args>Pick the Plan-Code-Review path — this is a scoped refactor in an existing brownfield codebase. No behavior chan…”
- Agents1
- New files5
- Edits22
- Bash12
- Skill files6
Plugin/skill files read (6 unique)
bmad-t1/.claude/skills/bmad-quick-dev/workflow.mdbmad-t1/_bmad/bmm/config.yamlbmad-t1/.claude/skills/bmad-quick-dev/step-01-clarify-and-route.mdbmad-t1/.claude/skills/bmad-quick-dev/step-02-plan.mdbmad-t1/.claude/skills/bmad-quick-dev/spec-template.mdbmad-t1/.claude/skills/bmad-quick-dev/step-03-implement.md
Subagents dispatched (1)
Explore· Investigate SHP-2317 refactor areas at 03:53
Subagent transcripts (1)
agent-a5d8ce588542…— I need a thorough investigation of a refactor in this NX monorepo. The task (SHP-2317) involves two… [Bash×23, Read×15, Grep×7, Glob×3]
New files created (5)
bmad-t1/_bmad-output/implementation-artifacts/spec-shp-2317-decouple-batch-binding.mdbmad-t1/libs/core/src/domain/savings-cd/td-cd-mode2.strategy.spec.tsbmad-t1/libs/core/src/domain/savings-cd/td-cd-mode2.strategy.tsbmad-t1/libs/core/src/model/cd-batch-info.model.tsbmad-t1/libs/core/src/port/service/td-cd-mode-strategy.port.ts
“<command-message>bmad-quick-dev</command-message> <command-name>/bmad-quick-dev</command-name> <command-args>Pick the Plan-Code-Review path — this is a scoped refactor in an existing brownfield codebase. No behavior chan…”
- Agents2
- Edits22
- Bash11
- Skill files3
Plugin/skill files read (3 unique)
bmad-t2/.claude/skills/bmad-quick-dev/workflow.mdbmad-t2/_bmad/bmm/config.yamlbmad-t2/.claude/skills/bmad-quick-dev/step-01-clarify-and-route.md
Subagents dispatched (2)
Explore· Investigate Scheme model/entity at 06:32Explore· Investigate TSSchemeSetting and strategy at 06:33
Subagent transcripts (2)
agent-a8fa3205ddb1…— In /Users/randytran/Codes/ai-tool-benchmark/runs/shp2317/bmad-t2, find everything related to: 1. `TS… [Read×11, Grep×6, Glob×1]agent-af07011904c2…— In /Users/randytran/Codes/ai-tool-benchmark/runs/shp2317/bmad-t2, find everything related to the `Sc… [Bash×16, Read×13, Grep×5, Glob×4]