bmad — BMad-Method

A structured Agile-style workflow harness that runs a Plan → Code → Review loop through a single /bmad-quick-dev slash command.

Upstream

Repo: https://github.com/bmad-code-org/BMAD-METHOD
Version used: 6.3.0 (pinned via npx bmad-method@6.3.0 install in scripts/setup-tool-config.sh; also recorded in plugin_versions of the bugfix and refactor auto-metrics.json).
Author / maintainer: bmad-code-org (GitHub organisation). Individual maintainer unknown — not verified in this pass.
License: unknown — not verified in this pass.
Primary doc: https://github.com/bmad-code-org/BMAD-METHOD (README). Installer also writes _bmad/ and .claude/skills/bmad-* into the repo with its own docs; upstream doc URL beyond the README unknown — not verified in this pass.

Performance in this benchmark

Task	Mean score	95% CI	z
feature	125.13	[122.73, 127.37]	+0.215
bugfix	178.83	[175.83, 181.33]	+0.640
refactor	159.42	[155.33, 163.17]	+0.020

Rank 1 / 9 overall (combined z̄ = +0.292). bmad is in the top statistical tier on feature (tier: bmad, gstack, superpower, ecc, pure) and on bugfix (tier: ecc, bmad, pure, mindful, gstack), and in the top tier but near the bottom of it on refactor. The ordering inside the top-4 overall (bmad, ecc, pure, gstack) is not statistically distinguishable.

Mechanism — what actually runs

Install surface (from scripts/setup-tool-config.sh lines 145–157): npx bmad-method@6.3.0 install --directory . --modules bmm --tools claude-code --yes is run inside the cloned repo. This writes _bmad/ (config + agent prompts, gitignored by the bench safety rules), .claude/skills/bmad-* (skill files exposed to Claude Code), and _bmad-output/ (tracked, used for phase artifacts). No plugin is added to the CLI’s plugin registry — the config/bmad-t1/plugins/ directory contains only the default claude-plugins-official marketplace. The tool’s settings.json is just { "skipDangerousModePermissionPrompt": true }; all bmad behaviour comes from the in-repo skill files, not from CLI-level config.
Entry point (from scripts/manual-bench.sh line 146): the harness prepends an intro that tells bmad which path to take, then sends PROMPT="/bmad-quick-dev $BMAD_INTRO\n\n$SHARED_TASK". The intro always says “Pick the Plan-Code-Review path” and adds task-shape hints (“non-trivial feature”, “scoped bugfix — reproduce first”, “scoped refactor — no behavior change”). Observable behaviour: the command parses the intro, then proceeds through plan → code → review phases inline. In the transcripts the loaded skill adds workflow instructions to the session (substring bmad-quick-dev appears 11/24/22 times across feature/bugfix/refactor; Plan-Code-Review appears 4/10/11 times; workflow 18/16/17 times; checkpoint 6/9/9 times; elicitation 1/4/4 times — all from within the transcript).
Skills / sub-agents / hooks activated: one Claude Code Skill invocation at session start (feature log). During execution bmad dispatches Agent tool calls to sub-agents (4 in bugfix, 1 in refactor, 0 in feature). Roles observed in sub-agent prompts: an Explore investigator, a blind adversarial code reviewer, an edge-case hunter, and an acceptance auditor. No external hooks, no MCP servers — only the installed skill files and Claude Code’s built-in Task/Agent tool.
Core mental model: BMad stages a cut-down Agile cycle — scope → plan → implement → multi-angle review — into a single slash command, with a fact-forcing preamble and explicit phase checkpoints before file edits.

How this benchmark invoked it

Exact PROMPT (from manual-bench.sh, with per-task intro):

/bmad-quick-dev Pick the Plan-Code-Review path — this is a <non-trivial feature | scoped bugfix | scoped refactor> in an existing <brownfield> codebase. <task-shape hint>

<SHARED_TASK>

Base model: claude-opus-4-6 (same for all nine tools).

What actually happened in the transcripts

feature (Mode 2 CD Batch, z = +0.215): 270-turn session, no sub-agent dispatch. The agent skipped elicitation (“I have sufficient context. Mode 1 is library-only — planning implementation now.”) and went straight into extending ITDCDModeStrategy, wiring validateInventory for per-batch scheme context, adding a new td-cd-mode2.strategy.ts + utilities, and writing specs. Result: 12 files changed (+508 / −6), 82/82 tests passed, 0 tsc errors, 13 eslint errors / 6 warnings (results/bmad/t1/auto-metrics.json).
bugfix (z = +0.640, the strongest trial): 273-turn session, 4 Agent dispatches. First an Explore sub-agent mapped savings-cd batch eligibility; then three independent reviewers (blind-adversarial diff review, edge-case hunter, acceptance auditor) ran against the proposed patch before commit. 3 files changed (+222 / −4), 90 tests passed / 15 failed (core: 27 pass / 15 fail; savings-cd: 63/63 green), 0 tsc errors, 10 eslint errors / 15 warnings.
refactor (z = +0.020): 301-turn session, 1 Agent dispatch (scoped Explore mapping the two design seams being cleaned). The agent carried out 11 edits (+129 / −75) across libs/core, kept a fact-forcing preamble before each edit, and landed all 61 tests green — but the resulting diff scored at cohort mean. 8 eslint errors / 7 warnings.

Why it ranked 1 (noting the top-4 tie)

The bugfix trial is the single biggest contributor (z = +0.640, the largest per-task lift among bmad’s three tasks). Judges rewarded the triple-reviewer gate before commit and a reproduced-test-first patch; every judge — opus (184.0), codex (160.5), qwen (192.0) — placed bmad at or near the top for this task.
On feature, bmad sat in the top tier but near the top of the field on length of work rather than on elegance: the opus judge gave it the single highest feature score (124.85) while codex put it only mid-field (95.85). A meaningful opus/codex divergence.
The refactor trial was flat (z ≈ 0); bmad’s rank on refactor is 6, so its overall rank-1 finish is entirely driven by the other two tasks and the top-4 tie is statistically indistinguishable in CI terms.

Strengths & failure modes

Strengths (transcript-grounded):

Multi-reviewer gate: bugfix’s three post-diff reviewers (adversarial, edge-case, acceptance) show up in the sub-agent prompts, not in the shared task brief — they come from bmad’s skill files.
Fact-forcing preamble before edits and explicit “checkpoint” markers keep the agent from drifting on long sessions (9 checkpoint references each in bugfix and refactor).
Matches task shape via the intro: the harness-provided hint flips the workflow between feature / bugfix / refactor without changing the slash command.

Failure modes (transcript-grounded):

Lint is not gated: 13 / 10 / 8 eslint errors at commit across feature / bugfix / refactor; the workflow does not round-trip eslint before declaring done.
Pre-maturity elicitation is skipped when the agent judges context “sufficient” (feature turn 0: “I have sufficient context … planning implementation now”), so the plan step can collapse into a monologue.
Refactor run used only one sub-agent and produced a mean-level result — the multi-reviewer pattern that lifted bugfix did not fire here, suggesting the “Plan-Code-Review” loop is strongest on tasks with a named defect to review against.
Overhead shows in length: 270–301 turns per trial is among the longer sessions in the benchmark.

References

Install surface: scripts/setup-tool-config.sh (case bmad), lines 145–157)
Prompt construction: scripts/manual-bench.sh (case bmad), lines 131–149)
Config snapshot: config/bmad-t1/settings.json, config/bmad-t1/plugins/
Transcripts: results/bmad/t1/session-logs/03edbe91-3a0d-4351-a8f8-2d6956ab36e8.jsonl, results/bugfix/bmad/t1/session-logs/16b2a47a-78f9-4a3e-86b0-c8aef7432577.jsonl, results/refactor/bmad/t1/session-logs/809fef16-9f6f-45ad-8622-517fba071d30.jsonl
Metrics: results/bmad/t1/auto-metrics.json, results/bugfix/bmad/t1/auto-metrics.json, results/refactor/bmad/t1/auto-metrics.json, results/cross-task-stats.json
Upstream repo: https://github.com/bmad-code-org/BMAD-METHOD

Observed in trial timelines

bmad is the only tool whose skill content shows up as explicit Read events in the session (mean 3.5 unique skill files on feature, 6.5 on bugfix, 4.5 on refactor; range 2–8). Every other tool injects skill content via slash command into the system prompt, so no Read events fire — bmad’s step-by-step step-01…step-05 files are loaded at runtime as the workflow advances. This is why bmad’s transcripts read like a script: the skill content is materialised in-band.

Detail: see the per-trial timeline files linked below.

Trial timelines

Per-trial event timelines auto-extracted from session-logs/*.jsonl — skill activations, plugin/skill file reads, subagents dispatched, code mutations, Bash usage:

Trial timelines

Per-trial session execution extracted from each trial's session-logs/*.jsonl. Each card shows the subagents dispatched, skill activations, Bash command mix, and the final diff. Switch task tabs to compare behaviour across feature, bugfix, and refactor trials.

Feature4 trials Bugfix2 trials Refactor2 trials

t1 15:10 → 15:26 UTC · 15 min

2 commits12 files+508

“Use bmad-quick-dev to handle this task. Pick the Plan-Code-Review path — this is a non-trivial feature in an existing codebase. Read the PRD at docs/infina-product-docs/docs/core-products/td-cd/user-logic/[PRD] [TD-CD] U…”

New files5
Edits18
Bash52
Skills1
Skill files2

Bash command mix · 52 calls

tests 20
other 17
inspection 9
git ops 3
typecheck 3

Skill activations (1)

bmad-quick-dev — Plan-Code-Review path. Implement Mode 2 CD Batch for TD-CD end-to-end. PRD: docs/infina-product-docs/docs/core-products… at 15:10

Plugin/skill files read (2 unique)

.claude/skills/bmad-quick-dev/workflow.md
.claude/skills/bmad-quick-dev/step-01-clarify-and-route.md

New files created (5)

libs/core/src/domain/savings-cd/td-cd-mode2-price-calculator.spec.ts
libs/core/src/domain/savings-cd/td-cd-mode2-price-calculator.ts
libs/core/src/domain/savings-cd/td-cd-mode2.strategy.spec.ts
libs/core/src/domain/savings-cd/td-cd-mode2.strategy.ts
libs/core/src/port/service/td-cd-mode2-batch-resolver.port.ts

t2 14:31 → 15:06 UTC · 34 min

2 commits5 files+587

Agents1
New files2
Edits2
Bash30
Skills1
Skill files2
Todos7

Bash command mix · 30 calls

tests 12
other 11
typecheck 4
inspection 1
lint/format 1
git ops 1

Skill activations (1)

bmad-quick-dev — Plan-Code-Review path. Implement Mode 2 CD Batch for TD-CD product end-to-end. PRD: docs/infina-product-docs/docs/core-… at 14:32

Plugin/skill files read (2 unique)

.claude/skills/bmad-quick-dev/workflow.md
.claude/skills/bmad-quick-dev/step-01-clarify-and-route.md

Subagents dispatched (1)

Explore · Map Mode 1 TD-CD layer at 14:33

Subagent transcripts (1)

agent-a775876df985… — I'm about to implement Mode 2 CD Batch for the TD-CD product in this NestJS/NX monorepo at /Users/ra… [Bash×31, Read×24]

New files created (2)

libs/core/src/domain/savings-cd/td-cd-mode2.strategy.spec.ts
libs/core/src/domain/savings-cd/td-cd-mode2.strategy.ts

t3 04:38 → 04:48 UTC · 10 min

2 commits5 files+553

“<local-command-caveat>Caveat: The messages below were generated by the user while running local commands. DO NOT respond to these messages or otherwise consider them in your response unless the user explicitly asks you t…”

New files3
Edits8
Bash23
Skills1
Skill files7

Bash command mix · 23 calls

other 9
tests 9
git ops 3
typecheck 1
inspection 1

Skill activations (1)

bmad-quick-dev — Pick the Plan-Code-Review path — this is a non-trivial feature in an existing codebase. Read the PRD at docs/infina-pro… at 04:38

Plugin/skill files read (7 unique)

.claude/skills/bmad-quick-dev/workflow.md
_bmad/bmm/config.yaml
.claude/skills/bmad-quick-dev/step-01-clarify-and-route.md
.claude/skills/bmad-quick-dev/step-02-plan.md
.claude/skills/bmad-quick-dev/spec-template.md
.claude/skills/bmad-quick-dev/step-03-implement.md
.claude/skills/bmad-quick-dev/step-04-review.md

New files created (3)

_bmad-output/implementation-artifacts/spec-td-cd-mode2-batch.md
libs/core/src/domain/savings-cd/td-cd-mode2.strategy.spec.ts
libs/core/src/domain/savings-cd/td-cd-mode2.strategy.ts

t4 08:54 → 09:19 UTC · 25 min

2 commits7 files+491

“<command-message>bmad-quick-dev</command-message> <command-name>/bmad-quick-dev</command-name> <command-args>Pick the Plan-Code-Review path — this is a non-trivial feature in an existing codebase. Read the PRD at docs/in…”

Agents3
New files2
Edits11
Bash29
Skill files3
Todos6

Bash command mix · 29 calls

tests 18
other 4
git ops 4
inspection 2
typecheck 1

Plugin/skill files read (3 unique)

.claude/skills/bmad-quick-dev/workflow.md
_bmad/bmm/config.yaml
.claude/skills/bmad-quick-dev/step-01-clarify-and-route.md

Subagents dispatched (3)

Explore · Explore Mode 1 TD-CD implementation at 08:55
Explore · Explore Mode 1 PRD for reference at 08:55
Explore · Find Mode 1 DI wiring at 09:02

Subagent transcripts (3)

agent-a4944196081c… — Find and read the Mode 1 TD-CD PRD document. It should be at a path like `docs/infina-product-docs/d… [Glob×2, Bash×2, Read×1]
agent-a58442e4e930… — Search the codebase for where TDCDMode1Strategy is instantiated or provided as a dependency. Look fo… [Read×19, Bash×16, Grep×7, Glob×5]
agent-ab21289cb0a8… — Thoroughly explore the Mode 1 TD-CD (savings CD) implementation in this codebase. I need to understa… [Read×30, Bash×23]

New files created (2)

libs/core/src/domain/savings-cd/td-cd-mode2.strategy.spec.ts
libs/core/src/domain/savings-cd/td-cd-mode2.strategy.ts

t1 16:40 → 17:09 UTC · 29 min

2 commits3 files+222

“<command-message>bmad-quick-dev</command-message> <command-name>/bmad-quick-dev</command-name> <command-args>Pick the Plan-Code-Review path — this is a scoped bugfix in an existing brownfield codebase. Investigate before…”

Agents4
New files2
Edits11
Bash19
Skill files8

Bash command mix · 19 calls

tests 6
other 5
git ops 5
inspection 2
lint/format 1

Plugin/skill files read (8 unique)

bmad-t1/.claude/skills/bmad-quick-dev/workflow.md
bmad-t1/_bmad/bmm/config.yaml
bmad-t1/.claude/skills/bmad-quick-dev/step-01-clarify-and-route.md
bmad-t1/.claude/skills/bmad-quick-dev/step-02-plan.md
bmad-t1/.claude/skills/bmad-quick-dev/spec-template.md
bmad-t1/.claude/skills/bmad-quick-dev/step-03-implement.md
bmad-t1/.claude/skills/bmad-quick-dev/step-04-review.md
bmad-t1/.claude/skills/bmad-quick-dev/step-05-present.md

Subagents dispatched (4)

Explore · Investigate savings-cd batch eligibility at 16:41
Blind adversarial review · Blind adversarial review at 16:57
Edge case review · Edge case review at 16:57
Acceptance auditor review · Acceptance auditor review at 16:57

Subagent transcripts (4)

agent-a1585943ab39… — You are an acceptance auditor. Verify the implementation matches the spec and bug report requirement… [Bash×34, Read×7, Grep×3, Glob×1]
agent-a4f2418311a0… — You are an edge case hunter reviewing a code change. Walk every branching path and boundary conditio… [Read×3, Glob×1]
agent-a92b111f975e… — You are a blind adversarial code reviewer. You have NO context about the project, no spec, no requir… [no tools]
agent-abeabb30b8f3… — Thoroughly explore the savings-cd codebase in /Users/randytran/Codes/ai-tool-benchmark/runs/shp2376/… [Read×15, Bash×7, Grep×4]

New files created (2)

bmad-t1/_bmad-output/implementation-artifacts/spec-shp-2376-deposit-maturity-fix.md
bmad-t1/libs/savings-cd/src/domain/savings-cd-batch-data-source.spec.ts

t2 16:43 → 16:49 UTC · 5 min

2 commits2 files+148

Agents1
New files1
Bash1
Skill files5

Bash command mix · 1 calls

other 1

Plugin/skill files read (5 unique)

bmad-t2/.claude/skills/bmad-quick-dev/workflow.md
bmad-t2/_bmad/bmm/config.yaml
bmad-t2/.claude/skills/bmad-quick-dev/step-01-clarify-and-route.md
bmad-t2/.claude/skills/bmad-quick-dev/step-02-plan.md
bmad-t2/.claude/skills/bmad-quick-dev/spec-template.md

Subagents dispatched (1)

Explore · Investigate SHP-2376 bug at 16:44

Subagent transcripts (1)

agent-a4da47e0c3e1… — Thoroughness: very thorough I'm investigating bug SHP-2376 in an NX monorepo at /Users/randytran/Cod… [Read×11, Grep×6, Bash×3, Glob×3]

New files created (1)

bmad-t2/_bmad-output/implementation-artifacts/spec-shp-2376-near-maturity-deposit-stuck.md

t1 03:52 → 04:12 UTC · 19 min

2 commits11 files+129

“<command-message>bmad-quick-dev</command-message> <command-name>/bmad-quick-dev</command-name> <command-args>Pick the Plan-Code-Review path — this is a scoped refactor in an existing brownfield codebase. No behavior chan…”

Agents1
New files5
Edits22
Bash12
Skill files6

Bash command mix · 12 calls

git ops 5
other 4
tests 2
inspection 1

Plugin/skill files read (6 unique)

bmad-t1/.claude/skills/bmad-quick-dev/workflow.md
bmad-t1/_bmad/bmm/config.yaml
bmad-t1/.claude/skills/bmad-quick-dev/step-01-clarify-and-route.md
bmad-t1/.claude/skills/bmad-quick-dev/step-02-plan.md
bmad-t1/.claude/skills/bmad-quick-dev/spec-template.md
bmad-t1/.claude/skills/bmad-quick-dev/step-03-implement.md

Subagents dispatched (1)

Explore · Investigate SHP-2317 refactor areas at 03:53

Subagent transcripts (1)

agent-a5d8ce588542… — I need a thorough investigation of a refactor in this NX monorepo. The task (SHP-2317) involves two… [Bash×23, Read×15, Grep×7, Glob×3]

New files created (5)

bmad-t1/_bmad-output/implementation-artifacts/spec-shp-2317-decouple-batch-binding.md
bmad-t1/libs/core/src/domain/savings-cd/td-cd-mode2.strategy.spec.ts
bmad-t1/libs/core/src/domain/savings-cd/td-cd-mode2.strategy.ts
bmad-t1/libs/core/src/model/cd-batch-info.model.ts
bmad-t1/libs/core/src/port/service/td-cd-mode-strategy.port.ts

t2 06:32 → 06:45 UTC · 12 min

2 commits10 files+120

Agents2
Edits22
Bash11
Skill files3

Bash command mix · 11 calls

other 5
git ops 3
tests 2
inspection 1

Plugin/skill files read (3 unique)

bmad-t2/.claude/skills/bmad-quick-dev/workflow.md
bmad-t2/_bmad/bmm/config.yaml
bmad-t2/.claude/skills/bmad-quick-dev/step-01-clarify-and-route.md

Subagents dispatched (2)

Explore · Investigate Scheme model/entity at 06:32
Explore · Investigate TSSchemeSetting and strategy at 06:33

Subagent transcripts (2)

agent-a8fa3205ddb1… — In /Users/randytran/Codes/ai-tool-benchmark/runs/shp2317/bmad-t2, find everything related to: 1. `TS… [Read×11, Grep×6, Glob×1]
agent-af07011904c2… — In /Users/randytran/Codes/ai-tool-benchmark/runs/shp2317/bmad-t2, find everything related to the `Sc… [Bash×16, Read×13, Grep×5, Glob×4]