omc (oh-my-claudecode)
Disclosure. omc is maintained by the benchmark author (Yeachan-Heo/oh-my-claudecode); see README caveat 12. The ranking below should be read in that light — the tool was evaluated on a benchmark its own author designed, and it still lands near the bottom. That is the expected direction for an honest self-report, but it does not cancel the conflict of interest. Every rubric item, task brief, and judge prompt was chosen by the same author; an external replication is the only thing that would remove the concern.
Identity
- Repo:
github.com/Yeachan-Heo/oh-my-claudecode - Plugin version at run: 4.13.0 (bugfix trial), 4.13.1 (refactor trial); feature trial predates the version-stamp hook
- Package:
oh-my-claude-sisyphuson npm - License: MIT, Copyright 2025 Yeachan Heo
- Install mechanism: Claude Code plugin marketplace (
claude plugin install oh-my-claudecode@omc) - Invocation in this benchmark: two-message flow —
/oh-my-claudecode:omc-setup, then/oh-my-claudecode:autopilot <task>
Mechanism
omc is an orchestration layer. It ships a large surface area — agents, hooks, skills, a status line, a .omc/ state directory, and an experimental agent-teams flag (CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS=1) — and routes work through specialized subagents via a top-level “autopilot” skill. The stated model is delegate-first: a planner decomposes the task, executor subagents write code, and reviewer/verifier subagents close the loop before a final commit. Session logs for the bench runs confirm heavy Task-tool usage: the feature and refactor trials each spawned three persisted subagent logs (omc-feature-t1: executor + Explore + verifier; omc-refactor-t1: three subagents); the bugfix trial spawned one (a single Explore dispatch). See docs/analysis/trial-timelines/{feature,bugfix,refactor}/omc.md for the per-trial breakdown. A parent transcript of 1.3 MB on the feature run and 1.0 MB on refactor is not unusual for this tool — it spends tokens coordinating.
Results
| Task | Mean (/200) | 95% CI | z | Per-task rank |
|---|---|---|---|---|
| Feature | 116.7 | [114.2, 119.4] | −0.108 | 7 / 9 |
| Bugfix | 158.4 | [152.5, 164.3] | −0.599 | 8 / 9 |
| Refactor | 162.1 | [159.8, 164.6] | +0.125 | 2 / 9 |
| Combined z̄ | −0.194 | 8 / 9 |
Rank-sum: 17 (tied with compound).
The refactor result is not a defense. Inter-judge Spearman ρ CIs straddle zero on the refactor task (PAPER §4.3, README caveat 3), so rank positions there are noise-dominated. Refactor self-preference for omc is +11.875, second-highest in the cohort — but PAPER §4.5 is explicit that this metric is not identified in a cohort where every executor runs on the same Anthropic base model. The honest read of the four rows is: omc is rank 8 overall, driven by a below-cohort bugfix trial (−0.599) and a below-cohort feature trial (−0.108). The refactor cell is consistent with that picture; it does not contradict it.
Automated artifacts line up with the judge scores. Feature: 8 files changed, +547 / −2 lines on a brief that did not require that much surface; 10 ESLint errors survived. Bugfix: only 2 files touched / +44 lines, but 15 core-test failures on commit — the orchestration did not prevent a shallow fix that broke the existing suite. Refactor: 11 files / +109 / −44, 8 ESLint errors, tests green.
What the mechanism did here
Two things stand out when the artifacts are read next to the session logs.
The ceremony entry point costs a full message. /oh-my-claudecode:omc-setup is a setup step — it bootstraps CLAUDE.md injection, hooks, and the .omc/ state directory — and no other tool in the cohort requires it. The bench runner has a dedicated OMC_TWO_MSG=1 branch precisely to accommodate this. In a single-session evaluation the setup message contributes no code; it is pure prelude. That alone does not explain an eight-point gap, but it sets the tone for the rest of the trace.
Orchestration did not substitute for scope discipline. The bugfix result is the most telling artifact in the set. The tool’s planner/executor/verifier loop produced a 2-file, 44-line patch that left 15 tests red. A simpler tool could have failed the same way, but the premise of an autopilot with multiple reviewer passes is that it catches exactly this class of mistake. On this trial it did not. On feature, the opposite failure mode shows: +547 lines for a PRD that the top-of-cohort tools closed in 100–300 lines of net change. Heavier does not correlate with higher rubric scores in this cohort — pure, the unadorned baseline, sits in the top-4 tie.
Honest read
omc is a plausible design. Its per-round σ is low (0.6 on bugfix, 0.6 on refactor, 1.8 on feature), meaning the tool is stable — re-running the judges would not move its mean much. The problem is not noise. It is that the orchestration surface, on these three tasks, did not produce code that judges preferred over a plain Claude session. The bench author’s own tool underperforms a zero-ceremony baseline, and the fix is not to reweight the judges; it is to take the artifacts at face value and investigate why a multi-agent loop produced a shallow bugfix and an oversized feature.
Reproducing
TASK=feature ./scripts/create-clones.sh 1
TASK=feature ./scripts/manual-bench.sh omc 1
# Paste message 1 (/oh-my-claudecode:omc-setup), wait, then paste message 2
# (/oh-my-claudecode:autopilot <task>). Exit when the tool has committed.
Artifacts live under results/omc/t1/, results/bugfix/omc/t1/, results/refactor/omc/t1/. Session logs and subagent transcripts are retained verbatim; the .omc/ state directory is gitignored by the benchmark safety rules and does not leave the clone.
Observed in trial timelines
omc is the only tool that consistently produces multi-file session logs (feature t1: 6 separate *.jsonl files; refactor t1–t2: 2 each), reflecting the autopilot pipeline’s process boundaries. Feature t1 spans roughly 13 hours of wall-clock time — by far the longest-running trial in the cohort.
Detail: see the per-trial timeline files linked below.
Trial timelines
Per-trial event timelines auto-extracted from session-logs/*.jsonl — skill activations, plugin/skill file reads, subagents dispatched, code mutations, Bash usage:
Trial timelines
Per-trial session execution extracted from each trial's session-logs/*.jsonl. Each card
shows the subagents dispatched, skill activations, Bash command mix, and the final diff. Switch task
tabs to compare behaviour across feature, bugfix, and refactor trials.
“Use /oh-my-claudecode:autopilot to handle this task end-to-end. Let it run its full pipeline (spec → ralplan → execute → verify → commit) in this session. Read the PRD at docs/infina-product-docs/docs/core-products/td-cd…”
- Agents3
- New files2
- Edits11
- Bash66
- Skills6
- Sessions6
- Todos5
Skill activations (6)
hud— setup at 02:28oh-my-claudecode:cancelat 02:37oh-my-claudecode:autopilot— Implement Mode 2 CD Batch for TD-CD product end-to-end. Read PRD at docs/infina-product-docs/docs/core-products/td-cd/us… at 02:38
Subagents dispatched (3)
oh-my-claudecode:executor· Implement TD-CD Mode 2 at 02:39Explore· Explore Mode 1 implementation at 15:08oh-my-claudecode:verifier· Verify Mode 2 implementation at 15:24
Subagent transcripts (3)
agent-a05accac6ff2…— Implement TD-CD Mode 2 (CD Batch) end-to-end in /Users/randytran/Codes/ai-tool-benchmark/runs/omc-t1… [Bash×31, Read×10, Grep×10, Edit×5]agent-a63e3c279ebf…— Verify the Mode 2 CD Batch implementation in this NestJS monorepo. The commit just landed at HEAD. K… [Bash×25, Read×10, ToolSearch×1, Monitor×1]agent-a8659f479f53…— Thoroughly explore the Mode 1 CD implementation in this NestJS monorepo. I need to understand: 1. Th… [Bash×32, Read×28]
New files created (2)
libs/core/src/domain/savings-cd/td-cd-mode2.strategy.spec.tslibs/core/src/domain/savings-cd/td-cd-mode2.strategy.ts
“<local-command-caveat>Caveat: The messages below were generated by the user while running local commands. DO NOT respond to these messages or otherwise consider them in your response unless the user explicitly asks you t…”
- Agents2
- New files8
- Edits11
- Bash29
- Skills1
- Sessions2
Skill activations (1)
oh-my-claudecode:cancelat 10:41
Subagents dispatched (2)
Explore· Read PRD document at 10:27Explore· Explore Mode 1 implementation at 10:27
Subagent transcripts (2)
agent-a3c3f2709101…— Explore the Mode 1 TD-CD implementation thoroughly in this NestJS monorepo. I need to understand the… [Read×19, Bash×17]agent-a579c004fbb2…— Read the file at this exact path and return its full contents: /Users/randytran/Codes/ai-tool-benchm… [Read×1]
New files created (7)
libs/core/src/domain/savings-cd/savings-cd-payment-schedule.service.spec.tslibs/core/src/domain/savings-cd/savings-cd-payment-schedule.service.tslibs/core/src/domain/savings-cd/td-cd-mode2.strategy.spec.tslibs/core/src/domain/savings-cd/td-cd-mode2.strategy.tslibs/core/src/model/savings-cd-payment-schedule.model.tslibs/core/src/port/service/savings-cd-payment-schedule-service.port.tslibs/core/src/storage/entity/savings-cd-payment-schedule.entity.ts
“<local-command-caveat>Caveat: The messages below were generated by the user while running local commands. DO NOT respond to these messages or otherwise consider them in your response unless the user explicitly asks you t…”
- Agents7
- New files2
- Edits5
- Bash19
- Skills1
- Sessions3
Skill activations (1)
oh-my-claudecode:cancelat 05:02
Subagents dispatched (7)
Explore· Explore Mode 1 implementation at 04:35Explore· Study Mode 1 implementation at 04:41Explore· Study savings-cd entities and ports at 04:42Explore· Explore strategy port and tests at 04:49Explore· Explore libs/savings-cd patterns at 04:49oh-my-claudecode:architect· Architecture completeness review at 05:01oh-my-claudecode:code-reviewer· Code quality review at 05:02
Subagent transcripts (7)
agent-a1a99f3955a2…— Explore the Mode 1 CD implementation thoroughly in the NestJS monorepo at /Users/randytran/Codes/ai-… [Bash×51, Read×20, Grep×3]agent-a237bc2b52cf…— Review the Mode 2 CD Batch strategy implementation for functional completeness against the PRD requi… [Read×5]agent-a56bb0d94809…— In the repo at /Users/randytran/Codes/ai-tool-benchmark/runs/omc-t3, I need to find and read several… [Read×19, Bash×18, Glob×15, Grep×8]agent-a5ff031cce3c…— Review the Mode 2 CD Batch strategy implementation for code quality. Files to review: - /Users/randy… [Read×4, mcp__plugin_oh-my-claudecode_t__lsp_diagnostics×2, ToolSearch×1]agent-a66c927275dc…— In the repo at /Users/randytran/Codes/ai-tool-benchmark/runs/omc-t3, explore the libs/savings-cd/ di… [Read×9]agent-ab8afdbed00e…— Explore the TD-CD related entities, repositories, DTOs, and constants in this NestJS monorepo. I nee… [Read×34, Glob×12, Grep×9, Bash×7]agent-af9355976725…— Thoroughly explore the Mode 1 TD-CD implementation in this NestJS monorepo. I need to understand: 1.… [Bash×42, Read×29]
New files created (2)
libs/core/src/domain/savings-cd/td-cd-mode2.strategy.spec.tslibs/core/src/domain/savings-cd/td-cd-mode2.strategy.ts
“<command-message>oh-my-claudecode:omc-setup</command-message> <command-name>/oh-my-claudecode:omc-setup</command-name>”
- Agents4
- New files2
- Edits13
- Bash48
- Skills2
- Sessions2
- Todos6
Skill activations (2)
oh-my-claudecode:hud— setup at 09:09oh-my-claudecode:cancelat 09:29
Subagents dispatched (4)
Explore· Study Mode 1 implementation patterns at 09:15oh-my-claudecode:architect· Architecture validation at 09:26oh-my-claudecode:security-reviewer· Security review at 09:26oh-my-claudecode:code-reviewer· Code quality review at 09:27
Subagent transcripts (4)
agent-a1f2e65b660f…— I need to understand the existing Mode 1 TD-CD implementation patterns in this NestJS monorepo. Plea… [Bash×24, Read×16, Glob×5]agent-a2bbcd0e51a2…— Security review the new Mode 2 TD-CD strategy implementation. Files to review: - libs/core/src/domai… [Read×2]agent-a66771480dd5…— Review the Mode 2 TD-CD strategy implementation for functional completeness against the PRD requirem… [Read×3]agent-a71a4d6f9058…— Code quality review of the new Mode 2 TD-CD strategy implementation. Files to review: - libs/core/sr… [Read×5, Bash×2, mcp__plugin_oh-my-claudecode_t__lsp_diagnostics×2, Grep×1]
New files created (2)
libs/core/src/domain/savings-cd/td-cd-mode2.strategy.spec.tslibs/core/src/domain/savings-cd/td-cd-mode2.strategy.ts
“<command-message>oh-my-claudecode:omc-setup</command-message> <command-name>/oh-my-claudecode:omc-setup</command-name>”
- Agents1
- Edits3
- Bash37
- Skills1
- Sessions2
Skill activations (1)
oh-my-claudecode:hud— setup at 16:09
Subagents dispatched (1)
Explore· Explore savings-cd codebase at 16:12
Subagent transcripts (1)
agent-ab807dd96f8a…— In the repo at /Users/randytran/Codes/ai-tool-benchmark/runs/shp2376/omc-t1, I need to understand th… [Read×20, Grep×10, Glob×5]
“<local-command-caveat>Caveat: The messages below were generated by the user while running local commands. DO NOT respond to these messages or otherwise consider them in your response unless the user explicitly asks you t…”
- Agents1
- Edits11
- Bash43
- Skills1
- Sessions2
Skill activations (1)
oh-my-claudecode:hud— setup at 16:14
Subagents dispatched (1)
Explore· Explore savings-cd codebase at 16:22
Subagent transcripts (1)
agent-a168697353b5…— In the repo at /Users/randytran/Codes/ai-tool-benchmark/runs/shp2376/omc-t2, I need to find code rel… [Read×17, Bash×12, Grep×12]
“<command-message>oh-my-claudecode:omc-setup</command-message> <command-name>/oh-my-claudecode:omc-setup</command-name>”
- Agents3
- New files1
- Edits23
- Bash37
- Skills2
- Sessions2
Skill activations (2)
oh-my-claudecode:hud— setup at 03:52oh-my-claudecode:cancelat 04:12
Subagents dispatched (3)
statusline-setup· Configure HUD statusline at 03:52Explore· Explore Scheme and TSSchemeSetting at 04:03Explore· Explore ITDCDModeStrategy and CDBatch at 04:03
Subagent transcripts (3)
agent-aedb78dad12a…— Configure the statusLine in /Users/randytran/Codes/ai-tool-benchmark/config/shp2317/omc-t1/settings.… [Read×1, Edit×1]agent-aee2d682a002…— I need to understand the current structure of Scheme and TSSchemeSetting in this NestJS/TypeORM mono… [Read×10, Grep×6, Glob×4, Bash×2]agent-af5bca4639f1…— I need to understand the current structure of ITDCDModeStrategy and CDBatch in this NestJS/TypeORM m… [Read×11, Glob×5, Grep×4, Bash×3]
New files created (1)
omc-t1/libs/core/src/model/cd-batch-info.model.ts
“<command-message>oh-my-claudecode:omc-setup</command-message> <command-name>/oh-my-claudecode:omc-setup</command-name>”
- Agents3
- New files3
- Edits18
- Bash35
- Skills1
- Sessions2
Skill activations (1)
oh-my-claudecode:cancelat 04:30
Subagents dispatched (3)
statusline-setup· Setup HUD statusline at 03:53Explore· Explore Scheme and TSSchemeSetting at 04:14Explore· Explore strategy port and CDBatch at 04:14
Subagent transcripts (3)
agent-a4a321d0ce12…— Thoroughness: very thorough In /Users/randytran/Codes/ai-tool-benchmark/runs/shp2317/omc-t2, I need… [Bash×16, Glob×9, Read×8, Grep×2]agent-ab5ae7b3104d…— Thoroughness: very thorough In /Users/randytran/Codes/ai-tool-benchmark/runs/shp2317/omc-t2, I need… [Read×13, Bash×5, Grep×4, Glob×4]agent-acd4695cc5ee…— Set up the OMC HUD statusline for Claude Code. Install the HUD wrapper script and configure the stat… [Read×8, Edit×1]
New files created (3)
omc-t2/libs/core/src/domain/savings-cd/td-cd-mode2.strategy.spec.tsomc-t2/libs/core/src/domain/savings-cd/td-cd-mode2.strategy.tsomc-t2/libs/core/src/model/cd-batch-info.model.ts