claudekit

Overview

Claudekit is a skill pack and hook toolkit for Claude Code authored by Carl Rannaberg (published as claudekit on npm, MIT licensed, v0.8.x at benchmark time, ~650 stars on GitHub). It ships as a .claude/ directory — custom slash commands, subagent skills, and a set of Claude Code hooks (file-guard, typecheck-changed, eslint, codebase-map, create-checkpoint, validate-todo-completion, no-any) that enforce guardrails at PreToolUse, PostToolUse, and Stop boundaries.

The benchmark exercises claudekit through a single slash command, /ck:cook, invoked with the --auto flag. Cook is claudekit’s feature-implementation orchestrator, documented by the author as “your implementation conductor” — it claims to classify intent, choose a workflow (fast, parallel, research-backed, or plan-execution), and chain planner, coder, tester, and reviewer skills across review gates. The --auto flag removes the human-in-the-loop approval gates and runs all phases continuously.

Setup

scripts/setup-tool-config.sh clones the claudekit source repo into /tmp/internal-claudekit, replaces the clone’s .claude/ directory with claudekit’s, copies claudekit’s CLAUDE.md over the project’s, and appends a short project-context block (stack, NX layout, testing command) while preserving the original as CLAUDE.md.original. A plans/ directory is created for the ck:plan skill. Hook node paths are rewritten from bare node to the resolved $(which node) because the bench runs under env -i and NVM’s PATH is stripped. settings.json is minimal — only skipDangerousModePermissionPrompt: true. No MCP servers, no external agents.

Benchmark prompt

/ck:cook --auto $SHARED_TASK

This is a single-shot automation entry point. The user never re-prompts; claudekit chooses its own workflow.

What the transcripts show

Across the three tasks, /ck:cook --auto immediately launches an internal skill routing step. In the feature transcript the assistant opens with the human-facing prompt “Use /ck:plan to scope this task, then execute it with /ck:cook”, then dispatches ck-plan as a subagent-style skill that enumerates existing plans in plans/, detects cross-plan dependencies, writes a frontmatter-annotated plan.md, and only then hands control back to ck-cook for implementation.

Tool-call distribution reflects this plan-first posture. The feature session (203 log lines) front-loads Bash and Read for discovery, issues a single Write for the plan file, then a long tail of targeted Read/Edit for implementation — 17 Greps, 12 Reads, 10 Bash, 5 Edits, 1 Write, 1 Agent. The refactor session (330 lines) shifts mass toward editing: 19 Reads, 18 Edits, 11 Bash, 6 Greps, 5 Writes, 2 Agents — consistent with cook’s “execute existing plan” mode once scope is understood. The bugfix session (197 lines) sits between the two.

The orchestration is visible but lightweight. There are Launching skill: ck-plan markers and occasional Agent tool uses, but most work happens in the main thread rather than in fanned-out subagents. --auto does suppress the documented review gates — transcripts contain no user-approval checkpoints.

Benchmark performance

Rank 6 of 9 on the combined score, combined z̄ = −0.134, rank-sum 19 (middle cluster). Per task: feature 8, bugfix 6, refactor 5.

Feature (T1): 8 files, +780 / −270, 79/79 tests passing, 0 TSC errors, 12 ESLint errors + 6 warnings. Large and passing, but tier-3 on judged quality.
Bugfix (T1): 3 files, +135 / −4, 0 TSC errors, 10 ESLint errors + 14 warnings. 88 tests passing but 15 core failures — the fix regressed adjacent tests. Hard-gate metrics failed to record (no .ts files in diff — the bugfix likely edited config or support files outside the scoped .ts glob).
Refactor (T1): 12 files, +140 / −79, 61/61 savings-cd tests passing, 0 TSC errors, 7 ESLint errors + 7 warnings. Claudekit’s strongest task; tier-1 on the refactor podium alongside pure, mindful, omc, compound, and bmad.

Claudekit’s ESLint counts are consistently non-zero across all three tasks despite its own eslint hook being configured — suggesting hook output is advisory in --auto mode and not a hard completion gate for the cook workflow.

Notable observations

The /ck:cook --auto entry point is claudekit’s most opinionated surface: it collapses a multi-skill pipeline (ck-plan → ck-cook → embedded reviewers) into a single prompt and bypasses human approval. The result is a coherent plan-then-execute trace that reads well, but judged quality lands mid-pack. The bugfix regression (15 failing core tests) is the clearest failure mode — cook’s review gates are disabled under --auto, and the internal test gate either did not run or did not block completion. On refactor, the plan-first structure aligns well with scope-bounded work and produces the tool’s best result.

Observed in trial timelines

Subagent activity is consistently low (mean 1.0 on feature, 1.0 on bugfix, 1.5 on refactor) and Bash usage is the lowest in the cohort on bugfix (11) and feature (20). The --auto cook orchestrator runs almost entirely in the main thread — Launching skill: ck-plan markers appear but rarely fan out to Agent/Task calls.

Detail: see the per-trial timeline files linked below.

Trial timelines

Per-trial event timelines auto-extracted from session-logs/*.jsonl — skill activations, plugin/skill file reads, subagents dispatched, code mutations, Bash usage:

Trial timelines

Per-trial session execution extracted from each trial's session-logs/*.jsonl. Each card shows the subagents dispatched, skill activations, Bash command mix, and the final diff. Switch task tabs to compare behaviour across feature, bugfix, and refactor trials.

Feature4 trials Bugfix2 trials Refactor2 trials

t1 02:23 → 02:34 UTC · 10 min

2 commits8 files+780

“Use /ck:plan to scope this task, then execute it with /ck:cook. You can chain them — when the plan is ready, kick off cook directly without leaving the session. Read the PRD at docs/infina-product-docs/docs/core-products…”

New files3
Edits5
Bash30
Skills2
Sessions2

Bash command mix · 30 calls

other 15
inspection 7
tests 6
typecheck 1
git ops 1

Skill activations (2)

ck-plan — Implement Mode 2 CD Batch for TD-CD end-to-end. PRD: docs/infina-product-docs/docs/core-products/td-cd/user-logic/[PRD] … at 02:23
ck-plan — Implement Mode 2 CD Batch for TD-CD product end-to-end. PRD: docs/infina-product-docs/docs/core-products/td-cd/user-logi… at 02:24

New files created (3)

libs/core/src/domain/savings-cd/td-cd-mode2.strategy.spec.ts
libs/core/src/domain/savings-cd/td-cd-mode2.strategy.ts
plans/260415-0924-td-cd-mode2-cd-batch/plan.md

t2 06:38 → 06:53 UTC · 14 min

2 commits3 files+498

“<command-message>cook</command-message> <command-name>/cook</command-name> <command-args>--auto Read the PRD at docs/infina-product-docs/docs/core-products/td-cd/user-logic/[PRD] [TD-CD] User stories - Mode 2 CD Batch.md…”

Agents1
New files2
Edits3
Bash33

Bash command mix · 33 calls

tests 25
git ops 3
other 2
typecheck 2
inspection 1

Subagents dispatched (1)

Explore · Research PRD and Mode 1 codebase at 06:39

Subagent transcripts (1)

agent-a378b7e67b66… — I need a very thorough exploration of two areas in the codebase at /Users/randytran/Codes/ai-tool-be… [Read×22, Bash×18, Glob×3]

New files created (2)

libs/core/src/domain/savings-cd/td-cd-mode2.strategy.spec.ts
libs/core/src/domain/savings-cd/td-cd-mode2.strategy.ts

t3 04:50 → 04:59 UTC · 8 min

2 commits7 files+638

“<local-command-caveat>Caveat: The messages below were generated by the user while running local commands. DO NOT respond to these messages or otherwise consider them in your response unless the user explicitly asks you t…”

Agents1
New files6
Edits2
Bash9

Bash command mix · 9 calls

other 5
tests 3
git ops 1

Subagents dispatched (1)

Explore · Research PRD and Mode 1 patterns at 04:51

Subagent transcripts (1)

agent-a1a564cbe2e2… — Thoroughly explore the Mode 2 CD Batch PRD and existing Mode 1 implementation in this NestJS/TypeORM… [Bash×30, Read×16, Glob×5]

New files created (6)

libs/core/src/domain/savings-cd/td-cd-mode2.strategy.spec.ts
libs/core/src/domain/savings-cd/td-cd-mode2.strategy.ts
plans/260416-mode2-cd-batch/phase-01-implement-strategy.md
plans/260416-mode2-cd-batch/phase-02-unit-tests.md
plans/260416-mode2-cd-batch/phase-03-export-wiring.md
plans/260416-mode2-cd-batch/plan.md

t4 08:37 → 08:51 UTC · 13 min

2 commits3 files+562

Agents2
New files2
Edits4
Bash8
Sessions2
Todos6

Bash command mix · 8 calls

tests 5
git ops 2
other 1

Subagents dispatched (2)

Explore · Explore Mode 1 CD implementation at 08:37
Explore · Explore savings-cd entities and types at 08:37

Subagent transcripts (2)

agent-a7f70015a633… — Explore the entity/model/type definitions for the TD-CD savings product. I need to understand: 1. Al… [Read×29, Bash×19, Glob×8, Grep×1]
agent-ae9909465074… — Explore the existing Mode 1 TD-CD implementation thoroughly. I need to understand: 1. The Strategy p… [Bash×21, Read×16, Grep×2]

New files created (2)

libs/core/src/domain/savings-cd/td-cd-mode2.strategy.spec.ts
libs/core/src/domain/savings-cd/td-cd-mode2.strategy.ts

t1 16:39 → 16:48 UTC · 8 min

2 commits3 files+135

“<command-message>cook</command-message> <command-name>/cook</command-name> <command-args>--auto A QA bug report was filed. Read docs/benchmark/TASK.md for the full report: reproduction steps, observed vs expected behavio…”

Agents1
New files1
Edits5
Bash10

Bash command mix · 10 calls

other 4
tests 3
git ops 2
inspection 1

Subagents dispatched (1)

Explore · Explore savings CD codebase at 16:40

Subagent transcripts (1)

agent-a4cbeae3fc50… — Explore the savings CD codebase at /Users/randytran/Codes/ai-tool-benchmark/runs/shp2376/claudekit-t… [Read×14, Grep×7, Bash×2, Glob×2]

New files created (1)

claudekit-t1/libs/savings-cd/src/domain/savings-cd-batch-data-source.spec.ts

t2 16:44 → 16:53 UTC · 9 min

2 commits2 files+189

Agents1
New files1
Edits3
Bash12

Bash command mix · 12 calls

other 6
tests 3
git ops 2
inspection 1

Subagents dispatched (1)

Explore · Explore savings-cd batch eligibility at 16:44

Subagent transcripts (1)

agent-a0841af26002… — Explore the codebase at /Users/randytran/Codes/ai-tool-benchmark/runs/shp2376/claudekit-t2 to unders… [Read×20, Grep×10, Bash×8, Glob×1]

New files created (1)

claudekit-t2/libs/savings-cd/src/domain/savings-cd-batch-data-source.spec.ts

t1 03:50 → 04:05 UTC · 14 min

2 commits12 files+140

“<command-message>cook</command-message> <command-name>/cook</command-name> <command-args>--auto A refactor has been requested by the tech lead. Read docs/benchmark/TASK.md for the full brief: the two design seams being c…”

Agents2
New files5
Edits18
Bash11

Bash command mix · 11 calls

other 5
tests 2
git ops 2
lint/format 1
inspection 1

Subagents dispatched (2)

Explore · Explore scheme and batch binding at 03:50
Explore · Find CDBatch, tests, barrel exports at 03:53

Subagent transcripts (2)

agent-aa6c37aee731… — I need to understand the current state of `cdBatchId` on the Scheme model and the TSSchemeSetting mo… [Read×13, Bash×10, Grep×5, Glob×1]
agent-afc77347efc5… — Working directory: /Users/randytran/Codes/ai-tool-benchmark/runs/shp2317/claudekit-t1 Thoroughness:… [Read×16, Glob×6, Grep×6, Bash×3]

New files created (4)

claudekit-t1/libs/core/src/domain/savings-cd/td-cd-mode2.strategy.spec.ts
claudekit-t1/libs/core/src/domain/savings-cd/td-cd-mode2.strategy.ts
claudekit-t1/libs/core/src/model/cd-batch-info.model.ts
claudekit-t1/libs/core/src/port/service/td-cd-mode-strategy.port.ts

t2 06:32 → 06:45 UTC · 12 min

2 commits11 files+113

Agents1
New files1
Edits22
Bash12

Bash command mix · 12 calls

other 4
git ops 4
tests 2
inspection 2

Subagents dispatched (1)

Explore · Explore Mode 2 batch binding at 06:32

Subagent transcripts (1)

agent-adc7f101647c… — I need to understand the current codebase structure for a refactor (SHP-2317). The refactor involves… [Read×14, Bash×11]

New files created (1)

claudekit-t2/libs/core/src/domain/savings-cd/cd-batch-info.ts