bugfix — Equal-Weight Aggregation (companion to final-report.md)

Generated: 2026-05-15T03:20:50Z

Inputs and source artifacts

Same inputs as the canonical final-report.md — only the aggregation rule changes here.

Methodology

Ranking (Equal-Weight Mean)

  1. claudekit — 185.47/200
  2. ecc — 181.29/200
  3. pure — 176.76/200
  4. bmad — 175.02/200
  5. superpower — 169.20/200
  6. compound — 168.00/200
  7. omc — 167.47/200
  8. gstack — 161.18/200

Detail

Tool Equal-Weight Mean Pooled σ within_σ between_σ N
claudekit 185.47 11.48 8.51 8.05 45
ecc 181.29 12.25 8.16 10.14 45
pure 176.76 13.24 11.22 8.55 45
bmad 175.02 16.05 11.81 12.41 45
superpower 169.20 13.02 6.21 12.64 45
compound 168.00 12.27 8.58 10.09 45
omc 167.47 20.89 15.07 15.64 45
gstack 161.18 16.05 7.10 15.98 45

Cross-rule comparison

Compare Equal-Weight Mean here against Weighted Mean in final-report.md. Rank-1 is identical under both rules on every task in this corpus; mid-pack ranks 4–7 may swap by at most 2 positions.