refactor — Equal-Weight Aggregation (companion to final-report.md)

Generated: 2026-05-15T03:20:51Z

Inputs and source artifacts

Same inputs as the canonical final-report.md — only the aggregation rule changes here.

Methodology

Ranking (Equal-Weight Mean)

  1. pure — 182.87/200
  2. superpower — 181.67/200
  3. bmad — 181.02/200
  4. claudekit — 180.80/200
  5. ecc — 180.29/200
  6. compound — 177.07/200
  7. gstack — 173.56/200
  8. omc — 173.07/200

Detail

Tool Equal-Weight Mean Pooled σ within_σ between_σ N
pure 182.87 12.89 4.54 13.37 45
superpower 181.67 13.92 4.07 14.72 45
bmad 181.02 14.96 5.71 15.20 45
claudekit 180.80 17.80 4.38 19.02 45
ecc 180.29 15.86 4.70 16.72 45
compound 177.07 16.49 5.28 17.26 45
gstack 173.56 21.49 12.70 19.14 45
omc 173.07 18.46 7.90 18.41 45

Cross-rule comparison

Compare Equal-Weight Mean here against Weighted Mean in final-report.md. Rank-1 is identical under both rules on every task in this corpus; mid-pack ranks 4–7 may swap by at most 2 positions.