Skip to main content

Benchmarks

Code quality metrics for Acolyte and other open-source AI agents, derived from static source analysis — no subjective scoring.

For feature and architecture comparisons, see Comparison.

All metrics extracted with scripts/benchmark.ts.

Methodology

  • Source lines = total lines of source code (including blanks and comments)
  • Test files, generated code, and files over 10k lines are excluded
  • Metrics normalized per 1k source lines where applicable
  • Dependencies shown as runtime + development dependencies

Closed systems

Several widely used coding agents are closed-source and cannot be analyzed with the same methodology.

AcolyteClaude CodeCursorCopilot
Open-source
Self-hostable
Observable execution

Claude Code, Cursor, and Copilot are included for context but excluded from code analysis benchmarks.

Projects compared

ProjectLanguageDescriptionSource linesFilesDependencies
AcolyteTypeScriptTerminal coding agent with lifecycle, effects, and AST code tools27,06822712 + 6
OpenCodeTypeScriptOpen-source AI coding agent (TUI/web/desktop)237,4681,143191 + 84
CodexRustTerminal AI coding agent from OpenAI462,6561,139245 + 58
CrushGoTerminal AI coding agent from Charm with Bubble Tea TUI60,86326872 + 0
AiderPythonAI pair programming in your terminal25,94310535 + 17
GooseRustExtensible AI agent from Block with MCP integration133,379343150 + 19
Qwen CodeTypeScriptTerminal AI coding agent from Alibaba233,6381,07691 + 85
PlandexGoAI coding agent for large multi-file tasks in the terminal74,57333354 + 0
Mistral VibePythonTerminal AI coding agent from Mistral34,24424034 + 13

Dependency surface area

Measures how much of a codebase depends on external packages.

MetricAcolyteOpenCodeQwen Code
External imports / 1k LOC6.616.97.9
Runtime dependencies1219191

TypeScript projects only.

Acolyte has the lowest external import density and fewest runtime dependencies among TypeScript projects.

Input validation coverage

Measures how frequently data entering the system is validated.

MetricAcolyteOpenCodeQwen Code
Schema validations / 1k LOC2.50.80.6
.safeParse() calls / 1k0.90.10.0

TypeScript projects only.

Acolyte validates at a higher rate than every other project in the benchmark.

TypeScript type safety

Per 1k source lines.

MetricAcolyteOpenCodeQwen Code
as any0.11.70.1
: any annotations0.00.90.3
@ts-ignore / @ts-expect-error0.00.20.0
Lint ignores0.20.00.3
: unknown usage3.01.82.3

Acolyte and Qwen Code have near-zero any usage. Acolyte uses unknown with explicit narrowing — every tool output, model response, and RPC payload is validated through Zod schemas before entering the type system.

Cross-language type safety

Per 1k source lines.

MetricAiderMistral VibeGooseCodexCrushPlandex
type: ignore (Python)0.00.1
Any usage (Python)0.19.3
cast() calls (Python)0.01.0
unsafe (Rust)0.11.0
.unwrap() (Rust)11.53.2
.expect() (Rust)1.411.2
any / interface{} (Go)3.84.4
panic() (Go)0.20.3
nolint (Go)0.20.0

Aider shows minimal type escape hatches. Mistral Vibe has high Any density. Codex has lower .unwrap() than Goose but high .expect() — errors are surfaced but rely on panicking assertions.

Test quality

MetricAcolyteOpenCodeCodexCrushAiderGooseQwen CodePlandexMistral Vibe
Test files1772662706842225326203
Test lines21,97061,963128,33614,61212,4277,970228,9062,51742,370
Ratio0.810.260.280.240.480.060.980.031.24

Acolyte maintains a high test ratio because lifecycle phases and tools are independent modules with clean interfaces.

Test types include:

  • unit (*.test.ts)
  • integration (*.int.test.ts)
  • TUI visual regression (*.tui.test.ts)
  • performance (*.perf.test.ts)

Module cohesion

MetricAcolyteOpenCodeCodexCrushAiderGooseQwen CodePlandexMistral Vibe
Avg lines / file119208406227247389217224143
Files > 500 lines2 (1%)117 (10%)242 (21%)26 (10%)14 (13%)88 (26%)114 (11%)36 (11%)8 (3%)
Largest file6925,2159,8423,6112,4862,7412,3692,4552,413
Barrel / index files15450254553042

Acolyte maintains the smallest average module size and fewest large files.

Error handling

Per 1k source lines.

MetricAcolyteOpenCodeQwen Code
.safeParse() calls0.90.10.0
try { ... } blocks6.01.35.0
.catch() calls0.52.30.4

TypeScript projects only.

Acolyte validates boundaries with Zod .safeParse() at a higher rate than other projects. RPC payloads, model responses, and configuration files are validated before entering the system.

Key takeaways

Across the benchmarked projects, Acolyte demonstrates:

  • Extremely low any usage and strong TypeScript safety
  • The smallest modules and lowest large-file density
  • The lightest dependency footprint
  • High automated test coverage
  • Clear lifecycle boundaries across independently testable modules

These characteristics reflect a deliberately small, strongly typed architecture — built so that lifecycle phases and tools behave predictably and can be independently verified.

Summary

DimensionAcolyteOpenCodeCodexCrushAiderGooseQwen CodePlandexMistral Vibe
Type safetyHighMediumMediumMediumHighPanic-heavyHighMediumAny-heavy
Test densityHigh (0.81)Low (0.26)Low (0.28)Low (0.24)Medium (0.48)Lowest (0.06)High (0.98)Low (0.03)Highest (1.24)
Module sizeSmallest (119)Medium (208)Large (406)Medium (227)Medium (247)Largest (389)Medium (217)Medium (224)Small (143)
DependenciesLightest (18)Heavy (275)Heavy (303)Light (72)Light (52)Heavy (169)Heavy (176)Light (54)Light (47)
First commitFeb 2026Apr 2025Apr 2025May 2025May 2023Aug 2024Jun 2025Oct 2023Dec 2025

Acolyte leads on type safety, module size, and dependency count while remaining the smallest codebase in the benchmark.

Updated 9 April 2026.