Crown Citadel Coding Quality Lab

Model coding ability and speed

Representative coding benchmark pass@1 results for local coding models, paired with wall-clock generation speed and live llama.cpp metrics where the run captured them.

Updated Coding suites • pass@1 NixOS • Linux 7.0.1 Benchmark Home

n/a top representative pass@1

Coding Ability vs Speed

The vertical axis is aggregate representative pass@1. The horizontal axis can use suite wall-clock generation speed for timed EvalPlus runs, or live decode throughput for runs that recorded server metrics.

Suite Pass Rates

Each completed suite is shown separately because a model can be strong on one task shape and weaker on another.

Speed Evidence

Wall-clock samples per minute come from available codegen logs. Live decode throughput appears only for runs that sampled llama.cpp metrics during active requests.

Current Ranking

Sorted by aggregate representative pass@1, then base pass@1, then available generation speed. Coverage is kept visible so partial speed evidence is not mistaken for full token telemetry.

Model	Strict Pass	Base	Suite Scores	Wall Clock	Live Decode	Run

Run Coverage

Completed runs are included when they have generated representative summary data. Small smoke tests are excluded from scoring, while the run table keeps scored coverage visible.

Run	Date	Profiles	Rows	Tasks	Metrics

Method Notes

How to read the lab without mixing quality, speed, and run completeness into one opaque score.