Coding Ability vs Speed
The vertical axis is aggregate representative pass@1. The horizontal axis can use suite wall-clock generation speed for timed EvalPlus runs, or live decode throughput for runs that recorded server metrics.
Crown Citadel Coding Quality Lab
Representative coding benchmark pass@1 results for local coding models, paired with wall-clock generation speed and live llama.cpp metrics where the run captured them.
n/a
top representative pass@1
The vertical axis is aggregate representative pass@1. The horizontal axis can use suite wall-clock generation speed for timed EvalPlus runs, or live decode throughput for runs that recorded server metrics.
Each completed suite is shown separately because a model can be strong on one task shape and weaker on another.
Wall-clock samples per minute come from available codegen logs. Live decode throughput appears only for runs that sampled llama.cpp metrics during active requests.
Sorted by aggregate representative pass@1, then base pass@1, then available generation speed. Coverage is kept visible so partial speed evidence is not mistaken for full token telemetry.
| Model | Strict Pass | Base | Suite Scores | Wall Clock | Live Decode | Run |
|---|
Completed runs are included when they have generated representative summary data. Small smoke tests are excluded from scoring, while the run table keeps scored coverage visible.
| Run | Date | Profiles | Rows | Tasks | Metrics |
|---|
How to read the lab without mixing quality, speed, and run completeness into one opaque score.