Crown Citadel Coding Quality Lab

Model coding ability and speed

Representative coding benchmark pass@1 results for local coding models, paired with wall-clock generation speed and live llama.cpp metrics where the run captured them.

Updated Coding suites • pass@1 NixOS • Linux 7.0.1 Benchmark Home
AMD Ryzen AI Max+ badge n/a top representative pass@1

Coding Ability vs Speed

The vertical axis is aggregate representative pass@1. The horizontal axis can use suite wall-clock generation speed for timed EvalPlus runs, or live decode throughput for runs that recorded server metrics.

Suite Pass Rates

Each completed suite is shown separately because a model can be strong on one task shape and weaker on another.

Speed Evidence

Wall-clock samples per minute come from available codegen logs. Live decode throughput appears only for runs that sampled llama.cpp metrics during active requests.

Current Ranking

Sorted by aggregate representative pass@1, then base pass@1, then available generation speed. Coverage is kept visible so partial speed evidence is not mistaken for full token telemetry.

Model Strict Pass Base Suite Scores Wall Clock Live Decode Run

Run Coverage

Completed runs are included when they have generated representative summary data. Small smoke tests are excluded from scoring, while the run table keeps scored coverage visible.

Run Date Profiles Rows Tasks Metrics

Method Notes

How to read the lab without mixing quality, speed, and run completeness into one opaque score.