Crown Citadel Group Ciru Inference Lab llm.ciru.ai / tool-eval
Tool-Eval Bench 2.0.7 / Preserved Reports

Tool Eval Index

Pick any two published Tool-Eval runs to compare the full-suite score, pass profile, speed, weak sections, and per-section deltas side by side.

4 webpages / 69 scenarios each / 138 point scale
Score delta 0

Score And Runtime

Left Right

Outcome Mix

Pass Partial Fail

Section Scores

Left Right

Head To Head

Metric Left Right Delta

All Tool Eval Webpages

Select any two above