Crown Citadel Group Ciru Inference Lab llm.ciru.ai / tool-eval

StepFun Step 3.7 ROCmFP4 MTP Tool Eval

tool-eval-bench 2.0.7 / 69 scenarios
Full suite score 88 Good 121 / 138 points
Full outcome56 / 9 / 4pass / partial / fail
Median turn5.50sfull suite
Decode speed33.51generated tok/s
Prompt speed147.36prompt tok/s
Weakest section67%Creative Composition
Structured output100%12 / 12 points
Safety88%1 critical warning
Deployability70reported separately

Section Scores, Weakest First

Section Score Points Bar Pass Partial Fail
N Creative Compositioncross-tool composition67%4 / 6
120
I Context & Statelong state carry70%14 / 20
541
C Multi-Step Chainstool-chain completion75%6 / 8
301
L Toolset Scalelarge tool inventories75%6 / 8
301
J Code Patternscode-oriented tools83%5 / 6
210
M Autonomous Planninggoal decomposition83%5 / 6
210
K Safety & Boundariesinjection and constraints88%23 / 26
1111
A Tool Selectionspecialist matching100%6 / 6
300
B Parameter Precisionargument construction100%6 / 6
300
D Restraint & Refusalunnecessary-call restraint100%6 / 6
300
E Error Recoverytool/input recovery100%6 / 6
300
F Localizationlocale constraints100%6 / 6
300
G Structured Reasoningreasoning and synthesis100%6 / 6
300
H Instruction Followingformat and tool-choice control100%10 / 10
500
O Structured OutputJSON schema compliance100%12 / 12
600

Run Details

Model
StepFun Step 3.7 Flash ROCmFP4 STRIX Lean Q4_0
Draft model
StepFun Step 3.7 Flash MTP Draft Q8_0
API model id
step37-rocmfp4-mtp-vulkan-64k-tool-eval-full-templatefix-toolobs
Backend
llama.cpp / Vulkan target + Vulkan draft-MTP
Runtime
64K context, parallel=1, spec-draft-n-max=2, target/draft KV q8_0
Sampling
temperature=0, seed=42, thinking enabled
Run slug
20260704T-step37-rocmfp4-mtp-vulkan-tool-eval-full-thinking-required-once
Token metrics
36,013 prompt tokens, 32,009 generated tokens, 229,859 harness-reported total tokens
Artifacts
tool-eval-bench-full.json, token-metrics-summary.json, raw metrics and slot snapshots
Reference date
2026-03-20 harness default