Crown Citadel Research Report

How a read-only tool call became a self-reinforcing loop

A plain-English and evidence-backed summary of the Hermes skill_view looping incident, the replay sandbox, the low-level token probes, cross-model smell tests, and the model-improvement attempts that followed.

Executive Summary

Hermes got stuck in a loop where the assistant said it was about to update a skill, but the structured tool call repeatedly selected the read-only skill_view tool instead of a mutation tool such as skill_manage or patch. The loop was reproduced from persisted chat and tool state in a local replay sandbox.

The strongest finding is that the loop was a context-induced function-name attractor. Repeated historical <function=skill_view> traces and repeated large successful tool results taught the current prompt to continue with the same function name. Natural-language intent was still present, but at the tool-name token boundary it lost to the repeated action trace.

42 persisted messages were exported for the loop window, message ids 12080 through 12121.
20 skill_view tool results appeared in that window; 16 were full 10,774-character skill bodies.
0.930 approximate late-loop probability of the bad skill_view path in the quantized Crown v7 probe.
5 traces were enough to make _view dominant again in the historical marker ablation.

Main conclusion: this was not primarily a WhatsApp transport issue, database-write issue, prompt-cache issue, MTP issue, or simple turn-limit issue. Those factors can change frequency, but the root behavior is schema-grounded action selection collapsing under repeated in-context action traces.

What Happened

The user asked Hermes to update a skill so that all changes would be committed. Hermes correctly inspected the relevant skill and related files. After it had enough information, it repeatedly wrote assistant text like "Now I'll add the git commit requirement to the skill", but then emitted another skill_view call instead of the patch/update call.

Plain-language failure signature

  1. The user asks for a change.
  2. The assistant reads the skill with skill_view.
  3. The assistant says it will now update or patch the skill.
  4. The structured tool call is still another read-only skill_view.
  5. The model receives the same large skill text again, making the pattern easier to repeat.

Concrete loop excerpt

user: Please update the skill to reflect that all changes need to be committed now
assistant: Now I'll add the git commit requirement to the skill.
tool_call: skill_view(...)
tool_result: full retatrutide-tracking skill content

assistant: Now I'll add the git commit requirement to the skill.
tool_call: skill_view(...)
tool_result: full retatrutide-tracking skill content

user: You got stuck trying to add to the skill
assistant: You're right, I got stuck in a loop calling skill_view repeatedly.
           Let me just patch the skill directly.
tool_call: skill_view(...)

That last turn was especially diagnostic: even after the assistant explicitly named the error and said it would patch directly, the structured action again chose skill_view.

Evidence Base

The replay sandbox is local and isolated. It was built from read-only SQLite exports against Hermes state. The harness can replay the model request and replace real tool execution with fake local stubs, so experiments do not modify production skills, the production database, or production services.

Primary fixture

Session 20260610_170626_2b75cab0, source whatsapp, loop window 12080..12121.

/home/crown/hermes-loop-replay-sandbox/fixtures/messages_12080_12121.json

Replay lab

The sandbox asks whether the same persisted chat state chooses another read-only call or a mutation path.

/home/crown/hermes-loop-replay-sandbox

Tool schema

The full WhatsApp runtime-filtered schema exposed 32 tools, not just skill_view and skill_manage. This mattered.

fixtures/whatsapp_tool_schemas.json

Low-level probes

The token probes rendered the prompt through llama.cpp and measured one-token probabilities at the exact function-name decision points.

tools/tool_name_logprob_probe.py

What the fixture showed

Quantity Observed value Why it matters
Persisted messages in loop window 42 Enough history to preserve the exact repeated action pattern.
Total skill_view tool results 20 The repeated read-only action was not an isolated mistake.
Full skill-body returns 16 Large repeated successful results made the prompt dominated by the same action/result pair.
Total skill_view result characters 187,599 The context was heavily polluted by repeated read results.

Mechanism: Why The Loop Happened

The key interface problem is that two opposite actions had the same first token in the tool name.

patch        -> [patch]
terminal     -> [terminal]
skill_view   -> [skill, _view]
skill_manage -> [skill, _manage]

This means the model first chooses the broad skill family. Only after that does it choose between read-only _view and write-capable _manage. Once the model has generated skill, the history of repeated skill_view calls exerts strong pressure on the next token.

The model was not simply choosing from the current schema according to current intent. It was doing next-token continuation over the whole transcript. When historical skill_view markers were rewritten to the fake non-schema name inspect_skill, the model emitted inspect with probability 0.9399.

Context pollution pathway

  1. skill_view is safe and easy to call.
  2. The tool returns useful content, so the transcript marks it as successful.
  3. The assistant says it will now mutate, but the next tool call repeats the previous function name.
  4. The repeated call returns another large copy of the same skill content.
  5. The prompt becomes a stronger induction target for yet another skill_view.

What We Tried

The work progressed from whole-turn replay to token-level diagnosis and then to model-side mitigation attempts. The table below separates what was tested from what was observed.

Experiment Purpose Observed result Interpretation
Two-tool replay Expose only skill_view and skill_manage. Mostly chose skill_manage; occasional mismatch. Under-fit the real environment because production had many visible tools.
Full WhatsApp schema replay Use the 32-tool runtime-filtered production-like schema. At cut 12090, 2/8 mismatches; first tools included skill_view, patch, and skill_manage. The real schema made the failure easier to trigger and introduced mutation-route ambiguity.
Isolated MTP on/off server runs Check whether speculative MTP was the root cause. Expanded full-tool sample: MTP on 5/20 mismatch; MTP off 2/20 mismatch. MTP can amplify marginal decisions, but it is not required.
Prompt-cache tests Check whether cache reuse caused the loop. Mismatches occurred with cache disabled and with cache reused. Prompt cache is not sufficient as a root cause, though it may affect frequency.
Low-level logprob probes Measure probabilities at <function= and <function=skill. At late cut 12107, combined bad path was about 0.930. The loop became near-argmax at the tool-name level.
Schema ablation: remove skill_view Test whether hiding the tool fixes the raw prompt tendency. With history unchanged, _view still had 0.908 probability after skill. Schema visibility alone is not enough without constrained decoding.
Historical marker rewrite Rewrite old skill_view markers to inspect_skill. The model copied inspect with probability 0.9399. Repeated historical function names were causal copy targets.
Action-first naming Rename skill_view/skill_manage to view_skill/patch_skill. First tool token became patch with probability 1.0000 at both early and late cuts. Tool-name geometry is a strong interface lever.

Probability Findings

Early versus late loop

At the start, the model was still recoverable. After repeated read-only calls, the bad branch became dominant.

Cut Context state skill_manage skill_view patch
12090 First repeated decision point 0.609 0.202 0.189
12107 After repeated read-only calls and one user correction 0.055 0.930 0.014
Early bad path
0.202
Late bad path
0.930

How hard it became to break

The cleanest causal curve came from rewriting historical skill_view markers to inspect_skill, while keeping only the latest K exact skill_view markers. The probe then forced <function=skill and measured whether the next token became _view or _manage.

Exact historical skill_view markers kept P(_view) P(_manage) Meaning
0 0.0000 1.0000 Bad branch removed; model chooses mutation continuation.
1 0.4089 0.4821 One exact trace already makes the bad branch competitive.
5 0.9842 0.0158 Five recent traces make the bad branch dominant.
13 / baseline 0.9441 0.0559 Full loop history remains strongly captured by read-only continuation.

Tokens required to break the loop

In the original naming scheme, the model had two practical exits:

The problem was that by cut 12107, skill was near-certain and _view was the dominant continuation. Action-first names changed the first decision to view versus patch, which moved the late-loop first-token probability to patch=1.0000 in the prompt-level simulation.

Cross-Model Observations

The behavior was not limited to one quantized model. Different models and serving profiles showed different margins, but the same basic repeated historical marker risk appeared repeatedly.

Model or run P(skill) P(_view | skill) P_bad Replay / interpretation
BF16 source target-only 0.990897 0.781368 0.774 Clean repeat on sampled seeds; quantization not required.
Quantized Crown v7 prior ~0.9856 ~0.9441 ~0.930 Clean repeat in prior replay.
Chadrock Saber ROCmFP4/Vulkan d2 0.991836 0.892114 0.885 1/4 poisoned seeds repeated.
Qwopus 27B Chadrock STRIX_LEAN 0.984066 0.913365 0.899 3/4 poisoned seeds repeated across three steps.
StepFun exact profile 0.655835 0.795741 0.521874 Lower first-token risk, but still high bad-path score.
LFM2.5 8B-A1B Q8 0.347585 skill-like 1.000000 0.347585 Often chose write/no-tool in open replay, but forced skill completed as _view in 8/8 seeds.

A low open-replay failure rate is not proof of safety. LFM sometimes avoided tools entirely, but once the tool-name path entered the historical skill attractor, it still collapsed to read-only skill_view.

Model-Improvement Attempts

After the sandbox identified the action-token failure, follow-up work tried small LoRA-style interventions on the Crown v8 path. These were useful diagnostically, but no adapter was promoted as a clean fix.

What worked partially

The best diagnostic adapter, crown-v8-tail-l39-sixprompt-plus-suffix-s128-44step-lr2e-5, passed the high-signal loop gate. It moved high-risk mutation cases toward skill_manage and drove the measured high-signal loop metrics to zero.

Candidate Positive result Why it was not promotable
sixprompt-plus-suffix All four mutation controls passed as skill_manage; high-signal loop gate passed. Read/direct controls regressed; read prompts also got pulled toward _manage.
Layer 38:40 suffix-context continuation Validated that wider late-layer coverage was mechanically possible. Preserved long-context loop fix but did not repair read-only behavior and damaged ordinary mutation retention.
Branch-preference adapter Corrected the immediate first branch token in the forced probe. Free continuation degenerated into repeated skill tokens; first-token success was a false sense of progress.

Lessons from rejected adapters

Recommendations

Interface and runtime mitigations

  1. Rename antagonistic tools so read/write intent competes on the first token. Prefer view_skill and patch_skill over skill_view and skill_manage.
  2. Add strict tool-name constrained decoding or a hard grammar/logit mask over currently valid tool names. Schema changes should be enforced during generation, not only parsed afterward.
  3. Compact repeated identical tool-call traces. Replace many full skill_view calls with a semantic state fact such as skill_view(retatrutide-tracking) was called 13 times; latest result unchanged.
  4. Detect plan/action mismatch. If assistant prose says "patch", "update", or "write" and the tool is read-only with no new information need, block or ask for a different action.

Model training and evaluation

  1. Build an adversarial tool-action eval around repeated read-only histories, marker counts, current user intent, assistant plan text, and expected tool class.
  2. Include SFT and preference examples where the same assistant prose is paired with a chosen mutation tool and a rejected repeated read-only tool.
  3. Score P(_manage | <function=skill) versus P(_view | <function=skill) across marker counts 0, 1, 2, 5, and 10+.
  4. Treat first-token probes as diagnostic gates, not final acceptance tests. Full continuation quality must pass before a model or adapter is promoted.

What not to rely on

Limitations

Artifact Map

This report is derived from local artifacts and notes. Key files:

Artifact Path
Replay sandbox README /home/crown/hermes-loop-replay-sandbox/README.md
Loop message fixture /home/crown/hermes-loop-replay-sandbox/fixtures/messages_12080_12121.json
Initial findings /home/crown/hermes-loop-replay-sandbox/research/INITIAL_FINDINGS_2026-06-10.md
Low-level probing report /home/crown/hermes-loop-replay-sandbox/research/LOW_LEVEL_PROBING_2026-06-11.md
Model-improvement guidance /home/crown/hermes-loop-replay-sandbox/research/MODEL_IMPROVEMENT_GUIDANCE_2026-06-11.md
Experiment index /home/crown/hermes-loop-replay-sandbox/research/EXPERIMENT_INDEX.md
BF16 source repro notes /home/crown/hermes-loop-replay-sandbox/experiments/bf16_source_live/20260611T085112Z/NOTES.md
Qwopus 27B smell test /home/crown/hermes-loop-replay-sandbox/experiments/model_smell_tests/qwopus3.6-27b-v2-chadrock-strix-lean-mtp/20260611T114120Z/NOTES.md
LFM smell test /home/crown/hermes-loop-replay-sandbox/experiments/model_smell_tests/lfm25-8b-a1b-q8/20260611T165725Z_no_mmap_tool_test/NOTES.md
Crown v8 adapter status /srv/desktop-data/cirudata/research/hermes-tool-loop/crown_v8_tool_loop_eval/STATUS.md
Crown model-improvement path report /srv/desktop-data/cirudata/research/hermes-tool-loop/crown_v8_tool_loop_eval/reports/crown_model_improvement_path_2026-06-11.md