Executive Summary
Hermes got stuck in a loop where the assistant said it was about to update
a skill, but the structured tool call repeatedly selected the read-only
skill_view tool instead of a mutation tool such as
skill_manage or patch. The loop was reproduced
from persisted chat and tool state in a local replay sandbox.
The strongest finding is that the loop was a context-induced function-name
attractor. Repeated historical <function=skill_view>
traces and repeated large successful tool results taught the current prompt
to continue with the same function name. Natural-language intent was still
present, but at the tool-name token boundary it lost to the repeated action
trace.
skill_view tool results appeared in that window; 16 were full 10,774-character skill bodies.
skill_view path in the quantized Crown v7 probe.
_view dominant again in the historical marker ablation.
Main conclusion: this was not primarily a WhatsApp transport issue, database-write issue, prompt-cache issue, MTP issue, or simple turn-limit issue. Those factors can change frequency, but the root behavior is schema-grounded action selection collapsing under repeated in-context action traces.
What Happened
The user asked Hermes to update a skill so that all changes would be
committed. Hermes correctly inspected the relevant skill and related files.
After it had enough information, it repeatedly wrote assistant text like
"Now I'll add the git commit requirement to the skill", but then
emitted another skill_view call instead of the patch/update call.
Plain-language failure signature
- The user asks for a change.
- The assistant reads the skill with
skill_view. - The assistant says it will now update or patch the skill.
- The structured tool call is still another read-only
skill_view. - The model receives the same large skill text again, making the pattern easier to repeat.
Concrete loop excerpt
user: Please update the skill to reflect that all changes need to be committed now
assistant: Now I'll add the git commit requirement to the skill.
tool_call: skill_view(...)
tool_result: full retatrutide-tracking skill content
assistant: Now I'll add the git commit requirement to the skill.
tool_call: skill_view(...)
tool_result: full retatrutide-tracking skill content
user: You got stuck trying to add to the skill
assistant: You're right, I got stuck in a loop calling skill_view repeatedly.
Let me just patch the skill directly.
tool_call: skill_view(...)
That last turn was especially diagnostic: even after the assistant explicitly
named the error and said it would patch directly, the structured action again
chose skill_view.
Evidence Base
The replay sandbox is local and isolated. It was built from read-only SQLite exports against Hermes state. The harness can replay the model request and replace real tool execution with fake local stubs, so experiments do not modify production skills, the production database, or production services.
Primary fixture
Session 20260610_170626_2b75cab0, source
whatsapp, loop window 12080..12121.
/home/crown/hermes-loop-replay-sandbox/fixtures/messages_12080_12121.json
Replay lab
The sandbox asks whether the same persisted chat state chooses another read-only call or a mutation path.
/home/crown/hermes-loop-replay-sandbox
Tool schema
The full WhatsApp runtime-filtered schema exposed 32 tools, not just
skill_view and skill_manage. This mattered.
fixtures/whatsapp_tool_schemas.json
Low-level probes
The token probes rendered the prompt through llama.cpp and measured one-token probabilities at the exact function-name decision points.
tools/tool_name_logprob_probe.py
What the fixture showed
| Quantity | Observed value | Why it matters |
|---|---|---|
| Persisted messages in loop window | 42 | Enough history to preserve the exact repeated action pattern. |
Total skill_view tool results |
20 | The repeated read-only action was not an isolated mistake. |
| Full skill-body returns | 16 | Large repeated successful results made the prompt dominated by the same action/result pair. |
Total skill_view result characters |
187,599 | The context was heavily polluted by repeated read results. |
Mechanism: Why The Loop Happened
The key interface problem is that two opposite actions had the same first token in the tool name.
patch -> [patch] terminal -> [terminal] skill_view -> [skill, _view] skill_manage -> [skill, _manage]
This means the model first chooses the broad skill family.
Only after that does it choose between read-only _view and
write-capable _manage. Once the model has generated
skill, the history of repeated skill_view calls
exerts strong pressure on the next token.
The model was not simply choosing from the current schema according to
current intent. It was doing next-token continuation over the whole
transcript. When historical skill_view markers were rewritten
to the fake non-schema name inspect_skill, the model emitted
inspect with probability 0.9399.
Context pollution pathway
skill_viewis safe and easy to call.- The tool returns useful content, so the transcript marks it as successful.
- The assistant says it will now mutate, but the next tool call repeats the previous function name.
- The repeated call returns another large copy of the same skill content.
- The prompt becomes a stronger induction target for yet another
skill_view.
What We Tried
The work progressed from whole-turn replay to token-level diagnosis and then to model-side mitigation attempts. The table below separates what was tested from what was observed.
| Experiment | Purpose | Observed result | Interpretation |
|---|---|---|---|
| Two-tool replay | Expose only skill_view and skill_manage. |
Mostly chose skill_manage; occasional mismatch. |
Under-fit the real environment because production had many visible tools. |
| Full WhatsApp schema replay | Use the 32-tool runtime-filtered production-like schema. | At cut 12090, 2/8 mismatches; first tools included skill_view, patch, and skill_manage. |
The real schema made the failure easier to trigger and introduced mutation-route ambiguity. |
| Isolated MTP on/off server runs | Check whether speculative MTP was the root cause. | Expanded full-tool sample: MTP on 5/20 mismatch; MTP off 2/20 mismatch. | MTP can amplify marginal decisions, but it is not required. |
| Prompt-cache tests | Check whether cache reuse caused the loop. | Mismatches occurred with cache disabled and with cache reused. | Prompt cache is not sufficient as a root cause, though it may affect frequency. |
| Low-level logprob probes | Measure probabilities at <function= and <function=skill. |
At late cut 12107, combined bad path was about 0.930. | The loop became near-argmax at the tool-name level. |
Schema ablation: remove skill_view |
Test whether hiding the tool fixes the raw prompt tendency. | With history unchanged, _view still had 0.908 probability after skill. |
Schema visibility alone is not enough without constrained decoding. |
| Historical marker rewrite | Rewrite old skill_view markers to inspect_skill. |
The model copied inspect with probability 0.9399. |
Repeated historical function names were causal copy targets. |
| Action-first naming | Rename skill_view/skill_manage to view_skill/patch_skill. |
First tool token became patch with probability 1.0000 at both early and late cuts. |
Tool-name geometry is a strong interface lever. |
Probability Findings
Early versus late loop
At the start, the model was still recoverable. After repeated read-only calls, the bad branch became dominant.
| Cut | Context state | skill_manage |
skill_view |
patch |
|---|---|---|---|---|
12090 |
First repeated decision point | 0.609 | 0.202 | 0.189 |
12107 |
After repeated read-only calls and one user correction | 0.055 | 0.930 | 0.014 |
How hard it became to break
The cleanest causal curve came from rewriting historical
skill_view markers to inspect_skill, while keeping
only the latest K exact skill_view markers. The probe then
forced <function=skill and measured whether the next token
became _view or _manage.
Exact historical skill_view markers kept |
P(_view) |
P(_manage) |
Meaning |
|---|---|---|---|
| 0 | 0.0000 | 1.0000 | Bad branch removed; model chooses mutation continuation. |
| 1 | 0.4089 | 0.4821 | One exact trace already makes the bad branch competitive. |
| 5 | 0.9842 | 0.0158 | Five recent traces make the bad branch dominant. |
| 13 / baseline | 0.9441 | 0.0559 | Full loop history remains strongly captured by read-only continuation. |
Tokens required to break the loop
In the original naming scheme, the model had two practical exits:
- Choose
patchimmediately after<function=. - If it already chose
skill, choose_manageinstead of_view.
The problem was that by cut 12107, skill was
near-certain and _view was the dominant continuation. Action-first
names changed the first decision to view versus patch,
which moved the late-loop first-token probability to patch=1.0000
in the prompt-level simulation.
Cross-Model Observations
The behavior was not limited to one quantized model. Different models and serving profiles showed different margins, but the same basic repeated historical marker risk appeared repeatedly.
| Model or run | P(skill) | P(_view | skill) | P_bad | Replay / interpretation |
|---|---|---|---|---|
| BF16 source target-only | 0.990897 | 0.781368 | 0.774 | Clean repeat on sampled seeds; quantization not required. |
| Quantized Crown v7 prior | ~0.9856 | ~0.9441 | ~0.930 | Clean repeat in prior replay. |
| Chadrock Saber ROCmFP4/Vulkan d2 | 0.991836 | 0.892114 | 0.885 | 1/4 poisoned seeds repeated. |
| Qwopus 27B Chadrock STRIX_LEAN | 0.984066 | 0.913365 | 0.899 | 3/4 poisoned seeds repeated across three steps. |
| StepFun exact profile | 0.655835 | 0.795741 | 0.521874 | Lower first-token risk, but still high bad-path score. |
| LFM2.5 8B-A1B Q8 | 0.347585 skill-like | 1.000000 | 0.347585 | Often chose write/no-tool in open replay, but forced skill completed as _view in 8/8 seeds. |
A low open-replay failure rate is not proof of safety. LFM sometimes
avoided tools entirely, but once the tool-name path entered the historical
skill attractor, it still collapsed to read-only
skill_view.
Model-Improvement Attempts
After the sandbox identified the action-token failure, follow-up work tried small LoRA-style interventions on the Crown v8 path. These were useful diagnostically, but no adapter was promoted as a clean fix.
What worked partially
The best diagnostic adapter,
crown-v8-tail-l39-sixprompt-plus-suffix-s128-44step-lr2e-5,
passed the high-signal loop gate. It moved high-risk mutation cases toward
skill_manage and drove the measured high-signal loop metrics to
zero.
| Candidate | Positive result | Why it was not promotable |
|---|---|---|
sixprompt-plus-suffix |
All four mutation controls passed as skill_manage; high-signal loop gate passed. |
Read/direct controls regressed; read prompts also got pulled toward _manage. |
| Layer 38:40 suffix-context continuation | Validated that wider late-layer coverage was mechanically possible. | Preserved long-context loop fix but did not repair read-only behavior and damaged ordinary mutation retention. |
| Branch-preference adapter | Corrected the immediate first branch token in the forced probe. | Free continuation degenerated into repeated skill tokens; first-token success was a false sense of progress. |
Lessons from rejected adapters
- First-token branch probes and full free generation must both pass.
- Average training loss or forced margin success can hide continuation collapse.
- Positive examples that emphasize generic
skillcan make the shared-prefix boundary worse. - A real model fix needs context-sensitive behavior: read prompts should allow
_view, mutation prompts should prefer_manage. - The training objective must score complete continuations, clean tool-call completion, and anti-repetition, not only the first branch token.
Recommendations
Interface and runtime mitigations
-
Rename antagonistic tools so read/write intent competes on the first
token. Prefer
view_skillandpatch_skilloverskill_viewandskill_manage. - Add strict tool-name constrained decoding or a hard grammar/logit mask over currently valid tool names. Schema changes should be enforced during generation, not only parsed afterward.
-
Compact repeated identical tool-call traces. Replace many full
skill_viewcalls with a semantic state fact such asskill_view(retatrutide-tracking) was called 13 times; latest result unchanged. - Detect plan/action mismatch. If assistant prose says "patch", "update", or "write" and the tool is read-only with no new information need, block or ask for a different action.
Model training and evaluation
- Build an adversarial tool-action eval around repeated read-only histories, marker counts, current user intent, assistant plan text, and expected tool class.
- Include SFT and preference examples where the same assistant prose is paired with a chosen mutation tool and a rejected repeated read-only tool.
-
Score
P(_manage | <function=skill)versusP(_view | <function=skill)across marker counts 0, 1, 2, 5, and 10+. - Treat first-token probes as diagnostic gates, not final acceptance tests. Full continuation quality must pass before a model or adapter is promoted.
What not to rely on
- Do not rely on temperature or top-p changes once the bad branch is near argmax.
- Do not treat MTP, KV type, or GGUF quantization as the primary root cause.
- Do not hide a stale tool from the schema unless decoding is constrained to valid current tools.
- Do not accept an adapter just because it passes a first-token branch probe.
Limitations
- The original raw Jun 10 request dumps or stream traces were not found, so the replay uses durable persisted state: system prompt, model config, assistant tool calls, tool ids, and tool results from SQLite.
- Several marker-count ablations are one sample per K. The threshold shape is strong, but exact non-monotonic differences such as 5 versus 13 traces should not be over-interpreted.
- Replay is seed-sensitive. Low-level logprob probes are therefore more useful for causal diagnosis, while full replay remains necessary for acceptance.
- Some later invalid eval directories contained connection errors after a prompt-cache crash and should not be treated as model-quality results.
Artifact Map
This report is derived from local artifacts and notes. Key files:
| Artifact | Path |
|---|---|
| Replay sandbox README | /home/crown/hermes-loop-replay-sandbox/README.md |
| Loop message fixture | /home/crown/hermes-loop-replay-sandbox/fixtures/messages_12080_12121.json |
| Initial findings | /home/crown/hermes-loop-replay-sandbox/research/INITIAL_FINDINGS_2026-06-10.md |
| Low-level probing report | /home/crown/hermes-loop-replay-sandbox/research/LOW_LEVEL_PROBING_2026-06-11.md |
| Model-improvement guidance | /home/crown/hermes-loop-replay-sandbox/research/MODEL_IMPROVEMENT_GUIDANCE_2026-06-11.md |
| Experiment index | /home/crown/hermes-loop-replay-sandbox/research/EXPERIMENT_INDEX.md |
| BF16 source repro notes | /home/crown/hermes-loop-replay-sandbox/experiments/bf16_source_live/20260611T085112Z/NOTES.md |
| Qwopus 27B smell test | /home/crown/hermes-loop-replay-sandbox/experiments/model_smell_tests/qwopus3.6-27b-v2-chadrock-strix-lean-mtp/20260611T114120Z/NOTES.md |
| LFM smell test | /home/crown/hermes-loop-replay-sandbox/experiments/model_smell_tests/lfm25-8b-a1b-q8/20260611T165725Z_no_mmap_tool_test/NOTES.md |
| Crown v8 adapter status | /srv/desktop-data/cirudata/research/hermes-tool-loop/crown_v8_tool_loop_eval/STATUS.md |
| Crown model-improvement path report | /srv/desktop-data/cirudata/research/hermes-tool-loop/crown_v8_tool_loop_eval/reports/crown_model_improvement_path_2026-06-11.md |
