Files

Teknium 38d3c49aaf refactor(skills): clean up bundled skill set + add environments: relevance gate (#39028 )

* refactor(skills): clean up bundled skill set + add environments: relevance gate

Bundled skills cleanup pass plus a new offer-time relevance gate.

Removals (redundant / dead):
- spotify (covered by the spotify plugin's 7 native tools)
- linear (covered by `hermes mcp install linear`)
- kanban-codex-lane, debugging-hermes-tui-commands
- empty category markers: diagramming, gifs, inference-sh,
  mlops/training, mlops/vector-databases
- domain (stale orphan dup of optional/research/domain-intel)

Bundled -> optional:
- baoyu-article-illustrator, baoyu-comic, creative-ideation, pixel-art
- dspy, subagent-driven-development
- minecraft-modpack-server, pokemon-player
- hermes-s6-container-supervision (-> optional/devops)

Consolidation:
- webhook-subscriptions + native-mcp folded into the hermes-agent skill
  as references/webhooks.md + references/native-mcp.md with SKILL.md pointers
- writing-plans merged into plan (v2.0.0); related_skills + prose refs updated

New: environments: frontmatter gate (agent/skill_utils.skill_matches_environment)
- Offer-time relevance filter (kanban / docker / s6), parallel to platforms:.
- Wired into the 3 OFFER surfaces only (prompt_builder skills index,
  skills_tool.list_skills, skill_commands slash discovery).
- Explicit loads (skill_view, --skills preload) intentionally BYPASS it, so
  load-bearing force-loads like the kanban dispatcher's `--skills kanban-worker`
  always resolve. Verified via E2E.
- kanban-orchestrator/kanban-worker tagged environments: [kanban];
  hermes-s6-container-supervision tagged environments: [s6] + platforms: [linux].

Validation: 8/8 E2E gating assertions (incl force-load invariant);
442 targeted tests green (agent, skills_tool, skill_commands, kanban worker).

* docs: regenerate skill catalogs + pages for the bundled cleanup

Regenerated per-skill doc pages, catalogs, and sidebar to match the skill
moves/removals in the parent commit. Moved skills' pages relocate
bundled -> optional (history preserved); removed skills' pages deleted;
edited skills' pages refreshed (hermes-agent now embeds the webhook +
native-mcp reference pointers). zh-Hans i18n mirror: stale bundled pages
and catalog rows for moved/removed skills pruned (new optional translations
land via the translation pipeline).

* test: drop regression test for removed kanban-codex-lane skill

The kanban-codex-lane skill was removed in the bundled-skills cleanup;
its dedicated regression test read the now-deleted SKILL.md and failed
with FileNotFoundError on CI shard 6.

2026-06-04 06:11:22 -07:00

4.1 KiB

Raw Blame History

Context Budget Discipline

Practical rules for keeping orchestrator context lean when spawning subagents or reading large artifacts. Use these whenever you're running a multi-step agent loop that will consume significant context — plan execution, subagent orchestration, review pipelines, multi-file refactors.

Universal rules

Every workflow that spawns agents or reads significant content must follow these:

Never read agent definition files. delegate_task auto-loads them — you reading them too just doubles the cost.
Never inline large files into subagent prompts. Tell the agent to read the file from disk with read_file instead. The subagent gets full content; your context stays lean.
Read depth scales with context window. See the table below.
Delegate heavy work to subagents. The orchestrator routes; it doesn't execute.
Proactively warn the user when you've consumed significant context ("Context is getting heavy — consider checkpointing progress before we continue").

Read depth by context window

Check the model's actual context window (not "it's Claude so 200K"). Some Sonnet deployments are 1M, some are 200K. If you don't know, assume the smaller one — err toward leanness.

Context window	Subagent output reading	Summary files	Verification files	Plans for other phases
< 500k (e.g. 200k)	Frontmatter only	Frontmatter only	Frontmatter only	Current phase only
>= 500k (1M models)	Full body permitted	Full body permitted	Full body permitted	Current phase only

"Frontmatter only" means: read enough to see the final status/verdict/conclusion. If the subagent wrote a 3000-line debug log, read the summary section it produced, not the log.

Four-tier degradation model

Monitor your context usage and shift behavior as you climb the tiers. The point is to notice before you hit the wall, not when responses start truncating.

Tier	Usage	Behavior
PEAK	0 – 30%	Full operations. Read bodies, spawn multiple agents in parallel, inline results freely.
GOOD	30 – 50%	Normal operations. Prefer frontmatter reads. Delegate aggressively.
DEGRADING	50 – 70%	Economize. Frontmatter-only reads, minimal inlining, warn the user about budget.
POOR	70%+	Emergency mode. Checkpoint progress immediately. No new reads unless critical. Finish the current task and stop cleanly.

Early warning signs (before panic thresholds fire)

Quality degrades gradually before hard limits hit. Watch for these:

Silent partial completion. Subagent claims done but implementation is incomplete. Self-checks catch file existence, not semantic completeness. Always verify subagent output against the plan's must-haves, not just "did a file appear?"
Increasing vagueness. Agent starts using phrases like "appropriate handling" or "standard patterns" instead of specific code. This is context pressure showing up before budget warnings fire.
Skipped protocol steps. Agent omits steps it would normally follow. If success criteria has 8 items and the report covers 5, suspect context pressure, not "the agent decided 5 was enough."

When these signs appear, checkpoint the work and either reset context or hand off to a fresh subagent.

Fundamental limitation

When you orchestrate, you cannot verify semantic correctness of subagent output — only structural completeness ("did the file appear?", "does the test pass?"). Semantic verification requires either running the code yourself or delegating a review pass to another fresh subagent.

Mitigation: in every task you delegate, include explicit "must-have" truths the subagent must confirm in its response (e.g., "confirm your test actually tests X, not just that X was imported"). The subagent re-asserting concrete facts is evidence; vague summaries are not.

4.1 KiB Raw Blame History Unescape Escape