Commit Graph

9933 Commits

Author SHA1 Message Date
78d7fa1b5c refactor(skills/antigravity-cli): move to autonomous-ai-agents (it's an AI agent CLI) 2026-05-29 05:21:48 -07:00
904c0b479b refactor(state): return FTS index count from vacuum()
Have vacuum() return optimize_fts()'s count so the CLI 'sessions optimize'
summary uses the real merged-index count instead of probing the private
_FTS_TABLES / _fts_table_exists() members.
2026-05-29 05:09:56 -07:00
38695254f8 perf(state): merge FTS5 segments on VACUUM + add 'hermes sessions optimize'
The FTS5 indexes (messages_fts, messages_fts_trigram) grow as a series of
incremental b-tree segments — one per trigger-driven insert batch. SQLite's
automerge caps at ~16 segments, so a long-lived store keeps scanning many
segments per MATCH and never collapses them unless the special 'optimize'
command runs. Nothing in the codebase ever ran it: vacuum() only fired after
a prune that deleted rows, and even then never merged FTS segments.

Changes:
- SessionDB.optimize_fts(): merges each FTS5 index to a single segment,
  probing for the (optional/lazy) trigram table first so it is safe to call
  unconditionally. Layout-only — search results and snippet() are unchanged.
- vacuum() now calls optimize_fts() before VACUUM so freed index pages are
  returned to the OS in the same pass.
- 'hermes sessions optimize' CLI subcommand for on-demand reclamation +
  segment compaction (previously there was no way to compact the store
  without a prune deleting rows), with before/after size reporting.

Benchmark (8000 msgs, fragmented to 8 segments/index):
- segments 8 -> 1 on both indexes
- porter MATCH 5.5x faster (0.449 -> 0.081 ms/q)
- trigram MATCH 3.0x faster (0.632 -> 0.207 ms/q)
- 8000 matches before == 8000 after, identical row ids (no functional change)

Orthogonal to the structural FTS-size PRs (#20239 external-content,
#27770 optional trigram) — segment merge helps regardless of those.

Tests: TestOptimizeFts covers index count, search+snippet preservation,
missing-trigram path, and idempotency. Full test_hermes_state.py green (227).
2026-05-29 05:09:56 -07:00
2159d2a729 docs(credential-pools): document immediate rotation on usage-limit 429 (#34580)
The rotation flowchart only described the generic 'retry once, rotate on
second 429' path. ChatGPT/Codex plan-limit 429s carry a usage_limit_reached
reason and rotate to the next pool key immediately (no retry, since the cap
won't clear on retry). Document that case so the docs match the code.
2026-05-29 04:50:14 -07:00
0dba60f73b docs(skills): regen catalog + sidebar for optional antigravity-cli skill 2026-05-29 04:49:42 -07:00
632a7088a3 chore(skills/antigravity-cli): make optional, frame through Hermes tools, tighten frontmatter 2026-05-29 04:49:42 -07:00
1bba5f27ab feat(skills): add antigravity-cli operator skill 2026-05-29 04:49:42 -07:00
d6f2bdabda docs(skills): regen catalog + sidebar for optional grok skill 2026-05-29 04:49:38 -07:00
99ddba94ed chore(skills/grok): make optional + tighten SKILL.md to modern format 2026-05-29 04:49:38 -07:00
10cd4138cc feat(skills): add grok skill for xAI Grok Build CLI
Adds a `grok` skill under `skills/autonomous-ai-agents/`, a third coding-agent orchestration guide alongside `codex` and `claude-code`. It teaches Hermes to delegate coding tasks to Grok Build (xAI's `grok` CLI).

- Headless `-p` one-shots (preferred)
- Interactive TUI via pty + tmux
- Session resume, background tasks, structured JSON output
- PR review and parallel worktree patterns
- Auth via SuperGrok / X Premium+ (`grok login`)
- Full pitfalls and config notes
2026-05-29 04:49:38 -07:00
5e7c2ffa9f chore(models): gemini-3.5-flash replaces gemini-3-flash-preview in OpenRouter + Nous lists (#34581)
* chore(models): swap gemini-3-flash-preview for gemini-3.5-flash in OpenRouter + Nous lists

* chore(models): regenerate model-catalog.json for gemini-3.5-flash swap
2026-05-29 04:27:58 -07:00
1c53d39eaa test: deflake process-registry kill + PTY resize tests
Two CI flakes surfaced on PR #34572 (both in files this PR doesn't touch;
pre-existing host-dependent flakes):

1. test_process_registry::TestPopenLeakOnSetupFailure — the failure-cleanup
   tests use a fake proc.pid (8888/9999) and assert proc.kill() runs. But
   spawn_local's primary cleanup is os.killpg(os.getpgid(pid), SIGKILL),
   falling back to proc.kill() only on ProcessLookupError/PermissionError/
   OSError. When the fake PID happens to exist on a busy host, os.getpgid
   succeeds, os.killpg fires against an UNRELATED real process group, and
   proc.kill() is never reached -> flaky AssertionError (and a real risk of
   SIGKILLing an innocent process group from a unit test). Patch os.getpgid
   to raise ProcessLookupError so the fallback path runs deterministically
   and no real killpg is ever issued.

2. test_web_server::test_resize_escape_is_forwarded — the receive loop calls
   the blocking conn.receive_bytes() with no exception guard. Once the child
   prints its winsize and exits, the PTY closes; on a missed-marker run the
   next recv blocks until the 30s pytest-timeout instead of failing fast.
   Add a try/except break (matching the working sibling tests) and bump the
   child's pre-read sleep 0.15s -> 0.5s so the resize reliably lands first.

Verified: 4/4 pass across 3 consecutive runs; root cause for #1 reproduced
(os.getpgid(1) succeeds -> old code skips proc.kill).
2026-05-29 04:22:41 -07:00
6a2e3c2d26 fix(gateway): guard adapter-trust check against bare GatewayRunner in tests
_adapter_enforces_own_access_policy accessed self.adapters directly, but
several auth tests build a bare GatewayRunner via object.__new__ without
setting .adapters (pitfalls.md #17). Read it defensively with getattr so a
missing/empty adapter map means "no adapter owns the policy" instead of
raising AttributeError.

Fixes 4 tests: test_feishu_bot_auth_bypass, test_discord_bot_auth_bypass (x2),
test_signal::test_signal_in_allowlist_maps.
2026-05-29 04:22:41 -07:00
fd09b2c55e fix(gateway): trust adapter-owned access policy over env default-deny (#34515)
Config-driven platform policies (dm_policy / group_policy / allow_from /
group_allow_from) for WeCom, Weixin, Yuanbao, and QQBot now work without
also setting a PLATFORM_ALLOWED_USERS env var.

These adapters enforce their access policy at intake — a message is dropped
inside the adapter and never dispatched unless it already passed the policy.
The gateway's env-based check (_is_user_authorized) ran afterward and, with
no env allowlist set, fell through to an env-only default-deny — silently
rejecting `dm_policy: open` and config-only allowlists the adapter had
already authorized.

Rather than re-implement each adapter's policy a second time in run.py
(which would drift), adapters that own their gate now declare it via a new
BasePlatformAdapter.enforces_own_access_policy property (default False). The
gateway trusts that flag and skips the env-only default-deny for those
platforms. Env allowlists still take precedence when set.

Also resolves unauthorized DM behavior from config dm_policy so allowlist /
disabled policies drop unauthorized DMs silently instead of leaking pairing
codes, while an explicit pairing policy opts back in.

Co-authored-by: Frowtek <frowte3k@gmail.com>
2026-05-29 04:22:41 -07:00
ddaf2f6712 style: restore PEP8 blank-line separation after dead-code removal
The deletions in the salvaged commit left some top-level defs/classes
separated by a single blank line. Restore the 2-blank-line separation.
2026-05-29 04:22:27 -07:00
dc235e93cb chore: remove dead code — 28 unused functions/classes across 16 files
Vulture + per-symbol verification (whole-repo grep incl. tests, string
literals, getattr, decorator/registry/argparse dispatch) confirmed each of
these has zero callers anywhere — not reachable via any dynamic-dispatch path,
not referenced by tests, not re-exported.

Removed:
- acp_adapter/tools.py: _build_patch_mode_content
- agent/anthropic_adapter.py: read_claude_managed_key (diagnostics-only, never called)
- agent/bedrock_adapter.py: get_bedrock_model_ids
- agent/browser_registry.py: get_active_browser_provider
- agent/chat_completion_helpers.py: _take_request_client (x2 nested closures, never invoked)
- gateway/platforms/weixin.py: _rewrite_headers_for_weixin, _rewrite_table_block_for_weixin
- hermes_cli/banner.py: _skin_branding
- hermes_cli/debug.py: _delete_hint
- hermes_cli/gateway.py: _setup_email, _setup_sms, _setup_yuanbao
  (platform keys absent from the _builtin_setup_fn dispatch dict; handled by
  the _setup_standard_platform fallback)
- hermes_cli/kanban_db.py: set_max_runtime, active_run
- hermes_cli/kanban_diagnostics.py: severity_of_highest, _latest_clean_event_ts
- hermes_cli/main.py: _build_provider_choices, cmd_portal
  (portal subcommand is wired via portal_cli.add_parser, not this wrapper)
- hermes_cli/model_switch.py: CustomAutoResult (orphaned by the switch_model() extraction)
- hermes_cli/models.py: format_model_pricing_table, fetch_nous_account_tier
- hermes_cli/portal_cli.py: _nous_portal_base_url
- hermes_cli/proxy/server.py: handle_models_fallback (defined but never registered on the router)
- tools/computer_use/cua_backend.py: _parse_element, _is_arm_mac
- tools/file_operations.py: _get_safe_write_root (prod uses the imported
  agent.file_safety.get_safe_write_root directly)
- tools/skills_tool.py: _load_category_description

Also dropped two imports left unused by the removals:
- tools/file_operations.py: get_safe_write_root alias
- tools/computer_use/cua_backend.py: import platform

Pure deletion: -551 LOC. No behavior change. Test files covering the edited
modules pass (640/640); the broader suite's pre-existing/env-dependent
failures reproduce unchanged on origin/main.
2026-05-29 04:22:27 -07:00
0aa9f6acfa docs(nav): wire multi-profile-gateways guide into sidebar
Follow-up for #30240 — the new page was not referenced in sidebars.ts,
leaving it orphaned (unreachable via nav and flagged as a broken relative
link to ./profiles.md). Added under Using Hermes after profile-distributions.
2026-05-29 04:11:10 -07:00
0c0a905011 docs(gateway): add multi-profile gateways operations guide
Covers running multiple Hermes profiles as managed services on one host:

- A shell-loop wrapper pattern for start/stop/restart/status across every
  profile (the per-profile CLI commands stay unchanged).
- Per-platform service file locations (LaunchAgent on macOS, systemd user
  unit on Linux), plus the rules around clashes.
- Log paths per profile and how to tail every gateway at once.
- Config file layout per profile and the restart-after-edit workflow.
- Keeping the host awake: caffeinate flags on macOS,
  systemd-inhibit + loginctl enable-linger on Linux.
- Token-conflict auditing across .env files.
- Troubleshooting for the common "Could not find service in domain for
  user gui: 501" message and stale PIDs after a crash.

Tested locally with five profiles on macOS launchd.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-29 04:11:10 -07:00
e4b9532c18 feat: embedder environment-hint hook for the system prompt (#34574)
* fix(security): block AWS SDK creds from subprocess env

* fix(security): narrow Bedrock subprocess strip to inference bearer token only

Scopes the AWS_SDK subprocess strip down from the full AWS credential chain
to just AWS_BEARER_TOKEN_BEDROCK — the only Hermes-managed *inference* secret
(analogous to OPENAI_API_KEY). The general AWS credential chain
(AWS_ACCESS_KEY_ID / AWS_SECRET_ACCESS_KEY / AWS_SESSION_TOKEN / AWS_PROFILE
/ config + role pointers) is intentionally left inheritable.

Why: per SECURITY.md §3.2 the local terminal is the user's trusted operator
shell. Hard-blocklisting the general chain would (a) regress *every* user who
runs aws/terraform/cdk/boto3 in the agent terminal — not just Bedrock users,
since PROVIDER_REGISTRY is iterated unconditionally at import — and (b) be
unrecoverable, because env_passthrough.py refuses to re-allow anything in
_HERMES_PROVIDER_ENV_BLOCKLIST (GHSA-rhgp-j443-p4rf). The narrow strip closes
the reported leak (opencode enumerating the Bedrock catalog off the leaked
bearer token) with no capability loss.

Keeps zapabob's self-healing auth_type=="aws_sdk" mechanism so any future
SDK-cred provider is covered automatically.

Tests: bearer token stripped + general chain preserved (no-regression guard),
on both the runtime strip path and the blocklist-membership path.

Co-authored-by: zapabob <1920071390@campus.ouj.ac.jp>

* feat: embedder environment-hint hook for the system prompt

Adds HERMES_ENVIRONMENT_HINT env var (and config.yaml agent.environment_hint)
so a host wrapping Hermes (sandbox runner, managed platform) can describe the
runtime environment — proxy, credential handling, mount layout — in the system
prompt's environment-hints block, without editing the identity slot (SOUL.md).

Read once at prompt-build time, so it lands in the stable, cache-safe portion
of the system prompt. Env var overrides the config key (build-time/container
mechanism). Empty by default — no behavior change for existing installs.

---------

Co-authored-by: zapabob <1920071390@campus.ouj.ac.jp>
2026-05-29 04:10:05 -07:00
c0b17b3c0c docs(weixin): clarify allowed users setup 2026-05-29 04:01:06 -07:00
2520c9ad68 docs(skills): clarify Reminders alarm timing 2026-05-29 04:01:01 -07:00
62e81b2d9b docs(windows): add WSL desktop shortcut guide 2026-05-29 04:00:57 -07:00
fe7e0a8c1d docs(feishu): add permission scopes, event subscription, and publish steps
The setup guide was missing the specific Feishu permission scopes to
configure and the event subscription (im.message.receive_v1) needed
for the bot to receive messages. Users had to reference external
OpenClaw documentation to complete the setup.

Adds:
- Required permissions table (im:message, im:message:send_as_bot,
  im:resource, im:chat, im:chat:readonly)
- Recommended permissions (reactions, app info, contact)
- Event subscription step (im.message.receive_v1)
- App version publish reminder (permissions require published version)
2026-05-29 04:00:52 -07:00
6e179c44b1 fix(web): ensure plugin discovery before web_*_tool registry lookups
Web search/extract dispatch read agent.web_search_registry before plugin
discovery had run, so in any process that hadn't imported model_tools.py
(subprocess agent runs, delegate children, standalone scripts) the registry
was empty: get_provider('firecrawl') returned None and the dispatcher emitted
the misleading 'No web extract provider configured' error even with
web.extract_backend set and FIRECRAWL_API_KEY exported.

Adds an idempotent _ensure_web_plugins_loaded() helper (mirrors
tools.browser_tool._ensure_browser_plugins_loaded) and calls it at the top of
both the web_search_tool and web_extract_tool dispatch sites before the
registry lookup.

Fixes #27580.

Co-authored-by: briandevans <252620095+briandevans@users.noreply.github.com>
2026-05-29 04:00:00 -07:00
58e1b04665 chore(release): map tillfalko to GitHub login for PR #29987 salvage 2026-05-29 03:58:56 -07:00
c77a697fa4 refactor(vision): consolidate native fast-path gate into one shared helper
The fast-path decision (native routing + provider allowlist OR
supports_vision override) lived inline in vision_analyze and was copied
into browser_vision. Extract it to _should_use_native_vision_fast_path()
so both tools share one source of truth.

- vision_tools: gate logic now one helper; vision_analyze calls it in 3 lines
- browser_tool: thin envelope decoration over the shared helper, not a copy
- browser_vision typed Union[str, Dict] to match its real return shape
- tests slimmed to target the override path + text-mode-wins invariant
2026-05-29 03:58:56 -07:00
c3f28c651d docs(browser): update browser_vision tool description for native vision routing 2026-05-29 03:58:56 -07:00
2402ec5e7b test: extend test coverage to native image routing 2026-05-29 03:58:56 -07:00
f8b8dffccf fix(browser): add native image support to browser_vision and respect supports_vision 2026-05-29 03:58:56 -07:00
f05353397d fix(vision): respect supports_vision in vision_analyze 2026-05-29 03:58:56 -07:00
784d8dd2c2 fix(matrix): fail-closed approval reaction auth when MATRIX_ALLOWED_USERS is empty
The _on_reaction approval handler used:

    if self._allowed_user_ids and sender not in self._allowed_user_ids:

When MATRIX_ALLOWED_USERS is not configured, _allowed_user_ids is an
empty set. The short-circuit on the empty set caused the deny block to
never execute, allowing any Matrix room member to approve or deny tool
calls via / reactions — even users that run.py's _is_user_authorized
would reject for regular messages.

Fix mirrors the Telegram _is_callback_user_authorized fix (commit
89d32052e, PR #28494): deny by default when no allowlist is configured,
unless GATEWAY_ALLOW_ALL_USERS=true is explicitly set.
2026-05-29 03:58:45 -07:00
3171845479 fix(code-exec): make dropped HERMES_* env vars diagnosable in sandbox scrub
Follow-up mitigation for the #27303 env-scrub tightening. Dropping the
broad HERMES_ prefix in favor of a 4-var operational allowlist is correct
hardening, but a sandbox script that imports a repo module reading a
non-allowlisted HERMES_* var at import time would otherwise see it
silently unset. _scrub_child_env now emits a one-shot debug log naming the
dropped non-secret HERMES_* vars and pointing at the env_passthrough
opt-in escape hatch. Secret-shaped vars are never named in the log.

Tests: dropped vars are logged + env_passthrough named; no log when
nothing is dropped; secret vars excluded from the diagnostic.
2026-05-29 03:44:49 -07:00
4bdae34771 test(code-exec): regression suite for the approval-bypass cluster
Cover context+callback propagation and teardown-clears, a source guard that both RPC threads stay wrapped, the check_execute_code_guard decision matrix (isolated backend, headless-local, cron-deny, gateway approve/deny/timeout/missing-notify, smart mode, session-yolo), the env-scrub allowlist/secret rules, and a behavioral test that execute_code() blocks before spawning on denial.

Refs #4146, #27303, #30882, #33057
2026-05-29 03:44:49 -07:00
655090b3d3 feat(gateway): warn at startup on manual approvals with no risk assessor
When approvals.mode=manual with security.tirith_enabled off and no auxiliary.approval model, dangerous commands and execute_code scripts can only be gated by live in-chat approval; with routing fixed they now fail closed (block) rather than silently auto-run. Surface that at startup so operators knowingly enable tirith or auxiliary.approval for unattended gateways.

Refs #30882
2026-05-29 03:44:49 -07:00
1083977261 fix(code-exec): restore approval context in execute_code RPC threads + guard entry
Wrap both execute_code RPC threads (local UDS + remote file-RPC) with propagate_context_to_thread so gateway sessions no longer fall into check_dangerous_command's non-interactive auto-approve branch and the CLI approval prompt stays reachable. Add check_execute_code_guard: one-shot fail-closed approval of the whole script in gateway/ask/cron-deny before the child spawns (skips isolated backends; command-string built only past the early returns). Drop the broad HERMES_ env passthrough for an explicit operational allowlist plus DSN/WEBHOOK secret substrings, and update the POSIX-equivalence oracle.

Refs #4146, #27303, #30882, #33057
2026-05-29 03:44:49 -07:00
21aeefe5fd fix(code-exec): propagate agent-turn context into tool worker threads
Worker threads that dispatch Hermes tools started with an empty contextvars.Context and no thread-local approval/sudo callbacks. Add tools/thread_context.propagate_context_to_thread factoring that capture/install/clear lifecycle (mirrors the GHSA-qg5c-hvr5-hjgr pattern), and refactor agent/tool_executor onto it so the security-critical logic lives in one audited place. Update the contextvar-propagation source guard for the new call shape.

Refs #33057
2026-05-29 03:44:49 -07:00
a22c250001 refactor(auth): remove vestigial Nous min_key_ttl/inference_auth_mode params
After the legacy session-key path was removed, two parameters became dead
surface on the Nous runtime-resolution chain:

- min_key_ttl_seconds: del'd inside refresh_nous_oauth_pure and pass-through /
  telemetry-only in refresh_nous_oauth_from_state, _try_import_shared_nous_state,
  _nous_device_code_login, and resolve_nous_runtime_credentials. It controlled the
  now-deleted agent-key mint TTL and drives no behavior.
- inference_auth_mode: with the legacy mode gone, AUTO and FRESH are behaviorally
  identical; the value only fed _normalize_nous_inference_auth_mode validation and
  oauth trace output, never a branch.

Removing inference_auth_mode orphaned its whole supporting cluster
(NOUS_INFERENCE_AUTH_MODE_AUTO/FRESH, NOUS_INFERENCE_AUTH_MODES,
_normalize_nous_inference_auth_mode), and dropping min_key_ttl_seconds orphaned
DEFAULT_AGENT_KEY_MIN_TTL_SECONDS — all deleted here.

Updated every caller (run_agent, auxiliary_client, credential_pool, proxy adapter,
runtime_provider, web_server, main, auth_commands, setup) and pruned the matching
test kwargs. Deleted two tests that exercised the removed surface
(test_legacy_auth_mode_is_rejected, test_try_refresh_..._accepts_explicit_auth_mode).

No behavior change: net -134 LOC of dead code.
2026-05-29 02:24:48 -07:00
95cf8f9842 refactor(auth): drop weak JWT-shape fallback in auxiliary _nous_api_key
The import-failure fallback returned any 3-segment token without scope/
expiry validation, a divergent reimplementation of the canonical
_nous_invoke_jwt_is_usable check. The import is from the same module that
provides resolve_nous_runtime_credentials, so a failure means the whole
auxiliary Nous path is unavailable anyway; return "" instead so the caller
falls through to the clear 'run: hermes auth add nous' guidance rather than
handing back an unvalidated token.
2026-05-29 02:24:48 -07:00
4e4984a11a test(auth): update nous jwt-only expectations 2026-05-29 02:24:48 -07:00
7e958dafc2 fix(auth): address Nous JWT fallback review 2026-05-29 02:24:48 -07:00
41ff6e5937 refactor(auth): Disable Nous legacy session key fallback 2026-05-29 02:24:48 -07:00
a87f0a82a5 test(tool-search): redact secrets from harness transcripts + console
The live harness runs against a real OpenRouter key; record['error'] is a
full traceback that, on an auth failure, could echo a request header or URL
containing the key. _redact_secrets() now masks the live OPENROUTER_API_KEY,
any sk-/sk-or- bearer token, and Authorization/Bearer headers before
final_response and error enter the transcript or the console print. Addresses
the CodeQL clear-text-storage/logging findings at the source.
2026-05-29 02:04:12 -07:00
18c9e89106 test: update _invoke_tool dispatch assertion for new toolset-scope kwargs
The scoping fix added enabled_toolsets/disabled_toolsets to the
agent_runtime_helpers sequential dispatch into handle_function_call, so
test_invoke_tool_dispatches_to_handle_function_call's assert_called_once_with
(exact match) needs the two new kwargs. Both are None for the default agent
fixture.
2026-05-29 02:04:12 -07:00
1709776120 test(tool-search): add live A/B harness, drop checked-in transcripts
Brings in the tool_search live-test harness from the original PR but leaves
out the 11 checked-in scripts/out/*.json transcript files — those are
non-deterministic model output that goes stale the moment the model changes
and were the bulk of the diff. scripts/out/ is now gitignored so a harness
run never re-commits them.

Fixes on top:
- API-key loading goes through hermes_cli.env_loader.load_hermes_dotenv
  instead of hand-parsing ~/.hermes/.env and assigning the value to a local.
  The canonical loader never materializes the secret in a local variable in
  this module, which clears the four CodeQL high alerts
  (py/clear-text-storage / py/clear-text-logging-sensitive-data at the
  transcript write/print sites — they were tracing the key from the
  hand-rolled parser into the records) and removes a hand-rolled parser.
- encoding='utf-8' on every write_text/read_text in both harness scripts
  (Windows-footgun hygiene).

Co-authored-by: teknium1 <127238744+teknium1@users.noreply.github.com>
2026-05-29 02:04:12 -07:00
7427b9d581 fix(tool-search): scope bridge catalog + dispatch to the session's toolsets
Tool Search read its catalog from the global registry (get_tool_definitions
with no toolset scope = 'start with everything'), so a restricted-toolset
session — subagent, kanban worker, curated gateway session — could:

  1. tool_search the entire process registry, not just its granted tools, and
  2. tool_call any registered plugin/MCP tool it was never given, because
     registry.dispatch() has no enabled_tools gate for non-execute_code tools.

A scoped session (enabled_toolsets=['mcp-github']) reported total_available=26
and successfully invoked an out-of-scope plugin tool via tool_call.

Fix:
- handle_function_call gains enabled_toolsets/disabled_toolsets; the bridge
  dispatch scopes get_tool_definitions to them (also stops polluting the
  process-global _last_resolved_tool_names with out-of-scope tools, which
  leaked into execute_code's sandbox-tool fallback).
- A defense-in-depth gate rejects any tool_call'd name not in the scoped
  deferrable catalog.
- tool_executor's unwrap (both concurrent + sequential paths) enforces the
  same scope before dispatch, since it unwraps tool_call -> underlying name
  and bypasses the bridge branch. New _tool_search_scoped_names() helper,
  cached per-agent on registry generation + toolset scope.
- New scoped_deferrable_names() helper in tool_search.py shared by both sites.

Tests: 4 new regression tests in TestRegression_ToolsetScoping (scoped
catalog, out-of-scope tool_call rejection, no global pollution, helper).
2026-05-29 02:04:12 -07:00
369075dc95 feat(tools): progressive tool disclosure for MCP and plugin tools
Adds Tool Search, a structured-tools progressive-disclosure layer that
replaces MCP and non-core plugin tools in the model-visible tools array
with three bridge tools (tool_search / tool_describe / tool_call) when
the deferrable surface would consume more than a configurable percentage
of the active model's context window. Core Hermes tools are never deferred.

Default mode is 'auto' with a 10% context threshold, so small toolsets
pay no overhead. Set tools.tool_search.enabled to 'on' to force or 'off'
to disable.

Design carefully reflects the OpenClaw production failure modes
documented in the openclaw-tool-search-report:

  - Core tools never defer (toolsets._HERMES_CORE_TOOLS). Addresses the
    'tools silently missing from isolated cron turns' regression class
    (openclaw#84141) by construction: there is no code path that can
    drop a core tool.
  - Catalog is stateless across turns — rebuilt from the live tool-defs
    list on every assembly. No session-keyed Map that can drift out of
    sync with the registry.
  - tool_call unwraps the bridge call before any hook fires, so plugin
    pre/post hooks, guardrails, approval flows, and the activity feed
    all see the underlying tool name, not the bridge (addresses
    openclaw#85588 and the verbose-mode complaint on openclaw#79823).
  - The unwrap happens in both the parallel and sequential paths of
    agent/tool_executor.py and also in handle_function_call, so direct
    callers (sandboxed code, eval harnesses) are covered too.
  - Bridge tools cannot invoke each other (recursion guard) and cannot
    invoke core tools (those must be called directly).
  - Tools mode only — no JS-sandbox code-mode. Keeps the surface small.
  - Token estimation via cheap char/4 heuristic; precision isn't needed
    for the threshold decision.

Files:
  - tools/tool_search.py — new module (BM25 retrieval, classification,
    threshold gate, bridge dispatch, unwrap helper).
  - tests/tools/test_tool_search.py — 35 tests including the OpenClaw
    #84141 regression guard.
  - model_tools.py — wires assembly into _compute_tool_definitions as the
    final step, adds skip_tool_search_assembly kwarg so the bridge can
    see the real catalog, dispatches the three bridge tools.
  - agent/tool_executor.py — unwraps tool_call in both parallel and
    sequential parsing loops so checkpointing, guardrails, plugin hooks,
    and tool-progress callbacks all observe the underlying tool name.
  - hermes_cli/config.py — DEFAULT_CONFIG['tools']['tool_search'] block.
  - website/docs/user-guide/features/tool-search.md — user docs.

Validation:
  - 35/35 new tests pass.
  - Existing tool/registry/model_tools/config/coercion/executor tests
    (82 + 74 + small adjacents) green.
  - Live E2E: 20 fake MCP tools registered, get_tool_definitions returns
    3 bridges, tool_search returns top 3 hits, tool_describe returns
    full schema, tool_call dispatches to the real underlying handler
    and the underlying result is what the model sees.
  - Reserved-name recursion guard verified live.
  - Core-tool refusal via tool_call verified live.
2026-05-29 02:04:12 -07:00
73d73f1f0d fix(codex): relax no-byte TTFB watchdog default from 12s to 120s
The chatgpt.com/backend-api/codex endpoint can spend tens of seconds in
backend admission / prompt prefill before emitting its first SSE event. The
12s no-byte TTFB cutoff aborted those still-valid streams, surfacing as
'Codex stream produced no bytes within 12s' through all retries (Discord
reports). The OpenAI SDK's own streaming read timeout is 600s, so 12s was
~50x more aggressive than the transport layer would have tolerated.

Default the no-byte cutoff to 120s and raise the openai-codex MAX cap default
to 120s so it no longer clamps the new default back to 20s. Disabling stays
available via HERMES_CODEX_TTFB_TIMEOUT_SECONDS=0; the 25k-token auto-disable,
_STRICT override, and post-first-event idle watchdog are unchanged.

Co-authored-by: Gille <4317663+helix4u@users.noreply.github.com>
2026-05-29 02:02:25 -07:00
6bebab4761 fix(security): narrow Bedrock subprocess strip to inference bearer token only
Scopes the AWS_SDK subprocess strip down from the full AWS credential chain
to just AWS_BEARER_TOKEN_BEDROCK — the only Hermes-managed *inference* secret
(analogous to OPENAI_API_KEY). The general AWS credential chain
(AWS_ACCESS_KEY_ID / AWS_SECRET_ACCESS_KEY / AWS_SESSION_TOKEN / AWS_PROFILE
/ config + role pointers) is intentionally left inheritable.

Why: per SECURITY.md §3.2 the local terminal is the user's trusted operator
shell. Hard-blocklisting the general chain would (a) regress *every* user who
runs aws/terraform/cdk/boto3 in the agent terminal — not just Bedrock users,
since PROVIDER_REGISTRY is iterated unconditionally at import — and (b) be
unrecoverable, because env_passthrough.py refuses to re-allow anything in
_HERMES_PROVIDER_ENV_BLOCKLIST (GHSA-rhgp-j443-p4rf). The narrow strip closes
the reported leak (opencode enumerating the Bedrock catalog off the leaked
bearer token) with no capability loss.

Keeps zapabob's self-healing auth_type=="aws_sdk" mechanism so any future
SDK-cred provider is covered automatically.

Tests: bearer token stripped + general chain preserved (no-regression guard),
on both the runtime strip path and the blocklist-membership path.

Co-authored-by: zapabob <1920071390@campus.ouj.ac.jp>
2026-05-29 01:48:08 -07:00
95b5b72404 fix(security): block AWS SDK creds from subprocess env 2026-05-29 01:48:08 -07:00
db2ce9e7d2 fix(compression): fail open when lock subsystem is missing (version skew) (#34475)
A process running mismatched module versions — conversation_compression.py
re-imported with the post-#34351 lock code while a long-lived
hermes_state.SessionDB stays bound to the pre-#34351 class in memory — has
the try_acquire_compression_lock call site but not the method. The
AttributeError it raises is NOT a sqlite3.Error, so the method's own
fail-open guard never runs; the exception escapes to the outer agent loop,
which prints the error and retries. Compression never succeeds, the token
count never drops, and the loop re-triggers compaction forever (the
'API call #47/#48/#49 ... has no attribute try_acquire_compression_lock'
spin a user hit after an update).

Wrap the lock acquire so any unexpected exception fails OPEN: skip locking
and proceed with compression. Skipping the lock risks a rare
concurrent-compression session fork; an infinite no-progress loop that never
compresses at all is strictly worse. The remediation hint in the log points
at the real fix (restart / hermes update to resync the stale module).

Also guards get_compression_lock_holder against the same skew.

Adds a regression test simulating the version skew (real SessionDB wrapped
so only the lock methods raise AttributeError) — asserts _compress_context
proceeds and rotates instead of raising.
2026-05-29 01:32:32 -07:00