Commit Graph

9914 Commits

Author SHA1 Message Date
c0b17b3c0c docs(weixin): clarify allowed users setup 2026-05-29 04:01:06 -07:00
2520c9ad68 docs(skills): clarify Reminders alarm timing 2026-05-29 04:01:01 -07:00
62e81b2d9b docs(windows): add WSL desktop shortcut guide 2026-05-29 04:00:57 -07:00
fe7e0a8c1d docs(feishu): add permission scopes, event subscription, and publish steps
The setup guide was missing the specific Feishu permission scopes to
configure and the event subscription (im.message.receive_v1) needed
for the bot to receive messages. Users had to reference external
OpenClaw documentation to complete the setup.

Adds:
- Required permissions table (im:message, im:message:send_as_bot,
  im:resource, im:chat, im:chat:readonly)
- Recommended permissions (reactions, app info, contact)
- Event subscription step (im.message.receive_v1)
- App version publish reminder (permissions require published version)
2026-05-29 04:00:52 -07:00
6e179c44b1 fix(web): ensure plugin discovery before web_*_tool registry lookups
Web search/extract dispatch read agent.web_search_registry before plugin
discovery had run, so in any process that hadn't imported model_tools.py
(subprocess agent runs, delegate children, standalone scripts) the registry
was empty: get_provider('firecrawl') returned None and the dispatcher emitted
the misleading 'No web extract provider configured' error even with
web.extract_backend set and FIRECRAWL_API_KEY exported.

Adds an idempotent _ensure_web_plugins_loaded() helper (mirrors
tools.browser_tool._ensure_browser_plugins_loaded) and calls it at the top of
both the web_search_tool and web_extract_tool dispatch sites before the
registry lookup.

Fixes #27580.

Co-authored-by: briandevans <252620095+briandevans@users.noreply.github.com>
2026-05-29 04:00:00 -07:00
58e1b04665 chore(release): map tillfalko to GitHub login for PR #29987 salvage 2026-05-29 03:58:56 -07:00
c77a697fa4 refactor(vision): consolidate native fast-path gate into one shared helper
The fast-path decision (native routing + provider allowlist OR
supports_vision override) lived inline in vision_analyze and was copied
into browser_vision. Extract it to _should_use_native_vision_fast_path()
so both tools share one source of truth.

- vision_tools: gate logic now one helper; vision_analyze calls it in 3 lines
- browser_tool: thin envelope decoration over the shared helper, not a copy
- browser_vision typed Union[str, Dict] to match its real return shape
- tests slimmed to target the override path + text-mode-wins invariant
2026-05-29 03:58:56 -07:00
c3f28c651d docs(browser): update browser_vision tool description for native vision routing 2026-05-29 03:58:56 -07:00
2402ec5e7b test: extend test coverage to native image routing 2026-05-29 03:58:56 -07:00
f8b8dffccf fix(browser): add native image support to browser_vision and respect supports_vision 2026-05-29 03:58:56 -07:00
f05353397d fix(vision): respect supports_vision in vision_analyze 2026-05-29 03:58:56 -07:00
784d8dd2c2 fix(matrix): fail-closed approval reaction auth when MATRIX_ALLOWED_USERS is empty
The _on_reaction approval handler used:

    if self._allowed_user_ids and sender not in self._allowed_user_ids:

When MATRIX_ALLOWED_USERS is not configured, _allowed_user_ids is an
empty set. The short-circuit on the empty set caused the deny block to
never execute, allowing any Matrix room member to approve or deny tool
calls via / reactions — even users that run.py's _is_user_authorized
would reject for regular messages.

Fix mirrors the Telegram _is_callback_user_authorized fix (commit
89d32052e, PR #28494): deny by default when no allowlist is configured,
unless GATEWAY_ALLOW_ALL_USERS=true is explicitly set.
2026-05-29 03:58:45 -07:00
3171845479 fix(code-exec): make dropped HERMES_* env vars diagnosable in sandbox scrub
Follow-up mitigation for the #27303 env-scrub tightening. Dropping the
broad HERMES_ prefix in favor of a 4-var operational allowlist is correct
hardening, but a sandbox script that imports a repo module reading a
non-allowlisted HERMES_* var at import time would otherwise see it
silently unset. _scrub_child_env now emits a one-shot debug log naming the
dropped non-secret HERMES_* vars and pointing at the env_passthrough
opt-in escape hatch. Secret-shaped vars are never named in the log.

Tests: dropped vars are logged + env_passthrough named; no log when
nothing is dropped; secret vars excluded from the diagnostic.
2026-05-29 03:44:49 -07:00
4bdae34771 test(code-exec): regression suite for the approval-bypass cluster
Cover context+callback propagation and teardown-clears, a source guard that both RPC threads stay wrapped, the check_execute_code_guard decision matrix (isolated backend, headless-local, cron-deny, gateway approve/deny/timeout/missing-notify, smart mode, session-yolo), the env-scrub allowlist/secret rules, and a behavioral test that execute_code() blocks before spawning on denial.

Refs #4146, #27303, #30882, #33057
2026-05-29 03:44:49 -07:00
655090b3d3 feat(gateway): warn at startup on manual approvals with no risk assessor
When approvals.mode=manual with security.tirith_enabled off and no auxiliary.approval model, dangerous commands and execute_code scripts can only be gated by live in-chat approval; with routing fixed they now fail closed (block) rather than silently auto-run. Surface that at startup so operators knowingly enable tirith or auxiliary.approval for unattended gateways.

Refs #30882
2026-05-29 03:44:49 -07:00
1083977261 fix(code-exec): restore approval context in execute_code RPC threads + guard entry
Wrap both execute_code RPC threads (local UDS + remote file-RPC) with propagate_context_to_thread so gateway sessions no longer fall into check_dangerous_command's non-interactive auto-approve branch and the CLI approval prompt stays reachable. Add check_execute_code_guard: one-shot fail-closed approval of the whole script in gateway/ask/cron-deny before the child spawns (skips isolated backends; command-string built only past the early returns). Drop the broad HERMES_ env passthrough for an explicit operational allowlist plus DSN/WEBHOOK secret substrings, and update the POSIX-equivalence oracle.

Refs #4146, #27303, #30882, #33057
2026-05-29 03:44:49 -07:00
21aeefe5fd fix(code-exec): propagate agent-turn context into tool worker threads
Worker threads that dispatch Hermes tools started with an empty contextvars.Context and no thread-local approval/sudo callbacks. Add tools/thread_context.propagate_context_to_thread factoring that capture/install/clear lifecycle (mirrors the GHSA-qg5c-hvr5-hjgr pattern), and refactor agent/tool_executor onto it so the security-critical logic lives in one audited place. Update the contextvar-propagation source guard for the new call shape.

Refs #33057
2026-05-29 03:44:49 -07:00
a22c250001 refactor(auth): remove vestigial Nous min_key_ttl/inference_auth_mode params
After the legacy session-key path was removed, two parameters became dead
surface on the Nous runtime-resolution chain:

- min_key_ttl_seconds: del'd inside refresh_nous_oauth_pure and pass-through /
  telemetry-only in refresh_nous_oauth_from_state, _try_import_shared_nous_state,
  _nous_device_code_login, and resolve_nous_runtime_credentials. It controlled the
  now-deleted agent-key mint TTL and drives no behavior.
- inference_auth_mode: with the legacy mode gone, AUTO and FRESH are behaviorally
  identical; the value only fed _normalize_nous_inference_auth_mode validation and
  oauth trace output, never a branch.

Removing inference_auth_mode orphaned its whole supporting cluster
(NOUS_INFERENCE_AUTH_MODE_AUTO/FRESH, NOUS_INFERENCE_AUTH_MODES,
_normalize_nous_inference_auth_mode), and dropping min_key_ttl_seconds orphaned
DEFAULT_AGENT_KEY_MIN_TTL_SECONDS — all deleted here.

Updated every caller (run_agent, auxiliary_client, credential_pool, proxy adapter,
runtime_provider, web_server, main, auth_commands, setup) and pruned the matching
test kwargs. Deleted two tests that exercised the removed surface
(test_legacy_auth_mode_is_rejected, test_try_refresh_..._accepts_explicit_auth_mode).

No behavior change: net -134 LOC of dead code.
2026-05-29 02:24:48 -07:00
95cf8f9842 refactor(auth): drop weak JWT-shape fallback in auxiliary _nous_api_key
The import-failure fallback returned any 3-segment token without scope/
expiry validation, a divergent reimplementation of the canonical
_nous_invoke_jwt_is_usable check. The import is from the same module that
provides resolve_nous_runtime_credentials, so a failure means the whole
auxiliary Nous path is unavailable anyway; return "" instead so the caller
falls through to the clear 'run: hermes auth add nous' guidance rather than
handing back an unvalidated token.
2026-05-29 02:24:48 -07:00
4e4984a11a test(auth): update nous jwt-only expectations 2026-05-29 02:24:48 -07:00
7e958dafc2 fix(auth): address Nous JWT fallback review 2026-05-29 02:24:48 -07:00
41ff6e5937 refactor(auth): Disable Nous legacy session key fallback 2026-05-29 02:24:48 -07:00
a87f0a82a5 test(tool-search): redact secrets from harness transcripts + console
The live harness runs against a real OpenRouter key; record['error'] is a
full traceback that, on an auth failure, could echo a request header or URL
containing the key. _redact_secrets() now masks the live OPENROUTER_API_KEY,
any sk-/sk-or- bearer token, and Authorization/Bearer headers before
final_response and error enter the transcript or the console print. Addresses
the CodeQL clear-text-storage/logging findings at the source.
2026-05-29 02:04:12 -07:00
18c9e89106 test: update _invoke_tool dispatch assertion for new toolset-scope kwargs
The scoping fix added enabled_toolsets/disabled_toolsets to the
agent_runtime_helpers sequential dispatch into handle_function_call, so
test_invoke_tool_dispatches_to_handle_function_call's assert_called_once_with
(exact match) needs the two new kwargs. Both are None for the default agent
fixture.
2026-05-29 02:04:12 -07:00
1709776120 test(tool-search): add live A/B harness, drop checked-in transcripts
Brings in the tool_search live-test harness from the original PR but leaves
out the 11 checked-in scripts/out/*.json transcript files — those are
non-deterministic model output that goes stale the moment the model changes
and were the bulk of the diff. scripts/out/ is now gitignored so a harness
run never re-commits them.

Fixes on top:
- API-key loading goes through hermes_cli.env_loader.load_hermes_dotenv
  instead of hand-parsing ~/.hermes/.env and assigning the value to a local.
  The canonical loader never materializes the secret in a local variable in
  this module, which clears the four CodeQL high alerts
  (py/clear-text-storage / py/clear-text-logging-sensitive-data at the
  transcript write/print sites — they were tracing the key from the
  hand-rolled parser into the records) and removes a hand-rolled parser.
- encoding='utf-8' on every write_text/read_text in both harness scripts
  (Windows-footgun hygiene).

Co-authored-by: teknium1 <127238744+teknium1@users.noreply.github.com>
2026-05-29 02:04:12 -07:00
7427b9d581 fix(tool-search): scope bridge catalog + dispatch to the session's toolsets
Tool Search read its catalog from the global registry (get_tool_definitions
with no toolset scope = 'start with everything'), so a restricted-toolset
session — subagent, kanban worker, curated gateway session — could:

  1. tool_search the entire process registry, not just its granted tools, and
  2. tool_call any registered plugin/MCP tool it was never given, because
     registry.dispatch() has no enabled_tools gate for non-execute_code tools.

A scoped session (enabled_toolsets=['mcp-github']) reported total_available=26
and successfully invoked an out-of-scope plugin tool via tool_call.

Fix:
- handle_function_call gains enabled_toolsets/disabled_toolsets; the bridge
  dispatch scopes get_tool_definitions to them (also stops polluting the
  process-global _last_resolved_tool_names with out-of-scope tools, which
  leaked into execute_code's sandbox-tool fallback).
- A defense-in-depth gate rejects any tool_call'd name not in the scoped
  deferrable catalog.
- tool_executor's unwrap (both concurrent + sequential paths) enforces the
  same scope before dispatch, since it unwraps tool_call -> underlying name
  and bypasses the bridge branch. New _tool_search_scoped_names() helper,
  cached per-agent on registry generation + toolset scope.
- New scoped_deferrable_names() helper in tool_search.py shared by both sites.

Tests: 4 new regression tests in TestRegression_ToolsetScoping (scoped
catalog, out-of-scope tool_call rejection, no global pollution, helper).
2026-05-29 02:04:12 -07:00
369075dc95 feat(tools): progressive tool disclosure for MCP and plugin tools
Adds Tool Search, a structured-tools progressive-disclosure layer that
replaces MCP and non-core plugin tools in the model-visible tools array
with three bridge tools (tool_search / tool_describe / tool_call) when
the deferrable surface would consume more than a configurable percentage
of the active model's context window. Core Hermes tools are never deferred.

Default mode is 'auto' with a 10% context threshold, so small toolsets
pay no overhead. Set tools.tool_search.enabled to 'on' to force or 'off'
to disable.

Design carefully reflects the OpenClaw production failure modes
documented in the openclaw-tool-search-report:

  - Core tools never defer (toolsets._HERMES_CORE_TOOLS). Addresses the
    'tools silently missing from isolated cron turns' regression class
    (openclaw#84141) by construction: there is no code path that can
    drop a core tool.
  - Catalog is stateless across turns — rebuilt from the live tool-defs
    list on every assembly. No session-keyed Map that can drift out of
    sync with the registry.
  - tool_call unwraps the bridge call before any hook fires, so plugin
    pre/post hooks, guardrails, approval flows, and the activity feed
    all see the underlying tool name, not the bridge (addresses
    openclaw#85588 and the verbose-mode complaint on openclaw#79823).
  - The unwrap happens in both the parallel and sequential paths of
    agent/tool_executor.py and also in handle_function_call, so direct
    callers (sandboxed code, eval harnesses) are covered too.
  - Bridge tools cannot invoke each other (recursion guard) and cannot
    invoke core tools (those must be called directly).
  - Tools mode only — no JS-sandbox code-mode. Keeps the surface small.
  - Token estimation via cheap char/4 heuristic; precision isn't needed
    for the threshold decision.

Files:
  - tools/tool_search.py — new module (BM25 retrieval, classification,
    threshold gate, bridge dispatch, unwrap helper).
  - tests/tools/test_tool_search.py — 35 tests including the OpenClaw
    #84141 regression guard.
  - model_tools.py — wires assembly into _compute_tool_definitions as the
    final step, adds skip_tool_search_assembly kwarg so the bridge can
    see the real catalog, dispatches the three bridge tools.
  - agent/tool_executor.py — unwraps tool_call in both parallel and
    sequential parsing loops so checkpointing, guardrails, plugin hooks,
    and tool-progress callbacks all observe the underlying tool name.
  - hermes_cli/config.py — DEFAULT_CONFIG['tools']['tool_search'] block.
  - website/docs/user-guide/features/tool-search.md — user docs.

Validation:
  - 35/35 new tests pass.
  - Existing tool/registry/model_tools/config/coercion/executor tests
    (82 + 74 + small adjacents) green.
  - Live E2E: 20 fake MCP tools registered, get_tool_definitions returns
    3 bridges, tool_search returns top 3 hits, tool_describe returns
    full schema, tool_call dispatches to the real underlying handler
    and the underlying result is what the model sees.
  - Reserved-name recursion guard verified live.
  - Core-tool refusal via tool_call verified live.
2026-05-29 02:04:12 -07:00
73d73f1f0d fix(codex): relax no-byte TTFB watchdog default from 12s to 120s
The chatgpt.com/backend-api/codex endpoint can spend tens of seconds in
backend admission / prompt prefill before emitting its first SSE event. The
12s no-byte TTFB cutoff aborted those still-valid streams, surfacing as
'Codex stream produced no bytes within 12s' through all retries (Discord
reports). The OpenAI SDK's own streaming read timeout is 600s, so 12s was
~50x more aggressive than the transport layer would have tolerated.

Default the no-byte cutoff to 120s and raise the openai-codex MAX cap default
to 120s so it no longer clamps the new default back to 20s. Disabling stays
available via HERMES_CODEX_TTFB_TIMEOUT_SECONDS=0; the 25k-token auto-disable,
_STRICT override, and post-first-event idle watchdog are unchanged.

Co-authored-by: Gille <4317663+helix4u@users.noreply.github.com>
2026-05-29 02:02:25 -07:00
6bebab4761 fix(security): narrow Bedrock subprocess strip to inference bearer token only
Scopes the AWS_SDK subprocess strip down from the full AWS credential chain
to just AWS_BEARER_TOKEN_BEDROCK — the only Hermes-managed *inference* secret
(analogous to OPENAI_API_KEY). The general AWS credential chain
(AWS_ACCESS_KEY_ID / AWS_SECRET_ACCESS_KEY / AWS_SESSION_TOKEN / AWS_PROFILE
/ config + role pointers) is intentionally left inheritable.

Why: per SECURITY.md §3.2 the local terminal is the user's trusted operator
shell. Hard-blocklisting the general chain would (a) regress *every* user who
runs aws/terraform/cdk/boto3 in the agent terminal — not just Bedrock users,
since PROVIDER_REGISTRY is iterated unconditionally at import — and (b) be
unrecoverable, because env_passthrough.py refuses to re-allow anything in
_HERMES_PROVIDER_ENV_BLOCKLIST (GHSA-rhgp-j443-p4rf). The narrow strip closes
the reported leak (opencode enumerating the Bedrock catalog off the leaked
bearer token) with no capability loss.

Keeps zapabob's self-healing auth_type=="aws_sdk" mechanism so any future
SDK-cred provider is covered automatically.

Tests: bearer token stripped + general chain preserved (no-regression guard),
on both the runtime strip path and the blocklist-membership path.

Co-authored-by: zapabob <1920071390@campus.ouj.ac.jp>
2026-05-29 01:48:08 -07:00
95b5b72404 fix(security): block AWS SDK creds from subprocess env 2026-05-29 01:48:08 -07:00
db2ce9e7d2 fix(compression): fail open when lock subsystem is missing (version skew) (#34475)
A process running mismatched module versions — conversation_compression.py
re-imported with the post-#34351 lock code while a long-lived
hermes_state.SessionDB stays bound to the pre-#34351 class in memory — has
the try_acquire_compression_lock call site but not the method. The
AttributeError it raises is NOT a sqlite3.Error, so the method's own
fail-open guard never runs; the exception escapes to the outer agent loop,
which prints the error and retries. Compression never succeeds, the token
count never drops, and the loop re-triggers compaction forever (the
'API call #47/#48/#49 ... has no attribute try_acquire_compression_lock'
spin a user hit after an update).

Wrap the lock acquire so any unexpected exception fails OPEN: skip locking
and proceed with compression. Skipping the lock risks a rare
concurrent-compression session fork; an infinite no-progress loop that never
compresses at all is strictly worse. The remediation hint in the log points
at the real fix (restart / hermes update to resync the stale module).

Also guards get_compression_lock_holder against the same skew.

Adds a regression test simulating the version skew (real SessionDB wrapped
so only the lock methods raise AttributeError) — asserts _compress_context
proceeds and rotates instead of raising.
2026-05-29 01:32:32 -07:00
e28a668b40 fix(gateway): diagnosable MEDIA rejections + canonical cache roots + null-path guard
Operators can now see which MEDIA path was dropped and why, generated
artifacts under the canonical ~/.hermes/cache/{images,...} layout deliver,
and a crafted ~\x00 path no longer aborts the whole attachment batch.

- MEDIA_DELIVERY_SAFE_ROOTS: add canonical cache/{images,audio,videos,
  documents,screenshots} alongside the legacy *_cache dirs (#31733).
- filter_media/local_delivery_paths: log the rejected path (was a blind
  "outside allowed roots") via _log_safe_path, which strips control chars
  and Unicode line separators so a model-emitted path can't forge a log line.
- validate_media_delivery_path + extract_media: guard os.path.expanduser
  so a ~\x00 path returns None / is skipped instead of raising and dropping
  every other attachment in the response.

Salvaged and slimmed from #33251 (780 LOC -> 35): the reason-tag taxonomy,
the parts-eliding redactor, and the extension-partition hoist are dropped in
favor of logging the path directly. All three findings were verified and
reproduced by the contributor.

Co-authored-by: wysie <wysie@users.noreply.github.com>
2026-05-29 01:23:35 -07:00
2765b02021 fix(packaging): ship bundled plugin.yaml manifests in wheel and sdist
The v0.15.0 PyPI wheel shipped every plugin's Python code but none of its
plugin.yaml manifests, so plugin discovery (hermes_cli/plugins.py) found zero
plugins and ALL gateway platforms failed with "No adapter available for
<platform>" (discord, slack, mattermost, ...). Same gap also dropped the
web-search provider manifests (#28149).

Declare manifest coverage in both packaging channels:
- wheel: [tool.setuptools.package-data] plugins += **/plugin.yaml, **/plugin.yml
- sdist: MANIFEST.in recursive-include plugins plugin.yaml plugin.yml
  (Homebrew and other downstream packagers build from the sdist)

Verified by building the wheel before/after: plugin.yaml count went 0 -> 69,
discord's manifest now ships. Adds a regression test asserting both channels
cover manifests.

Fixes #34034

Co-authored-by: outsourc-e <201563152+outsourc-e@users.noreply.github.com>
Co-authored-by: Dhruvil Parikh <41384593+dparikh79@users.noreply.github.com>
Co-authored-by: ousiaresearch <261687298+ousiaresearch@users.noreply.github.com>
Co-authored-by: libre-7 <6366424+libre-7@users.noreply.github.com>
2026-05-29 01:23:28 -07:00
c01a2df0a3 fix(auth): don't launch a text-mode browser inside the terminal for OAuth (#34479)
OAuth auto-open only checked _is_remote_session() (SSH + cloud-shell env
vars). On a headless/CLI-only Linux box with no GUI browser, none of those
trip, so webbrowser.open() resolved to a console browser (w3m/lynx/links)
and launched it INSIDE the terminal — hijacking the user's TTY with the
xAI 'Account Management' login page instead of letting them copy the URL.

Add _can_open_graphical_browser(): returns False when webbrowser would
resolve to a known console browser, when $BROWSER names one, when there's
no display server on Linux, or when no browser resolves at all. Gate all 5
OAuth auto-open callsites (xAI loopback, Spotify loopback, MiniMax device
code, Anthropic, Google) on it in addition to the existing remote check.
Headless boxes now print the URL / fall through to manual-paste instead.
2026-05-29 01:23:06 -07:00
f247686c42 feat(yuanbao): cache resolved media resources by resourceId
Add an in-memory resourceId->local-path cache (24h TTL, 256-entry LRU) to
MediaResolveMiddleware so the same Yuanbao resource isn't re-downloaded when
it's referenced more than once in a session (own attachment, then quoted, then
group-observed backfill). Each reference otherwise triggers a fresh token
exchange + COS download.

The cache verifies the file still exists on disk before returning a hit (cache
dir may be swept) and is threaded through all three resolve paths:
_resolve_media_urls (rid parsed from placeholder URL), _collect_observed_media,
and the DispatchMiddleware quote path.

Salvaged from PR #30418 by @loongfay; the broader middleware refactor in that
PR converged with work already merged on main, so only the net-new download
cache is carried over.
2026-05-29 01:05:00 -07:00
f32b66c758 fix: improve plugins list usability 2026-05-29 00:59:42 -07:00
c692000a57 docs(xai-oauth): mirror bare-code paste note to the primary guide (#33917)
The original PR diff updated two guides (oauth-over-ssh.md and
xai-grok-oauth.md) but only the oauth-over-ssh.md edit landed in the
PR's actual commit.  Mirror the note to the primary xai-grok-oauth.md
guide too so users reading the main entry point don't miss the
bare-code form that already shipped in #33880.
2026-05-29 00:57:13 -07:00
Evo
2410e11395 docs(xai-oauth): note bare-code manual-paste from #33880 2026-05-29 00:57:13 -07:00
0384398c65 chore(release): map blackpilledsoftware-prog email to GitHub login
Required by CI author validation after salvaging PR #16780.
2026-05-29 00:31:44 -07:00
26b83a5f5f fix(cli): ignore terminal focus reports (salvage of #16780)
Ghostty/macOS window or tab navigation (Cmd+Shift+[ / ], Alt+Tab,
etc.) can deliver terminal focus reports (CSI I / CSI O) to the
running TUI. prompt_toolkit does not map those sequences by default,
so its parser falls back to literal key presses (ESC, [, I/O) and
inserts `[I` / `[O` into the prompt buffer after the ESC byte is
handled.

Fix: register the two sequences as Keys.Ignore in ANSI_SEQUENCES at
parser level, plus a no-op kb.add(Keys.Ignore) handler so the
default self-insert path never inserts focus-report bytes.

Salvage notes: original PR put the helper in cli.py. Salvaged into
hermes_cli/pt_input_extras.py alongside install_shift_enter_alias /
install_ctrl_enter_alias to match the established pattern for
ANSI_SEQUENCES augmentation. setdefault → in-check so any prior user
registration wins.

Closes #16780
2026-05-29 00:31:44 -07:00
c1485d52e3 chore(release): add moikapy AUTHOR_MAP for PR #31527 salvage 2026-05-29 00:28:02 -07:00
f6a2ba6261 fix(auxiliary): detect xAI OAuth 403 bad-credentials as auth error
xAI returns HTTP 403 (not 401) with unauthenticated:bad-credentials
when an OAuth2 access token has expired or is invalid. The existing
_is_auth_error() only checked for 401 status codes, so these tokens
were never refreshed and the 403 propagated as a generic permission
denied error.

Three fixes:

1. _is_auth_error: Recognize xAI's 403+bad-credentials pattern as
   an auth failure, triggering token refresh instead of silent failure.

2. _refresh_provider_credentials: Add xai-oauth branch with
   pool-level refresh (try_refresh_current with select to ensure
   current entry) then fallback to singleton resolver with
   force_refresh=True.

3. _recoverable_pool_provider: Map api.x.ai host to xai-oauth
   pool for auto-resolved providers, matching existing pattern for
   openai-codex/openrouter/nous/anthropic.

Includes 14 tests covering the new detection logic, host mapping,
and graceful fallback behavior.

Signed-off-by: moikapy <moikapy@devmoi.com>
2026-05-29 00:28:02 -07:00
bc736ff543 test(model-catalog): use exact URL equality in fallback tests
CodeQL flagged 'hermes-agent.nousresearch.com' in url and similar substring
checks as py/incomplete-url-substring-sanitization. The rule is about URL
allowlist checks in production code, not test routing — there's no
security boundary here. Switch to url == self.PRIMARY / self.FALLBACK,
which is the same semantic and silences the rule.
2026-05-29 00:25:36 -07:00
f2d88c820c fix(model-catalog): fall through to raw.github when Vercel 403s; swap step-3.5-flash for step-3.7-flash on OpenRouter+Nous
The docs site (Vercel) serves /docs/api/model-catalog.json behind a bot
mitigation rule that returns HTTP 403 + x-vercel-mitigated: challenge for
non-browser User-Agents — including urllib (what the CLI uses) and curl.
When that happens, get_catalog() falls back to the stale disk cache and
new model releases (Opus 4.8, etc.) never reach the /model picker even
though they're already in OPENROUTER_MODELS and the live OpenRouter API.

Adds a fallback URL chain: when the primary catalog URL fails, walk
DEFAULT_CATALOG_FALLBACK_URLS — currently the raw.githubusercontent.com
copy of the same file. GitHub raw doesn't bot-gate, so the manifest stays
reachable through Vercel firewall hiccups. Per-provider override URLs
keep their direct-fetch semantics (operators configure those specifically,
no implicit fallback).

Also swaps stepfun/step-3.5-flash for stepfun/step-3.7-flash in the
OpenRouter + Nous Portal curated picker lists. Native stepfun provider
configuration (api.stepfun.ai) is left alone — that depends on what
stepfun.ai itself serves, not what OpenRouter routes.

Test plan: 5 new TestFallbackChain tests cover primary-success,
primary-failure-fallback-success, all-fail, primary==fallback-dedup, and
end-to-end get_catalog routing through the new helper. Existing 23 tests
in test_model_catalog.py still pass (28 total). Wider tests/hermes_cli/
sweep: 5701/5701 pass.
2026-05-29 00:25:36 -07:00
8d57281650 chore: add AUTHOR_MAP entry for Interstellar-code 2026-05-29 00:21:54 -07:00
9d4fda9952 feat(kanban): add POST /runs/{run_id}/terminate endpoint
Closes the termination-control gap left by PR #28432, which shipped the
read-only sibling endpoints (/workers/active, /runs/{run_id},
/runs/{run_id}/inspect) but no way to stop a misbehaving worker from
the dashboard without dropping to the CLI.

The new endpoint resolves run_id -> task_id and delegates to the
existing kanban_db.reclaim_task() flow, so the SIGTERM->SIGKILL
escalation, run-outcome bookkeeping, and event-log append all match
POST /tasks/{task_id}/reclaim exactly. No new termination semantics
introduced.

Responses:
  200 {ok, run_id, task_id} on success
  404 unknown run_id
  409 run already ended OR task no longer reclaimable

Refs: #23762
2026-05-29 00:21:54 -07:00
7d10105918 test(kanban): update iteration-exhaustion tests for #29747 gap 2
The two tests in TestRunConversation now verify the new behavior:
  - test_kanban_block_called_on_iteration_exhaustion → verifies
    _record_task_failure(outcome='timed_out') is called instead of
    kanban_block
  - test_no_kanban_block_when_not_in_kanban_mode → verifies the bridge
    is a no-op when HERMES_KANBAN_TASK is unset

The function names are kept for diff stability; both assert against
_record_task_failure now, which is the correct contract per the gap-2
fix in this PR.
2026-05-29 00:13:29 -07:00
592a4ffb6b fix(kanban): close three blocked/iteration-exhausted handling gaps (#29747)
Reporter diagnosed three independent gaps that together allowed infinite
'unblock → re-stuck' loops with no surfacing or escalation:

GAP 1: `_rule_stuck_in_blocked` resets timer on any `commented`/`unblocked`
event, so a task that cycles every few minutes is invisible to it
regardless of how many times it cycles.

Fix: new `_rule_block_unblock_cycling` rule (`hermes_cli/kanban_diagnostics.py`)
that counts block→unblock cycles in a sliding window. Default threshold
3 cycles within 24h, configurable via `block_cycle_threshold` /
`block_cycle_window_seconds`. Walks events in arrival order (event id)
since multiple events can share the same `created_at` second. Fires as a
warning with a CLI hint to inspect the block reasons.

GAP 2: Iteration-budget-exhausted runs in kanban workers map to
`kanban_block` (status=blocked, but a clean exit from the kernel's
perspective). `_rule_repeated_failures` reads `consecutive_failures`,
which `_record_task_failure` increments only for crashed/timed_out/
spawn_failed — `blocked` outcome bypasses the failure counter, so the
`kanban.failure_limit` circuit breaker never trips on budget-exhaustion
loops.

Fix: `agent/conversation_loop.py` budget-exhaustion path now calls
`_record_task_failure(outcome="timed_out")` instead of `kanban_block`.
Budget exhaustion is genuinely a timeout-shaped failure (the task ran out
of allowed iterations), so this is more honest semantics; it also routes
through the unified failure counter, so repeated budget exhaustions trip
the circuit breaker and the task auto-blocks with `gave_up` after
`failure_limit` retries.

GAP 3: `release_stale_claims` uses `_pid_alive(worker_pid)` only and
ignores `last_heartbeat_at`. Reporter observed a 91-min run that held
its claim with frozen heartbeat because the worker entered a logic loop
with no tool calls — `_pid_alive` kept returning True so the claim was
extended every 15 minutes indefinitely.

Fix: heartbeat-stale backstop. If `last_heartbeat_at` is set AND older
than `DEFAULT_CLAIM_HEARTBEAT_MAX_STALE_SECONDS` (default 1h), reclaim
even if the PID is alive. NULL `last_heartbeat_at` preserves backward
compatibility (no heartbeat yet = extend, as before). The reclaim event
payload now includes a `heartbeat_stale` boolean so operators see why a
live-PID worker was reclaimed.

This works cleanly in concert with PR #34418 (#31752 runtime → heartbeat
bridge): once `_touch_activity` keeps `last_heartbeat_at` fresh as a
side effect of normal API traffic, the backstop only fires for genuinely
wedged workers (no chunks, no tool results, no progress at all).

Co-authored-by: baofuen <45189813+baofuen@users.noreply.github.com>
2026-05-29 00:13:29 -07:00
bc31ee5cf8 fix(kanban): bridge worker runtime activity to board heartbeat (#31752)
The dispatcher watchdog (release_stale_claims) reads tasks.last_heartbeat_at
to decide whether to reclaim a running task. The agent maintains its own
in-process `_last_activity_ts` for every chunk/tool result, but those
liveness ticks never reach the board unless the model explicitly calls
the `kanban_heartbeat` tool — so a worker actively executing a long run
without tool-level heartbeats can be reclaimed mid-flight as 'stale',
returning the task to ready and orphaning the in-flight worker's progress.

Fix: in `_touch_activity` (the canonical 'we just did work' hook in
run_agent.py), call a new `heartbeat_current_worker_from_env` helper
in `tools/kanban_tools.py` that:

- No-ops outside dispatcher-spawned worker context (no HERMES_KANBAN_TASK).
- Rate-limited to one DB write per 60s (runtime activity ticks too often
  to faithfully mirror; we just need the watchdog to see liveness).
- Best-effort: never raises. heartbeat_claim + heartbeat_worker calls are
  individually try/except'd; any DB error logs at debug and returns.
- Uses worker env identity: HERMES_KANBAN_TASK + HERMES_KANBAN_RUN_ID +
  HERMES_KANBAN_CLAIM_LOCK (all pinned by the dispatcher at spawn time).
- No durable note on auto-heartbeats — that's reserved for the explicit
  `kanban_heartbeat` tool which carries a model-supplied note.

The explicit `kanban_heartbeat` tool stays available unchanged for
workers that want to attach a note or pre-emptively extend a claim
across a known-long single tool call.

Co-authored-by: faisfamilytravel <223516181+faisfamilytravel@users.noreply.github.com>
2026-05-29 00:05:58 -07:00
40217aa194 fix(kanban): tell workers not to use clarify; route to kanban_block instead (#32167)
Kanban workers run headless — no live user is on the other side of `clarify`,
so the call times out (~120s default) and the task sits silently in `running`
with no signal to the operator that input is needed. Reporter observed a real
incident where a worker asked 'promote to production, or check staging first?'
via clarify, the call timed out, the agent hallucinated a fallback, and the
task sat 'running' for hours.

Fix: explicit 'do not call clarify' bullet in two surfaces every kanban worker
sees —

- `agent/prompt_builder.py` KANBAN_GUIDANCE `## Do NOT` section (auto-injected
  into every dispatcher-spawned worker run).
- `skills/devops/kanban-worker/SKILL.md` `## Do NOT` section (the bundled
  worker skill).

Both point at the right pattern: `kanban_comment` (context) + `kanban_block`
(decision needed) — the task surfaces on the board as blocked, the operator
sees it, unblocks with their answer in a comment, and the worker respawns
with the thread.

Co-authored-by: kweiner <17778+kweiner@users.noreply.github.com>
2026-05-28 23:57:20 -07:00