hermes-agent

Author	SHA1	Message	Date
tillfalko	2402ec5e7b	test: extend test coverage to native image routing	2026-05-29 03:58:56 -07:00
tillfalko	f8b8dffccf	fix(browser): add native image support to browser_vision and respect supports_vision	2026-05-29 03:58:56 -07:00
tillfalko	f05353397d	fix(vision): respect supports_vision in vision_analyze	2026-05-29 03:58:56 -07:00
EloquentBrush0x	784d8dd2c2	fix(matrix): fail-closed approval reaction auth when MATRIX_ALLOWED_USERS is empty The _on_reaction approval handler used: if self._allowed_user_ids and sender not in self._allowed_user_ids: When MATRIX_ALLOWED_USERS is not configured, _allowed_user_ids is an empty set. The short-circuit on the empty set caused the deny block to never execute, allowing any Matrix room member to approve or deny tool calls via ✅/❎ reactions — even users that run.py's _is_user_authorized would reject for regular messages. Fix mirrors the Telegram _is_callback_user_authorized fix (commit `89d32052e`, PR #28494): deny by default when no allowlist is configured, unless GATEWAY_ALLOW_ALL_USERS=true is explicitly set.	2026-05-29 03:58:45 -07:00
teknium1	3171845479	fix(code-exec): make dropped HERMES_* env vars diagnosable in sandbox scrub Follow-up mitigation for the #27303 env-scrub tightening. Dropping the broad HERMES_ prefix in favor of a 4-var operational allowlist is correct hardening, but a sandbox script that imports a repo module reading a non-allowlisted HERMES_* var at import time would otherwise see it silently unset. _scrub_child_env now emits a one-shot debug log naming the dropped non-secret HERMES_* vars and pointing at the env_passthrough opt-in escape hatch. Secret-shaped vars are never named in the log. Tests: dropped vars are logged + env_passthrough named; no log when nothing is dropped; secret vars excluded from the diagnostic.	2026-05-29 03:44:49 -07:00
firefly	4bdae34771	test(code-exec): regression suite for the approval-bypass cluster Cover context+callback propagation and teardown-clears, a source guard that both RPC threads stay wrapped, the check_execute_code_guard decision matrix (isolated backend, headless-local, cron-deny, gateway approve/deny/timeout/missing-notify, smart mode, session-yolo), the env-scrub allowlist/secret rules, and a behavioral test that execute_code() blocks before spawning on denial. Refs #4146, #27303, #30882, #33057	2026-05-29 03:44:49 -07:00
firefly	655090b3d3	feat(gateway): warn at startup on manual approvals with no risk assessor When approvals.mode=manual with security.tirith_enabled off and no auxiliary.approval model, dangerous commands and execute_code scripts can only be gated by live in-chat approval; with routing fixed they now fail closed (block) rather than silently auto-run. Surface that at startup so operators knowingly enable tirith or auxiliary.approval for unattended gateways. Refs #30882	2026-05-29 03:44:49 -07:00
firefly	1083977261	fix(code-exec): restore approval context in execute_code RPC threads + guard entry Wrap both execute_code RPC threads (local UDS + remote file-RPC) with propagate_context_to_thread so gateway sessions no longer fall into check_dangerous_command's non-interactive auto-approve branch and the CLI approval prompt stays reachable. Add check_execute_code_guard: one-shot fail-closed approval of the whole script in gateway/ask/cron-deny before the child spawns (skips isolated backends; command-string built only past the early returns). Drop the broad HERMES_ env passthrough for an explicit operational allowlist plus DSN/WEBHOOK secret substrings, and update the POSIX-equivalence oracle. Refs #4146, #27303, #30882, #33057	2026-05-29 03:44:49 -07:00
firefly	21aeefe5fd	fix(code-exec): propagate agent-turn context into tool worker threads Worker threads that dispatch Hermes tools started with an empty contextvars.Context and no thread-local approval/sudo callbacks. Add tools/thread_context.propagate_context_to_thread factoring that capture/install/clear lifecycle (mirrors the GHSA-qg5c-hvr5-hjgr pattern), and refactor agent/tool_executor onto it so the security-critical logic lives in one audited place. Update the contextvar-propagation source guard for the new call shape. Refs #33057	2026-05-29 03:44:49 -07:00
kshitijk4poor	a22c250001	refactor(auth): remove vestigial Nous min_key_ttl/inference_auth_mode params After the legacy session-key path was removed, two parameters became dead surface on the Nous runtime-resolution chain: - min_key_ttl_seconds: del'd inside refresh_nous_oauth_pure and pass-through / telemetry-only in refresh_nous_oauth_from_state, _try_import_shared_nous_state, _nous_device_code_login, and resolve_nous_runtime_credentials. It controlled the now-deleted agent-key mint TTL and drives no behavior. - inference_auth_mode: with the legacy mode gone, AUTO and FRESH are behaviorally identical; the value only fed _normalize_nous_inference_auth_mode validation and oauth trace output, never a branch. Removing inference_auth_mode orphaned its whole supporting cluster (NOUS_INFERENCE_AUTH_MODE_AUTO/FRESH, NOUS_INFERENCE_AUTH_MODES, _normalize_nous_inference_auth_mode), and dropping min_key_ttl_seconds orphaned DEFAULT_AGENT_KEY_MIN_TTL_SECONDS — all deleted here. Updated every caller (run_agent, auxiliary_client, credential_pool, proxy adapter, runtime_provider, web_server, main, auth_commands, setup) and pruned the matching test kwargs. Deleted two tests that exercised the removed surface (test_legacy_auth_mode_is_rejected, test_try_refresh_..._accepts_explicit_auth_mode). No behavior change: net -134 LOC of dead code.	2026-05-29 02:24:48 -07:00
kshitijk4poor	95cf8f9842	refactor(auth): drop weak JWT-shape fallback in auxiliary _nous_api_key The import-failure fallback returned any 3-segment token without scope/ expiry validation, a divergent reimplementation of the canonical _nous_invoke_jwt_is_usable check. The import is from the same module that provides resolve_nous_runtime_credentials, so a failure means the whole auxiliary Nous path is unavailable anyway; return "" instead so the caller falls through to the clear 'run: hermes auth add nous' guidance rather than handing back an unvalidated token.	2026-05-29 02:24:48 -07:00
Robin Fernandes	4e4984a11a	test(auth): update nous jwt-only expectations	2026-05-29 02:24:48 -07:00
Robin Fernandes	7e958dafc2	fix(auth): address Nous JWT fallback review	2026-05-29 02:24:48 -07:00
Robin Fernandes	41ff6e5937	refactor(auth): Disable Nous legacy session key fallback	2026-05-29 02:24:48 -07:00
teknium1	a87f0a82a5	test(tool-search): redact secrets from harness transcripts + console The live harness runs against a real OpenRouter key; record['error'] is a full traceback that, on an auth failure, could echo a request header or URL containing the key. _redact_secrets() now masks the live OPENROUTER_API_KEY, any sk-/sk-or- bearer token, and Authorization/Bearer headers before final_response and error enter the transcript or the console print. Addresses the CodeQL clear-text-storage/logging findings at the source.	2026-05-29 02:04:12 -07:00
teknium1	18c9e89106	test: update _invoke_tool dispatch assertion for new toolset-scope kwargs The scoping fix added enabled_toolsets/disabled_toolsets to the agent_runtime_helpers sequential dispatch into handle_function_call, so test_invoke_tool_dispatches_to_handle_function_call's assert_called_once_with (exact match) needs the two new kwargs. Both are None for the default agent fixture.	2026-05-29 02:04:12 -07:00
teknium1	1709776120	test(tool-search): add live A/B harness, drop checked-in transcripts Brings in the tool_search live-test harness from the original PR but leaves out the 11 checked-in scripts/out/*.json transcript files — those are non-deterministic model output that goes stale the moment the model changes and were the bulk of the diff. scripts/out/ is now gitignored so a harness run never re-commits them. Fixes on top: - API-key loading goes through hermes_cli.env_loader.load_hermes_dotenv instead of hand-parsing ~/.hermes/.env and assigning the value to a local. The canonical loader never materializes the secret in a local variable in this module, which clears the four CodeQL high alerts (py/clear-text-storage / py/clear-text-logging-sensitive-data at the transcript write/print sites — they were tracing the key from the hand-rolled parser into the records) and removes a hand-rolled parser. - encoding='utf-8' on every write_text/read_text in both harness scripts (Windows-footgun hygiene). Co-authored-by: teknium1 <127238744+teknium1@users.noreply.github.com>	2026-05-29 02:04:12 -07:00
teknium1	7427b9d581	fix(tool-search): scope bridge catalog + dispatch to the session's toolsets Tool Search read its catalog from the global registry (get_tool_definitions with no toolset scope = 'start with everything'), so a restricted-toolset session — subagent, kanban worker, curated gateway session — could: 1. tool_search the entire process registry, not just its granted tools, and 2. tool_call any registered plugin/MCP tool it was never given, because registry.dispatch() has no enabled_tools gate for non-execute_code tools. A scoped session (enabled_toolsets=['mcp-github']) reported total_available=26 and successfully invoked an out-of-scope plugin tool via tool_call. Fix: - handle_function_call gains enabled_toolsets/disabled_toolsets; the bridge dispatch scopes get_tool_definitions to them (also stops polluting the process-global _last_resolved_tool_names with out-of-scope tools, which leaked into execute_code's sandbox-tool fallback). - A defense-in-depth gate rejects any tool_call'd name not in the scoped deferrable catalog. - tool_executor's unwrap (both concurrent + sequential paths) enforces the same scope before dispatch, since it unwraps tool_call -> underlying name and bypasses the bridge branch. New _tool_search_scoped_names() helper, cached per-agent on registry generation + toolset scope. - New scoped_deferrable_names() helper in tool_search.py shared by both sites. Tests: 4 new regression tests in TestRegression_ToolsetScoping (scoped catalog, out-of-scope tool_call rejection, no global pollution, helper).	2026-05-29 02:04:12 -07:00
teknium1	369075dc95	feat(tools): progressive tool disclosure for MCP and plugin tools Adds Tool Search, a structured-tools progressive-disclosure layer that replaces MCP and non-core plugin tools in the model-visible tools array with three bridge tools (tool_search / tool_describe / tool_call) when the deferrable surface would consume more than a configurable percentage of the active model's context window. Core Hermes tools are never deferred. Default mode is 'auto' with a 10% context threshold, so small toolsets pay no overhead. Set tools.tool_search.enabled to 'on' to force or 'off' to disable. Design carefully reflects the OpenClaw production failure modes documented in the openclaw-tool-search-report: - Core tools never defer (toolsets._HERMES_CORE_TOOLS). Addresses the 'tools silently missing from isolated cron turns' regression class (openclaw#84141) by construction: there is no code path that can drop a core tool. - Catalog is stateless across turns — rebuilt from the live tool-defs list on every assembly. No session-keyed Map that can drift out of sync with the registry. - tool_call unwraps the bridge call before any hook fires, so plugin pre/post hooks, guardrails, approval flows, and the activity feed all see the underlying tool name, not the bridge (addresses openclaw#85588 and the verbose-mode complaint on openclaw#79823). - The unwrap happens in both the parallel and sequential paths of agent/tool_executor.py and also in handle_function_call, so direct callers (sandboxed code, eval harnesses) are covered too. - Bridge tools cannot invoke each other (recursion guard) and cannot invoke core tools (those must be called directly). - Tools mode only — no JS-sandbox code-mode. Keeps the surface small. - Token estimation via cheap char/4 heuristic; precision isn't needed for the threshold decision. Files: - tools/tool_search.py — new module (BM25 retrieval, classification, threshold gate, bridge dispatch, unwrap helper). - tests/tools/test_tool_search.py — 35 tests including the OpenClaw #84141 regression guard. - model_tools.py — wires assembly into _compute_tool_definitions as the final step, adds skip_tool_search_assembly kwarg so the bridge can see the real catalog, dispatches the three bridge tools. - agent/tool_executor.py — unwraps tool_call in both parallel and sequential parsing loops so checkpointing, guardrails, plugin hooks, and tool-progress callbacks all observe the underlying tool name. - hermes_cli/config.py — DEFAULT_CONFIG['tools']['tool_search'] block. - website/docs/user-guide/features/tool-search.md — user docs. Validation: - 35/35 new tests pass. - Existing tool/registry/model_tools/config/coercion/executor tests (82 + 74 + small adjacents) green. - Live E2E: 20 fake MCP tools registered, get_tool_definitions returns 3 bridges, tool_search returns top 3 hits, tool_describe returns full schema, tool_call dispatches to the real underlying handler and the underlying result is what the model sees. - Reserved-name recursion guard verified live. - Core-tool refusal via tool_call verified live.	2026-05-29 02:04:12 -07:00
teknium1	73d73f1f0d	fix(codex): relax no-byte TTFB watchdog default from 12s to 120s The chatgpt.com/backend-api/codex endpoint can spend tens of seconds in backend admission / prompt prefill before emitting its first SSE event. The 12s no-byte TTFB cutoff aborted those still-valid streams, surfacing as 'Codex stream produced no bytes within 12s' through all retries (Discord reports). The OpenAI SDK's own streaming read timeout is 600s, so 12s was ~50x more aggressive than the transport layer would have tolerated. Default the no-byte cutoff to 120s and raise the openai-codex MAX cap default to 120s so it no longer clamps the new default back to 20s. Disabling stays available via HERMES_CODEX_TTFB_TIMEOUT_SECONDS=0; the 25k-token auto-disable, _STRICT override, and post-first-event idle watchdog are unchanged. Co-authored-by: Gille <4317663+helix4u@users.noreply.github.com>	2026-05-29 02:02:25 -07:00
teknium1	6bebab4761	fix(security): narrow Bedrock subprocess strip to inference bearer token only Scopes the AWS_SDK subprocess strip down from the full AWS credential chain to just AWS_BEARER_TOKEN_BEDROCK — the only Hermes-managed inference secret (analogous to OPENAI_API_KEY). The general AWS credential chain (AWS_ACCESS_KEY_ID / AWS_SECRET_ACCESS_KEY / AWS_SESSION_TOKEN / AWS_PROFILE / config + role pointers) is intentionally left inheritable. Why: per SECURITY.md §3.2 the local terminal is the user's trusted operator shell. Hard-blocklisting the general chain would (a) regress every user who runs aws/terraform/cdk/boto3 in the agent terminal — not just Bedrock users, since PROVIDER_REGISTRY is iterated unconditionally at import — and (b) be unrecoverable, because env_passthrough.py refuses to re-allow anything in _HERMES_PROVIDER_ENV_BLOCKLIST (GHSA-rhgp-j443-p4rf). The narrow strip closes the reported leak (opencode enumerating the Bedrock catalog off the leaked bearer token) with no capability loss. Keeps zapabob's self-healing auth_type=="aws_sdk" mechanism so any future SDK-cred provider is covered automatically. Tests: bearer token stripped + general chain preserved (no-regression guard), on both the runtime strip path and the blocklist-membership path. Co-authored-by: zapabob <1920071390@campus.ouj.ac.jp>	2026-05-29 01:48:08 -07:00
zapabob	95b5b72404	fix(security): block AWS SDK creds from subprocess env	2026-05-29 01:48:08 -07:00
Teknium	db2ce9e7d2	fix(compression): fail open when lock subsystem is missing (version skew) (#34475 ) A process running mismatched module versions — conversation_compression.py re-imported with the post-#34351 lock code while a long-lived hermes_state.SessionDB stays bound to the pre-#34351 class in memory — has the try_acquire_compression_lock call site but not the method. The AttributeError it raises is NOT a sqlite3.Error, so the method's own fail-open guard never runs; the exception escapes to the outer agent loop, which prints the error and retries. Compression never succeeds, the token count never drops, and the loop re-triggers compaction forever (the 'API call #47/#48/#49 ... has no attribute try_acquire_compression_lock' spin a user hit after an update). Wrap the lock acquire so any unexpected exception fails OPEN: skip locking and proceed with compression. Skipping the lock risks a rare concurrent-compression session fork; an infinite no-progress loop that never compresses at all is strictly worse. The remediation hint in the log points at the real fix (restart / hermes update to resync the stale module). Also guards get_compression_lock_holder against the same skew. Adds a regression test simulating the version skew (real SessionDB wrapped so only the lock methods raise AttributeError) — asserts _compress_context proceeds and rotates instead of raising.	2026-05-29 01:32:32 -07:00
Teknium	e28a668b40	fix(gateway): diagnosable MEDIA rejections + canonical cache roots + null-path guard Operators can now see which MEDIA path was dropped and why, generated artifacts under the canonical ~/.hermes/cache/{images,...} layout deliver, and a crafted ~\x00 path no longer aborts the whole attachment batch. - MEDIA_DELIVERY_SAFE_ROOTS: add canonical cache/{images,audio,videos, documents,screenshots} alongside the legacy *_cache dirs (#31733). - filter_media/local_delivery_paths: log the rejected path (was a blind "outside allowed roots") via _log_safe_path, which strips control chars and Unicode line separators so a model-emitted path can't forge a log line. - validate_media_delivery_path + extract_media: guard os.path.expanduser so a ~\x00 path returns None / is skipped instead of raising and dropping every other attachment in the response. Salvaged and slimmed from #33251 (780 LOC -> 35): the reason-tag taxonomy, the parts-eliding redactor, and the extension-partition hoist are dropped in favor of logging the path directly. All three findings were verified and reproduced by the contributor. Co-authored-by: wysie <wysie@users.noreply.github.com>	2026-05-29 01:23:35 -07:00
teknium1	2765b02021	fix(packaging): ship bundled plugin.yaml manifests in wheel and sdist The v0.15.0 PyPI wheel shipped every plugin's Python code but none of its plugin.yaml manifests, so plugin discovery (hermes_cli/plugins.py) found zero plugins and ALL gateway platforms failed with "No adapter available for <platform>" (discord, slack, mattermost, ...). Same gap also dropped the web-search provider manifests (#28149). Declare manifest coverage in both packaging channels: - wheel: [tool.setuptools.package-data] plugins += /plugin.yaml, /plugin.yml - sdist: MANIFEST.in recursive-include plugins plugin.yaml plugin.yml (Homebrew and other downstream packagers build from the sdist) Verified by building the wheel before/after: plugin.yaml count went 0 -> 69, discord's manifest now ships. Adds a regression test asserting both channels cover manifests. Fixes #34034 Co-authored-by: outsourc-e <201563152+outsourc-e@users.noreply.github.com> Co-authored-by: Dhruvil Parikh <41384593+dparikh79@users.noreply.github.com> Co-authored-by: ousiaresearch <261687298+ousiaresearch@users.noreply.github.com> Co-authored-by: libre-7 <6366424+libre-7@users.noreply.github.com>	2026-05-29 01:23:28 -07:00
Teknium	c01a2df0a3	fix(auth): don't launch a text-mode browser inside the terminal for OAuth (#34479 ) OAuth auto-open only checked _is_remote_session() (SSH + cloud-shell env vars). On a headless/CLI-only Linux box with no GUI browser, none of those trip, so webbrowser.open() resolved to a console browser (w3m/lynx/links) and launched it INSIDE the terminal — hijacking the user's TTY with the xAI 'Account Management' login page instead of letting them copy the URL. Add _can_open_graphical_browser(): returns False when webbrowser would resolve to a known console browser, when $BROWSER names one, when there's no display server on Linux, or when no browser resolves at all. Gate all 5 OAuth auto-open callsites (xAI loopback, Spotify loopback, MiniMax device code, Anthropic, Google) on it in addition to the existing remote check. Headless boxes now print the URL / fall through to manual-paste instead.	2026-05-29 01:23:06 -07:00
loongzhao	f247686c42	feat(yuanbao): cache resolved media resources by resourceId Add an in-memory resourceId->local-path cache (24h TTL, 256-entry LRU) to MediaResolveMiddleware so the same Yuanbao resource isn't re-downloaded when it's referenced more than once in a session (own attachment, then quoted, then group-observed backfill). Each reference otherwise triggers a fresh token exchange + COS download. The cache verifies the file still exists on disk before returning a hit (cache dir may be swept) and is threaded through all three resolve paths: _resolve_media_urls (rid parsed from placeholder URL), _collect_observed_media, and the DispatchMiddleware quote path. Salvaged from PR #30418 by @loongfay; the broader middleware refactor in that PR converged with work already merged on main, so only the net-new download cache is carried over.	2026-05-29 01:05:00 -07:00
wysie	f32b66c758	fix: improve plugins list usability	2026-05-29 00:59:42 -07:00
Teknium	c692000a57	docs(xai-oauth): mirror bare-code paste note to the primary guide (#33917 ) The original PR diff updated two guides (oauth-over-ssh.md and xai-grok-oauth.md) but only the oauth-over-ssh.md edit landed in the PR's actual commit. Mirror the note to the primary xai-grok-oauth.md guide too so users reading the main entry point don't miss the bare-code form that already shipped in #33880.	2026-05-29 00:57:13 -07:00
Evo	2410e11395	docs(xai-oauth): note bare-code manual-paste from #33880	2026-05-29 00:57:13 -07:00
Teknium	0384398c65	chore(release): map blackpilledsoftware-prog email to GitHub login Required by CI author validation after salvaging PR #16780.	2026-05-29 00:31:44 -07:00
Blake	26b83a5f5f	fix(cli): ignore terminal focus reports (salvage of #16780 ) Ghostty/macOS window or tab navigation (Cmd+Shift+[ / ], Alt+Tab, etc.) can deliver terminal focus reports (CSI I / CSI O) to the running TUI. prompt_toolkit does not map those sequences by default, so its parser falls back to literal key presses (ESC, [, I/O) and inserts `[I` / `[O` into the prompt buffer after the ESC byte is handled. Fix: register the two sequences as Keys.Ignore in ANSI_SEQUENCES at parser level, plus a no-op kb.add(Keys.Ignore) handler so the default self-insert path never inserts focus-report bytes. Salvage notes: original PR put the helper in cli.py. Salvaged into hermes_cli/pt_input_extras.py alongside install_shift_enter_alias / install_ctrl_enter_alias to match the established pattern for ANSI_SEQUENCES augmentation. setdefault → in-check so any prior user registration wins. Closes #16780	2026-05-29 00:31:44 -07:00
Teknium	c1485d52e3	chore(release): add moikapy AUTHOR_MAP for PR #31527 salvage	2026-05-29 00:28:02 -07:00
moikapy	f6a2ba6261	fix(auxiliary): detect xAI OAuth 403 bad-credentials as auth error xAI returns HTTP 403 (not 401) with unauthenticated:bad-credentials when an OAuth2 access token has expired or is invalid. The existing _is_auth_error() only checked for 401 status codes, so these tokens were never refreshed and the 403 propagated as a generic permission denied error. Three fixes: 1. _is_auth_error: Recognize xAI's 403+bad-credentials pattern as an auth failure, triggering token refresh instead of silent failure. 2. _refresh_provider_credentials: Add xai-oauth branch with pool-level refresh (try_refresh_current with select to ensure current entry) then fallback to singleton resolver with force_refresh=True. 3. _recoverable_pool_provider: Map api.x.ai host to xai-oauth pool for auto-resolved providers, matching existing pattern for openai-codex/openrouter/nous/anthropic. Includes 14 tests covering the new detection logic, host mapping, and graceful fallback behavior. Signed-off-by: moikapy <moikapy@devmoi.com>	2026-05-29 00:28:02 -07:00
teknium1	bc736ff543	test(model-catalog): use exact URL equality in fallback tests CodeQL flagged 'hermes-agent.nousresearch.com' in url and similar substring checks as py/incomplete-url-substring-sanitization. The rule is about URL allowlist checks in production code, not test routing — there's no security boundary here. Switch to url == self.PRIMARY / self.FALLBACK, which is the same semantic and silences the rule.	2026-05-29 00:25:36 -07:00
teknium1	f2d88c820c	fix(model-catalog): fall through to raw.github when Vercel 403s; swap step-3.5-flash for step-3.7-flash on OpenRouter+Nous The docs site (Vercel) serves /docs/api/model-catalog.json behind a bot mitigation rule that returns HTTP 403 + x-vercel-mitigated: challenge for non-browser User-Agents — including urllib (what the CLI uses) and curl. When that happens, get_catalog() falls back to the stale disk cache and new model releases (Opus 4.8, etc.) never reach the /model picker even though they're already in OPENROUTER_MODELS and the live OpenRouter API. Adds a fallback URL chain: when the primary catalog URL fails, walk DEFAULT_CATALOG_FALLBACK_URLS — currently the raw.githubusercontent.com copy of the same file. GitHub raw doesn't bot-gate, so the manifest stays reachable through Vercel firewall hiccups. Per-provider override URLs keep their direct-fetch semantics (operators configure those specifically, no implicit fallback). Also swaps stepfun/step-3.5-flash for stepfun/step-3.7-flash in the OpenRouter + Nous Portal curated picker lists. Native stepfun provider configuration (api.stepfun.ai) is left alone — that depends on what stepfun.ai itself serves, not what OpenRouter routes. Test plan: 5 new TestFallbackChain tests cover primary-success, primary-failure-fallback-success, all-fail, primary==fallback-dedup, and end-to-end get_catalog routing through the new helper. Existing 23 tests in test_model_catalog.py still pass (28 total). Wider tests/hermes_cli/ sweep: 5701/5701 pass.	2026-05-29 00:25:36 -07:00
teknium1	8d57281650	chore: add AUTHOR_MAP entry for Interstellar-code	2026-05-29 00:21:54 -07:00
Rohit Sharma	9d4fda9952	feat(kanban): add POST /runs/{run_id}/terminate endpoint Closes the termination-control gap left by PR #28432, which shipped the read-only sibling endpoints (/workers/active, /runs/{run_id}, /runs/{run_id}/inspect) but no way to stop a misbehaving worker from the dashboard without dropping to the CLI. The new endpoint resolves run_id -> task_id and delegates to the existing kanban_db.reclaim_task() flow, so the SIGTERM->SIGKILL escalation, run-outcome bookkeeping, and event-log append all match POST /tasks/{task_id}/reclaim exactly. No new termination semantics introduced. Responses: 200 {ok, run_id, task_id} on success 404 unknown run_id 409 run already ended OR task no longer reclaimable Refs: #23762	2026-05-29 00:21:54 -07:00
teknium1	7d10105918	test(kanban): update iteration-exhaustion tests for #29747 gap 2 The two tests in TestRunConversation now verify the new behavior: - test_kanban_block_called_on_iteration_exhaustion → verifies _record_task_failure(outcome='timed_out') is called instead of kanban_block - test_no_kanban_block_when_not_in_kanban_mode → verifies the bridge is a no-op when HERMES_KANBAN_TASK is unset The function names are kept for diff stability; both assert against _record_task_failure now, which is the correct contract per the gap-2 fix in this PR.	2026-05-29 00:13:29 -07:00
teknium1	592a4ffb6b	fix(kanban): close three blocked/iteration-exhausted handling gaps (#29747 ) Reporter diagnosed three independent gaps that together allowed infinite 'unblock → re-stuck' loops with no surfacing or escalation: GAP 1: `_rule_stuck_in_blocked` resets timer on any `commented`/`unblocked` event, so a task that cycles every few minutes is invisible to it regardless of how many times it cycles. Fix: new `_rule_block_unblock_cycling` rule (`hermes_cli/kanban_diagnostics.py`) that counts block→unblock cycles in a sliding window. Default threshold 3 cycles within 24h, configurable via `block_cycle_threshold` / `block_cycle_window_seconds`. Walks events in arrival order (event id) since multiple events can share the same `created_at` second. Fires as a warning with a CLI hint to inspect the block reasons. GAP 2: Iteration-budget-exhausted runs in kanban workers map to `kanban_block` (status=blocked, but a clean exit from the kernel's perspective). `_rule_repeated_failures` reads `consecutive_failures`, which `_record_task_failure` increments only for crashed/timed_out/ spawn_failed — `blocked` outcome bypasses the failure counter, so the `kanban.failure_limit` circuit breaker never trips on budget-exhaustion loops. Fix: `agent/conversation_loop.py` budget-exhaustion path now calls `_record_task_failure(outcome="timed_out")` instead of `kanban_block`. Budget exhaustion is genuinely a timeout-shaped failure (the task ran out of allowed iterations), so this is more honest semantics; it also routes through the unified failure counter, so repeated budget exhaustions trip the circuit breaker and the task auto-blocks with `gave_up` after `failure_limit` retries. GAP 3: `release_stale_claims` uses `_pid_alive(worker_pid)` only and ignores `last_heartbeat_at`. Reporter observed a 91-min run that held its claim with frozen heartbeat because the worker entered a logic loop with no tool calls — `_pid_alive` kept returning True so the claim was extended every 15 minutes indefinitely. Fix: heartbeat-stale backstop. If `last_heartbeat_at` is set AND older than `DEFAULT_CLAIM_HEARTBEAT_MAX_STALE_SECONDS` (default 1h), reclaim even if the PID is alive. NULL `last_heartbeat_at` preserves backward compatibility (no heartbeat yet = extend, as before). The reclaim event payload now includes a `heartbeat_stale` boolean so operators see why a live-PID worker was reclaimed. This works cleanly in concert with PR #34418 (#31752 runtime → heartbeat bridge): once `_touch_activity` keeps `last_heartbeat_at` fresh as a side effect of normal API traffic, the backstop only fires for genuinely wedged workers (no chunks, no tool results, no progress at all). Co-authored-by: baofuen <45189813+baofuen@users.noreply.github.com>	2026-05-29 00:13:29 -07:00
teknium1	bc31ee5cf8	fix(kanban): bridge worker runtime activity to board heartbeat (#31752 ) The dispatcher watchdog (release_stale_claims) reads tasks.last_heartbeat_at to decide whether to reclaim a running task. The agent maintains its own in-process `_last_activity_ts` for every chunk/tool result, but those liveness ticks never reach the board unless the model explicitly calls the `kanban_heartbeat` tool — so a worker actively executing a long run without tool-level heartbeats can be reclaimed mid-flight as 'stale', returning the task to ready and orphaning the in-flight worker's progress. Fix: in `_touch_activity` (the canonical 'we just did work' hook in run_agent.py), call a new `heartbeat_current_worker_from_env` helper in `tools/kanban_tools.py` that: - No-ops outside dispatcher-spawned worker context (no HERMES_KANBAN_TASK). - Rate-limited to one DB write per 60s (runtime activity ticks too often to faithfully mirror; we just need the watchdog to see liveness). - Best-effort: never raises. heartbeat_claim + heartbeat_worker calls are individually try/except'd; any DB error logs at debug and returns. - Uses worker env identity: HERMES_KANBAN_TASK + HERMES_KANBAN_RUN_ID + HERMES_KANBAN_CLAIM_LOCK (all pinned by the dispatcher at spawn time). - No durable note on auto-heartbeats — that's reserved for the explicit `kanban_heartbeat` tool which carries a model-supplied note. The explicit `kanban_heartbeat` tool stays available unchanged for workers that want to attach a note or pre-emptively extend a claim across a known-long single tool call. Co-authored-by: faisfamilytravel <223516181+faisfamilytravel@users.noreply.github.com>	2026-05-29 00:05:58 -07:00
teknium1	40217aa194	fix(kanban): tell workers not to use clarify; route to kanban_block instead (#32167 ) Kanban workers run headless — no live user is on the other side of `clarify`, so the call times out (~120s default) and the task sits silently in `running` with no signal to the operator that input is needed. Reporter observed a real incident where a worker asked 'promote to production, or check staging first?' via clarify, the call timed out, the agent hallucinated a fallback, and the task sat 'running' for hours. Fix: explicit 'do not call clarify' bullet in two surfaces every kanban worker sees — - `agent/prompt_builder.py` KANBAN_GUIDANCE `## Do NOT` section (auto-injected into every dispatcher-spawned worker run). - `skills/devops/kanban-worker/SKILL.md` `## Do NOT` section (the bundled worker skill). Both point at the right pattern: `kanban_comment` (context) + `kanban_block` (decision needed) — the task surfaces on the board as blocked, the operator sees it, unblocks with their answer in a comment, and the worker respawns with the thread. Co-authored-by: kweiner <17778+kweiner@users.noreply.github.com>	2026-05-28 23:57:20 -07:00
Teknium	86a389fee2	fix(credential-pool): STATUS_DEAD for terminal OAuth failures (#32849 ) (#34412 ) When OpenAI Codex returns 401 token_invalidated or token_revoked, the credential is broken upstream — retrying after a TTL cooldown cannot fix it. The existing code treated every 401/429 the same way: STATUS_EXHAUSTED with a TTL cooldown (5 min for 401, 1 hour for 429). After the TTL elapsed, the broken credential re-entered rotation and immediately failed again with the same 401, surfacing as 'Failed to generate context summary' on every context-compression cycle. Reporter observed 7 separate 401 token_invalidated failures from the same revoked credential in a single day; the only workaround was removing it manually via 'hermes auth'. Add a STATUS_DEAD terminal state. Only 401 responses whose error.code/reason matches a known terminal OAuth state (token_invalidated, token_revoked, invalid_token, invalid_grant, unauthorized_client, refresh_token_reused) transition to DEAD. Everything else keeps the existing TTL semantics — 429 rate limits are transient and should recover. DEAD entries are excluded from rotation unconditionally. They only clear when an explicit write-side re-auth sync rewrites the tokens (the existing _sync_codex_pool_entries / _sync__entry_from_auth_store paths already clear last_status to None). The read-side auth.json-sync paths also now fire on DEAD so an in-flight pool entry can adopt fresh tokens written by another process without needing explicit re-auth. After 24 hours, DEAD manual entries (source='manual:') are pruned from the pool automatically so dead state doesn't accumulate forever. Singleton-seeded DEAD entries (source='device_code' etc.) are kept because _seed_from_singletons would recreate them on the next load with the same stale tokens — pruning would be pointless. The audit trail stays visible (label, last_error_reason, timestamps). Closes #32849.	2026-05-28 23:45:42 -07:00
teknium1	ae6817f7f7	fix(kanban): add --reason flag to unblock for symmetry with block (#30897 ) `hermes kanban unblock <id> review-required: ...` parsed every trailing word as another task_id (since `task_ids` is `nargs='+'`), then quietly failed on each non-existent id with "cannot unblock review-required: (not blocked/scheduled?)". Reporter saw this as asymmetric with `block <id> <reason...>` which accepts positional reason words. Fix: add a `--reason "..."` flag that, when provided, is appended as a `UNBLOCK: <reason>` comment before the unblock transition. Bulk syntax (`unblock t_a t_b t_c`) is preserved unchanged. Co-authored-by: julio-cloudvisor <211828103+julio-cloudvisor@users.noreply.github.com>	2026-05-28 23:41:44 -07:00
AhmetArif0	4126da65ae	fix(security): add bws_cache.json to file_safety read guard The Bitwarden Secrets Manager disk cache introduced in #31968 stores plaintext secret values at <hermes_home>/cache/bws_cache.json to avoid re-fetching across back-to-back CLI invocations. The file was not added to get_read_block_error()'s credential_file_names list, leaving the agent able to read it directly via the read_file tool. Add os.path.join("cache", "bws_cache.json") to credential_file_names so both HERMES_HOME and the global root are covered, matching the existing pattern used for auth.json, .anthropic_oauth.json, etc. Other files under cache/ (images, documents, audio) are unaffected — the check is an exact-file match, not a prefix match. Verified: 11/11 exploit/regression scenarios pass; 38/38 existing file_safety tests pass.	2026-05-28 23:31:20 -07:00
Teknium	71ae98b792	chore(release): map seppe@fushia.be to GitHub login Required by CI author validation after salvaging PR #33193.	2026-05-28 23:30:39 -07:00
Seppe Gadeyne	cf8862cfa3	fix: preserve Ctrl+J newlines in Ghostty	2026-05-28 23:30:39 -07:00
Gabor Barany	1386a7e478	fix(xai-sanitize): deepcopy tools_for_api before in-place mutation (#27907 ) The xAI tool-schema sanitizers (strip_slash_enum, strip_pattern_and_format) mutate their input in place — that's their documented contract. The two call sites (chat_completion_helpers.build_api_kwargs and the auxiliary client) were passing agent.tools straight through, so the first xAI request would permanently strip slash-containing enum constraints and pattern/format keywords from the per-agent tool registry. Effect: any subsequent non-xAI call from the same agent (auxiliary task routed to Anthropic, OpenRouter fallback, mid-session model switch) saw the already-stripped schema with no way for the user to notice from their config. Fix: deepcopy tools_for_api before sanitizing at both call sites. The slash-enum bug itself (xAI 400ing on enums with '/') was fixed earlier by #32443 (Nami4D) — that PR landed the strip but used the sanitizers directly without copying. This salvages #27907's correctness contribution (the deepcopy) while skipping its redundant parallel sanitizer (strip_xai_incompatible_enum_values is functionally equivalent to the existing strip_slash_enum) and its preflight- neutrality argument (we chose model-gated preflight in #32443). 3 new tests in tests/run_agent/test_run_agent_codex_responses.py: - strips_slash_enum_from_outgoing_request — outgoing kwargs has no slash-containing enum values (functional contract preserved). - does_not_mutate_agent_tools — headline #27907 regression. Snapshot agent.tools before build_api_kwargs, assert it survives intact after. Pre-fix this assertion would have caught the mutation. - is_idempotent_across_repeated_calls — three xAI requests in a row each strip cleanly AND don't progressively erode the source schema. 344/344 across tests/agent/test_auxiliary_client.py, tests/agent/transports/test_codex_transport.py, tests/run_agent/test_run_agent_codex_responses.py, and tests/tools/test_schema_sanitizer.py. Co-authored-by: Gabor Barany <barany.gabor@gmail.com>	2026-05-28 23:29:59 -07:00
Teknium	db96fc60d0	fix(gateway): keep Telegram topic bindings aligned with compression children (#34409 ) Telegram DM topic bindings persist (chat_id, thread_id) -> session_id in SQLite so reopening a topic resumes the right Hermes session. When compression rotated session_entry.session_id mid-turn, the binding row stayed pointed at the pre-compression parent. On the next inbound message in that topic the gateway reloaded the oversized parent transcript, retriggering preflight compression — sometimes in a loop. Two-pronged fix: 1. `_sync_telegram_topic_binding(source, entry, *, reason)` helper called immediately after each of the three session_id rotation sites in _handle_message_with_agent (hygiene compression, agent-result compression rotation, /compress command). Keeps future bindings fresh. 2. Read-path self-heal: when resolving an existing topic binding, walk SessionDB.get_compression_tip() forward and switch_session to the descendant instead of the stored parent. Rewrites the binding row to the tip so subsequent messages skip the walk. Heals existing stale state on the next user message without requiring a gateway restart. Skipped from competing PRs as not load-bearing for the bug: - advance_session_after_compression SessionStore primitive (#26204/ #28870/#33416) — preserves end_reason='compression' analytics nicety but doesn't affect routing correctness. - Cached-agent eviction on session_id mismatch — _compress_context() already mutates tmp_agent.session_id on the cached object so the in-memory agent self-corrects. - Startup repair pass (#33416) — redundant once the read path heals on the next message; one-line CLI follow-up can address bindings for topics users never reopen. Closes #20470, #29712, #33414. Acknowledges work in #23195 (@litvinovvo), #26204 (@bizyumov), #28870 (@donrhmexe), #29713 (@hehehe0803), #29945 (@eugeneb1ack), #33416 (@bizyumov).	2026-05-28 23:25:52 -07:00
Ben	ec7736f8a7	fix(docker): auto-join Docker socket group for docker-in-docker backend When users bind-mount /var/run/docker.sock to use TERMINAL_ENV=docker from inside the container, the supervised hermes user (UID 10000) lacks permission to talk to the socket — every `docker` invocation EACCES'es and check_terminal_requirements() returns False. In messaging mode this also silently strips the file/terminal toolset from the registered tool list, so the agent rationalizes the missing tools as a platform restriction. The naive workaround (docker run --group-add <socket-gid>) does NOT work with our s6-setuidgid privilege drop: s6-setuidgid calls initgroups() for the target user, which rebuilds supp groups from /etc/group. Without a matching /etc/group entry the kernel-granted supp group is wiped between PID 1 and the dropped hermes process. Verified empirically: --group-add 998 alone: PID 1 Groups: 0 998 → after drop: Groups: 10000 This fix's /etc/group add: id hermes shows 998 → after drop: Groups: 998 10000 Detect the socket's GID at boot in stage2-hook (runs as root before the privilege drop), reuse an existing group name if one matches the GID, otherwise create 'hostdocker'. Idempotent across container restarts. Silent no-op when no socket is mounted. End-to-end verified by building the image and running the supervised hermes user against the real host Docker daemon: `docker version` succeeds and check_terminal_requirements() returns True. Fixes #16703	2026-05-29 16:15:44 +10:00

1 2 3 4 5 ...

9906 Commits