hermes-agent

Author	SHA1	Message	Date
Acean	b0d234f068	fix(cron): don't crash on `cron list` when a job's repeat is null Some checks failed Docker Build and Publish / build-amd64 (push) Waiting to run Details Docker Build and Publish / build-arm64 (push) Waiting to run Details Docker Build and Publish / merge (push) Blocked by required conditions Details Lint (ruff + ty) / ruff + ty diff (push) Waiting to run Details Lint (ruff + ty) / ruff enforcement (blocking) (push) Waiting to run Details Lint (ruff + ty) / Windows footguns (blocking) (push) Waiting to run Details Nix / nix (macos-latest) (push) Waiting to run Details Nix / nix (ubuntu-latest) (push) Waiting to run Details Tests / test (1) (push) Waiting to run Details Tests / test (2) (push) Waiting to run Details Tests / test (3) (push) Waiting to run Details Tests / test (4) (push) Waiting to run Details Tests / test (5) (push) Waiting to run Details Tests / test (6) (push) Waiting to run Details Tests / save-durations (push) Blocked by required conditions Details Tests / e2e (push) Waiting to run Details Skills Index Freshness Check / check-freshness (push) Has been cancelled Details `cron_list` read `job.get("repeat", {})`, but the dict-default only applies to a MISSING key. A one-shot job persisted with `"repeat": null` returns None, and the next `.get("times")` raised AttributeError, taking down the whole `cron list` output. Coalesce with `or {}` so a present-but-null repeat renders as ∞ like the other cron readers already do. Adds a regression test. Co-authored-by: Teknium <127238744+teknium1@users.noreply.github.com>	2026-06-05 00:19:45 -07:00
helix4u	c8e80cd0bf	fix(update): require managed marker before destructive clean	2026-06-05 00:05:30 -07:00
Baris Sencan	ad69d3edc7	fix(terminal): guard os.getcwd() against a deleted CWD `os.getcwd()` raises FileNotFoundError when the process's working directory was removed out from under it (e.g. a scratch workspace cleaned up mid-session), crashing terminal env setup. Extract a `_safe_getcwd()` helper that falls back to TERMINAL_CWD, then the user's home, on FileNotFoundError, and route all three `os.getcwd()` call sites in terminal_tool.py through it (local default_cwd, the Docker cwd-passthrough source, and the debug-config print) so the same crash can't resurface at a sibling site. Adds unit tests for the real-cwd path and both fallback branches. Co-authored-by: Teknium <127238744+teknium1@users.noreply.github.com>	2026-06-04 23:39:34 -07:00
Ben Barclay	b1e399de95	fix(update-check): stop reporting phantom "N commits behind" inside Docker (#39559 ) Inside the published Docker image, both the `--tui` banner and the dashboard-embedded TUI report `1 commit behind — run docker pull nousresearch/hermes-agent:latest to update` even though the container has no git repo and no way to compute a commit delta. Root cause: two independent update-detection paths, only one of which knows it's running in Docker. - `recommended_update_command()` → `detect_install_method()` reads the `.install_method` stamp that `docker/stage2-hook.sh` writes at boot → returns "docker", so the command string correctly says `docker pull`. - `banner.check_for_updates()` (the source of the "N commits behind" count) has no notion of the docker install method. It only detects a build via `HERMES_REVISION` (nix-only, unset in the image) or a `.git` dir (excluded from the image by .dockerignore). Neither matches, so it silently falls through to `check_via_pypi()`, whose PyPI-version mismatch flag (1) is then rendered verbatim by the CLI banner (build_welcome_banner), the Ink TUI badge (branding.tsx), and `hermes version` as "1 commit behind" — a phantom count, no commit math involved. `hermes update` already refuses to run in-place in the container. The dashboard's REST `/api/hermes/update/check` endpoint already short-circuits docker (returns behind=None + the docker guidance). This mirrors that guard inside `check_for_updates()` so the banner/TUI/version surfaces agree: when `detect_install_method() == "docker"`, return None before any git/pypi probe (and before writing a cache entry). None makes the render guards (`typeof === 'number' && > 0`, `behind and behind > 0`) stay false, so the badge/line disappears entirely — matching the System page. Fix is in one place (check_for_updates) because all three consumers route through it via get_update_result()/_update_result. Tests: test_check_for_updates_docker_returns_none asserts None + no git/pypi probe + no cache write; test_check_for_updates_non_docker_still_checks guards against over-broadening (pip still version-checks). Mutation-tested: removing the guard fails the docker test. Verified against a real `docker build` of the image — see PR description.	2026-06-05 15:37:19 +10:00
Brian Doherty	899ee8c23d	fix(gateway): tolerate non-UTF-8 status/pid files in gateway status reads `_read_json_file` caught OSError but not UnicodeDecodeError, so a status file holding binary/non-UTF-8 bytes (truncated or clobbered write) would crash the gateway status path instead of being treated as unreadable. UnicodeDecodeError is a ValueError subclass, not an OSError, so it escaped the existing guard. Widen the catch to (OSError, UnicodeDecodeError) at both read sites in gateway/status.py — `_read_json_file` and the sibling `_read_pid_record`, which had the identical gap. Adds tests covering binary input (returns None) and valid input (still parses) for both. Co-authored-by: Teknium <127238744+teknium1@users.noreply.github.com>	2026-06-04 22:05:23 -07:00
Teknium	7309f3bef7	fix(line): map inbound message types to the correct MessageType The LINE adapter classified every non-text inbound message as `MessageType.IMAGE`, which doesn't exist on the enum — so any image, video, audio, file, sticker, or location message raised AttributeError the moment it was constructed. Beyond fixing the crash, every non-text message was being collapsed onto a single type. The gateway routes on MessageType (voice → STT, files → document handling, etc.), so misclassification silently mishandled media. Replace the inline ternary with a `_LINE_MESSAGE_TYPES` lookup that maps each LINE webhook type to its proper enum member (audio → VOICE to match how Telegram/WhatsApp treat voice notes), falling back to TEXT for unknown types. Adds regression tests covering the mapping and the old AttributeError. Co-authored-by: Sahibzada Allahyar <94376830+sahibzada-allahyar@users.noreply.github.com>	2026-06-04 21:55:20 -07:00
Ben Barclay	7c00ffd92c	fix(google-workspace): fall back to uv when venv has no pip (#39516 ) The Hermes Docker image's venv is built with `uv sync`, which does not bootstrap pip into the venv. When the google-workspace setup script needs to install its deps and the running interpreter has no pip, `sys.executable -m pip install` dead-ends with "No module named pip" (reported via Discord support). install_deps() now falls back to `uv pip install --python <interpreter>` when the pip path fails and uv is on PATH. uv installs into the exact interpreter the script is running under without needing pip present, so the pip-less venv self-heals (e.g. a dep evicted on image update, or a build without the [google]/[all] extra). On environments with neither pip nor uv, the [google] extra hint is printed as before. Verified E2E against nousresearch/hermes-agent:latest: under the venv python with a missing dep, --install-deps now prints "Dependencies installed." and exits 0 instead of failing. Adds TestInstallDeps regression coverage: pip path, uv fallback, uv-not-consulted-when-pip-works control, and both no-installer-available and uv-also-fails failure cases.	2026-06-05 13:30:02 +10:00
ethernet	fb853a1783	fix(install): scrap rebuild venv	2026-06-04 23:20:29 -04:00
Ben	96cd37e212	fix(dashboard): reap orphaned embedded-chat sessions to stop slash_worker leak Since #38591 made the dashboard's embedded chat unconditional, every browser refresh of /chat spins up a fresh session.create (new sid + a fresh _SlashWorker via _deferred_build) over /api/ws, but the old tab's WS disconnect only DETACHES the transport (ws.py) — it never closes the old session or its slash_worker. The dashboard's in-process gateway is long-lived, so the detached _SlashWorker subprocess's stdin pipe stays open forever and the worker never reaches EOF: one leaked python process per refresh. Fix at the session-lifecycle layer (not PTY signal timing — verified that a process whose owning gateway dies is always reaped via stdin-EOF; the leak is specifically the long-lived dashboard process keeping detached sessions parked). On WS disconnect, schedule a grace-delayed reap of any session left orphaned (transport detached to stdio, not mid-turn). A quick reconnect / session.resume / prompt.submit rebinds a live transport and cancels the reap, preserving the intentional detach-for-reconnect window. - server.py: extract _teardown_session() (shared with session.close), add _ws_session_is_orphaned() + _schedule_ws_orphan_reap(), gated by HERMES_TUI_WS_ORPHAN_REAP_GRACE_S (default 20s, 0 disables). - ws.py: schedule the reap for each detached session on disconnect. - tests: reap-closes-worker, spares-reattached/mid-turn/finalized, disabled-when-grace-zero.	2026-06-04 19:50:33 -07:00
kyssta-exe	25742372eb	fix(approval): check is_approved in execute_code guard (#39275 ) check_execute_code_guard() never called is_approved() before entering the approval flow, and never persisted session/permanent approvals from the gateway response. This meant 'Approve session' and 'Always' buttons had no effect — every execute_code call re-prompted the user. - Add is_approved() check after get_current_session_key(), matching check_all_command_guards() - Persist session ('approve_session') and permanent ('approve_permanent') approvals based on the gateway choice, same as terminal command guard - Add 3 regression tests for session persistence, permanent persistence, and short-circuit on pre-existing approval	2026-06-04 19:40:30 -07:00
Brooklyn Nicholson	89baf02919	Merge origin/main into bb/desktop-profile-support Resolve conflicts in desktop settings/cron/messaging/sidebar: adopt main's ListRow + actions-menu refactors for credential rows; keep our profileColor import on the sidebar. Drop the now-orphaned Tip-based helpers.	2026-06-04 20:17:07 -05:00
kewe63	46abf04012	fix(ssh): handle WinError 1314 symlink failure with shutil.copy2 fallback On Windows, os.symlink() raises OSError (WinError 1314) unless the process has Administrator rights or Developer Mode is enabled. The SSH bulk-upload staging logic used symlinks to mirror the remote layout before piping through tar; this caused all ssh_bulk_upload tests to fail on Windows. - ssh.py: wrap os.symlink() in try/except OSError and fall back to shutil.copy2() so staging works on every platform. shutil was already imported, no new dependency introduced. - file_sync.py: replace str(Path(remote).parent) with posixpath.dirname(remote) in unique_parent_dirs(). pathlib.Path uses the host separator (\ on Windows), but these paths are sent to a remote Linux host over SSH and must always use forward slashes. - test_ssh_bulk_upload.py: make test_staging_symlinks_mirror_remote_layout platform-agnostic — assert file existence and content instead of os.path.islink() + os.readlink(), since the staged entry may be a copy on Windows.	2026-06-04 18:06:21 -07:00
teknium1	93b5df3189	fix(test): patch async_is_safe_url in web-provider SSRF mocks web_tools.is_safe_url was replaced by async_is_safe_url, but three web-provider test files still monkeypatched the old sync name, raising AttributeError. Patch the async variant with an async lambda.	2026-06-04 18:04:47 -07:00
kewe63	c60952ba94	fix(web): run URL SSRF checks off the event loop in async paths Add async_is_safe_url() wrapping is_safe_url via asyncio.to_thread, and route all async SSRF call sites through it: web_extract_tool, the vision/video preflight checks, and both download redirect guards. socket.getaddrinfo blocks; calling it inline from async tool paths froze the event loop for the duration of DNS resolution. vision_tools: split _validate_image_url into _image_url_shape_ok (no DNS) + sync _validate_image_url (for sync callers/tests) + async _validate_image_url_async. Widened beyond the original PR #3691 to sibling async sites that also blocked the loop (second redirect guard, video preflight). Salvage of #3691 by @Kewe63 — surgically re-applied onto current main because the original branch was too stale to cherry-pick cleanly (would have reverted the web_crawl_tool refactor). Co-authored-by: Kewe63 <kewe.3217@gmail.com>	2026-06-04 18:04:47 -07:00
kewe63	19db9cd076	fix(acp): replace direct db._lock/_conn access with public update_session_meta() session.py _persist() bypassed SessionDB's thread-safe write path by accessing private internals db._lock and db._conn directly: with db._lock: db._conn.execute("UPDATE sessions SET model_config = ? ...") db._conn.commit() This was fragile for three reasons: 1. It bypassed _execute_write()'s BEGIN IMMEDIATE + jitter-retry logic, so concurrent writes could hit SQLite BUSY without retrying. 2. It called db._conn.commit() manually, breaking the transactional contract that _execute_write() enforces. 3. Any internal rename of _lock or _conn would silently break this call site with an AttributeError at runtime. Fix: - Add SessionDB.update_session_meta(session_id, model_config_json, model) to hermes_state.py. Routes through _execute_write() for the standard BEGIN IMMEDIATE + lock + jitter-retry guarantee. Uses COALESCE so passing model=None leaves the stored model column unchanged. - Replace the db._lock / db._conn block in session.py _persist() with a single db.update_session_meta() call. Tests (tests/acp/test_session_db_private_access.py, 11 tests): - Unit tests for update_session_meta: updates model_config, updates model, preserves existing model on None, routes through _execute_write, no-op on non-existent session. - AST checks: db._lock and db._conn not referenced in session.py; _persist() calls update_session_meta(). - Integration round-trips: cwd and model persisted correctly; COALESCE prevents overwriting an existing model with NULL.	2026-06-04 17:54:59 -07:00
Kewe63	4a4b9bd2dc	fix(test): add platform guard for grp import Tests in test_gateway_service.py imported grp inline without a platform guard, causing ImportError on systems where grp is unavailable (e.g. macOS, WSL without grp module). Added pytest.importorskip('grp') at module level alongside the existing pwd guard, and removed three redundant inline import grp statements. Fixes #24531	2026-06-04 17:52:50 -07:00
dirtyren	74e845c000	fix(slack): pass thread_ts in standalone send_message tool path The standalone `_send_slack()` function used by the send_message tool and cron delivery fallback was not passing `thread_ts` to the Slack API, causing messages to post to the top-level channel instead of inside threads. - Add `thread_ts` parameter to `_send_slack()` - Include `thread_ts` in the chat.postMessage payload when present - Pass `thread_id` from `_send_to_platform()` to `_send_slack()` Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-06-04 17:42:10 -07:00
Brooklyn Nicholson	9dbd3c57d7	feat(desktop): drag sessions into chat as @session links + spawn loader Drag a sidebar session into the composer to drop an @session:<profile>/<id> chip the agent resolves via session_search. New READ shape dumps a whole session by id (head+tail when large); a `profile` param reads another profile's DB read-only, and a cross-profile locate scan resolves bare ids when the model drops the owning profile from the link. Also: ASCII "waking up <profile>" overlay during lazy gateway swaps, global haptic rate-limit to kill the reconnect-storm "clickity" buzz, and reauth toasts surfaced once per disconnect instead of every backoff tick.	2026-06-04 19:41:51 -05:00
Kewe63	c14c37d46b	fix(openviking): add missing /agent/{agent}/ segment to memory URI — fixes #36969 _build_memory_uri produced URIs of the form: viking://user/{user}/memories/{subdir}/mem_{slug}.md The /agent/{agent}/ segment was missing, causing every agent under the same user to write into the same flat namespace. In multi-agent deployments agents silently overwrite each other's memories and vector retrieval cross-pollinates results. self._agent was already populated correctly (from OPENVIKING_AGENT env var, default 'hermes') and sent via X-OpenViking-Agent header — it was simply not interpolated into the URI. Fix: add the missing segment so URIs follow the documented shape: viking://user/{user}/agent/{agent}/memories/{subdir}/mem_{slug}.md Tests: 4 new regression tests in TestOpenVikingMemoryUriBuilder, 13/13 passed (9 existing + 4 new).	2026-06-04 17:40:33 -07:00
Ben Barclay	8a888441d7	fix(docker): recover from out-of-band container removal in persistent mode (salvage #36631 ) (#39415 ) Salvage of #36631 (@annguyenNous), rebased onto current main with regression tests added. Fixes #36266. When a persistent Docker sandbox container is removed out-of-band (idle reaper, `docker prune`, OOM kill, daemon restart), the gateway kept issuing `docker exec` against the dead container ID, returning "No such container" on every subsequent tool call — the agent was permanently blocked until the gateway process restarted. DockerEnvironment.execute() now detects the "No such container" / "is not running" error after a non-zero exit (gated on persist_across_processes) and calls _recreate_container(): it tries label-based reuse first, falls back to a fresh container replaying the same image + full all_run_args set, re-runs init_session(), and retries the command once. A genuine non-zero exit is NOT misclassified as container-gone. Differs from #36631 as submitted: adds the tests the original lacked. tests/tools/test_docker_environment.py covers _is_container_gone pattern matching (incl. the negative/control case), the recover-and-retry path, the persist_across_processes=False opt-out (no recovery), and the ordinary-failure passthrough (no spurious recreation). _make_dummy_env now forwards persist_across_processes. Verified: - Unit: 67/67 in test_docker_environment.py (4 new + existing). - Live E2E against the real docker daemon: started a persistent container, `docker rm -f`'d it out-of-band, and the next execute() transparently recreated a fresh container and succeeded; a follow-up command worked in the recovered container; a real `exit N` passed through without triggering recovery. Co-authored-by: annguyenNous <annguyenNous@users.noreply.github.com>	2026-06-05 10:33:44 +10:00
rob-maron	54cae7d1cb	switch model order	2026-06-04 17:29:31 -07:00
Ben Barclay	82c157b267	fix(docker): clean up orphaned container when docker run fails (salvage #7440 ) (#39412 ) When `docker run -d` fails after Docker has already created the container object (e.g. exit 125 when the daemon isn't ready, or a timeout mid image pull), the code raised before `self._container_id` was set — so the container leaked permanently in "Created" state. Reported in #7439: 110+ orphaned containers accumulated over 3 days from hourly cron- scheduled gateway sessions hitting a Docker Desktop startup race. The orphan reaper added in #33645 (reap_orphan_containers) does NOT cover this case: it filters `status=exited`, but a failed-create container is in `Created` state, so it slips through and is never reaped. Wrap the `docker run -d` call in try/except and `docker rm -f` the container by its known name before re-raising. Salvages #7440 by @Tranquil-Flow. Their branch predated the cross-process reuse + labels rework on `main`, so a cherry-pick conflicted; reconstructed the same intent (plus their two regression tests, adapted to mock the new reuse `docker ps` probe) against current `main`. Verified adversarially: reverted just the product change to origin/main's `docker.py`, ran the two new tests -> both FAIL with `assert 0 == 1 ("docker rm should be called once")`. With the fix applied, both pass; full test_docker_environment.py is 65/65 green. Closes #7440. Fixes #7439. Co-authored-by: Evi Nova <66773372+Tranquil-Flow@users.noreply.github.com>	2026-06-05 10:19:08 +10:00
Evi Nova	4690bbc363	fix(local): recognize unqualified hostnames as local endpoints (#9248 ) Docker Compose service names (e.g. ollama, litellm, hermes-litellm) are unqualified hostnames with no dots. These are always local — they resolve via Docker DNS, /etc/hosts, or mDNS. Without this fix, the stale stream timeout fires on local LLM proxies, causing infinite reconnect loops. Closes #7905	2026-06-05 10:18:10 +10:00
flooryyyy	e7a7872a87	fix(tui_gateway): dedup re-queued process notifications flooding TUI _ notification_poller_loop_ re-emits status.update every cycle when a background process completes while the session is busy. The same completion event gets re-queued and re-emitted to the TUI every few ms, flooding the transcript with duplicate lines. Add _notification_event_dedup_key(evt) that returns a tuple identity for each notification event. Only emit status.update on first sight per identity: - completions: (sid, type) — one-shot per process session - watch_match: (sid, type, command, pattern, output, ...) - watch_overflow/disabled: (sid, type, command, message, ...) The dedup key design was refined from an initial sid:type approach after @lordbuffcloud identified that distinct watch_match events (READY vs DONE) for the same process would be incorrectly collapsed. Tests from @tymrtn cover distinct watch matches, exact replay dedup, and completion one-shot behavior. Co-authored-by: tymrtn <ty@tmrtn.com>	2026-06-04 16:56:34 -07:00
Shannon Sands	2f0c8e90e6	Add Telegram QR onboarding to dashboard	2026-06-04 16:55:27 -07:00
Teknium	5300727a08	revert: keep Google Chat OAuth secret + active_provider profile-scoped (#39398 ) * Revert "fix(gateway): anchor Google Chat OAuth client secret to default Hermes root" This reverts commit `fff0561441`. * Revert "fix(cli): honor global-root active_provider fallback for named profiles" This reverts commit `3858cf4307`. * docs(google_chat): describe OAuth client secret as profile-scoped, not host-wide The setup docs, oauth docstring, and the adapter's 'no credentials' error message all described the Google Chat OAuth client secret as host-wide shared infrastructure. That contradicts profile isolation: profiles are separate auth boundaries, so two profiles can point at different Google OAuth apps / accounts. Reword all three to say the secret is profile-scoped and each profile registers its own.	2026-06-04 16:54:40 -07:00
Dusk	495c3733d8	fix(config): bridge docker_volumes and docker_forward_env in config set (#38611 ) Co-authored-by: Ben Barclay <ben@nousresearch.com>	2026-06-05 09:31:01 +10:00
Brooklyn Nicholson	cf9dc366dd	refactor(desktop): drop per-session icons, read-only cross-profile reads The per-session icon picker added more noise than value — rip it out end to end (sessions.icon column, set_session_icon, the PATCH field, the picker UI, and the SessionInfo.icon type). The cross-profile session aggregator now opens each profile's state.db read-only (mode=ro, no schema init), so listing other profiles on every sidebar refresh never DDLs or takes a write lock on their live DBs. The single-profile hot path stays on par with /api/sessions.	2026-06-04 18:24:35 -05:00
Brooklyn Nicholson	b94b3622b5	feat(desktop): per-session profile switching + cross-profile sessions Add first-class profile support to the desktop app without app reloads. - Swap the single live gateway onto a session's profile lazily (spawned on demand by the Electron backend pool), so one backend serves the active profile and others stay cold — no OOM with many profiles. - Aggregate sessions across profiles by reading each profile's state.db read-only; unified "All profiles" view groups sessions per profile with per-profile pagination, while the default view stays scoped to one profile. - Add an Arc-style profile rail at the sidebar foot: a default<->all toggle pinned left, colored named-profile squares scrolling between, Manage pinned right. Profile identity is a deterministic per-name color. - Route profile-scoped REST (config/env/skills/tools/model) to the active gateway profile and invalidate React Query caches on swap. Single-profile users never trigger a swap, so their path is unchanged. Backend: - web_server: profile-aware active/list endpoints + per-profile session totals; hermes_state: session_count(exclude_children); main.py: honor --profile over HERMES_HOME env for pooled backends. UI primitives: - Add a position-aware Tip tooltip (instant, themed) as a drop-in for native title=, and strip redundant tooltips from self-descriptive chrome.	2026-06-04 16:35:34 -05:00
Austin Pickett	acce1a2452	feat(desktop): polish credentials settings and messaging env routing (#39217 ) * feat(desktop): polish credentials settings and messaging env routing Align Provider API Keys and Tools & Keys with Advanced ListRow inputs, add Tools & Keys sidebar subnav, move platform env vars to Messaging via channel_managed discovery, strip toolset emojis, and condense cron actions. Co-authored-by: Cursor <cursoragent@cursor.com> * fix(desktop): align Messaging credential inputs with settings ListRow style Remove monospace inputs and use CREDENTIAL_CONTROL_CLASS + ListRow layout to match Provider API Keys and Tools & Keys. Co-authored-by: Cursor <cursoragent@cursor.com> --------- Co-authored-by: Cursor <cursoragent@cursor.com>	2026-06-04 14:01:15 -04:00
liuhao1024	a3fb48b2ce	fix(state): keep /branch sessions visible after parent reopen /branch (aka /fork) sessions vanished from /resume and /sessions. Both surfaces funnel through list_sessions_rich(include_children=False), which hid any session with a parent_session_id unless identified as a branch via a heuristic — parent.end_reason == 'branched' AND child.started_at >= parent.ended_at. Two ways that heuristic failed: 1. CLI/gateway branches: once the parent was reopened (e.g. resumed) and re-ended with a different end_reason (tui_shutdown overwriting 'branched'), the heuristic stopped matching and the branch was hidden permanently. 2. TUI branches (tui_gateway session.branch): the TUI never ends the parent as 'branched' — it creates the child while the parent is still live — so the heuristic NEVER matched and TUI branches were hidden from the moment they were created (this is the macOS desktop app's primary symptom). Fix: persist a stable '_branched_from' marker in the branch session's model_config at creation time across ALL THREE branch paths (CLI cli.py, gateway gateway/run.py, and TUI tui_gateway/server.py), and OR a json_extract(model_config, '$._branched_from') IS NOT NULL check into the list_sessions_rich filter. The marker is immutable across the parent's lifecycle, so the branch stays visible regardless of how/whether the parent is ended. The legacy end_reason heuristic is kept (OR'd) so pre-existing branches remain visible. Subagent/compression children (no marker, parent not 'branched') stay correctly hidden. Fixes #20856. Approach by liuhao1024 (PR #20864); reimplemented on current main, extended to the TUI branch path (which the original missed), with regression tests for the reopen+re-end scenario and the TUI marker persistence.	2026-06-04 10:07:20 -07:00
Jeff	1f347ee543	fix(uv): move venv aside instead of gutting it in place on Windows rebuild hermes update can brick a Windows install. When 'hermes update --force' runs past the concurrent-process guard, rebuild_venv runs while the venv is still in use: shutil.rmtree(ignore_errors=True) deletes site-packages + certifi's cert bundle but can't remove the locked python.exe, leaving a half-gutted venv that uv venv then refuses to overwrite. Every later HTTPS call dies with FileNotFoundError for the missing cacert and there is no recovery. --clear alone (the `c136eb4de` retry path) does not fix the real lock case: when the locked interpreter is inside the venv being rebuilt, neither rmtree nor uv venv --clear can delete it. os.replace of the parent directory is allowed on Windows (a running .exe is tracked by handle, not path), so we move the old venv aside atomically to <venv>.old, rebuild with --clear in its place, and the still-running gateway/desktop keep using the moved-aside copy until they restart. If the venv genuinely can't be moved, we abort cleanly and leave it fully intact; if the rebuild fails, we restore the moved-aside copy. Folds in the call-site guards from #38511 (@f3rs3n): - rebuild_venv() returns False (and restores the backup) if uv exits 0 without producing an interpreter. - both hermes update venv-rebuild call sites abort with RuntimeError instead of continuing into dependency install when rebuild_venv() returns False. Also gitignore /venv.old/ so the update autostash (git stash --include-untracked) doesn't sweep the moved-aside venv on every run. Root-cause fix for #37881. Supersedes the --clear-only retry from `c136eb4de`. Co-authored-by: f3rs3n <32328813+f3rs3n@users.noreply.github.com>	2026-06-04 12:18:38 -04:00
rexdotsh	ee7948ea6e	fix(deps): exclude dev tooling from all extra	2026-06-04 08:54:38 -07:00
kshitijk4poor	8077e7d2fb	fix(tui): narrow resume lock to avoid blocking session.close The salvaged fix held _session_resume_lock across _make_agent (MCP discovery + AIAgent construction, seconds), serializing it against session.close. Since session.close runs on the main RPC dispatch thread (not a _LONG_HANDLER), a close racing a mid-build resume would stall all fast-path RPCs (approval.respond, session.interrupt). Restructure to double-checked locking: build the agent outside the lock, then re-check _find_live_session_by_key under the lock before _init_session. A losing concurrent resume discards its just-built agent (no worker/poller wired yet) and reuses the winner. Updated the concurrent-resume regression test to assert the real invariant (one surviving live session + loser agent closed) rather than the implementation detail of a single _make_agent call.	2026-06-04 08:18:26 -07:00
rexdotsh	bd6d098762	fix(tui): keep resumed live history current	2026-06-04 08:18:26 -07:00
rexdotsh	98903d0313	fix(tui): reuse live session on resume	2026-06-04 08:18:26 -07:00
kyssta-exe	30412a9771	fix(cron): re-validate stale cron-output entries before deletion (#37721 ) quick() and dry_run() previously trusted the stored category from tracked.json without re-validating at delete time. Stale entries from before #34840 could carry category="cron-output" for cron control-plane paths (e.g. cron/jobs.json), causing quick() to delete the live scheduler registry. Fix: - Fix guess_category() to only classify cron/output/** as cron-output (was classifying ALL cron/* paths, missing the #34840 fix). - Re-validate cron-output entries via guess_category() at delete time in quick() and dry_run(); stale entries that are no longer classified as cron-output are skipped and removed from tracked.json. - Add _is_protected_cron_path() as a hard defense-in-depth guard that blocks deletion of cron/cronjobs directories and known control-plane files (jobs.json, .tick.lock) regardless of stored category. - Update test_cron_subtree_categorised to match fixed guess_category (only cron/output/* is cron-output, not all of cron/). Tests: add 5 regression tests in TestStaleCronEntryMigration.	2026-06-04 07:52:04 -07:00
CryptoByz	693f4c7e9c	fix(gateway): clear zombie agent slot when session_reset races in-flight run A session_reset (/new, /cc) that bumps the run generation while an agent turn is in flight left the dead agent in the _running_agents slot: the in-flight run's own release is generation-guarded and correctly returns False, and the outer finally's sentinel-only check also missed the leftover real agent. The session then silently dropped every subsequent message as 'agent busy' until a full gateway restart. (#28686) - _process_message_or_command outer finally now calls the unconditional, idempotent _release_running_agent_state(key) on all exit paths instead of the sentinel-vs-else branch that could strand a dead agent. - _handle_reset_command evicts the slot right after bumping the generation, so the zombie is cleared at reset time regardless of how the in-flight run unwinds. Co-authored-by: CryptoByz <cryptobyz.airdrop@gmail.com>	2026-06-04 07:50:45 -07:00
teknium1	2982122be7	fix(gateway): deliver $HOME deliverables on root-run gateways Root-run gateways have $HOME=/root, which is on the MEDIA system-path denylist, so the gateway silently dropped agent-generated deliverables under /root (e.g. /root/work/proposal.docx) — the user got a 'here is your file' reply with nothing attached. _path_under_denied_prefix now treats the running user's own home as deliverable: the home tree itself is no longer denied, while the more-specific denied paths inside it (~/.ssh, ~/.aws, ~/.hermes/.env, auth.json, config.yaml) stay blocked because they are separate denylist entries. The exception only matches when the denied prefix IS $HOME, so a non-root gateway still can't deliver another user's home. Diagnosis, reproduction, and the failing-case analysis are from @GodsBoy (#38108 / #38106). Implemented here as the minimal denylist fix rather than a staging/copy subsystem. Co-authored-by: GodsBoy <dhuysamen@gmail.com>	2026-06-04 07:50:22 -07:00
Teknium	580d924097	perf(desktop): make session-id search SQL-bounded, not O(n) search_sessions_by_id previously fetched up to 10k sessions via list_sessions_rich and filtered them in Python — O(n) per keystroke. Push the id match into SQL instead. - list_sessions_rich gains an optional id_query param: a case-insensitive LIKE pushed into the outer WHERE, matched against each surfaced row's id AND every id in its forward compression chain (via the existing chain CTE). Searching a compression root id or a tip id both resolve to the same projected conversation. LIKE wildcards in the needle are escaped. - search_sessions_by_id now fetches only matching rows (limit*4) and ranks exact > prefix > substring in Python over that small set. - web_server /api/sessions/search: route ID matches and content matches through one lineage-keyed dedup helper so an id-hit and a content-hit on the same conversation collapse to a single result (the contributor's version keyed ID hits by raw sid and content hits by root, which could double-list a compression tip). - command-center haystack also matches _lineage_root_id for parity. E2E verified against a real DB: exact match over 3000+ sessions materializes 1 row in Python (was ~3000), 5ms; root-id resolves to tip; LIKE-wildcard escaping holds. Follow-up to @0xharryriddle's feat(desktop): search sessions by id.	2026-06-04 07:49:34 -07:00
Harry Riddle	9ecc331be8	feat(desktop): search sessions by id	2026-06-04 07:49:34 -07:00
worlldz	081694c111	fix(kanban): isolate board override per concurrent call	2026-06-04 07:39:53 -07:00
Teknium	fef04a197e	fix(desktop): purge electron cache unconditionally, not via stdlib zipfile gate The salvaged detector validated each cached electron-*.zip with zipfile.testzip() and only purged ones it judged corrupt. But stdlib zipfile reads from the end-of-central-directory backward, so it silently tolerates prepended/concatenated junk — which is exactly the corruption the bug report names ('86257938 extra bytes at beginning or within zipfile', a partial download resumed into the same file). testzip() returns clean on those zips, so the self-heal never fired for the reported failure mode. Drop the self-rolled validator: on any packaged-build failure, purge the version's cached zips AND the half-written unpacked dir, then retry once. @electron/get re-downloads with its own SHASUM verification — the real source of truth, which catches prepend/concat/truncate alike. An unrelated failure just costs one clean re-download and fails the same way. Verified empirically: zipfile.testzip() returns None (clean) on a prepended-junk zip; the unconditional purge removes it correctly.	2026-06-04 07:17:33 -07:00
Harry Riddle	f583c6ebd5	fix(desktop): recover from corrupt cached Electron download on build hermes desktop failed on Linux with an ENOENT renaming release/linux-unpacked/electron -> Hermes. Root cause is a corrupt cached Electron zip (~/.cache/electron/electron-.zip): app-builder unpack-electron extracts a partial tree from the bad zip that is missing the electron binary, so electron-builder dies on the final rename. Re-running repeats the broken extraction, leaving the desktop app permanently unlaunchable until the cache is manually purged. - Add _electron_download_cache_dirs() + _purge_corrupt_electron_cache() to hermes_cli/main.py: validate every electron-.zip via zipfile.testzip() and delete corrupt ones; honor electron_config_cache / ELECTRON_CACHE overrides with per-OS defaults. - Wire purge + single retry into cmd_gui packaged-build failure path so a poisoned download self-heals (electron re-downloads clean). - Add beforePack hook (apps/desktop/scripts/before-pack.cjs) to wipe the target unpacked dir before staging, making packaging idempotent across interrupted runs. Cross-platform, best-effort. - Tests: corrupt-zip detector, cmd_gui purge/retry/launch path, no-retry-when-clean path, and node --test for the cleanup helper.	2026-06-04 07:17:33 -07:00
Frowtek	3858cf4307	fix(cli): honor global-root active_provider fallback for named profiles	2026-06-04 07:08:30 -07:00
Frowtek	b7169f9bbb	fix(gateway): keep pending /update completion notifications until the target platform reconnects	2026-06-04 06:56:28 -07:00
Frowtek	fff0561441	fix(gateway): anchor Google Chat OAuth client secret to default Hermes root	2026-06-04 06:45:32 -07:00
Frowtek	07f5382675	fix(gateway): don't treat dm_policy: pairing as open access on own-policy adapters	2026-06-04 06:31:28 -07:00
teknium1	dd4ba4c2c4	fix(vision): cap pixel dimensions proactively at embed time + declare Pillow Follow-up to the salvaged #37727. That PR fixed the reactive recovery path (classifier + post-failure shrinker) but left the PROACTIVE embed-time guard in vision_tools byte-only — a tall small-byte screenshot (e.g. 1200x12000 at 0.06 MB) still baked into immutable history un-resized, relying on a failed round-trip to trigger reactive shrink. - vision_tools: add _image_exceeds_dimension() + _EMBED_MAX_DIMENSION (7900px); the embed-time cap now fires on bytes OR pixels and passes max_dimension to the resizer, so tall small-byte images are shrunk before they're embedded. - vision_tools: best-effort lazy-install of Pillow (tool.vision) in the resize ImportError fallback so the soft dep self-heals (respects allow_lazy_installs). - error_classifier: add two more Anthropic dimension-cap wording variants. - pyproject + lazy_deps: declare Pillow as the [vision] extra / tool.vision lazy dep (it was undeclared everywhere; without it ALL resize recovery no-ops). - tests: cover _image_exceeds_dimension (tall/small/edge/no-Pillow/corrupt). Co-authored-by: kyssta-exe <kyssta-exe@users.noreply.github.com>	2026-06-04 06:16:45 -07:00
kyssta-exe	6bdbe30763	fix(vision): guard image pixel dimensions, not just bytes (#37677 ) Anthropic enforces two independent ceilings per image: 1. 5 MB encoded byte size 2. 8000 px longest side Hermes only guarded #1. A tall screenshot (e.g. 1200x12000 at 0.06 MB) passes every byte check but fails the pixel check, returning a non-retryable HTTP 400 that permanently bricks the conversation thread. Fixes: - error_classifier: add 'image dimensions exceed' pattern to _IMAGE_TOO_LARGE_PATTERNS so the 400 is classified as image_too_large and triggers the shrink/retry path instead of falling through to non-retryable error. - conversation_compression: check pixel dimensions (via Pillow) even when byte size is under the 4 MB target. If max(dims) > 8000, force shrink. - vision_tools._resize_image_for_vision: add optional max_dimension param. When set, images exceeding the pixel cap are downscaled even if they're under the byte budget. The resize loop now checks both byte AND pixel limits before accepting a candidate. Closes #37677	2026-06-04 06:16:45 -07:00

1 2 3 4 5 ...

4928 Commits