hermes-agent

Author	SHA1	Message	Date
kewe63	19db9cd076	fix(acp): replace direct db._lock/_conn access with public update_session_meta() session.py _persist() bypassed SessionDB's thread-safe write path by accessing private internals db._lock and db._conn directly: with db._lock: db._conn.execute("UPDATE sessions SET model_config = ? ...") db._conn.commit() This was fragile for three reasons: 1. It bypassed _execute_write()'s BEGIN IMMEDIATE + jitter-retry logic, so concurrent writes could hit SQLite BUSY without retrying. 2. It called db._conn.commit() manually, breaking the transactional contract that _execute_write() enforces. 3. Any internal rename of _lock or _conn would silently break this call site with an AttributeError at runtime. Fix: - Add SessionDB.update_session_meta(session_id, model_config_json, model) to hermes_state.py. Routes through _execute_write() for the standard BEGIN IMMEDIATE + lock + jitter-retry guarantee. Uses COALESCE so passing model=None leaves the stored model column unchanged. - Replace the db._lock / db._conn block in session.py _persist() with a single db.update_session_meta() call. Tests (tests/acp/test_session_db_private_access.py, 11 tests): - Unit tests for update_session_meta: updates model_config, updates model, preserves existing model on None, routes through _execute_write, no-op on non-existent session. - AST checks: db._lock and db._conn not referenced in session.py; _persist() calls update_session_meta(). - Integration round-trips: cwd and model persisted correctly; COALESCE prevents overwriting an existing model with NULL.	2026-06-04 17:54:59 -07:00
teknium1	d33d23c852	fix(vision): drop models.dev catalog fallback, keep explicit profile flag The models.dev supports_vision field reflects model IMAGE-INPUT capability, which is not the same contract as 'provider API accepts images inside tool-result messages' — the looser heuristic could re-introduce the exact HTTP 400 'text is not set' it aims to fix. Keep only the explicit, opt-in ProviderProfile.supports_vision flag (set on xiaomi); add catalog-based detection later if a concrete provider needs it.	2026-06-04 17:53:49 -07:00
Kewe63	f736d2be86	fix(vision): detect vision-capable custom providers via ProviderProfile flag _supports_media_in_tool_results() had a hardcoded provider allowlist that missed custom providers and newer vision-capable providers like xiaomi. Added ProviderProfile.supports_vision flag and made the function check: 1. Registered provider profile (supports_vision flag) 2. Model capabilities from models.dev catalog (supports_vision) 3. Existing hardcoded allowlist (unchanged) This fixes HTTP 400 "text is not set" errors when vision-capable custom providers receive text-only tool results instead of multipart image content. Related: #25594	2026-06-04 17:53:49 -07:00
Kewe63	4a4b9bd2dc	fix(test): add platform guard for grp import Tests in test_gateway_service.py imported grp inline without a platform guard, causing ImportError on systems where grp is unavailable (e.g. macOS, WSL without grp module). Added pytest.importorskip('grp') at module level alongside the existing pwd guard, and removed three redundant inline import grp statements. Fixes #24531	2026-06-04 17:52:50 -07:00
bedirhancode	99cee124dc	docs(install): warn that VPS browser consoles mangle special chars (#36279 ) (#38811 ) Some VPS providers (Hetzner Cloud and others) offer a browser-based console for managing hosts. These consoles transmit special characters incorrectly — ':' may arrive as ';', '@' may be mis-rendered, and non-English keyboard layouts fare worse — which silently corrupts 'docker run' arguments like '-v ~/.hermes:/opt/data', '-e KEY=value', and pasted API keys / tokens. Adds a :::caution admonition above the Quick start 'docker run' block in website/docs/user-guide/docker.md recommending SSH for copy-paste- safe command entry, with manual-typing guidance as a fallback. Pure docs change, no code touched. Closes #36279 Co-authored-by: Bedirhan Celayir <bedirhancode@users.noreply.github.com>	2026-06-05 10:49:55 +10:00
ethernet	36f1cd7dea	feat(installer): do shallow clones no need to get the whole repo history :)	2026-06-04 17:49:16 -07:00
teknium1	0538c5ed19	chore: add dirtyren to AUTHOR_MAP for PR #38177 salvage	2026-06-04 17:42:10 -07:00
dirtyren	74e845c000	fix(slack): pass thread_ts in standalone send_message tool path The standalone `_send_slack()` function used by the send_message tool and cron delivery fallback was not passing `thread_ts` to the Slack API, causing messages to post to the top-level channel instead of inside threads. - Add `thread_ts` parameter to `_send_slack()` - Include `thread_ts` in the chat.postMessage payload when present - Pass `thread_id` from `_send_to_platform()` to `_send_slack()` Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-06-04 17:42:10 -07:00
Teknium	fe4e327bb5	chore: add Kewe63 to release AUTHOR_MAP	2026-06-04 17:40:33 -07:00
Kewe63	c14c37d46b	fix(openviking): add missing /agent/{agent}/ segment to memory URI — fixes #36969 _build_memory_uri produced URIs of the form: viking://user/{user}/memories/{subdir}/mem_{slug}.md The /agent/{agent}/ segment was missing, causing every agent under the same user to write into the same flat namespace. In multi-agent deployments agents silently overwrite each other's memories and vector retrieval cross-pollinates results. self._agent was already populated correctly (from OPENVIKING_AGENT env var, default 'hermes') and sent via X-OpenViking-Agent header — it was simply not interpolated into the URI. Fix: add the missing segment so URIs follow the documented shape: viking://user/{user}/agent/{agent}/memories/{subdir}/mem_{slug}.md Tests: 4 new regression tests in TestOpenVikingMemoryUriBuilder, 13/13 passed (9 existing + 4 new).	2026-06-04 17:40:33 -07:00
Teknium	b20fcffa54	docs: make dashboard/gateway prerequisites explicit for remote-backend connection (#39128 ) Both the desktop and web-dashboard remote-backend sections now state up front that the 'remote backend' is a running 'hermes dashboard' process the desktop app attaches to (it does not start it for you), and that the gateway is a separate process needed only for messaging channels.	2026-06-04 17:38:49 -07:00
Ben Barclay	8a888441d7	fix(docker): recover from out-of-band container removal in persistent mode (salvage #36631 ) (#39415 ) Salvage of #36631 (@annguyenNous), rebased onto current main with regression tests added. Fixes #36266. When a persistent Docker sandbox container is removed out-of-band (idle reaper, `docker prune`, OOM kill, daemon restart), the gateway kept issuing `docker exec` against the dead container ID, returning "No such container" on every subsequent tool call — the agent was permanently blocked until the gateway process restarted. DockerEnvironment.execute() now detects the "No such container" / "is not running" error after a non-zero exit (gated on persist_across_processes) and calls _recreate_container(): it tries label-based reuse first, falls back to a fresh container replaying the same image + full all_run_args set, re-runs init_session(), and retries the command once. A genuine non-zero exit is NOT misclassified as container-gone. Differs from #36631 as submitted: adds the tests the original lacked. tests/tools/test_docker_environment.py covers _is_container_gone pattern matching (incl. the negative/control case), the recover-and-retry path, the persist_across_processes=False opt-out (no recovery), and the ordinary-failure passthrough (no spurious recreation). _make_dummy_env now forwards persist_across_processes. Verified: - Unit: 67/67 in test_docker_environment.py (4 new + existing). - Live E2E against the real docker daemon: started a persistent container, `docker rm -f`'d it out-of-band, and the next execute() transparently recreated a fresh container and succeeded; a follow-up command worked in the recovered container; a real `exit N` passed through without triggering recovery. Co-authored-by: annguyenNous <annguyenNous@users.noreply.github.com>	2026-06-05 10:33:44 +10:00
Ben Barclay	c54b935873	fix(desktop): rename session via session.title RPC so /title works (#39410 ) The desktop `/title <name>` command 404s with "Session not found" on every platform (reported on Windows in #38508). Root cause: `session.create` returns two distinct ids — a runtime session id (held in `activeSessionIdRef`) and a `stored_session_id` (the DB `sessions.id`) — and deliberately does NOT persist a DB row until the first turn. Routing `/title` through the REST `PATCH /api/sessions/{id}` endpoint (as #38576 proposed) resolves the id against the `sessions` table, so the runtime id — or any brand-new, not-yet-persisted session — never resolves and returns 404. This is an id-type mismatch, not a Windows file-locking quirk, so it fails on macOS and Linux too. Fix: route `/title <name>` through the gateway's `session.title` RPC — the exact path the TUI already uses (`ui-tui/.../slash/commands/core.ts`). The RPC maps the runtime id to the in-memory session, writes through the gateway's own DB connection, and queues the title (`pending: true`) when the row isn't persisted yet, so it works for a fresh chat. The sidebar is then refreshed via the existing `refreshSessions()` plumbing. Keeps the sidebar-refresh wiring and `refreshSessions` threading from #38576; replaces only the broken REST/slash-worker write path. A bare `/title` (no arg) still falls through to the worker to show the current title. Tests rewritten to assert `session.title` routing with the runtime-vs- stored id distinction (which the original mock collapsed), plus the queued/`pending` fresh-chat case and the error path. Supersedes #38576. Fixes #38508. Co-authored-by: xxxigm <54813621+xxxigm@users.noreply.github.com>	2026-06-04 19:32:24 -05:00
Teknium	fd87c61078	feat(models): add qwen/qwen3.7-plus to nous + openrouter catalogs (#39409 ) Adds qwen/qwen3.7-plus directly under qwen/qwen3.7-max in both the OpenRouter curated catalog (OPENROUTER_MODELS) and the Nous portal catalog (_PROVIDER_MODELS['nous']), then regenerates the docs-hosted model-catalog.json manifest from those source lists.	2026-06-04 17:29:45 -07:00
rob-maron	54cae7d1cb	switch model order	2026-06-04 17:29:31 -07:00
Teknium	2c98dc0a96	fix(desktop): offer remote sign-in on a gated-gateway boot failure (#39402 ) When a remote gateway with username/password (or OAuth) auth restarts, its session cookie lapses and Desktop boots into the recovery overlay with a session-expired error. That overlay only exposed local-recovery actions — Retry (resets the local bootstrap latch) and Repair (re-runs the installer) — neither of which can re-establish a remote session, so the user is stuck in a no-op Retry loop with no way to sign in again. The overlay now detects a remote-reauth boot failure from the saved connection config (remote + gated + not currently connected + has a URL) and surfaces a primary 'Sign in to remote gateway' button that opens the gateway login window (the username/password form for a basic gateway, the OAuth redirect otherwise) and reloads on success. Button copy is driven by a best-effort provider probe, matching the gateway-settings page. Detection and copy logic live in a pure helper module with unit coverage.	2026-06-04 17:28:29 -07:00
Ben Barclay	82c157b267	fix(docker): clean up orphaned container when docker run fails (salvage #7440 ) (#39412 ) When `docker run -d` fails after Docker has already created the container object (e.g. exit 125 when the daemon isn't ready, or a timeout mid image pull), the code raised before `self._container_id` was set — so the container leaked permanently in "Created" state. Reported in #7439: 110+ orphaned containers accumulated over 3 days from hourly cron- scheduled gateway sessions hitting a Docker Desktop startup race. The orphan reaper added in #33645 (reap_orphan_containers) does NOT cover this case: it filters `status=exited`, but a failed-create container is in `Created` state, so it slips through and is never reaped. Wrap the `docker run -d` call in try/except and `docker rm -f` the container by its known name before re-raising. Salvages #7440 by @Tranquil-Flow. Their branch predated the cross-process reuse + labels rework on `main`, so a cherry-pick conflicted; reconstructed the same intent (plus their two regression tests, adapted to mock the new reuse `docker ps` probe) against current `main`. Verified adversarially: reverted just the product change to origin/main's `docker.py`, ran the two new tests -> both FAIL with `assert 0 == 1 ("docker rm should be called once")`. With the fix applied, both pass; full test_docker_environment.py is 65/65 green. Closes #7440. Fixes #7439. Co-authored-by: Evi Nova <66773372+Tranquil-Flow@users.noreply.github.com>	2026-06-05 10:19:08 +10:00
Evi Nova	4690bbc363	fix(local): recognize unqualified hostnames as local endpoints (#9248 ) Docker Compose service names (e.g. ollama, litellm, hermes-litellm) are unqualified hostnames with no dots. These are always local — they resolve via Docker DNS, /etc/hosts, or mDNS. Without this fix, the stale stream timeout fires on local LLM proxies, causing infinite reconnect loops. Closes #7905	2026-06-05 10:18:10 +10:00
annguyenNous	751b91446e	fix(mcp): ensure server.shutdown() on probe iteration failure Wrap the _tools iteration in _probe_single_server() in try/finally so that server.shutdown() is called even if iterating tool metadata raises. Without this, the MCP server connection leaks until the event loop is torn down by _stop_mcp_loop().	2026-06-04 17:11:17 -07:00
Ali Zakaee	454d6cbe52	fix(telegram): finalize sealed overflow chunk so split streamed replies render formatting The existing-message overflow split path in stream_consumer.run() sealed the first chunk via _send_or_edit(chunk) (finalize=False) then reset _message_id to None — so that chunk was never edited again and never received the adapter's final rich-text pass. On Telegram, MarkdownV2 formatting is applied on the finalize edit, so early split messages of a long multi-part streamed reply rendered raw markdown (##, bold, code fences) while only the last chunk rendered correctly. Fix: seal the overflow chunk with finalize=True so it gets its final formatting pass before _message_id is cleared. Salvaged from #32609 (the streaming-format portion only; the PR's send_draft parse_mode change is already superseded on main, and its media-roots change conflicts with the current denylist + recency-window delivery model).	2026-06-04 17:11:12 -07:00
flooryyyy	e7a7872a87	fix(tui_gateway): dedup re-queued process notifications flooding TUI _ notification_poller_loop_ re-emits status.update every cycle when a background process completes while the session is busy. The same completion event gets re-queued and re-emitted to the TUI every few ms, flooding the transcript with duplicate lines. Add _notification_event_dedup_key(evt) that returns a tuple identity for each notification event. Only emit status.update on first sight per identity: - completions: (sid, type) — one-shot per process session - watch_match: (sid, type, command, pattern, output, ...) - watch_overflow/disabled: (sid, type, command, message, ...) The dedup key design was refined from an initial sid:type approach after @lordbuffcloud identified that distinct watch_match events (READY vs DONE) for the same process would be incorrectly collapsed. Tests from @tymrtn cover distinct watch matches, exact replay dedup, and completion one-shot behavior. Co-authored-by: tymrtn <ty@tmrtn.com>	2026-06-04 16:56:34 -07:00
Shannon Sands	2f0c8e90e6	Add Telegram QR onboarding to dashboard	2026-06-04 16:55:27 -07:00
Teknium	5300727a08	revert: keep Google Chat OAuth secret + active_provider profile-scoped (#39398 ) * Revert "fix(gateway): anchor Google Chat OAuth client secret to default Hermes root" This reverts commit `fff0561441`. * Revert "fix(cli): honor global-root active_provider fallback for named profiles" This reverts commit `3858cf4307`. * docs(google_chat): describe OAuth client secret as profile-scoped, not host-wide The setup docs, oauth docstring, and the adapter's 'no credentials' error message all described the Google Chat OAuth client secret as host-wide shared infrastructure. That contradicts profile isolation: profiles are separate auth boundaries, so two profiles can point at different Google OAuth apps / accounts. Reword all three to say the secret is profile-scoped and each profile registers its own.	2026-06-04 16:54:40 -07:00
bluefishs	6ad015255d	chore: enforce LF line endings for container entrypoints (#12181 ) Windows contributors checking out on NTFS with git's default core.autocrlf will end up with CRLF in docker/entrypoint.sh. When COPY'd into the image and invoked as ENTRYPOINT, the kernel interprets the trailing \r as part of the interpreter path, producing a confusing 'no such file or directory' despite the file being present and executable. Lock LF for the usual suspects (.sh, Dockerfile, .dockerfile, and the specific docker/entrypoint.sh). The existing tree is already LF; this is preventive against future Windows regressions only.	2026-06-05 09:54:01 +10:00
zer0 spirits	eb43a5b5d8	chore: improve .dockerignore with Python and common patterns (#6092 ) Co-authored-by: 欧阳 <archer@ouyangdeMac-mini.local>	2026-06-05 09:53:42 +10:00
Ben Barclay	b434f8c3e0	fix(deps): promote markdown to a core dependency so rich delivery works out of the box (#32486 ) (#38649 ) `markdown` was declared only in the `matrix` optional extra, and the official Docker image installs `--extra all --extra messaging --extra anthropic --extra bedrock --extra azure-identity --extra hindsight` — notably NOT `--extra matrix` (the matrix extra is deliberately routed to lazy-install because `mautrix[encryption]`/`python-olm` can't build on Windows/macOS — see the 2026-05-12 policy comment in `[all]`). Result: `markdown` never lands in the image venv, so the Markdown->HTML conversion on the DEFAULT delivery path silently falls back to plain text. Cron/agent deliveries render raw `##`/`**`/tables in clients like Element (no `formatted_body`). The conversion is now used by BOTH `gateway/platforms/matrix.py` and `tools/send_message_tool.py`, so it is no longer matrix-specific. `markdown` is a pure-Python `py3-none-any` wheel (~108KB, no compiled extensions, no platform constraints), so none of the reasons the matrix extra was lazy-routed apply to it. Promote it to a core dependency so it ships in the wheel, the Docker image, and every install; drop the now redundant copies from the `matrix` extra and the `platform.matrix` lazy-deps group; refresh the stale "installed with the matrix extra" docstring. Verified against a real build: ran the image's exact `uv sync` command (same extras, no `--extra matrix`) in a clean container off the new lockfile -> `import markdown` succeeds (3.10.2). On `origin/main` the same command leaves markdown absent. 223 targeted tests pass (test_matrix.py + test_lazy_deps.py). Closes #32486.	2026-06-04 16:46:36 -07:00
Dusk	495c3733d8	fix(config): bridge docker_volumes and docker_forward_env in config set (#38611 ) Co-authored-by: Ben Barclay <ben@nousresearch.com>	2026-06-05 09:31:01 +10:00
Ben	825629424d	fix(tui): persist timed-out/cancelled clarify prompts in transcript When a clarify prompt times out (backend _block returns an empty answer after the configured timeout) or is dismissed with Esc/Ctrl+C, the live ClarifyPrompt overlay was torn down by turnController.idle() -> resetFlowOverlays() with no persistent transcript record. The question and options vanished from the screen while the agent's follow-up still referred to "the options above". The answered path already persists the question + answer; only the unanswered exits left no trace. This asymmetry is the bug. Fix (TUI layer only, no Python/protocol change): - formatAbandonedClarify() in lib/text.ts renders the question + the same 1-based numbered option list shown by ClarifyPrompt, plus a reason ('timed out' / 'cancelled'). - Timeout: createGatewayEventHandler flushes a still-live clarify into the transcript as a plain system line when the clarify tool's own tool.complete fires. A live capture of the event stream confirmed this is the only point where the overlay is still set after a timeout: the sequence is clarify.request -> (timeout) -> tool.complete -> message.complete, with NO intervening message.start/tool.start. On a real answer, answerClarify() clears the overlay before tool.complete arrives, so the hook no-ops there (no double-write); a per-requestId guard set is belt-and-braces. - Explicit cancel: answerClarify('') persists the prompt as a system line instead of a transient 'prompt cancelled' flash. System lines always render (unlike trail lines, which /details can hide), so the record reliably survives on screen as standard output. Verified live in the TUI: an Esc-cancelled clarify now leaves the question + options + '(cancelled - no selection)' in the transcript after the turn ends. Tests: formatAbandonedClarify unit cases + gateway-handler behavioral cases (persist on clarify tool.complete, no flush on a non-clarify tool.complete, no double-persist on repeat tool.complete, no-op when the overlay was already cleared by an answer).	2026-06-04 16:25:54 -07:00
Austin Pickett	dfd6bcf1ff	fix(desktop): restore accordion expand for credential settings rows (#39327 ) * fix(desktop): restore accordion expand for credential settings rows Reintroduce collapsible provider and tool key rows so descriptions, docs links, and advanced fields stay hidden until a row is expanded. Co-authored-by: Cursor <cursoragent@cursor.com> * docs(desktop): add credential settings accordion screenshots for PR 39327 Co-authored-by: Cursor <cursoragent@cursor.com> --------- Co-authored-by: Cursor <cursoragent@cursor.com>	2026-06-04 19:10:44 -04:00
helix4u	d29caf3828	fix(desktop): satisfy slash metadata typecheck	2026-06-04 17:56:36 -04:00
ethernet	1eeb7da2e6	fix(desktop): slash commands bypass queue when busy and chip id suffix leak (#39289 ) Two fixes for desktop app slash command handling: 1. Slash commands submitted while the agent is busy now execute immediately instead of being queued. Previously submitDraft() unconditionally queued any draft when busy, but slash commands are client-side operations or self-contained gateway RPCs that should run regardless of busy state (matching TUI behavior). executeSlashCommand already has its own per-command busy guard for commands that genuinely need an idle session. 2. Slash command trigger items no longer leak the "\|index" suffix from their item.id into the serialized chip text. The toItem callback now sets rawText in metadata so hermesDirectiveFormatter.serialize takes the direct-insertion path instead of the legacy @type:id fallback. This also means slash commands enter the composer as plain text (not chips), matching selectSkinSlashCommand and TUI behavior.	2026-06-04 16:06:45 -05:00
Austin Pickett	acce1a2452	feat(desktop): polish credentials settings and messaging env routing (#39217 ) * feat(desktop): polish credentials settings and messaging env routing Align Provider API Keys and Tools & Keys with Advanced ListRow inputs, add Tools & Keys sidebar subnav, move platform env vars to Messaging via channel_managed discovery, strip toolset emojis, and condense cron actions. Co-authored-by: Cursor <cursoragent@cursor.com> * fix(desktop): align Messaging credential inputs with settings ListRow style Remove monospace inputs and use CREDENTIAL_CONTROL_CLASS + ListRow layout to match Provider API Keys and Tools & Keys. Co-authored-by: Cursor <cursoragent@cursor.com> --------- Co-authored-by: Cursor <cursoragent@cursor.com>	2026-06-04 14:01:15 -04:00
liuhao1024	a3fb48b2ce	fix(state): keep /branch sessions visible after parent reopen /branch (aka /fork) sessions vanished from /resume and /sessions. Both surfaces funnel through list_sessions_rich(include_children=False), which hid any session with a parent_session_id unless identified as a branch via a heuristic — parent.end_reason == 'branched' AND child.started_at >= parent.ended_at. Two ways that heuristic failed: 1. CLI/gateway branches: once the parent was reopened (e.g. resumed) and re-ended with a different end_reason (tui_shutdown overwriting 'branched'), the heuristic stopped matching and the branch was hidden permanently. 2. TUI branches (tui_gateway session.branch): the TUI never ends the parent as 'branched' — it creates the child while the parent is still live — so the heuristic NEVER matched and TUI branches were hidden from the moment they were created (this is the macOS desktop app's primary symptom). Fix: persist a stable '_branched_from' marker in the branch session's model_config at creation time across ALL THREE branch paths (CLI cli.py, gateway gateway/run.py, and TUI tui_gateway/server.py), and OR a json_extract(model_config, '$._branched_from') IS NOT NULL check into the list_sessions_rich filter. The marker is immutable across the parent's lifecycle, so the branch stays visible regardless of how/whether the parent is ended. The legacy end_reason heuristic is kept (OR'd) so pre-existing branches remain visible. Subagent/compression children (no marker, parent not 'branched') stay correctly hidden. Fixes #20856. Approach by liuhao1024 (PR #20864); reimplemented on current main, extended to the TUI branch path (which the original missed), with regression tests for the reopen+re-end scenario and the TUI marker persistence.	2026-06-04 10:07:20 -07:00
teknium1	d1367355d5	chore(release): map jeffrobodie@gmail.com -> jeffrobodie-glitch for salvage	2026-06-04 12:18:38 -04:00
Jeff	1f347ee543	fix(uv): move venv aside instead of gutting it in place on Windows rebuild hermes update can brick a Windows install. When 'hermes update --force' runs past the concurrent-process guard, rebuild_venv runs while the venv is still in use: shutil.rmtree(ignore_errors=True) deletes site-packages + certifi's cert bundle but can't remove the locked python.exe, leaving a half-gutted venv that uv venv then refuses to overwrite. Every later HTTPS call dies with FileNotFoundError for the missing cacert and there is no recovery. --clear alone (the `c136eb4de` retry path) does not fix the real lock case: when the locked interpreter is inside the venv being rebuilt, neither rmtree nor uv venv --clear can delete it. os.replace of the parent directory is allowed on Windows (a running .exe is tracked by handle, not path), so we move the old venv aside atomically to <venv>.old, rebuild with --clear in its place, and the still-running gateway/desktop keep using the moved-aside copy until they restart. If the venv genuinely can't be moved, we abort cleanly and leave it fully intact; if the rebuild fails, we restore the moved-aside copy. Folds in the call-site guards from #38511 (@f3rs3n): - rebuild_venv() returns False (and restores the backup) if uv exits 0 without producing an interpreter. - both hermes update venv-rebuild call sites abort with RuntimeError instead of continuing into dependency install when rebuild_venv() returns False. Also gitignore /venv.old/ so the update autostash (git stash --include-untracked) doesn't sweep the moved-aside venv on every run. Root-cause fix for #37881. Supersedes the --clear-only retry from `c136eb4de`. Co-authored-by: f3rs3n <32328813+f3rs3n@users.noreply.github.com>	2026-06-04 12:18:38 -04:00
rexdotsh	ee7948ea6e	fix(deps): exclude dev tooling from all extra	2026-06-04 08:54:38 -07:00
kshitijk4poor	8077e7d2fb	fix(tui): narrow resume lock to avoid blocking session.close The salvaged fix held _session_resume_lock across _make_agent (MCP discovery + AIAgent construction, seconds), serializing it against session.close. Since session.close runs on the main RPC dispatch thread (not a _LONG_HANDLER), a close racing a mid-build resume would stall all fast-path RPCs (approval.respond, session.interrupt). Restructure to double-checked locking: build the agent outside the lock, then re-check _find_live_session_by_key under the lock before _init_session. A losing concurrent resume discards its just-built agent (no worker/poller wired yet) and reuses the winner. Updated the concurrent-resume regression test to assert the real invariant (one surviving live session + loser agent closed) rather than the implementation detail of a single _make_agent call.	2026-06-04 08:18:26 -07:00
rexdotsh	bd6d098762	fix(tui): keep resumed live history current	2026-06-04 08:18:26 -07:00
rexdotsh	98903d0313	fix(tui): reuse live session on resume	2026-06-04 08:18:26 -07:00
kyssta-exe	30412a9771	fix(cron): re-validate stale cron-output entries before deletion (#37721 ) quick() and dry_run() previously trusted the stored category from tracked.json without re-validating at delete time. Stale entries from before #34840 could carry category="cron-output" for cron control-plane paths (e.g. cron/jobs.json), causing quick() to delete the live scheduler registry. Fix: - Fix guess_category() to only classify cron/output/** as cron-output (was classifying ALL cron/* paths, missing the #34840 fix). - Re-validate cron-output entries via guess_category() at delete time in quick() and dry_run(); stale entries that are no longer classified as cron-output are skipped and removed from tracked.json. - Add _is_protected_cron_path() as a hard defense-in-depth guard that blocks deletion of cron/cronjobs directories and known control-plane files (jobs.json, .tick.lock) regardless of stored category. - Update test_cron_subtree_categorised to match fixed guess_category (only cron/output/* is cron-output, not all of cron/). Tests: add 5 regression tests in TestStaleCronEntryMigration.	2026-06-04 07:52:04 -07:00
CryptoByz	693f4c7e9c	fix(gateway): clear zombie agent slot when session_reset races in-flight run A session_reset (/new, /cc) that bumps the run generation while an agent turn is in flight left the dead agent in the _running_agents slot: the in-flight run's own release is generation-guarded and correctly returns False, and the outer finally's sentinel-only check also missed the leftover real agent. The session then silently dropped every subsequent message as 'agent busy' until a full gateway restart. (#28686) - _process_message_or_command outer finally now calls the unconditional, idempotent _release_running_agent_state(key) on all exit paths instead of the sentinel-vs-else branch that could strand a dead agent. - _handle_reset_command evicts the slot right after bumping the generation, so the zombie is cleared at reset time regardless of how the in-flight run unwinds. Co-authored-by: CryptoByz <cryptobyz.airdrop@gmail.com>	2026-06-04 07:50:45 -07:00
teknium1	2982122be7	fix(gateway): deliver $HOME deliverables on root-run gateways Root-run gateways have $HOME=/root, which is on the MEDIA system-path denylist, so the gateway silently dropped agent-generated deliverables under /root (e.g. /root/work/proposal.docx) — the user got a 'here is your file' reply with nothing attached. _path_under_denied_prefix now treats the running user's own home as deliverable: the home tree itself is no longer denied, while the more-specific denied paths inside it (~/.ssh, ~/.aws, ~/.hermes/.env, auth.json, config.yaml) stay blocked because they are separate denylist entries. The exception only matches when the denied prefix IS $HOME, so a non-root gateway still can't deliver another user's home. Diagnosis, reproduction, and the failing-case analysis are from @GodsBoy (#38108 / #38106). Implemented here as the minimal denylist fix rather than a staging/copy subsystem. Co-authored-by: GodsBoy <dhuysamen@gmail.com>	2026-06-04 07:50:22 -07:00
Teknium	580d924097	perf(desktop): make session-id search SQL-bounded, not O(n) search_sessions_by_id previously fetched up to 10k sessions via list_sessions_rich and filtered them in Python — O(n) per keystroke. Push the id match into SQL instead. - list_sessions_rich gains an optional id_query param: a case-insensitive LIKE pushed into the outer WHERE, matched against each surfaced row's id AND every id in its forward compression chain (via the existing chain CTE). Searching a compression root id or a tip id both resolve to the same projected conversation. LIKE wildcards in the needle are escaped. - search_sessions_by_id now fetches only matching rows (limit*4) and ranks exact > prefix > substring in Python over that small set. - web_server /api/sessions/search: route ID matches and content matches through one lineage-keyed dedup helper so an id-hit and a content-hit on the same conversation collapse to a single result (the contributor's version keyed ID hits by raw sid and content hits by root, which could double-list a compression tip). - command-center haystack also matches _lineage_root_id for parity. E2E verified against a real DB: exact match over 3000+ sessions materializes 1 row in Python (was ~3000), 5ms; root-id resolves to tip; LIKE-wildcard escaping holds. Follow-up to @0xharryriddle's feat(desktop): search sessions by id.	2026-06-04 07:49:34 -07:00
Harry Riddle	9ecc331be8	feat(desktop): search sessions by id	2026-06-04 07:49:34 -07:00
teknium	62f0cfd902	fix(kanban-dashboard): use context-local board pin in specify/decompose endpoints The dashboard specify and decompose endpoints run as sync FastAPI threadpool handlers and pinned the active board by mutating the process-global HERMES_KANBAN_BOARD env var. Two concurrent requests for different boards race on that shared global and cross-write — the same bug class as the CLI path (#38323), now using the scoped_current_board() contextvar introduced by the CLI fix.	2026-06-04 07:39:53 -07:00
worlldz	081694c111	fix(kanban): isolate board override per concurrent call	2026-06-04 07:39:53 -07:00
AhmetArif0	de370fd10f	fix(dashboard): prevent stale desc-save indicator when requests overlap handleSaveDesc and handleAutoDescribe both set their loading flag in a try block but always cleared it unconditionally in finally. When a user opened profile A's description editor, clicked Save, then quickly switched to profile B's editor and saved, profile A's resolving request would clear descSaving/describing while profile B's request was still in-flight, making the "Saving…" indicator disappear prematurely. Track concurrent in-flight counts with descSavingCount and describingCount refs (mirrors the existing activeDescRequest guard pattern). The loading flag is cleared only when the counter reaches zero, i.e. all overlapping requests have settled.	2026-06-04 07:23:22 -07:00
AhmetArif0	c2d11cc95d	fix(dashboard): surface model-write failure when creating a profile POST /api/profiles returns model_set: false when the model assignment step fails (e.g. filesystem error) while the profile itself was created successfully. handleCreate discarded the response, so the user received a "Profile created" success toast with no indication that their chosen model was not persisted. Capture the response and show an error toast when a model was selected but model_set is explicitly false, directing the user to set it from the profile editor.	2026-06-04 07:23:22 -07:00
AhmetArif0	6feb40e702	fix(desktop): wait for backend exit before reloading on connection-config apply The apply handler sent SIGTERM then fired a 150 ms setTimeout to reload the renderer. If the backend took longer to shut down the port was still bound when startHermes() ran after reload, causing an "address already in use" failure. Capture the process reference before resetHermesConnection() nulls it, then await the actual exit event. A 5 s SIGKILL fallback ensures the wait never hangs if the backend ignores SIGTERM.	2026-06-04 07:23:22 -07:00
Teknium	fef04a197e	fix(desktop): purge electron cache unconditionally, not via stdlib zipfile gate The salvaged detector validated each cached electron-*.zip with zipfile.testzip() and only purged ones it judged corrupt. But stdlib zipfile reads from the end-of-central-directory backward, so it silently tolerates prepended/concatenated junk — which is exactly the corruption the bug report names ('86257938 extra bytes at beginning or within zipfile', a partial download resumed into the same file). testzip() returns clean on those zips, so the self-heal never fired for the reported failure mode. Drop the self-rolled validator: on any packaged-build failure, purge the version's cached zips AND the half-written unpacked dir, then retry once. @electron/get re-downloads with its own SHASUM verification — the real source of truth, which catches prepend/concat/truncate alike. An unrelated failure just costs one clean re-download and fails the same way. Verified empirically: zipfile.testzip() returns None (clean) on a prepended-junk zip; the unconditional purge removes it correctly.	2026-06-04 07:17:33 -07:00

1 2 3 4 5 ...

10615 Commits