Commit Graph

10391 Commits

Author SHA1 Message Date
827f251426 perf(observability): gate tool-hook emit on has_hook; slim per-tool footprint
The salvaged observer contract gated the API-request hot path on has_hook()
but left the per-tool emit ungated: every tool call ran result-field
derivation + payload dict build + invoke_hook dispatch even with zero
plugins registered.

- _emit_post_tool_call_hook now short-circuits on has_hook("post_tool_call")
  and derives status/error fields lazily (after the gate, only when a
  listener will consume them). status defaults to None -> derived; explicit
  blocked/cancelled callers still pass status through.
- transform_tool_result emit (pre-existing hook) likewise gated on
  has_hook(); skips _tool_result_observer_fields when no listener.
- Removed the now-redundant _tool_result_observer_fields pre-computation at
  the three ok-path call sites (model_tools, agent_runtime_helpers,
  tool_executor) — the helper derives them, so the no-listener path costs
  one dict lookup and the call sites shrink.
- Tests: stub has_hook=True where payload correctness is asserted; add a
  no-listener regression proving post_tool_call/transform_tool_result emit
  is skipped when nothing is registered.
2026-06-03 06:36:46 -07:00
432325933a test: restore unrelated trailing newlines in cwd/tool-search tests
The salvaged PR incidentally stripped a trailing blank line from two
unrelated test files (test_file_tools_cwd_resolution.py,
test_tool_search.py). Restore them to keep the salvage diff scoped to
the observability feature.
2026-06-03 06:36:46 -07:00
0d9b7132ff feat(observability): observer-grade telemetry hooks + NeMo-Relay plugin
Adds backend-neutral observer hooks for plugins: session, turn, API
request, tool, approval, and subagent lifecycle events with stable
correlation IDs (session_id, task_id, turn_id, api_request_id,
tool_call_id, parent/child subagent ids). Extends VALID_HOOKS with
api_request_error and subagent_start.

Hot path is zero-cost when no plugin subscribes: has_hook()/presence
checks gate all payload construction, request payloads are returned
by reference when no middleware rewrites, and the sanitized response
payload no longer embeds raw response objects.

Bundles the optional NeMo-Relay observability plugin
(plugins/observability/nemo_relay) as an in-repo consumer of the new
hooks, peer to the existing langfuse plugin. Fails open when the
optional nemo-relay package is not installed.

Authored-by: Bryan Bednarski <bbednarski@nvidia.com>
Salvaged from #29722 onto current main.
2026-06-03 06:36:46 -07:00
a78c73f3aa Merge pull request #38224 from NousResearch/hermes/hermes-79601e59
fix(tui): stop persisting full tool output in trail lines (silent OOM death)
2026-06-03 08:24:39 -05:00
4c544b633d fix(kanban): don't permanently block tasks that hit a provider rate limit (#38223)
A kanban worker that exhausted its retries purely on a provider rate
limit / quota wall (e.g. opencode-go's 5-hour window) exited with code 1.
The dispatcher counted that as a crash, and with DEFAULT_FAILURE_LIMIT=2
two quota-wall hits permanently blocked the card. Fanning out many
workers against one shared quota made this routine.

Now a rate-limited worker exits with EX_TEMPFAIL (75); the dispatcher
classifies that as a 'rate_limited' exit, releases the task back to
'ready' WITHOUT incrementing consecutive_failures (the breaker can't trip
on a transient throttle), and the respawn guard defers the next attempt
on a cooldown (default 5min, HERMES_KANBAN_RATE_LIMIT_COOLDOWN_SECONDS)
until the quota window clears. Genuine crashes still count and trip the
breaker as before. The 120s Retry-After cap is unchanged — no worker
parks for hours holding a slot.

- conversation_loop.py: surface failure_reason in the exhaustion return
- cli.py: kanban worker picks exit 75 on rate_limit/billing failure
- kanban_db.py: rate_limited exit kind, no-count requeue, cooldown guard
2026-06-03 06:19:32 -07:00
60b6352fe5 Merge pull request #38221 from NousResearch/hermes/hermes-45accc84
fix(desktop): stop chat scroll bounce — at-rest backward jump + wheel-up snap-back
2026-06-03 08:05:28 -05:00
e76d8bf5aa fix(tui): stop persisting full tool output in trail lines (silent OOM death)
A heavy --tui session (browser snapshots, large tool outputs) silently
OOM-killed the Node parent within minutes — closing the gateway child's
stdin, which the user saw only as a bare "gateway exited" / stdin EOF.
CLI was immune. Root cause: each completed tool's verbose trail line
embedded up to 16KB of result_text, persisted in transcript Msg.tools[]
for the whole session and rendered EXPANDED by default, so an Ink
render-node tree was built for every one of up to 800 messages at once.
That tree blew past Node's heap at a few hundred MB — far below the 2.5GB
memory-monitor exit threshold, so the death was never even attributed.

- text.ts: persisted verbose tool-trail blocks now cap to a small preview
  (VERBOSE_TRAIL_MAX_CHARS=800/12 lines), not the 16KB live-render budget.
  Retained trail strings drop ~17x (12.2MB -> 0.7MB at 800 msgs); the live
  streaming tail still uses the larger LIVE_RENDER budget.
- tui_gateway/server.py: lower the gateway-side verbose text cap to match
  (1KB/16 lines) so we stop shipping output the TUI no longer renders.
- memoryMonitor.ts: derive critical/high thresholds from the real V8 heap
  ceiling (~88%/70%) instead of the hardcoded 2.5GB that killed the process
  at 31% of an 8GB ceiling; add a one-shot onWarn early-warning on fast
  sub-threshold heap growth so the next such death is diagnosable, not silent.
- entry.tsx: wire onWarn to a crash-log breadcrumb + stderr line.

Full tool output is unchanged in the agent context and SQLite session — this
is display/transport only, no behavior or context change.

Fixes #34095. Related #27282.

Tests: ui-tui text + new memoryMonitor suites (33 pass), python verbose-cap
guard (5 pass); full ui-tui suite shows no new failures vs pristine main.
E2E repro confirms the retention drop.
2026-06-03 06:00:22 -07:00
c5d199eada feat(dashboard): check-before-update flow on the System page (#38205)
The dashboard's update button ran 'hermes update' immediately with no
preview. Now the System page shows whether an update is available and
asks the user to confirm before applying it.

- New GET /api/hermes/update/check: reports install method, current
  version, and commits-behind (via banner.check_for_updates, 6h-cached;
  ?force=1 busts the cache). Soft-fails to behind=null on network error;
  marks docker/nix/homebrew as can_apply=false with the out-of-band cmd.
- System page: update-status badge on the Hermes version row (latest /
  N behind), a Check-for-updates button, and an Update-now button that
  opens a ConfirmDialog showing the commit count before POST /api/hermes/
  update fires. Cached status loads with the rest of the page.
- Docs + 5 endpoint tests (git/up-to-date/docker/soft-failure + auth gate).
2026-06-03 05:57:15 -07:00
c930a49ce9 fix(desktop): honor upward wheel scroll in long threads 2026-06-03 05:54:49 -07:00
3aa24e2619 fix(desktop): stop chat scroll backward-jump from content-growth interim scrolls (#37997)
The thread scroll-anchor hook in apps/desktop/src/components/assistant-ui/
thread-virtualizer.tsx was disarming sticky-bottom whenever scrollTop
decreased by >1px between scroll events. That check was too eager: when
content height grows mid-frame (virtualizer measurement of a newly visible
turn, streaming token, Streamdown/Shiki re-tokenization, composer chip
toggle), the browser emits an interim 'scroll' event whose scrollTop is
smaller than the previous frame's because scrollHeight just jumped. The
rAF-scheduled pinToBottom hasn't run yet, so programmaticScrollPendingRef
is 0 and the disarm fired. With sticky-bottom disarmed the scroller stuck
~50px above bottom — the visible at-rest backward jump that #37997
describes (and the same root cause as the wheel-up variant in #37527).

Fix:
- Track scrollHeight per frame (lastHeightRef). Disarm on scrollTop
  decrease ONLY when scrollHeight did not grow this frame. Real upward
  user intent (scrollbar drag, keyboard PgUp, programmatic scrollIntoView)
  still disarms because it moves scrollTop without growing the content.
  Wheel-up and touchmove continue to disarm via their own listeners.
- Stop observing the scroller element itself in the ResizeObserver; only
  observe its content child. Viewport-only resizes (window resize,
  devtools panel toggle) no longer trigger spurious pins, matching the
  intent of the auto-stick-to-bottom behavior.

Verified:
- apps/desktop `tsc -b` clean.
- apps/desktop `vitest run src/components/assistant-ui/streaming.test.tsx`
  passes (9/9), including the existing wheel-up disarm regression test
  that asserts scrollTop stays at 420 after a wheel-up + content growth.
2026-06-03 05:54:45 -07:00
ba57ebec33 fix(nix): bump npmDepsHash for refreshed lockfile
Lockfile regeneration invalidated the flake's pinned npm-deps hash.
Hash taken from fetchNpmDeps' authoritative 'got:' line (the
prefetch-npm-deps Diagnose helper reports a different, wrong value
due to a fetcherVersion normalization discrepancy).
2026-06-03 05:50:36 -07:00
b98b645f87 chore: regenerate lockfile + map vladkvlchk for salvaged #36978
- Add @testing-library/dom to apps/desktop devDeps in package-lock.json
  so npm ci validates against the manifest change (contributor left the
  lockfile out of the PR intentionally).
- Removes stale 'peer: true' flags now that dom is an explicit devDep.
- AUTHOR_MAP: prostoandrei9@gmail.com -> vladkvlchk (CI author gate).
2026-06-03 05:50:36 -07:00
f45d7dee7d fix(desktop): add @testing-library/dom as explicit dev dependency
@testing-library/react@16 declares @testing-library/dom as a peerDependency
and re-exports waitFor/fireEvent/screen/within from it. Without dom installed
as a direct dependency, tsc -b fails with TS2305 in every test file that
imports those names — which breaks the apps/desktop build during installer
bootstrap (Hermes Setup → "INSTALL DIDN'T FINISH").
2026-06-03 05:50:36 -07:00
1b302a0474 feat(debug): include desktop.log in hermes debug share / /debug / hermes logs (#38203)
The Electron desktop app writes boot failures, backend spawn output, and
Python tracebacks to HERMES_HOME/logs/desktop.log, but debug-share only
captured agent/errors/gateway — so desktop boot issues never made it into
shared debug reports.

- logs.py: register desktop -> desktop.log (enables 'hermes logs desktop')
- debug.py: capture desktop snapshot, add to summary report, upload full
  desktop.log in 'share', update privacy notice
- gateway /debug inherits the desktop tail via collect_debug_report()
- main.py + docs: help text and log-name table (also adds missing gui row)
- tests: desktop seed in fixture, new report test, three_pastes -> four_pastes
2026-06-03 05:41:35 -07:00
1d90b23982 fix(mcp): banner shows 'disabled' not 'failed' for enabled:false servers (#38204)
get_mcp_status() treated every non-connected server as a failure, so a
server configured with enabled: false rendered as red '— failed' in the
startup banner even though it was intentionally off. Add a 'disabled'
field derived from the enabled flag and render disabled servers dim as
'— disabled' instead.
2026-06-03 05:41:13 -07:00
ef65298103 docs: make the Desktop App remote-backend section self-contained (#38194)
The section explained why the Session token is hidden but punted the actual
setup steps to the web-dashboard page via a link — a bounce for someone on
the Desktop App page trying to connect. Inline the concrete steps instead:
backend command block (mint token -> .env -> hermes dashboard --insecure),
the in-app Remote gateway steps, the env-var override, Tailscale guidance,
and a troubleshooting list. Keep a short pointer to the web-dashboard page
for the same setup from that angle.
2026-06-03 05:27:38 -07:00
50ba36dcab chore: add bbednarski9 to AUTHOR_MAP for #29722 salvage (#38189)
Co-authored-by: kshitijk4poor <kshitijk4poor@users.noreply.github.com>
2026-06-03 05:25:35 -07:00
5fca754ee3 fix(desktop): pass live backend PID to in-app update so its own dashboard is spared
The Python half (#37538) reads HERMES_DESKTOP_CHILD_PID to exclude the
desktop-managed backend from _kill_stale_dashboard_processes, but nothing
set it. applyUpdatesPosixInApp now passes the live backend PID in the
`hermes update` env, completing the #37532 fix end-to-end.
2026-06-03 04:59:49 -07:00
192020992d fix(cli): exclude desktop-managed backend from stale-dashboard kill
Fixes #37532
2026-06-03 04:59:49 -07:00
d833b1eff7 docs: add remote-backend section to the Desktop App page (#38180)
The Desktop App page covered install, settings, and chat but not how to
connect the app to a backend on another machine — the exact thing
@PedjaDrazic asked about. Add a 'Connecting to a remote backend' section
that explains the Session token is the dashboard token Hermes never
surfaces (pin it via HERMES_DASHBOARD_SESSION_TOKEN + run --insecure),
and link to the web-dashboard page for the full backend setup rather than
duplicating it. Add a reciprocal link from the web-dashboard remote section
back to the Desktop App page.
2026-06-03 04:59:04 -07:00
a1264e9967 fix(matrix): make bang-command resolution robust + fix dead skill-command branch
Follow-up to the salvaged contributor commit:

- Underscore→hyphen tolerance now emits a resolvable token. Previously
  the detect set accepted the hyphenated variant but emit returned the
  raw token, so '!set_home' produced '/set_home' which the dispatcher
  could not resolve. Now emits '/set-home'. Aliases are left as-is — the
  gateway dispatcher canonicalizes them itself.
- Fix dead skill-command branch: skill command keys are stored
  slash-prefixed (e.g. '/arxiv') in get_skill_commands(), but the check
  compared the bare token, so '!arxiv' never normalized. Now compares
  the '/candidate' form, making skill aliases (e.g. !gif-search) work.
- Re-run bang normalization after Matrix reply-fallback stripping so a
  quoted reply whose content is a bang command reaches command parity
  with the slash form.
- Replace silent 'except Exception: pass' with logger.debug(exc_info=True).
- Add AUTHOR_MAP entry for @nepenth.

Tests: +5 (underscore-alias, skill-command branch, quoted-reply bang +
slash parity). 162 Matrix tests pass.
2026-06-03 17:19:27 +05:30
0022e94d74 feat(matrix): support bang command aliases 2026-06-03 17:19:27 +05:30
6038bfb66e docs: explain remote-gateway session token for Hermes Desktop (#38144)
The desktop Remote gateway field asks for a session token that Hermes never
surfaces — by default web_server.py mints an ephemeral token per boot and
injects it into the served HTML, so there is nothing in config.yaml, /gateway,
or env to copy. Document that you pin it yourself via
HERMES_DASHBOARD_SESSION_TOKEN, run the backend with --insecure (keeps the
legacy token auth path instead of engaging the OAuth gate), then paste that
value into the desktop app.

- web-dashboard.md: new 'Connecting Hermes Desktop to a remote backend' section
  (backend + desktop steps, --insecure vs OAuth-gate nuance, HERMES_DESKTOP_*
  env override, Tailscale guidance, troubleshooting).
- environment-variables.md: new 'Web Dashboard & Hermes Desktop' env-var table
  (HERMES_DASHBOARD_SESSION_TOKEN, HERMES_DESKTOP_REMOTE_URL/TOKEN, the OAuth
  and public-url vars) — none were previously documented.
2026-06-03 04:16:00 -07:00
047e7cf36f fix(docs): remove remaining stale submodule references missed by #38089 (#38105)
Follow-up to #38089. The merged PR removed --recurse-submodules from the
installer, CI, and getting-started docs, but missed the same stale clause in:
- CONTRIBUTING.md (Prerequisites table)
- website/docs/developer-guide/contributing.md (table + clone command)
- zh-Hans mirror of the developer-guide contributing doc

git-lfs is kept in the Git requirement rows since it's a separate, real
prerequisite. No .gitmodules has existed since the Atropos RL submodule was
removed in #26106.
2026-06-03 03:11:19 -07:00
43fd63b4b5 fix(windows): rip out unused submodule support in installer & docker & docs
we have no submodules anymore, so #37702 was kinda right, but we can just delete it entirely.
2026-06-03 03:01:37 -07:00
64202200a6 chore: remove committed RELEASE_v*.md changelogs from repo root (#37855)
These per-release changelog files are transient working files used only to
feed `gh release create --notes-file` at release time; the GitHub Release
itself permanently stores the published notes. They were never a build
artifact (no package-data glob, no MANIFEST.in include, no CI reference)
and don't belong in the tracked tree.

- Delete all 15 (v0.2.0 through v0.15.1)
- Add RELEASE_v*.md to .gitignore so an accidental `git add -A` can't
  recommit them

The hermes-release skill is updated separately to write the changelog to
/tmp/ for the whole release process and never stage it.
2026-06-03 01:55:59 -07:00
f019a9c491 Merge pull request #37975 from kshitijk4poor/fix/desktop-session-view-bleed
fix(desktop): stop background session messages bleeding into the active transcript
2026-06-03 01:03:50 -07:00
46ea0a184d Merge pull request #37999 from kshitijk4poor/desktop-slash-nav-dom-regression-test
fix(desktop): slash/@ menu keyboard nav — cycle all items + Esc dismiss
2026-06-03 00:51:54 -07:00
49f1b9e4b4 fix(desktop): stop Esc reopening the slash/@ menu; harden keyup guard
Follow-up to #37937. That fix guarded the composer's keyup with
`shouldSkipTriggerRefreshOnKeyUp(key, trigger !== null)`. The `trigger !== null`
check is timing-fragile for Escape: Escape's *keydown* sets `trigger = null`
and closes the menu, but in a real browser the *keyup* fires after a re-render,
so the handler closure sees `trigger === null`, the guard returns false,
`refreshTrigger` runs, re-detects the still-present `/` in the input, and
instantly reopens the menu. (jsdom batches state synchronously so a unit test
could not observe this -- only the running app does.)

Replace the value-based guard with a `triggerKeyConsumedRef` set synchronously
in keydown whenever the open popover consumes a nav/control key
(Arrow/Enter/Tab/Escape). keyup consults and clears that ref, so it is immune
to the keydown->re-render->keyup timing. Applied to both the main composer
(chat/composer/index.tsx) and the message-edit composer
(assistant-ui/thread.tsx).

Removes the now-unused `shouldSkipTriggerRefreshOnKeyUp` helper and its unit
test. The real-DOM regression test now fires keydown+keyup pairs through the
ref-based handlers and asserts Esc closes and stays closed.

Verified by running a production renderer build (Vite v8) under Electron
against a local backend: ArrowDown/ArrowUp cycle the full list and Esc
dismisses the menu without reopening.
2026-06-03 13:15:08 +05:30
c77c470d27 test(desktop): real-DOM regression for slash/@ menu keyboard nav
The existing slash-menu fix (PR #37937) shipped a unit test that drove the
keydown reducer directly. It did not exercise the actual DOM event path —
specifically the keyup-driven `refreshTrigger` that was the root cause — so
it would not have caught a regression in that path.

This adds a faithful @testing-library reproduction that mounts the real
`useLiveCompletionAdapter` plus the index.tsx trigger wiring and fires real
`keyDown` + `keyUp` event pairs on a contentEditable. It asserts:

- ArrowDown cycles through ALL items (0,1,2,3,4,0,1), not just the first two
- Escape closes the menu and keyup does not reopen it

Reverting the fix (always-refresh keyup + unconditional setTriggerActive(0))
makes this test fail with the highlight stuck at the top — confirming it
guards the real bug.
2026-06-03 12:46:14 +05:30
e114b31eda test(dashboard): direct unit coverage for internal WS credential + docstring fix
Follow-up to Ben's PR #37892. Adds a TestInternalCredential block to
test_dashboard_auth_ws_tickets.py exercising the mint-once stability,
multi-use, unminted-rejection, empty-value, wrong-value, reset-and-remint,
and ticket-store-independence branches directly (previously only covered
indirectly via _ws_auth_ok, which left the unminted and empty-value
branches unexercised).

Also corrects the consume_internal_credential docstring: the returned
identity dict is discarded by the current _ws_auth_ok caller (which only
needs the boolean outcome), so the prior 'carry it into its session log'
wording over-promised.
2026-06-02 23:43:27 -07:00
Ben
fd1ec8033d fix(dashboard): authenticate server-spawned PTY child WS with a process-internal credential
The embedded-TUI PTY child attaches to two server-internal WebSockets:
/api/ws (its primary JSON-RPC gateway backend) and /api/pub (the event
sidecar). Both URLs are built server-side in web_server.py and handed to
the child via its environment.

In OAuth-gated mode (auth_required=true, every hosted Fly agent), _ws_auth_ok
unconditionally rejects the legacy ?token=<_SESSION_TOKEN> path — a leaked
session token must not grant WS access once the gate is engaged. But
_build_gateway_ws_url() still only emitted ?token=, with no gated-mode
branch (its sibling _build_sidecar_url had been given a ticket branch; the
gateway-url builder was missed). So the TUI child's /api/ws upgrade was
rejected 4401 -> 'gateway websocket connection failed' -> 'gateway startup
timeout', leaving the embedded chat unusable on every gated deployment.

A single-use 30s browser ticket is the wrong shape for this link: the child
reads its attach URL once at startup and reuses it on every reconnect, and
on a slow cold boot it may not dial within the TTL. (_build_sidecar_url's
own docstring already flagged this fragility.)

Fix: add a process-lifetime, multi-use internal credential to
dashboard_auth.ws_tickets (internal_ws_credential / consume_internal_credential),
minted once per process and NEVER injected into the SPA — it only leaves the
process via a spawned child's env, so browser-side XSS can't read it, and a
leak grants no more than a ticket already does. _ws_auth_ok accepts it via
?internal= in gated mode only. Both _build_gateway_ws_url and
_build_sidecar_url now use it, so the child can reconnect both sockets.

Loopback / --insecure behavior is unchanged (still ?token=).

Needs review: touches _ws_auth_ok + dashboard_auth (core auth surface).
2026-06-02 23:43:27 -07:00
28f1590b7a fix(desktop): stop background session messages bleeding into the active transcript
A still-busy background session (one the user toggled away from) keeps
emitting updateSessionState() heartbeats — stream deltas, and especially
the 'session busy' prompt-rejection errors from auto-drained queued turns.
Each call invoked syncSessionStateToView() unconditionally, staging that
session's messages into the shared $messages view.

flushPendingViewState() guarded against the wrong session reaching the
view, but only one requestAnimationFrame is scheduled per frame and
pendingViewStateRef holds just the latest writer. So within a single
frame a background write could overwrite an already-pending foreground
write, and the stale background transcript (e.g. the red 'session busy'
rows) would render on top of whatever session the user switched to —
appearing to 'bleed' into every session.

Guard at the staging site: a session may only stage into the view when
it is the currently-active session. Background sessions still update
their own cache entry; they just never touch $messages. Pure render
fix, no behavior change to queuing, interrupt, or drain.
2026-06-03 12:09:18 +05:30
ada04573a9 Merge pull request #37948 from kshitijk4poor/fix/desktop-stop-button-interrupt
fix(desktop): make Stop button actually interrupt when a turn is queued
2026-06-02 23:20:30 -07:00
a23728dfcc fix(desktop): make Stop button actually interrupt when a turn is queued
When a follow-up message is queued during a busy turn, the composer
clears and the primary button switches back to the Stop affordance. But
clicking Stop ran interruptAndSendNextQueued(), which cancelled the turn
and *immediately* re-sent the head of the queue. The auto-drain effect
(busy true to false) compounded this: any explicit cancel flipped busy
false and re-fired the queue. The net effect was that Stop appeared to
never interrupt -- the agent kept running on the queued prompt.

Fix:
- Stop button (busy + empty composer) now always performs a pure
  interrupt via onCancel(); it no longer hijacks the queue.
- An explicit interrupt latches userInterruptedRef so the busy to false
  auto-drain skips exactly one drain. Queued turns are preserved and the
  user resumes them deliberately (Cmd/Ctrl+K, Enter, or the per-row
  send-now arrow), matching the documented Esc=cancel / Cmd+K=send-next
  affordances.
- Extracted the settle decision into shouldAutoDrainOnSettle() with unit
  tests covering natural completion vs. explicit interrupt.
2026-06-03 11:46:02 +05:30
9b43ab8de5 Merge pull request #37937 from kshitijk4poor/fix/desktop-slash-menu-keyup-nav
fix(desktop): keep slash/@ completion menu navigable and Esc-dismissable
2026-06-02 22:54:05 -07:00
188e52db91 fix(desktop): keep slash/@ completion menu navigable and Esc-dismissable
The desktop composer's `onKeyUp` handler unconditionally re-ran
`refreshTrigger` on every keyup, including the Arrow/Enter/Tab/Escape keys
the open-trigger `onKeyDown` branch had already fully handled. Because
`refreshTrigger` re-detects the trigger and resets the active index to 0,
this produced two bugs in the `/` (and `@`) completion popover:

- ArrowDown/ArrowUp moved the highlight on keydown, then keyup snapped it
  straight back to the top — so the user could never cycle past the first
  couple of items.
- Escape closed the menu on keydown, then keyup re-detected the still-present
  `/` and immediately reopened it — so Esc appeared to do nothing.

Fix: skip the keyup-driven refresh for the navigation/control keys while a
trigger menu is open (they never edit text, so refreshing is pointless), and
only reset the highlight in `refreshTrigger` when the detected trigger query
actually changed. Applied to both the main composer (chat/composer/index.tsx)
and the message-edit composer (assistant-ui/thread.tsx), which shared the
same bug. New `shouldSkipTriggerRefreshOnKeyUp` helper is unit-tested.
2026-06-03 11:19:07 +05:30
5005b79bc3 Merge pull request #37932 from NousResearch/bb/desktop-remote-flicker
fix(desktop): disable GPU acceleration on remote displays to stop flicker
2026-06-03 00:43:37 -05:00
d0ea4caf7f fix(desktop): don't treat WSLg as a remote display
WSLg renders Linux GUIs locally through a vGPU surface rather than
shipping frames over the wire, so it doesn't show the remote-compositor
flicker — confirmed by a WSL user seeing zero flickering. Drop the WSL
branch from detectRemoteDisplay so WSLg keeps hardware acceleration;
detection now covers only genuinely-remote displays (SSH X11 forwarding,
VNC, RDP). The HERMES_DESKTOP_DISABLE_GPU override still works for anyone
who does hit it.
2026-06-03 00:42:05 -05:00
6a2909fe5a fix(desktop): disable GPU acceleration on remote displays to stop flicker
Users on remote/forwarded displays (SSH X11 forwarding, VNC, RDP, WSLg)
reported the window flickering during scroll/streaming; nobody on native
Windows/macOS ever saw it.

Root cause: the app shipped with Chromium's default GPU hardware
acceleration and no remote-display handling. Over a remote connection the
GPU compositor can't present accelerated layers cleanly across the wire,
so the surface flashes on repaint. Local sessions composite on the GPU
and never hit it.

Detect a remote display before app `ready` (detectRemoteDisplay in
bootstrap-platform.cjs) and fall back to software rendering via
app.disableHardwareAcceleration() + --disable-gpu-compositing. Software
compositing is rock-steady over the wire and the CPU cost is negligible
next to the connection's latency. HERMES_DESKTOP_DISABLE_GPU overrides
detection both ways for VNC/screen-sharing setups we can't sniff or
remote hosts that do have working acceleration.
2026-06-03 00:36:59 -05:00
9272e4019a fix(docker): point TUI launcher at prebuilt bundle via HERMES_TUI_DIR (#37923)
The embedded dashboard Chat tab dies on hosted images with a 502 /
"[session ended]": the PTY child's `hermes --tui` spawn runs a runtime
`npm install` that fails.

Root cause: the root package-lock.json describes the WHOLE npm monorepo
workspace set (root + web + ui-tui + apps/*), but the image only installs
root/web/ui-tui — apps/* (the desktop app) is never `npm install`ed here, and
its deps hoist into the shared root node_modules. So the actualized
node_modules permanently disagrees with the canonical lock,
`_tui_need_npm_install()` returns True on every launch, and the runtime
`npm install` it triggers (a) can never converge against the partial monorepo
and (b) races itself across concurrent /api/pty connections -> ENOTEMPTY ->
the launcher `sys.exit(1)`s, the slow install blows past Fly's WS-upgrade
window -> 502 -> the browser shows "[session ended]".

Fix: set `ENV HERMES_TUI_DIR=/opt/hermes/ui-tui` so `_make_tui_argv` takes the
prebuilt-bundle fast path (`node --expose-gc /opt/hermes/ui-tui/dist/entry.js`)
and never reaches the install check — exactly the nix/packaged-release path
the launcher was designed for. The bundle is already built at Layer 8
(`ui-tui && npm run build`); this just tells the launcher to use it.

Verified on a freshly-built image: HERMES_TUI_DIR is set, the prebuilt
dist/entry.js is present, `_make_tui_argv` resolves to the prebuilt node
invocation (no npm), and `docker run ... --tui` no longer prints
"npm install failed". New regression guard: tests/docker/test_tui_prebuilt_bundle.py.

A separate launcher hardening (make _tui_need_npm_install tolerant of
partial-monorepo installs) is tracked independently; this Docker-side fix
resolves the hosted-chat symptom on its own.

Area: docker (Dockerfile + tests/docker).
2026-06-03 15:30:45 +10:00
feb50eee70 Merge pull request #37908 from NousResearch/bb/desktop-concurrent-session-loss
fix(desktop): keep in-flight new chats from vanishing on refresh
2026-06-03 00:29:13 -05:00
e0a999aa8a fix(desktop): label in-flight new chats with the first message
The send path created the optimistic sidebar row with a null preview, so
a new chat read "Untitled session" until its turn persisted and auto-title
ran. With concurrent new chats now preserved across refreshes, several
"Untitled session" rows could show at once.

Seed the optimistic preview with the user's first message (the branch path
already does this) so each in-flight row is labeled immediately. The
server's own preview/title supersedes it once the turn persists.
2026-06-03 00:25:19 -05:00
55a76ec669 fix(desktop): keep in-flight new chats from vanishing on refresh
Creating several sessions in a row (Ctrl-N, type, send, repeat) and
waiting for one to finish made the other still-running chats disappear
from the sidebar.

Root cause: a new session's first user message isn't flushed to the
SessionDB until its turn is persisted, so the row's message_count stays
0 mid-response. `refreshSessions()` lists with min_messages=1 and then
hard-replaces $sessions. Because every message.complete triggers a
refresh, the moment one session finished, the others (still at
message_count 0) were filtered out of the server page and dropped from
the list.

Fix: merge instead of replace. `mergeWorkingSessions()` preserves any
session that is still in $workingSessionIds but absent from the server
page, so concurrent new chats stay visible until their own turn persists.
Optimistic deletes/archives already remove the row from the previous
list, so a removed session can't be resurrected by the merge.
2026-06-03 00:21:05 -05:00
d9f7e7ac81 fix(docker): seed gateway_state.json from HERMES_GATEWAY_BOOTSTRAP_STATE on first boot (#37896)
On a fresh volume there is no gateway_state.json, so the boot reconciler
(cont-init.d/02-reconcile-profiles) registers the gateway-default s6 slot
but leaves it down — it only auto-starts when the last recorded state was
"running". A freshly-provisioned container therefore comes up with the
gateway down until something starts it (e.g. the dashboard's start button).

Add a generic, first-boot-only env-seed in stage2-hook.sh (which runs
before 02-reconcile-profiles): when HERMES_GATEWAY_BOOTSTRAP_STATE=running
and no gateway_state.json exists yet, seed {"gateway_state":"running"} so
the reconciler brings the supervised slot up on the very first boot.

This mirrors the existing HERMES_AUTH_JSON_BOOTSTRAP pattern: it seeds the
same state file the reconciler already consults, guarded by [ ! -f ] so
persisted runtime state always wins on later boots (a deliberately-stopped
gateway stays stopped across restarts). Only the literal "running" is
honoured (the sole value in the reconciler's _AUTOSTART_STATES).

Generic container contract — no host-specific code. Useful to any
orchestrator that provisions a blank volume and wants the gateway up from
first boot (the supervised gateway/dashboard already work on such hosts;
only the first-boot autostart was missing because the CLI lifecycle
commands can't drive the s6 layer when container self-detection misses).

Adds a shell-level contract test and documents the env var.
2026-06-03 15:11:15 +10:00
e618cbee44 feat(desktop): custom zoom shortcuts at half default step
Replace Electron's built-in zoomIn/zoomOut/resetZoom menu roles with
custom implementations that use a 0.1 zoom-level step instead of
Chromium's default 0.2. This makes Ctrl/Cmd + +/-0 zoom feel more
granular and less jumpy.

Also adds installZoomShortcuts() which intercepts the keyboard shortcuts
via before-input-event. This is necessary on Linux/Windows where the
application menu is set to null, so Chromium's default handler would
otherwise apply the full 0.2 step.
2026-06-03 01:07:44 -04:00
2f0ee66467 Merge pull request #37877 from NousResearch/bb/desktop-sticky-msg-clamp
feat(desktop): clamp sticky human messages to ~2 lines until hover/focus
2026-06-02 23:45:13 -05:00
cbc1d901ba chore: uptick 2026-06-02 23:44:51 -05:00
84eb5f1f89 fix(desktop): restore sticky human clamp transition at 0.75s 2026-06-02 23:44:06 -05:00
e5472da584 fix(desktop): drop sticky human clamp max-height transition 2026-06-02 23:43:52 -05:00