fix(docker): point TUI launcher at prebuilt bundle via HERMES_TUI_DIR (#37923)

The embedded dashboard Chat tab dies on hosted images with a 502 /
"[session ended]": the PTY child's `hermes --tui` spawn runs a runtime
`npm install` that fails.

Root cause: the root package-lock.json describes the WHOLE npm monorepo
workspace set (root + web + ui-tui + apps/*), but the image only installs
root/web/ui-tui — apps/* (the desktop app) is never `npm install`ed here, and
its deps hoist into the shared root node_modules. So the actualized
node_modules permanently disagrees with the canonical lock,
`_tui_need_npm_install()` returns True on every launch, and the runtime
`npm install` it triggers (a) can never converge against the partial monorepo
and (b) races itself across concurrent /api/pty connections -> ENOTEMPTY ->
the launcher `sys.exit(1)`s, the slow install blows past Fly's WS-upgrade
window -> 502 -> the browser shows "[session ended]".

Fix: set `ENV HERMES_TUI_DIR=/opt/hermes/ui-tui` so `_make_tui_argv` takes the
prebuilt-bundle fast path (`node --expose-gc /opt/hermes/ui-tui/dist/entry.js`)
and never reaches the install check — exactly the nix/packaged-release path
the launcher was designed for. The bundle is already built at Layer 8
(`ui-tui && npm run build`); this just tells the launcher to use it.

Verified on a freshly-built image: HERMES_TUI_DIR is set, the prebuilt
dist/entry.js is present, `_make_tui_argv` resolves to the prebuilt node
invocation (no npm), and `docker run ... --tui` no longer prints
"npm install failed". New regression guard: tests/docker/test_tui_prebuilt_bundle.py.

A separate launcher hardening (make _tui_need_npm_install tolerant of
partial-monorepo installs) is tracked independently; this Docker-side fix
resolves the hosted-chat symptom on its own.

Area: docker (Dockerfile + tests/docker).
This commit is contained in:
Ben Barclay
2026-06-03 15:30:45 +10:00
committed by GitHub
parent feb50eee70
commit 9272e4019a
2 changed files with 96 additions and 0 deletions

View File

@ -243,6 +243,23 @@ COPY --chmod=0755 docker/cont-init.d/02-reconcile-profiles /etc/cont-init.d/02-r
# ---------- Runtime ----------
ENV HERMES_WEB_DIST=/opt/hermes/hermes_cli/web_dist
# Point the TUI launcher at the prebuilt bundle baked at build time (Layer 8:
# `ui-tui && npm run build`). This makes _make_tui_argv take the prebuilt-bundle
# fast path (`node --expose-gc /opt/hermes/ui-tui/dist/entry.js`) and skip the
# _tui_need_npm_install / runtime `npm install` branch entirely — exactly the
# nix/packaged-release path the launcher was designed for.
#
# Why this is required (not just an optimization): the root package-lock.json
# describes the WHOLE monorepo workspace set (root + web + ui-tui + apps/*),
# but the image only installs root/web/ui-tui (apps/* — the desktop app — is
# never `npm install`ed here). So the actualized node_modules permanently
# disagrees with the canonical lock, _tui_need_npm_install() returns True on
# every launch, and the runtime `npm install` it triggers (a) can never
# converge against the partial monorepo and (b) races itself across concurrent
# embedded-chat (/api/pty) connections → ENOTEMPTY → the chat tab dies with a
# 502 / "[session ended]". Pointing at the prebuilt bundle sidesteps the whole
# check. (A separate launcher hardening is tracked independently.)
ENV HERMES_TUI_DIR=/opt/hermes/ui-tui
ENV HERMES_HOME=/opt/data
# `docker exec` privilege-drop shim. When operators run