Files
hermes-agent/hermes_cli
Teknium 6717914e0a fix(dashboard): explain WHY a chat WS connection was refused (#38743)
* Port from google-gemini/gemini-cli#21541: back up corrupted config.yaml

When config.yaml fails to parse, load_config() silently falls back to
DEFAULT_CONFIG and leaves the broken file on disk. If the user then re-runs
the setup wizard or hermes config set (both rewrite config.yaml), their
broken-but-recoverable overrides are lost for good.

Adapts the policy-file recovery from gemini-cli#21541: on the first parse
warning for a given broken file, snapshot it to config.yaml.corrupt.<ts>.bak
(best-effort, symlink-guarded, size-deduped) and tell the user where it
landed. Unlike Gemini's version we deliberately do NOT reset config.yaml to a
clean state — hermes never silently mutates user config, and leaving it means
a hand-fixed file is re-read on the next load.

Tests: 3 new cases (backup created + content preserved + original untouched;
same-size backup dedup; symlink not copied). E2E verified with isolated
HERMES_HOME and a real tab-indented broken config.

* fix(dashboard): explain WHY a chat WS connection was refused

The embedded-chat PTY WebSocket (/api/pty) collapsed every rejection
into a bare close code: 4401 for any auth failure, 4403 for three
unrelated failures (host mismatch, origin mismatch, peer-IP). Neither
the server log nor the browser said which gate fired or why, so a
"chat won't connect" report was undiagnosable without a repro.

Server (web_server.py):
- _ws_auth_reason / _ws_host_origin_reason / _ws_client_reason return a
  short machine-parseable reason; old bool wrappers kept for callers/tests.
- pty_ws splits the overloaded 4403 into 4401 (auth), 4403 (host/origin),
  4408 (peer not allowed), 4404 (chat disabled), and sends the reason on
  the close frame (clamped to the 123-byte RFC6455 limit).
- Each path logs one line: 'pty auth rejected reason=.. mode=.. cred=.. peer=..'
  / 'pty refused: <reason> ..'. Accepted path logs 'pty accepted peer=..
  mode=.. cred=..' so an audit shows HOW a peer authed, not just that it did.

tui_gateway/ws.py:
- 'ws send/write failed' now logs error_type=<ExcName> so an exception
  whose str() is empty (closed-transport sends) no longer logs 'error='.

web/src/pages/ChatPage.tsx:
- console.warn the real close code + server reason on every close.
- Map 4404/4408 to specific banners; 4401/4403 banners echo the server
  reason; [session ended] prints the close code.

E2E verified all five reject paths + accepted path produce matching
close code, wire reason, and server log line.
2026-06-04 00:36:03 -07:00
..
2026-05-31 17:46:56 -05:00