Files
hermes-agent/plugins
Ben Barclay c10ccaaf51 feat(dashboard-auth): rotate dashboard sessions via refresh token (#37247)
* feat(dashboard-auth): rotate dashboard sessions via refresh token

The dashboard auth-code grant now issues a 24h rotating refresh token
(server side: NousResearch/nous-account-service#293). This wires up the
Hermes client half so an expired access token is transparently refreshed
instead of bouncing the user to /login every 15 minutes.

plugins/dashboard_auth/nous:
- refresh_session() now POSTs grant_type=refresh_token to Portal's token
  endpoint and returns a Session carrying the ROTATED refresh token (was
  an unconditional RefreshExpiredError under the old "no RT in V1"
  contract). The RT is sent in BOTH the request body (Portal's schema
  requires it there) and the X-Refresh-Token header (log redaction) —
  verified against the #293 preview deploy: header-only is rejected as
  invalid_request, body is accepted.
- A 400 from Portal (expired / revoked / reuse-detected) maps to
  RefreshExpiredError so the middleware forces a clean re-login; network
  errors map to ProviderError; empty RT fast-fails without a network call.
- complete_login now captures the initial refresh token Portal returns
  (forward-tolerant: empty string if a deploy omits it).
- Extracted the shared token-response handling into
  _token_response_to_session, parameterised on the 400 exception type so
  the auth-code path raises InvalidCodeError and the refresh path raises
  RefreshExpiredError.
- revoke_session stays a best-effort no-op: Portal exposes no public
  token-endpoint revocation grant (revocation is the authenticated
  /sessions UI, keyed by sessionId+userId), so logout is cookie-clearing
  and the 24h session expires on its own. Documented for a future
  revoke grant.

hermes_cli/dashboard_auth/middleware:
- On an expired/invalid access token the gate now attempts refresh via
  the session's RT BEFORE forcing re-login. On success it serves the
  request and re-sets the rotated cookies on the response (mandatory:
  Portal rotates the RT every refresh and reuse-detects, so a stale RT
  cookie would revoke the whole session on the next refresh). On
  RefreshExpiredError (or no RT) it falls through to clear-and-relogin.
- ProviderError during refresh (Portal unreachable) forces a clean
  re-login rather than 500-ing the request.
- Uses the existing REFRESH_SUCCESS / REFRESH_FAILURE audit events.

Validation:
- 176 dashboard-auth unit/integration tests pass.
- Live E2E against the #293 preview deploy: refresh_session(bad rt) ->
  RefreshExpiredError through the real token endpoint; live JWKS fetch +
  RS256 verification rejects a forged token; empty-RT fast-fail. The
  successful happy-path rotation is covered by unit tests (a live run
  needs an interactive browser OAuth round trip + registered agent:*
  client).

Depends on: NousResearch/nous-account-service#293 (server-side RT issuance).

* fix(dashboard-auth): use Portal's x-nous-refresh-token header name

The refresh-token header must match Portal's REFRESH_TOKEN_HEADER exactly
("x-nous-refresh-token"); the initial cut used "X-Refresh-Token", which
Portal silently ignores (harmless since the RT is also in the body, which
is what the schema requires — but the header redaction was a no-op).
Confirmed against the NAS token route + re-validated live against the
#293 preview deploy.

* fix(dashboard-auth): refresh session when access-token cookie has been evicted

The gated middleware bounced users to /login the instant the access-token
cookie was absent, without ever consulting the refresh token:

    at, _rt = read_session_cookies(request)
    if not at:
        return _unauth_response(...)   # bailed here

This made transparent refresh effectively dead for the common case. The
access-token cookie is set with Max-Age = access_token_expires_in (~15 min),
so a real browser EVICTS hermes_session_at the moment the token lapses while
hermes_session_rt persists (30-day Max-Age). From that point the browser
sends only the refresh-token cookie — and the old guard rejected it before
_attempt_refresh could run. The _attempt_refresh path only fired for a
present-but-invalid access token, which never happens in a browser.

Fix: only hard-bounce when NEITHER cookie is present. A request carrying
just the refresh token now skips verification (no AT to verify) and flows
into the existing refresh path, which rotates both cookies and serves the
request transparently. A dead/expired RT still raises RefreshExpiredError
and falls through to clear-and-relogin.

This failure mode escaped the original tests + manual refresh button because
both kept the access-token cookie present; only a real browser evicting the
cookie at Max-Age exposes it. Added 3 regression tests covering: AT-evicted +
RT-present (transparent refresh), no-cookies (still bounces), and RT-only with
a dead RT (clean 401, no 500).
2026-06-02 21:16:41 +10:00
..
2026-06-01 20:13:42 -07:00