The Nous dashboard OAuth login rejected any http:// redirect_uri whose host was not localhost/127.0.0.1, surfacing "redirect_uri may only use http:// for localhost/127.0.0.1" on the login screen. This broke self-hosted dashboards reached over plain HTTP — LAN IPs, internal hostnames, and reverse proxies that terminate TLS upstream. The Portal-side check (agent-redirect-uri.ts) is authoritative on which redirect_uris are permitted; this client-side _validate_redirect_uri is only a fast-fail for obvious operator error and should not second-guess valid http:// deployments. Fix: drop the localhost-only branch on the http scheme. Validation now enforces only that the scheme is http(s) and the path ends with /auth/callback. Updated the docstring to explain the relaxed contract, and replaced test_rejects_http_with_non_localhost (which pinned the old behavior) with test_allows_http_with_arbitrary_host covering a Fly hostname, a LAN IP, and an internal hostname.
666 lines
28 KiB
Python
666 lines
28 KiB
Python
"""NousDashboardAuthProvider — Nous Portal OAuth (authorization-code + PKCE).
|
|
|
|
Implements ``nous-account-service/docs/agent-dashboard-oauth-contract.md``
|
|
(PR #180). The plugin auto-loads (bundled, kind=backend) but only registers
|
|
its provider when a client_id is configured — either via ``config.yaml`` or
|
|
via the Portal-injected env var — so loopback / ``--insecure`` operators
|
|
are unaffected.
|
|
|
|
Configuration surfaces (env wins over config.yaml when set non-empty):
|
|
|
|
``config.yaml`` — canonical surface::
|
|
|
|
dashboard:
|
|
oauth:
|
|
client_id: agent:{agent_instance_id} # required
|
|
portal_url: https://portal.example # optional
|
|
|
|
Environment overrides — used by Fly.io's platform-secret injection so
|
|
per-deploy values don't need to bake into ``config.yaml``:
|
|
|
|
HERMES_DASHBOARD_OAUTH_CLIENT_ID — shape ``agent:{agent_instance_id}``
|
|
HERMES_DASHBOARD_PORTAL_URL — defaults to
|
|
``https://portal.nousresearch.com``
|
|
(production Portal). Override only
|
|
for staging (``portal.rewbs.uk``)
|
|
or a custom deployment.
|
|
|
|
Empty env var values are treated as unset so a provisioned-but-not-populated
|
|
Fly secret can't shadow a valid config.yaml entry.
|
|
|
|
Key contract points encoded here:
|
|
|
|
- client_id is per-instance (``agent:{instance_id}``); the suffix is also
|
|
cross-checked against the token's ``agent_instance_id`` claim as
|
|
defense-in-depth.
|
|
- scope is ``agent_dashboard:access`` only (no OIDC scopes).
|
|
- tokens are RS256 JWTs verified against ``/.well-known/jwks.json``;
|
|
JWKS is cached for 5 minutes.
|
|
- the dashboard auth-code grant issues a 24h rotating refresh token
|
|
(Portal NAS PR #293). ``refresh_session`` posts ``grant_type=refresh_token``
|
|
to rotate the access token; ``complete_login`` and ``refresh_session``
|
|
both populate ``Session.refresh_token`` with the (rotating) value the
|
|
middleware persists back to the HttpOnly cookie. On a dead/expired/
|
|
reuse-detected refresh token Portal returns 400 → ``RefreshExpiredError``
|
|
→ middleware redirects to ``/auth/login``.
|
|
- audience claim is the bare ``client_id`` (no ``hermes-cli:`` prefix).
|
|
- tolerant ``oauth_contract_version`` check: missing → warn + proceed;
|
|
present and ``!= 1`` → refuse.
|
|
|
|
The cookie payload returned by ``start_login`` stashes the PKCE
|
|
``code_verifier`` and the OAuth ``state`` parameter for the
|
|
``/auth/callback`` handler to retrieve. The auth-route layer is the owner
|
|
of cookie names; this provider just hands back ``{"code_verifier": …,
|
|
"state": …}`` and the route serializes those into the ``hermes_session_pkce``
|
|
cookie.
|
|
|
|
Refresh-token rotation: Portal rotates the refresh token on every
|
|
successful refresh and runs reuse-detection (replaying a rotated token
|
|
outside Portal's 60s grace revokes the whole session). The host
|
|
middleware therefore MUST persist the rotated ``Session.refresh_token``
|
|
back to the cookie on every refresh.
|
|
|
|
Skip reasons:
|
|
The plugin exposes a module-level ``LAST_SKIP_REASON`` that the gate's
|
|
fail-closed branch reads to surface a useful operator error message
|
|
("Set HERMES_DASHBOARD_OAUTH_CLIENT_ID …") instead of the bare "no
|
|
providers registered" the gate would otherwise emit.
|
|
"""
|
|
|
|
from __future__ import annotations
|
|
|
|
import base64
|
|
import hashlib
|
|
import logging
|
|
import os
|
|
import secrets
|
|
import urllib.parse
|
|
from typing import Any, Dict, Optional
|
|
|
|
import httpx
|
|
|
|
from hermes_cli.dashboard_auth import (
|
|
DashboardAuthProvider,
|
|
InvalidCodeError,
|
|
LoginStart,
|
|
ProviderError,
|
|
RefreshExpiredError,
|
|
Session,
|
|
)
|
|
|
|
logger = logging.getLogger(__name__)
|
|
|
|
|
|
# ---------------------------------------------------------------------------
|
|
# Defaults
|
|
# ---------------------------------------------------------------------------
|
|
|
|
# Production Portal URL. Override via HERMES_DASHBOARD_PORTAL_URL for
|
|
# staging (portal.rewbs.uk) or a custom deployment. Contract docs name
|
|
# this as the production issuer.
|
|
_DEFAULT_PORTAL_URL = "https://portal.nousresearch.com"
|
|
|
|
|
|
# ---------------------------------------------------------------------------
|
|
# Skip-reason channel for operator-friendly error messages
|
|
# ---------------------------------------------------------------------------
|
|
#
|
|
# When the plugin loads but refuses to register (missing / malformed
|
|
# env vars), the auth gate downstream just sees "zero providers" and
|
|
# emits a generic "install a provider" error. That's misleading for the
|
|
# common case where the provider IS installed but mis-configured. The
|
|
# plugin writes the *specific* reason to this module-level slot; the
|
|
# gate reads it back when building its fail-closed SystemExit message.
|
|
#
|
|
# Cleared on every register() call so repeated dashboard starts in the
|
|
# same process (tests, hot-reload) don't leak stale reasons.
|
|
|
|
LAST_SKIP_REASON: str = ""
|
|
|
|
|
|
# ---------------------------------------------------------------------------
|
|
# Contract constants
|
|
# ---------------------------------------------------------------------------
|
|
|
|
# Contract C3: scope name for the dashboard flow.
|
|
_SCOPE = "agent_dashboard:access"
|
|
|
|
# Contract C11: emitted claim should equal 1; tolerant (warn) if missing.
|
|
_EXPECTED_CONTRACT_VERSION = 1
|
|
|
|
# Contract C7: JWKS Cache-Control max-age=300.
|
|
_JWKS_CACHE_SECONDS = 300
|
|
|
|
# httpx timeout for the token endpoint POST.
|
|
_TOKEN_ENDPOINT_TIMEOUT_SEC = 10.0
|
|
|
|
|
|
# ---------------------------------------------------------------------------
|
|
# Helpers
|
|
# ---------------------------------------------------------------------------
|
|
|
|
|
|
def _b64url_no_pad(raw: bytes) -> str:
|
|
"""Base64url-encode without ``=`` padding (RFC 7636 §4)."""
|
|
return base64.urlsafe_b64encode(raw).rstrip(b"=").decode()
|
|
|
|
|
|
# ---------------------------------------------------------------------------
|
|
# Provider
|
|
# ---------------------------------------------------------------------------
|
|
|
|
|
|
class NousDashboardAuthProvider(DashboardAuthProvider):
|
|
"""Nous Portal OAuth via authorization-code + PKCE (S256)."""
|
|
|
|
name = "nous"
|
|
display_name = "Nous Research"
|
|
|
|
def __init__(self, *, client_id: str, portal_url: str) -> None:
|
|
if not client_id.startswith("agent:"):
|
|
# Defense-in-depth. The plugin entry point already filters, but
|
|
# the provider should never be constructible with a malformed id.
|
|
raise ValueError(
|
|
"client_id must match contract shape 'agent:{instance_id}', "
|
|
f"got {client_id!r}"
|
|
)
|
|
self._client_id = client_id
|
|
self._agent_instance_id = client_id[len("agent:") :]
|
|
self._portal_url = portal_url.rstrip("/")
|
|
self._jwks_url = f"{self._portal_url}/.well-known/jwks.json"
|
|
self._authorize_url = f"{self._portal_url}/oauth/authorize"
|
|
self._token_url = f"{self._portal_url}/api/oauth/token"
|
|
# PyJWKClient is lazily imported so plugin discovery doesn't pay the
|
|
# crypto-import cost when the provider isn't activated.
|
|
self._jwks_client: Any = None
|
|
|
|
# ---- public API (DashboardAuthProvider) -------------------------------
|
|
|
|
def start_login(self, *, redirect_uri: str) -> LoginStart:
|
|
self._validate_redirect_uri(redirect_uri)
|
|
|
|
code_verifier = _b64url_no_pad(secrets.token_bytes(64)) # ~86 chars
|
|
code_challenge = _b64url_no_pad(
|
|
hashlib.sha256(code_verifier.encode("ascii")).digest()
|
|
)
|
|
state = _b64url_no_pad(secrets.token_bytes(32))
|
|
|
|
params = {
|
|
"response_type": "code",
|
|
"client_id": self._client_id,
|
|
"redirect_uri": redirect_uri,
|
|
"scope": _SCOPE,
|
|
"state": state,
|
|
"code_challenge": code_challenge,
|
|
"code_challenge_method": "S256",
|
|
}
|
|
redirect_url = f"{self._authorize_url}?{urllib.parse.urlencode(params)}"
|
|
# The auth-route layer expects ``cookie_payload[\"hermes_session_pkce\"]``
|
|
# as a single semicolon-delimited string of ``key=value`` segments,
|
|
# matching the stub provider's shape. The route handler prepends
|
|
# ``provider=`` so the callback knows which plugin to dispatch to.
|
|
cookie_payload = {
|
|
"hermes_session_pkce": f"state={state};verifier={code_verifier}",
|
|
}
|
|
return LoginStart(redirect_url=redirect_url, cookie_payload=cookie_payload)
|
|
|
|
def complete_login(
|
|
self,
|
|
*,
|
|
code: str,
|
|
state: str,
|
|
code_verifier: str,
|
|
redirect_uri: str,
|
|
) -> Session:
|
|
# ``state`` is verified by the auth-route layer before this call
|
|
# (it checks the cookie-stashed state matches the query-param state);
|
|
# we just receive it for symmetry with the protocol. Nous Portal
|
|
# doesn't re-check state at the token endpoint, so we ignore it here.
|
|
_ = state
|
|
|
|
try:
|
|
response = httpx.post(
|
|
self._token_url,
|
|
data={
|
|
"grant_type": "authorization_code",
|
|
"code": code,
|
|
"redirect_uri": redirect_uri,
|
|
"client_id": self._client_id,
|
|
"code_verifier": code_verifier,
|
|
},
|
|
headers={"Accept": "application/json"},
|
|
timeout=_TOKEN_ENDPOINT_TIMEOUT_SEC,
|
|
)
|
|
except httpx.RequestError as exc:
|
|
raise ProviderError(f"Portal token endpoint unreachable: {exc}") from exc
|
|
|
|
# The dashboard auth-code grant now issues a rotating refresh token
|
|
# (24h session, reuse-detected) — Portal NAS PR #293. A 400 here means
|
|
# the code/PKCE/redirect_uri failed, surfaced as InvalidCodeError.
|
|
return self._token_response_to_session(
|
|
response, bad_request_exc=InvalidCodeError
|
|
)
|
|
|
|
def refresh_session(self, *, refresh_token: str) -> Session:
|
|
"""Rotate the access token using the refresh token.
|
|
|
|
Posts ``grant_type=refresh_token`` to Portal's token endpoint. The
|
|
refresh token is sent in the ``X-Refresh-Token`` header (not the body)
|
|
so it never lands in Portal's request-body access logs — mirroring the
|
|
device-flow CLI convention; Portal reconciles header vs. body and
|
|
rejects conflicts.
|
|
|
|
Portal rotates the refresh token on every successful refresh, so the
|
|
returned ``Session.refresh_token`` is a NEW value the caller MUST
|
|
persist (replacing the old cookie). Failing to persist it means the
|
|
next refresh replays a rotated token and — outside Portal's 60s grace
|
|
— trips reuse-detection and revokes the whole session.
|
|
|
|
Raises ``RefreshExpiredError`` on a 400 (expired / revoked / reuse-
|
|
detected), so the middleware clears cookies and forces re-login.
|
|
Raises ``ProviderError`` if Portal is unreachable.
|
|
"""
|
|
if not refresh_token:
|
|
# No RT to present — treat as a dead session so middleware
|
|
# forces a clean re-login rather than emitting a malformed POST.
|
|
raise RefreshExpiredError("no refresh token present in session")
|
|
|
|
try:
|
|
response = httpx.post(
|
|
self._token_url,
|
|
# The refresh token goes in BOTH the body and the
|
|
# ``x-nous-refresh-token`` header. Portal's token endpoint
|
|
# requires ``refresh_token`` in the body (its request schema
|
|
# rejects a header-only request as ``invalid_request``), and
|
|
# additionally reconciles the header against the body — sending
|
|
# both lets Portal keep the value out of body-access-logs while
|
|
# still satisfying the schema. The header name must match
|
|
# Portal's ``REFRESH_TOKEN_HEADER`` exactly (``x-nous-refresh-
|
|
# token``); any other name is silently ignored. (Verified
|
|
# against the NAS #293 preview deploy: header-only → 400
|
|
# invalid_request; body → accepted.)
|
|
data={
|
|
"grant_type": "refresh_token",
|
|
"client_id": self._client_id,
|
|
"refresh_token": refresh_token,
|
|
},
|
|
headers={
|
|
"Accept": "application/json",
|
|
"x-nous-refresh-token": refresh_token,
|
|
},
|
|
timeout=_TOKEN_ENDPOINT_TIMEOUT_SEC,
|
|
)
|
|
except httpx.RequestError as exc:
|
|
raise ProviderError(
|
|
f"Portal token endpoint unreachable: {exc}"
|
|
) from exc
|
|
|
|
# A 400 on refresh means the RT is expired / revoked / reuse-detected;
|
|
# surface as RefreshExpiredError so middleware forces re-login.
|
|
return self._token_response_to_session(
|
|
response, bad_request_exc=RefreshExpiredError
|
|
)
|
|
|
|
def _token_response_to_session(
|
|
self,
|
|
response: httpx.Response,
|
|
*,
|
|
bad_request_exc: type[Exception],
|
|
) -> Session:
|
|
"""Translate a Portal ``/api/oauth/token`` response into a Session.
|
|
|
|
Shared by ``complete_login`` (auth-code grant) and ``refresh_session``
|
|
(refresh grant). ``bad_request_exc`` is the exception type raised on a
|
|
400 — ``InvalidCodeError`` for the auth-code path, ``RefreshExpiredError``
|
|
for the refresh path — so the middleware's distinct handling
|
|
(400-on-callback vs. force-relogin) is preserved.
|
|
"""
|
|
if response.status_code == 400:
|
|
# Contract: invalid_code / invalid_grant / redirect_uri_mismatch
|
|
# (auth-code) and expired / revoked / reuse-detected (refresh) all
|
|
# surface as 400 with an OAuth-shaped JSON error envelope.
|
|
body = self._parse_json_body(response)
|
|
error_code = body.get("error", "invalid_request")
|
|
raise bad_request_exc(f"Portal rejected token request: {error_code}")
|
|
if response.status_code != 200:
|
|
raise ProviderError(
|
|
f"Portal token endpoint returned {response.status_code}: "
|
|
f"{response.text[:200]!r}"
|
|
)
|
|
|
|
payload = self._parse_json_body(response)
|
|
access_token = payload.get("access_token")
|
|
if not access_token or not isinstance(access_token, str):
|
|
raise ProviderError("Portal token response missing access_token")
|
|
|
|
token_type = str(payload.get("token_type", "")).lower()
|
|
if token_type and token_type != "bearer":
|
|
raise ProviderError(f"unexpected token_type={token_type!r}")
|
|
|
|
claims = self._verify_jwt(access_token)
|
|
# The dashboard grant issues a rotating refresh token; capture it so
|
|
# the caller can persist it. Empty string if Portal omitted it (the
|
|
# session then behaves as access-token-only until expiry).
|
|
refresh_token = payload.get("refresh_token") or ""
|
|
if not isinstance(refresh_token, str):
|
|
refresh_token = ""
|
|
return self._session_from_claims(access_token, refresh_token, claims)
|
|
|
|
|
|
def verify_session(self, *, access_token: str) -> Optional[Session]:
|
|
# Contract: returns None on expiry/invalidity (middleware then
|
|
# triggers redirect-to-login since refresh_session can never succeed
|
|
# under V1); raises ProviderError if the IDP is unreachable.
|
|
try:
|
|
claims = self._verify_jwt(access_token)
|
|
except InvalidCodeError:
|
|
# Expired/invalid token — middleware contract is None, not raise.
|
|
return None
|
|
except ProviderError:
|
|
# JWKS unreachable, etc. Bubble up so middleware emits 503.
|
|
raise
|
|
# verify_session has no access to the original refresh_token; pass
|
|
# "" because in contract V1 there is none anyway.
|
|
return self._session_from_claims(access_token, "", claims)
|
|
|
|
def revoke_session(self, *, refresh_token: str) -> None:
|
|
# Portal exposes no public refresh-token revocation grant on its token
|
|
# endpoint (revocation is driven from the authenticated /sessions UI,
|
|
# keyed by sessionId + userId, not by the RT value). So logout is
|
|
# client-side cookie clearing; the server-side refresh session simply
|
|
# expires within its 24h TTL. Best-effort no-op, must not raise.
|
|
#
|
|
# If Portal later adds a token-endpoint revoke grant (e.g.
|
|
# grant_type=... + X-Refresh-Token), implement it here so logout
|
|
# invalidates the RT server-side immediately rather than waiting out
|
|
# the TTL.
|
|
_ = refresh_token
|
|
return None
|
|
|
|
# ---- internals --------------------------------------------------------
|
|
|
|
def _validate_redirect_uri(self, redirect_uri: str) -> None:
|
|
"""Surface obviously-broken redirect_uris before bouncing to Portal.
|
|
|
|
The Portal-side check (``agent-redirect-uri.ts``) is authoritative;
|
|
this is a fast-fail for the common operator-error case. We allow any
|
|
``http://`` host (not just localhost) so self-hosted dashboards reached
|
|
over plain HTTP — LAN IPs, internal hostnames, reverse proxies that
|
|
terminate TLS upstream — are not rejected here; Portal makes the final
|
|
call on which redirect_uris are permitted.
|
|
"""
|
|
parsed = urllib.parse.urlparse(redirect_uri)
|
|
if parsed.scheme not in ("https", "http"):
|
|
raise ProviderError(
|
|
f"redirect_uri must be http(s), got {redirect_uri!r}"
|
|
)
|
|
if not parsed.path or not parsed.path.endswith("/auth/callback"):
|
|
raise ProviderError(
|
|
"redirect_uri path must end with '/auth/callback', "
|
|
f"got {redirect_uri!r}"
|
|
)
|
|
|
|
def _parse_json_body(self, response: httpx.Response) -> Dict[str, Any]:
|
|
ctype = response.headers.get("content-type", "")
|
|
if not ctype.startswith("application/json"):
|
|
return {}
|
|
try:
|
|
body = response.json()
|
|
except ValueError:
|
|
return {}
|
|
return body if isinstance(body, dict) else {}
|
|
|
|
def _get_jwks_client(self) -> Any:
|
|
if self._jwks_client is None:
|
|
from jwt import PyJWKClient # lazy import
|
|
|
|
self._jwks_client = PyJWKClient(
|
|
self._jwks_url,
|
|
cache_keys=True,
|
|
lifespan=_JWKS_CACHE_SECONDS,
|
|
)
|
|
return self._jwks_client
|
|
|
|
def _verify_jwt(self, access_token: str) -> Dict[str, Any]:
|
|
# Lazy import — keeps startup fast for operators who never trigger
|
|
# the gated path.
|
|
import jwt
|
|
|
|
try:
|
|
signing_key = self._get_jwks_client().get_signing_key_from_jwt(
|
|
access_token
|
|
)
|
|
except jwt.PyJWKClientError as exc:
|
|
raise ProviderError(f"JWKS lookup failed: {exc}") from exc
|
|
except Exception as exc: # pragma: no cover - defensive
|
|
raise ProviderError(f"JWKS lookup failed: {exc!r}") from exc
|
|
|
|
try:
|
|
claims = jwt.decode(
|
|
access_token,
|
|
signing_key.key,
|
|
algorithms=["RS256"],
|
|
# Contract C2: aud is the bare client_id.
|
|
audience=self._client_id,
|
|
# Contract: issuer is the Portal base URL.
|
|
issuer=self._portal_url,
|
|
options={"require": ["exp", "iat", "aud", "iss", "sub"]},
|
|
)
|
|
except jwt.ExpiredSignatureError as exc:
|
|
# verify_session() catches this and returns None per protocol.
|
|
raise InvalidCodeError(f"access token expired: {exc}") from exc
|
|
except jwt.InvalidTokenError as exc:
|
|
# Surface the actual claim values that failed verification so
|
|
# operators don't have to dig into the JWT to debug config drift
|
|
# between HERMES_DASHBOARD_PORTAL_URL / HERMES_DASHBOARD_OAUTH_CLIENT_ID
|
|
# and what Portal is actually emitting. Decoding without verification
|
|
# is safe here: we've already failed to verify, and we never trust
|
|
# these values — they're surfaced for diagnostics only.
|
|
details = ""
|
|
try:
|
|
unverified = jwt.decode(
|
|
access_token,
|
|
options={"verify_signature": False, "verify_exp": False},
|
|
)
|
|
details = (
|
|
f" [token iss={unverified.get('iss')!r} "
|
|
f"aud={unverified.get('aud')!r}; "
|
|
f"expected iss={self._portal_url!r} "
|
|
f"aud={self._client_id!r}]"
|
|
)
|
|
except Exception:
|
|
pass
|
|
raise ProviderError(
|
|
f"access token verification failed: {exc}{details}"
|
|
) from exc
|
|
|
|
self._check_agent_instance_id(claims)
|
|
self._check_contract_version(claims)
|
|
return claims
|
|
|
|
def _check_agent_instance_id(self, claims: Dict[str, Any]) -> None:
|
|
"""Contract C9: cross-check agent_instance_id against our config."""
|
|
token_instance_id = claims.get("agent_instance_id")
|
|
if token_instance_id is None:
|
|
# Tolerated — the claim is documented as "should" not "must".
|
|
# Our audience check on the bare client_id already binds the
|
|
# token to this instance; agent_instance_id is defense-in-depth.
|
|
return
|
|
if token_instance_id != self._agent_instance_id:
|
|
raise ProviderError(
|
|
f"agent_instance_id mismatch: token={token_instance_id!r} "
|
|
f"vs configured={self._agent_instance_id!r}"
|
|
)
|
|
|
|
def _check_contract_version(self, claims: Dict[str, Any]) -> None:
|
|
"""Contract C11 — tolerant treatment per OQ-C2."""
|
|
contract_version = claims.get("oauth_contract_version")
|
|
if contract_version is None:
|
|
logger.warning(
|
|
"Nous Portal token missing oauth_contract_version claim "
|
|
"(contract says it should be %d); proceeding anyway.",
|
|
_EXPECTED_CONTRACT_VERSION,
|
|
)
|
|
return
|
|
if contract_version != _EXPECTED_CONTRACT_VERSION:
|
|
raise ProviderError(
|
|
f"unsupported oauth_contract_version={contract_version!r}, "
|
|
f"expected {_EXPECTED_CONTRACT_VERSION}"
|
|
)
|
|
|
|
def _session_from_claims(
|
|
self,
|
|
access_token: str,
|
|
refresh_token: str,
|
|
claims: Dict[str, Any],
|
|
) -> Session:
|
|
# Contract C4: no email / display_name in tokens. AuthWidget will
|
|
# show user_id (truncated). Session fields kept for forward-compat.
|
|
user_id = str(claims.get("sub", ""))
|
|
if not user_id:
|
|
raise ProviderError("token missing 'sub' (user_id) claim")
|
|
return Session(
|
|
user_id=user_id,
|
|
email="",
|
|
display_name="",
|
|
org_id=str(claims.get("org_id") or ""),
|
|
provider=self.name,
|
|
expires_at=int(claims["exp"]),
|
|
access_token=access_token,
|
|
refresh_token=refresh_token,
|
|
)
|
|
|
|
|
|
# ---------------------------------------------------------------------------
|
|
# Plugin entry point
|
|
# ---------------------------------------------------------------------------
|
|
|
|
|
|
def _load_config_oauth_section() -> dict:
|
|
"""Return the ``dashboard.oauth`` block from ``config.yaml`` if it
|
|
exists and is a dict; otherwise an empty dict.
|
|
|
|
Robust to (a) load_config() raising (malformed YAML, IO error,
|
|
config.yaml absent — common in fresh installs), (b) the
|
|
``dashboard`` key being absent or non-dict, and (c) the ``oauth``
|
|
sub-key being present but not a dict (user typo). Each shape falls
|
|
through to ``{}`` so register() can rely on `.get(...)` access.
|
|
"""
|
|
try:
|
|
from hermes_cli.config import cfg_get, load_config
|
|
|
|
cfg = load_config()
|
|
except Exception as exc: # noqa: BLE001 — broad catch is intentional
|
|
logger.debug(
|
|
"dashboard-auth-nous: load_config() raised %s; "
|
|
"falling back to env-only configuration",
|
|
exc,
|
|
)
|
|
return {}
|
|
section = cfg_get(cfg, "dashboard", "oauth", default=None)
|
|
return section if isinstance(section, dict) else {}
|
|
|
|
|
|
def _resolve_client_id() -> str:
|
|
"""Resolve the OAuth client_id with env-overrides-config precedence.
|
|
|
|
Order:
|
|
1. ``HERMES_DASHBOARD_OAUTH_CLIENT_ID`` env var (when non-empty
|
|
after strip — empty values are treated as unset so a
|
|
provisioned-but-not-populated Fly secret can't shadow a valid
|
|
config.yaml entry).
|
|
2. ``dashboard.oauth.client_id`` in ``config.yaml``.
|
|
3. Empty string — signals "no client_id configured" to the caller.
|
|
"""
|
|
env = os.environ.get("HERMES_DASHBOARD_OAUTH_CLIENT_ID", "").strip()
|
|
if env:
|
|
return env
|
|
cfg_value = _load_config_oauth_section().get("client_id", "")
|
|
return str(cfg_value).strip()
|
|
|
|
|
|
def _resolve_portal_url() -> str:
|
|
"""Resolve the Portal URL with env-overrides-config precedence.
|
|
|
|
Order:
|
|
1. ``HERMES_DASHBOARD_PORTAL_URL`` env var (non-empty after strip).
|
|
2. ``dashboard.oauth.portal_url`` in ``config.yaml``.
|
|
3. :data:`_DEFAULT_PORTAL_URL` (production Portal).
|
|
"""
|
|
env = os.environ.get("HERMES_DASHBOARD_PORTAL_URL", "").strip()
|
|
if env:
|
|
return env
|
|
cfg_value = str(
|
|
_load_config_oauth_section().get("portal_url", "")
|
|
).strip()
|
|
return cfg_value or _DEFAULT_PORTAL_URL
|
|
|
|
|
|
def register(ctx) -> None:
|
|
"""Plugin entry — called by the plugin loader at startup.
|
|
|
|
Registers ``NousDashboardAuthProvider`` only when a client_id is
|
|
configured (either via ``HERMES_DASHBOARD_OAUTH_CLIENT_ID`` env var
|
|
or via ``dashboard.oauth.client_id`` in ``config.yaml``). The env
|
|
var wins when set non-empty — Fly.io's platform-secret injection
|
|
pushes the per-deploy value through this path.
|
|
|
|
When skipping, writes a short human-readable reason to the module-
|
|
level :data:`LAST_SKIP_REASON` so the dashboard's fail-closed branch
|
|
can surface "Set HERMES_DASHBOARD_OAUTH_CLIENT_ID …" instead of the
|
|
bare "no providers registered" the gate would otherwise emit. The
|
|
reason mentions BOTH configuration surfaces so operators don't
|
|
guess wrong about which one to populate.
|
|
|
|
Operator-owned dashboards (loopback / ``--insecure``) leave both
|
|
surfaces unset, so this plugin is a no-op for them. The gate-
|
|
engagement layer (``hermes_cli.web_server.should_require_auth`` +
|
|
the fail-closed check in ``start_server``) handles the "public bind
|
|
with zero providers" case independently.
|
|
"""
|
|
global LAST_SKIP_REASON
|
|
LAST_SKIP_REASON = ""
|
|
|
|
client_id = _resolve_client_id()
|
|
portal_url = _resolve_portal_url()
|
|
|
|
if not client_id:
|
|
LAST_SKIP_REASON = (
|
|
"HERMES_DASHBOARD_OAUTH_CLIENT_ID is not set (and "
|
|
"dashboard.oauth.client_id in config.yaml is empty). The "
|
|
"Nous Portal provisions this env var (shape "
|
|
"'agent:{instance_id}') when it deploys a Hermes Agent "
|
|
"instance — set it to your provisioned client id (either "
|
|
"as an env var or under dashboard.oauth.client_id in "
|
|
"config.yaml), or pass --insecure to skip the OAuth gate "
|
|
"entirely."
|
|
)
|
|
logger.debug("dashboard-auth-nous: %s", LAST_SKIP_REASON)
|
|
return
|
|
|
|
if not client_id.startswith("agent:"):
|
|
LAST_SKIP_REASON = (
|
|
f"HERMES_DASHBOARD_OAUTH_CLIENT_ID={client_id!r} doesn't match "
|
|
f"the contract shape 'agent:{{instance_id}}'. The Nous Portal "
|
|
f"provisions this value at deploy time; check your Fly app's "
|
|
f"secrets or override with the value from the Portal admin UI."
|
|
)
|
|
logger.warning("dashboard-auth-nous: %s", LAST_SKIP_REASON)
|
|
return
|
|
|
|
try:
|
|
provider = NousDashboardAuthProvider(
|
|
client_id=client_id, portal_url=portal_url
|
|
)
|
|
except ValueError as exc:
|
|
LAST_SKIP_REASON = f"NousDashboardAuthProvider construction failed: {exc}"
|
|
logger.warning("dashboard-auth-nous: %s", LAST_SKIP_REASON)
|
|
return
|
|
|
|
ctx.register_dashboard_auth_provider(provider)
|
|
logger.info(
|
|
"dashboard-auth-nous: registered provider (client_id=%s, portal=%s)",
|
|
client_id,
|
|
portal_url,
|
|
)
|