fix(dashboard-auth): don't abort verify chain on one provider's ProviderError
The gated dashboard verifies a session cookie by trying each registered DashboardAuthProvider's verify_session in turn (the session cookie stores only the access token, not which provider issued it). A provider that doesn't recognise a token returns None; a provider whose IDP/JWKS is unreachable raises ProviderError. The loop used to return HTTP 503 on the FIRST ProviderError, before any later provider got a turn. With multiple providers stacked, that means an unreachable IDP for a session you didn't even use blocks login through a different, reachable provider. Concrete repro: a self-hosted-OIDC session hits the 'nous' provider first (registered earlier); nous tries to reach Nous Portal's JWKS, which is unreachable in a self-hosted deployment, so it raises — and the gate 503s before the 'self-hosted' provider can verify the token. Hit live while testing the new self-hosted OIDC plugin against a local Keycloak. Fix: a ProviderError from one provider is logged and the loop continues to the next. A 503 is returned only if NO provider verified the token AND at least one was unreachable — distinguishing a transient IDP outage (don't force a needless re-login) from a token that's genuinely invalid (fall through to refresh/relogin). Single-provider behaviour is unchanged. Tests: adds an _UnreachableProvider stub and three cases — unreachable provider first must not block a working second; all-unreachable still 503s; reachable-but-unrecognised falls through to 401/relogin (not 503). Mutation-tested: reverting the fix makes the first case fail with the exact 503 bug.
This commit is contained in:
@ -207,6 +207,22 @@ async def gated_auth_middleware(
|
||||
# good refresh token — defeating the whole transparent-refresh feature.
|
||||
session = None
|
||||
if at:
|
||||
# Try every registered provider's verify_session in turn. A provider
|
||||
# that doesn't recognise the token returns None and we move on; the
|
||||
# first provider that returns a Session wins.
|
||||
#
|
||||
# A provider may instead raise ProviderError (its IDP/JWKS is
|
||||
# unreachable, so it can neither confirm nor deny the token). With
|
||||
# multiple providers stacked, that MUST NOT abort the chain — the
|
||||
# token may belong to a *different*, reachable provider. (Concretely:
|
||||
# a self-hosted-OIDC session hits the `nous` provider first, which
|
||||
# tries to reach Nous Portal's JWKS; if that's unreachable it raises,
|
||||
# but the `self-hosted` provider can still verify the token.) So we
|
||||
# remember the unreachable error and keep going. Only if NO provider
|
||||
# verifies the token AND at least one was unreachable do we surface a
|
||||
# 503 — distinguishing "transient IDP outage" (don't force re-login)
|
||||
# from "token genuinely invalid" (fall through to refresh/relogin).
|
||||
unreachable_provider: str | None = None
|
||||
for provider in list_providers():
|
||||
try:
|
||||
session = provider.verify_session(access_token=at)
|
||||
@ -221,12 +237,19 @@ async def gated_auth_middleware(
|
||||
reason="provider_unreachable",
|
||||
ip=_client_ip(request),
|
||||
)
|
||||
return JSONResponse(
|
||||
{"detail": f"Auth provider {provider.name!r} unreachable"},
|
||||
status_code=503,
|
||||
)
|
||||
if unreachable_provider is None:
|
||||
unreachable_provider = provider.name
|
||||
continue
|
||||
if session is not None:
|
||||
break
|
||||
if session is None and unreachable_provider is not None:
|
||||
# No provider could verify the token and at least one couldn't be
|
||||
# reached — treat as a transient outage rather than forcing a
|
||||
# re-login through a (possibly also-unreachable) refresh.
|
||||
return JSONResponse(
|
||||
{"detail": f"Auth provider {unreachable_provider!r} unreachable"},
|
||||
status_code=503,
|
||||
)
|
||||
|
||||
if session is None:
|
||||
# Access token is expired/invalid. Before forcing re-login, try to
|
||||
|
||||
Reference in New Issue
Block a user