fix(docker): seed gateway_state.json from HERMES_GATEWAY_BOOTSTRAP_STATE on first boot (#37896)
On a fresh volume there is no gateway_state.json, so the boot reconciler
(cont-init.d/02-reconcile-profiles) registers the gateway-default s6 slot
but leaves it down — it only auto-starts when the last recorded state was
"running". A freshly-provisioned container therefore comes up with the
gateway down until something starts it (e.g. the dashboard's start button).
Add a generic, first-boot-only env-seed in stage2-hook.sh (which runs
before 02-reconcile-profiles): when HERMES_GATEWAY_BOOTSTRAP_STATE=running
and no gateway_state.json exists yet, seed {"gateway_state":"running"} so
the reconciler brings the supervised slot up on the very first boot.
This mirrors the existing HERMES_AUTH_JSON_BOOTSTRAP pattern: it seeds the
same state file the reconciler already consults, guarded by [ ! -f ] so
persisted runtime state always wins on later boots (a deliberately-stopped
gateway stays stopped across restarts). Only the literal "running" is
honoured (the sole value in the reconciler's _AUTOSTART_STATES).
Generic container contract — no host-specific code. Useful to any
orchestrator that provisions a blank volume and wants the gateway up from
first boot (the supervised gateway/dashboard already work on such hosts;
only the first-boot autostart was missing because the CLI lifecycle
commands can't drive the s6 layer when container self-detection misses).
Adds a shell-level contract test and documents the env var.
This commit is contained in:
@ -278,6 +278,38 @@ if [ ! -f "$HERMES_HOME/auth.json" ] && [ -n "${HERMES_AUTH_JSON_BOOTSTRAP:-}" ]
|
||||
chmod 600 "$HERMES_HOME/auth.json"
|
||||
fi
|
||||
|
||||
# gateway_state.json: declare the gateway's INITIAL supervised state on a
|
||||
# fresh volume. Same first-boot-only env-seed pattern as auth.json above.
|
||||
#
|
||||
# On a blank volume there is no gateway_state.json, so the boot reconciler
|
||||
# (cont-init.d/02-reconcile-profiles → container_boot.reconcile_profile_gateways)
|
||||
# registers the gateway-default s6 slot but leaves it DOWN — it only
|
||||
# auto-starts when the last recorded state was "running". That means a
|
||||
# freshly-provisioned container comes up with the gateway down until
|
||||
# someone starts it (e.g. from the dashboard). An orchestrator that
|
||||
# provisions a fresh volume and wants the gateway running from first boot
|
||||
# can set HERMES_GATEWAY_BOOTSTRAP_STATE=running; we seed the state file
|
||||
# here, BEFORE 02-reconcile-profiles runs (cont-init.d scripts run in
|
||||
# lexicographic order), so the reconciler sees prior_state=running and
|
||||
# brings the supervised slot up on the very first boot.
|
||||
#
|
||||
# This is a generic container contract, not specific to any host: it seeds
|
||||
# the SAME gateway_state.json the reconciler already consults, exactly as
|
||||
# HERMES_AUTH_JSON_BOOTSTRAP seeds auth.json. The [ ! -f ] guard is the
|
||||
# load-bearing part — on every subsequent boot the persisted state wins,
|
||||
# so a gateway the operator deliberately stopped stays stopped across
|
||||
# restarts and we never clobber real runtime state.
|
||||
#
|
||||
# Only a literal "running" is honoured (the sole value in the reconciler's
|
||||
# _AUTOSTART_STATES); any other value is ignored so a typo can't write a
|
||||
# bogus state the reconciler would treat as "no prior state" anyway.
|
||||
if [ ! -f "$HERMES_HOME/gateway_state.json" ] && \
|
||||
[ "${HERMES_GATEWAY_BOOTSTRAP_STATE:-}" = "running" ]; then
|
||||
printf '{"gateway_state":"running"}\n' > "$HERMES_HOME/gateway_state.json"
|
||||
chown hermes:hermes "$HERMES_HOME/gateway_state.json" 2>/dev/null || true
|
||||
chmod 644 "$HERMES_HOME/gateway_state.json"
|
||||
fi
|
||||
|
||||
# --- Sync bundled skills ---
|
||||
# Invoke the venv's python by absolute path so we don't need a `sh -c`
|
||||
# wrapper to source the activate script. This is safe because
|
||||
|
||||
Reference in New Issue
Block a user