Resolves the explicit "Known follow-up" left by commit 2f8ceeab9 and the resulting CI failures in tests/docker/test_dashboard.py and tests/docker/test_s6_profile_gateway_integration.py. The product gap --------------- Every hermes runtime operation inside the container runs as the hermes user (UID 10000) via s6-setuidgid. But s6-supervise — spawned by s6-svscan running as PID 1 — creates each service's supervise/ and top-level event/ directories with mode 0700 owned by its effective UID (root). That left every s6-svc / s6-svstat / s6-svwait call from hermes hitting EACCES on the supervise/control FIFO and supervise/status — i.e. the entire S6ServiceManager lifecycle (register, start, stop, unregister) was inert in production. The 2f8ceeab9 commit message called this out and deferred the fix. The audit changes that landed alongside it (defaulting docker_exec to -u hermes) made the integration tests reproduce the bug deterministically; the fix below resolves it. The fix: pre-create the supervise/ skeleton hermes-owned ---------------------------------------------------------- Reading s6's source (src/supervision/s6-supervise.c::trymkdir + control_init), the mkdir and mkfifo calls that build the supervise tree are EEXIST-safe: if the directory or FIFO is already present, s6-supervise reuses it and skips the chown/chmod fix-up that would normally make event/ 03730 root:root. So if we lay the skeleton down with hermes ownership before triggering s6-svscanctl -a, s6-supervise inherits our layout and never touches it. The death_tally / lock / status regular files written later by s6-supervise (still as root) land mode 0644 — world-readable — which is all s6-svstat needs. New module-level helper _seed_supervise_skeleton(svc_dir) in hermes_cli/service_manager.py lays down: svc_dir/event/ hermes:hermes 03730 svc_dir/supervise/ hermes:hermes 0755 svc_dir/supervise/event/ hermes:hermes 03730 svc_dir/supervise/control hermes:hermes 0660 (FIFO) svc_dir/log/event/ hermes:hermes 03730 (if log/ present) svc_dir/log/supervise/ hermes:hermes 0755 svc_dir/log/supervise/event/ hermes:hermes 03730 svc_dir/log/supervise/control hermes:hermes 0660 (FIFO) The log/ branch matters because the logger is a second s6-supervise instance — without it, unregister rmtree races on the logger's root-owned supervise dir even after the parent slot's supervise/ is hermes-owned. The helper is idempotent and swallows PermissionError on chown so it works equally well when called from root (cont-init.d) or hermes (runtime register). Wiring ------ 1. S6ServiceManager.register_profile_gateway calls _seed_supervise_skeleton(tmp_dir) just before publishing the slot via Path.replace. Runtime-registered profile gateways are set up by hermes. 2. container_boot._register_service does the same in the cont-init.d reconciliation path so boot-time-restored profile slots inherit the same layout. 3. New cont-init.d/015-supervise-perms script chowns the supervise/ and event/ trees for STATIC s6-rc services (dashboard, main-hermes). These are spawned by s6-rc before cont-init.d gets to run, so the EEXIST-trick doesn't apply; we chown the already-existing tree instead. s6-supervise keeps using the same files; it never re-asserts ownership on a running service. The script skips s6-overlay internal services (s6rc-*, s6-linux-*) so the supervision tree itself stays root-only. 015- slot is intentional: lex-sorts between 01-hermes-setup and 02-reconcile-profiles in the container's C-locale, so the chown finishes before the reconciler walks the scandir. Unregister teardown reordering ------------------------------ S6ServiceManager.unregister_profile_gateway now fires s6-svscanctl -an BEFORE rmtree (with a 200ms grace), so s6-svscan reaps the supervise child and releases its file handles on supervise/lock + supervise/status before we try to remove the directory. Previously rmtree raced s6-supervise on a set of files inside the supervise dir, and even with the parent supervise/ now hermes-owned, the contained files (death_tally, lock, status, written by root) could still be in use. Dashboard down-state redesign ----------------------------- The original PR #30136 review fix wrote a 'down' marker file into /run/service/dashboard/ via cont-init.d/03-dashboard-toggle. That approach was broken in two ways: (a) /run/service/dashboard is a symlink to a TRANSIENT /run/s6-rc:s6-rc-init:<tmpdir>/ directory while s6-rc is mid-transaction; the touch landed in a soon-to-be-discarded tmp. (b) Even when written to the final /run/s6-rc/servicedirs/ location, the 'down' file is only consulted by s6-supervise at slot startup. s6-rc's user-bundle explicitly transitions 'dashboard' to 'up' on every boot, overriding any down marker. The right fix is the canonical s6 pattern: when HERMES_DASHBOARD is unset, the dashboard run script exits 0 and a companion finish script exits 125. Per s6-supervise(8), exit code 125 from the finish script is the 'permanent failure, do not restart' marker — equivalent to s6-svc -O. The slot reports as 'down' to s6-svstat, matching the reality that no dashboard process is running. When HERMES_DASHBOARD IS truthy, finish exits 0 and restart-on-crash semantics apply. 03-dashboard-toggle is removed (its function is now subsumed by the run/finish pair). Tests ----- Adds four unit tests for _seed_supervise_skeleton covering the produced layout, the log/ subservice case, the skip-when-no-log case, and idempotency. The live-container verification continues to live in tests/docker/test_s6_profile_gateway_integration.py and tests/docker/test_dashboard.py — both now pass against the rebuilt image. References ---------- * Skarnet skaware mailing list 2020-02-02 (Laurent Bercot + Guillermo Diaz Hartusch) on unprivileged s6 tool semantics: http://skarnet.org/lists/skaware/1424.html * just-containers/s6-overlay#130 — same EEXIST-preseed pattern, community-validated 2016 onward * https://skarnet.org/software/s6/servicedir.html — exit-code 125 semantics in finish scripts (cherry picked from commit c41f908ad46043728d884f4b1929435636cf1bcb)
326 lines
13 KiB
Python
326 lines
13 KiB
Python
"""Container-boot reconciliation of per-profile gateway s6 services.
|
|
|
|
Service directories under /run/service/ live on **tmpfs** and are wiped
|
|
on every container restart. Profile directories under
|
|
``$HERMES_HOME/profiles/<name>/`` live on the persistent VOLUME, and
|
|
each one records its gateway's last state in ``gateway_state.json``.
|
|
This module bridges the two: on every container boot, walk the
|
|
persistent profiles, recreate the s6 service slots, and auto-start
|
|
only those whose last recorded state was ``running``.
|
|
|
|
Wired into the image as /etc/cont-init.d/02-reconcile-profiles by the
|
|
Dockerfile (Phase 4 Task 4.0). Runs as root after 01-hermes-setup
|
|
(the stage2 hook) has chowned the volume and seeded $HERMES_HOME, but
|
|
before s6-rc starts user services.
|
|
|
|
Without this module, every ``docker restart`` would silently wipe
|
|
every per-profile gateway, even though the user's profiles still
|
|
exist on disk.
|
|
"""
|
|
from __future__ import annotations
|
|
|
|
import json
|
|
import logging
|
|
import os
|
|
from dataclasses import dataclass
|
|
from pathlib import Path
|
|
from typing import Literal
|
|
|
|
log = logging.getLogger(__name__)
|
|
|
|
# Only this prior state triggers automatic restart. Everything else
|
|
# (startup_failed, starting, stopped, missing) registers the slot in
|
|
# the down state and waits for explicit user action — this avoids the
|
|
# crash-loop where a broken gateway keeps being restarted across
|
|
# `docker restart` cycles.
|
|
_AUTOSTART_STATES = frozenset({"running"})
|
|
|
|
# Stale runtime files we sweep before recreating service slots. These
|
|
# all hold container-namespaced state (PIDs, process tables) that's
|
|
# garbage post-restart — a numerically-equal PID in the new container
|
|
# is a different process. See the Risk Register in the plan.
|
|
_STALE_RUNTIME_FILES = ("gateway.pid", "processes.json")
|
|
|
|
ReconcileActionLabel = Literal["started", "registered", "skipped"]
|
|
|
|
|
|
@dataclass(frozen=True)
|
|
class ReconcileAction:
|
|
"""One profile's outcome from a single reconciliation pass."""
|
|
profile: str
|
|
prior_state: str | None
|
|
action: ReconcileActionLabel
|
|
|
|
|
|
def reconcile_profile_gateways(
|
|
*,
|
|
hermes_home: Path,
|
|
scandir: Path,
|
|
dry_run: bool = False,
|
|
) -> list[ReconcileAction]:
|
|
"""Recreate s6 service registrations for every persistent profile.
|
|
|
|
Always registers a ``gateway-default`` slot for the root profile
|
|
(the implicit profile that lives at the top of ``$HERMES_HOME``,
|
|
not under ``profiles/``). The dispatcher in ``hermes_cli.gateway``
|
|
maps an empty profile suffix to ``gateway-default``, so this slot
|
|
is what ``hermes gateway start`` (no ``-p``) targets. Without it,
|
|
bare ``hermes gateway start`` inside the container would land on
|
|
``s6-svc -u /run/service/gateway-default`` → uncaught
|
|
``CalledProcessError`` → traceback to the user (PR #30136 review).
|
|
|
|
The default slot's prior state is read from
|
|
``$HERMES_HOME/gateway_state.json`` (sibling to the profile root,
|
|
not under ``profiles/``); stale runtime files there are swept the
|
|
same way as for named profiles.
|
|
|
|
Args:
|
|
hermes_home: The container's HERMES_HOME (typically /opt/data).
|
|
Profiles live under ``<hermes_home>/profiles/<name>/``;
|
|
the default profile lives at ``<hermes_home>`` itself.
|
|
scandir: The s6 dynamic scandir (typically /run/service). Service
|
|
directories are created at ``<scandir>/gateway-<profile>/``.
|
|
dry_run: When True, walk and return the action list without
|
|
touching the filesystem. For tests and `--dry-run` debug.
|
|
|
|
Returns:
|
|
One :class:`ReconcileAction` per profile, in this order:
|
|
``default`` first, then named profiles in directory order.
|
|
"""
|
|
actions: list[ReconcileAction] = []
|
|
|
|
# Default profile — always register, even if nothing has ever
|
|
# populated the root profile dir. The slot exists so
|
|
# ``hermes gateway start`` (no ``-p``) has somewhere to land;
|
|
# auto-up only when the prior state was "running" (same rule as
|
|
# named profiles).
|
|
default_prior_state = _read_prior_state(hermes_home)
|
|
default_should_start = default_prior_state in _AUTOSTART_STATES
|
|
if not dry_run:
|
|
_cleanup_stale_runtime_files(hermes_home)
|
|
_register_service(scandir, "default", start=default_should_start)
|
|
actions.append(ReconcileAction(
|
|
profile="default",
|
|
prior_state=default_prior_state,
|
|
action="started" if default_should_start else "registered",
|
|
))
|
|
|
|
profiles_root = hermes_home / "profiles"
|
|
if profiles_root.is_dir():
|
|
for entry in sorted(profiles_root.iterdir()):
|
|
if not entry.is_dir():
|
|
continue
|
|
# SOUL.md is always seeded by `hermes profile create` (config.yaml
|
|
# is not — that comes later via `hermes setup`). Use it as the
|
|
# "real profile" marker so stray dirs (backups, manual mkdir)
|
|
# aren't picked up.
|
|
if not (entry / "SOUL.md").exists():
|
|
continue
|
|
# The "default" service name is reserved for the root
|
|
# profile (above) — if a user has somehow created a
|
|
# ``profiles/default/`` directory, skip it to avoid the
|
|
# slot collision. Their gateway would still be reachable
|
|
# via ``hermes -p default-named gateway start`` if they
|
|
# rename the directory; we don't try to disambiguate here.
|
|
if entry.name == "default":
|
|
log.warning(
|
|
"profiles/default/ exists — skipping to avoid colliding "
|
|
"with the reserved root-profile s6 slot",
|
|
)
|
|
continue
|
|
|
|
prior_state = _read_prior_state(entry)
|
|
should_start = prior_state in _AUTOSTART_STATES
|
|
|
|
if not dry_run:
|
|
_cleanup_stale_runtime_files(entry)
|
|
_register_service(scandir, entry.name, start=should_start)
|
|
|
|
actions.append(ReconcileAction(
|
|
profile=entry.name,
|
|
prior_state=prior_state,
|
|
action="started" if should_start else "registered",
|
|
))
|
|
|
|
if not dry_run:
|
|
_write_reconcile_log(hermes_home, actions)
|
|
return actions
|
|
|
|
|
|
def _read_prior_state(profile_dir: Path) -> str | None:
|
|
"""Read gateway_state.json's ``gateway_state`` field, or None if
|
|
missing or unparseable. Unparseable counts as "no prior state" so
|
|
we don't bork the whole reconciliation on a corrupt file."""
|
|
state_file = profile_dir / "gateway_state.json"
|
|
if not state_file.exists():
|
|
return None
|
|
try:
|
|
return json.loads(state_file.read_text()).get("gateway_state")
|
|
except (OSError, json.JSONDecodeError):
|
|
log.warning(
|
|
"could not read %s; treating as no prior state", state_file,
|
|
)
|
|
return None
|
|
|
|
|
|
def _cleanup_stale_runtime_files(profile_dir: Path) -> None:
|
|
"""Remove gateway.pid and processes.json — they reference PIDs in
|
|
the dead container's process namespace and would otherwise confuse
|
|
the newly-started gateway's process-mismatch checks."""
|
|
for name in _STALE_RUNTIME_FILES:
|
|
(profile_dir / name).unlink(missing_ok=True)
|
|
|
|
|
|
def _register_service(scandir: Path, profile: str, *, start: bool) -> None:
|
|
"""Recreate the s6 service slot for one profile.
|
|
|
|
Mirrors the rendering in :func:`S6ServiceManager.register_profile_gateway`,
|
|
but here we control the start state directly via the ``down`` marker
|
|
file (s6-svscan honors it on rescan). Cannot use the manager
|
|
directly because the cont-init.d phase runs as root before
|
|
s6-svscan starts scanning the dynamic scandir — the manager's
|
|
``s6-svscanctl -a`` call would fail with no control socket.
|
|
|
|
Atomicity: build the new layout in a sibling temp directory and
|
|
rename it into place via :meth:`Path.replace`. This matches
|
|
:meth:`S6ServiceManager.register_profile_gateway` (PR #30136
|
|
review item O4) — even though cont-init.d runs before s6-svscan
|
|
starts scanning, an atomic publication keeps the contract uniform
|
|
between the two registration paths and protects against a
|
|
half-populated dir if the script is interrupted mid-write.
|
|
"""
|
|
import shutil
|
|
|
|
from hermes_cli.service_manager import (
|
|
S6ServiceManager,
|
|
_seed_supervise_skeleton,
|
|
validate_profile_name,
|
|
)
|
|
|
|
validate_profile_name(profile)
|
|
service_dir = scandir / f"gateway-{profile}"
|
|
tmp_dir = service_dir.with_name(service_dir.name + ".tmp")
|
|
|
|
# Wipe any leftover tmp from a previous interrupted run.
|
|
if tmp_dir.exists():
|
|
shutil.rmtree(tmp_dir, ignore_errors=True)
|
|
tmp_dir.mkdir(parents=True)
|
|
|
|
try:
|
|
(tmp_dir / "type").write_text("longrun\n")
|
|
|
|
# Reuse the manager's run-script rendering — single source of
|
|
# truth so register_profile_gateway and reconcile_profile_gateways
|
|
# stay consistent. extra_env is empty here; users who need
|
|
# per-profile env can set it via the profile's config.yaml
|
|
# (which the gateway itself loads).
|
|
run = tmp_dir / "run"
|
|
run.write_text(S6ServiceManager._render_run_script(profile, extra_env={}))
|
|
run.chmod(0o755)
|
|
|
|
# Persistent log rotation (OQ8-C).
|
|
log_subdir = tmp_dir / "log"
|
|
log_subdir.mkdir()
|
|
log_run = log_subdir / "run"
|
|
log_run.write_text(S6ServiceManager._render_log_run(profile))
|
|
log_run.chmod(0o755)
|
|
|
|
# The presence of a `down` file tells s6-supervise to NOT
|
|
# start the service when s6-svscan picks it up. User brings
|
|
# it up explicitly with `hermes -p <profile> gateway start`
|
|
# (which routes through the Phase 4
|
|
# _dispatch_via_service_manager_if_s6 helper to `s6-svc -u`).
|
|
if not start:
|
|
(tmp_dir / "down").touch()
|
|
|
|
# Pre-create the supervise/ skeleton with hermes ownership
|
|
# BEFORE we publish the slot. Mirrors the same pre-creation
|
|
# step in S6ServiceManager.register_profile_gateway — when
|
|
# s6-svscan picks the published slot up, the s6-supervise it
|
|
# spawns will EEXIST our dirs/FIFOs and inherit hermes
|
|
# ownership, so runtime s6-svc / s6-svstat / s6-svwait calls
|
|
# (all dispatched as the hermes user) won't hit EACCES. See
|
|
# ``_seed_supervise_skeleton`` in service_manager.py for the
|
|
# full rationale.
|
|
_seed_supervise_skeleton(tmp_dir)
|
|
|
|
# Publish atomically. Path.replace handles the existing-target
|
|
# case the same way os.rename does on POSIX: the target is
|
|
# silently replaced, so a previous reconcile pass's slot is
|
|
# cleanly overwritten in one operation.
|
|
if service_dir.exists():
|
|
shutil.rmtree(service_dir)
|
|
tmp_dir.replace(service_dir)
|
|
except Exception:
|
|
shutil.rmtree(tmp_dir, ignore_errors=True)
|
|
raise
|
|
|
|
|
|
def _write_reconcile_log(
|
|
hermes_home: Path, actions: list[ReconcileAction],
|
|
) -> None:
|
|
"""Append one line per profile to $HERMES_HOME/logs/container-boot.log.
|
|
|
|
Operators inspect this to debug "why didn't my profile come back
|
|
up". Keeping a separate log file (vs. mixing into agent.log) lets
|
|
troubleshooters grep for "profile=foo" without wading through
|
|
unrelated activity.
|
|
|
|
Size-bounded: when the file exceeds ``_LOG_ROTATE_BYTES``
|
|
(defaults to 256 KiB ≈ 3000 reconcile lines), the current file
|
|
is renamed to ``container-boot.log.1`` (replacing any previous
|
|
rotation) before the new entries are appended. This gives long-
|
|
lived containers a soft cap of ~512 KiB across the two files
|
|
without pulling in logrotate or s6-log machinery just for this
|
|
one append-only file (PR #30136 review item O3).
|
|
"""
|
|
import time
|
|
log_dir = hermes_home / "logs"
|
|
log_dir.mkdir(parents=True, exist_ok=True)
|
|
log_path = log_dir / "container-boot.log"
|
|
|
|
# Rotate before opening to append, so the new entries always land
|
|
# in a fresh file when we crossed the threshold last time.
|
|
try:
|
|
if log_path.exists() and log_path.stat().st_size >= _LOG_ROTATE_BYTES:
|
|
log_path.replace(log_dir / "container-boot.log.1")
|
|
except OSError as exc:
|
|
# Rotation failure is non-fatal — keep appending to the
|
|
# existing file rather than losing the entry entirely.
|
|
log.warning("could not rotate %s: %s", log_path, exc)
|
|
|
|
ts = time.strftime("%Y-%m-%dT%H:%M:%S%z")
|
|
with log_path.open("a", encoding="utf-8") as f:
|
|
for a in actions:
|
|
f.write(
|
|
f"{ts} profile={a.profile} prior_state={a.prior_state} "
|
|
f"action={a.action}\n"
|
|
)
|
|
|
|
|
|
# 256 KiB soft cap on container-boot.log; rotated to .1 when crossed.
|
|
# At ~80 B per reconcile-action line this is ~3000 lines, or about a
|
|
# year of daily reboots on a 5-profile container. Two files = ~512 KiB
|
|
# worst case. Tuned for visibility (small enough to grep / cat without
|
|
# scrolling forever) more than space (the persistent volume has GB).
|
|
_LOG_ROTATE_BYTES = 256 * 1024
|
|
|
|
|
|
def main() -> int:
|
|
"""Entry point invoked from /etc/cont-init.d/02-reconcile-profiles."""
|
|
hermes_home = Path(os.environ.get("HERMES_HOME", "/opt/data"))
|
|
scandir = Path(os.environ.get("S6_PROFILE_GATEWAY_SCANDIR", "/run/service"))
|
|
actions = reconcile_profile_gateways(
|
|
hermes_home=hermes_home, scandir=scandir,
|
|
)
|
|
for a in actions:
|
|
print(
|
|
f"reconcile: profile={a.profile} "
|
|
f"prior_state={a.prior_state} action={a.action}"
|
|
)
|
|
return 0
|
|
|
|
|
|
if __name__ == "__main__":
|
|
raise SystemExit(main())
|