refactor: consolidate symlink-safe atomic replace into shared helper

Extract the islink/realpath guard from the 16743 fix into a single
atomic_replace() helper in utils.py, then migrate every os.replace()
call site in the codebase to use it.

The original PR #16777 correctly identified and fixed the bug, but
only patched 9 of ~24 call sites. The same bug class (managed
deployments that symlink state files silently losing the link on
every write) still existed at auth.json, sessions file, gateway
config, env_loader, webhook subscriptions, debug store, model
catalog, pairing, google OAuth, nous rate guard, and more.

Rather than add another 10+ copies of the same three-line guard,
consolidate into atomic_replace(tmp, target) which:
- resolves symlinks via os.path.realpath before os.replace
- returns the resolved real path so callers can re-apply permissions
- is a drop-in replacement for os.replace at the use sites

Changes:
- utils.py: new atomic_replace() helper + atomic_json_write /
  atomic_yaml_write now call it instead of inlining the guard
- 16 files: all os.replace() call sites migrated to atomic_replace()
  - agent/{google_oauth, nous_rate_guard, shell_hooks}.py
  - cron/jobs.py
  - gateway/{pairing, session, platforms/telegram}.py
  - hermes_cli/{auth, config, debug, env_loader, model_catalog, webhook}.py
  - tools/{memory_tool, skill_manager_tool, skills_sync}.py

Tests: tests/test_atomic_replace_symlinks.py pins the invariant for
atomic_replace + atomic_json_write + atomic_yaml_write, covers plain
files, first-time creates, broken symlinks, and permission preservation.

Refs #16743
Builds on #16777 by @vominh1919.
This commit is contained in:
Teknium
2026-04-28 04:51:38 -07:00
committed by Teknium
parent 3ab97a32d1
commit b61d9b297a
18 changed files with 225 additions and 46 deletions

View File

@ -58,6 +58,30 @@ def _restore_file_mode(path: Path, mode: "int | None") -> None:
pass
def atomic_replace(tmp_path: Union[str, Path], target: Union[str, Path]) -> str:
"""Atomically move *tmp_path* onto *target*, preserving symlinks.
``os.replace(tmp, target)`` atomically swaps ``tmp`` into place at
``target``. When ``target`` is a symlink, the symlink itself is
replaced with a regular file — silently detaching managed deployments
that symlink ``config.yaml`` / ``SOUL.md`` / ``auth.json`` etc. from
``~/.hermes/`` to a git-tracked profile package or dotfiles repo
(GitHub #16743).
This helper resolves the symlink first so ``os.replace`` writes to
the real file in-place while the symlink survives. For non-symlink
and non-existent paths the behavior is identical to a plain
``os.replace`` call.
Returns the resolved real path used for the replace, so callers that
need to re-apply permissions can target it instead of the symlink.
"""
target_str = str(target)
real_path = os.path.realpath(target_str) if os.path.islink(target_str) else target_str
os.replace(str(tmp_path), real_path)
return real_path
def atomic_json_write(
path: Union[str, Path],
data: Any,
@ -99,10 +123,8 @@ def atomic_json_write(
)
f.flush()
os.fsync(f.fileno())
# Resolve symlinks so os.replace writes to the real file instead of
# replacing the symlink with a regular file (GitHub #16743).
real_path = os.path.realpath(path) if os.path.islink(path) else path
os.replace(tmp_path, real_path)
# Preserve symlinks — swap in-place on the real file (GitHub #16743).
real_path = atomic_replace(tmp_path, path)
_restore_file_mode(real_path, original_mode)
except BaseException:
# Intentionally catch BaseException so temp-file cleanup still runs for
@ -153,10 +175,8 @@ def atomic_yaml_write(
f.write(extra_content)
f.flush()
os.fsync(f.fileno())
# Resolve symlinks so os.replace writes to the real file instead of
# replacing the symlink with a regular file (GitHub #16743).
real_path = os.path.realpath(path) if os.path.islink(path) else path
os.replace(tmp_path, real_path)
# Preserve symlinks — swap in-place on the real file (GitHub #16743).
real_path = atomic_replace(tmp_path, path)
_restore_file_mode(real_path, original_mode)
except BaseException:
# Match atomic_json_write: cleanup must also happen for process-level