fix(file-safety): add sandbox-mirror soft guard for writes to per-task .hermes mirrors (#32213)

#32049 reports that under terminal.backend: docker, write_file / patch calls to authoritative profile state (SOUL.md, memories, etc.) land on the sandbox-local mirror at ``<HERMES_HOME>/profiles/<name>/sandboxes/<backend>/<task>/home/.hermes/...`` — a path the host Hermes process never reads. The tool reports success, the user sees no behavior change, and on disk two divergent copies of SOUL.md (or any other profile file) accumulate. The existing classify_cross_profile_target guard does not catch this: its parts[2] check sees "sandboxes" and returns None, and the path is in-profile from the inner-mirror perspective so even a fixed version would not fire. Add a parallel sandbox-mirror classifier in agent/file_safety: * classify_sandbox_mirror_target() detects the ``…/sandboxes/<backend>/<task>/home/.hermes/…`` shape via path parts. Detection is path-shape only — backend-agnostic, does not require the file to exist, and works regardless of which HERMES_HOME resolves. * get_sandbox_mirror_warning() returns a model-facing warning that names the mirror root and the inner authoritative path the agent likely meant. Wire both detectors through tools/file_tools._check_cross_profile_path so the existing write_file and v4a patch call sites pick up the new guard with no API change. The bypass kwarg (``cross_profile=True``) remains shared between the two guards — same "I know what I'm doing" escape valve after explicit user direction. This is the defense-in-depth piece of the proposal in #32049 ("any …/sandboxes/<backend>/…/home/…hermes/… path as sandbox-mirror"). It catches the host-side speculation case where the agent writes a literal sandbox-mirror path. The inner-container case (where the bind mount strips the ``sandboxes/`` prefix from the agent's path view) is out of scope for this surgical change — that requires either a dispatch-layer host-side check before the container handoff, or the host-side ``profile_state`` / ``soul`` tool the issue also proposes. Soft guard, NOT a security boundary — matches the existing classify_cross_profile_target contract. Co-authored-by: briandevans <252620095+briandevans@users.noreply.github.com> Co-authored-by: Ben Barclay <ben@nousresearch.com>
2026-06-02 02:29:24 +01:00
parent 4f7fe9bcff
commit 162c7856ca
3 changed files with 360 additions and 11 deletions
--- a/tools/file_tools.py
+++ b/tools/file_tools.py
@ -290,20 +290,32 @@ def _check_sensitive_path(filepath: str, task_id: str = "default") -> str | None


 def _check_cross_profile_path(filepath: str, task_id: str = "default") -> str | None:
-    """Return a cross-profile warning string when ``filepath`` lands in
-    another Hermes profile's skills/plugins/cron/memories directory.
+    """Return a soft-guard warning when ``filepath`` lands in another Hermes
+    profile's scoped area or a sandbox-mirror of authoritative profile state.

-    Returns ``None`` when the write is in-scope (same profile) or outside
-    Hermes scope entirely. Soft guard — the agent can override by passing
-    ``cross_profile=True`` to its write tool after explicit user direction.
+    Two detectors run in order:

-    Defense-in-depth, NOT a security boundary — the terminal tool runs
-    as the same OS user and can write any of these paths directly.
-    See ``agent/file_safety.classify_cross_profile_target`` for the
-    detection rules.
+    * cross-profile (#TBD) — writes that hit another profile's
+      ``skills/plugins/cron/memories`` directory.
+    * sandbox-mirror (#32049) — writes that hit the
+      ``…/sandboxes/<backend>/<task>/home/.hermes/…`` mirror created by a
+      non-local terminal backend (Docker, Daytona, etc.), where the host
+      Hermes process never reads the mirror and the authoritative file is
+      left untouched.
+
+    Returns ``None`` when the write is in-scope or outside Hermes scope.
+    Both detectors are soft guards — the agent can override either by
+    passing ``cross_profile=True`` to its write tool after explicit user
+    direction. Defense-in-depth, NOT a security boundary — the terminal
+    tool runs as the same OS user and can write any of these paths
+    directly. See ``agent/file_safety.classify_cross_profile_target`` and
+    ``classify_sandbox_mirror_target`` for the detection rules.
    """
    try:
-        from agent.file_safety import get_cross_profile_warning
+        from agent.file_safety import (
+            get_cross_profile_warning,
+            get_sandbox_mirror_warning,
+        )
    except Exception:
        # Fail open on import error — the existing sensitive-path guard
        # plus the write_denied list still apply.
@ -317,7 +329,10 @@ def _check_cross_profile_path(filepath: str, task_id: str = "default") -> str |
    except (OSError, ValueError):
        resolved = filepath

-    return get_cross_profile_warning(resolved)
+    warning = get_cross_profile_warning(resolved)
+    if warning is not None:
+        return warning
+    return get_sandbox_mirror_warning(resolved)


 def _is_expected_write_exception(exc: Exception) -> bool: