* docs(code-execution): document HERMES_* env narrowing + passthrough workaround
The execute_code sandbox-child env scrub (108397726, #27303) deliberately
dropped the broad HERMES_ prefix passthrough, keeping only an operational
4-var allowlist (HERMES_HOME/PROFILE/CONFIG/ENV). A script that relied on a
non-secret HERMES_* var (HERMES_BASE_URL, HERMES_KANBAN_DB, HERMES_*_WEBHOOK,
or a plugin-defined one) now sees it unset in the child.
Document the behavior change and the two recovery routes (terminal.env_passthrough
in config.yaml, or required_environment_variables in skill frontmatter), plus
the debug log line that surfaces the drop for diagnosis.
* fix: drain thread no longer crashes on fd-less stdout streams
The _wait_for_process drain thread called proc.stdout.fileno()
unconditionally. ProcessHandle implementations whose stdout is not
backed by a real OS fd (iterator-style in-memory streams, mock procs)
raised 'list_iterator' object has no attribute 'fileno' (or
'fileno() returned a non-integer' from select.select), killing the
daemon thread and silently losing all process output.
Resolve the fd defensively at the top of _drain; when stdout has no
usable integer fileno, fall back to draining it as an iterable (the
legacy 'for line in proc.stdout' contract). The real subprocess /
os.pipe-backed select() fast path is unchanged.
297 lines
12 KiB
Markdown
297 lines
12 KiB
Markdown
---
|
|
sidebar_position: 8
|
|
title: "Code Execution"
|
|
description: "Programmatic Python execution with RPC tool access — collapse multi-step workflows into a single turn"
|
|
---
|
|
|
|
# Code Execution (Programmatic Tool Calling)
|
|
|
|
The `execute_code` tool lets the agent write Python scripts that call Hermes tools programmatically, collapsing multi-step workflows into a single LLM turn. The script runs in a child process on the agent host, communicating with Hermes over a Unix domain socket RPC.
|
|
|
|
## How It Works
|
|
|
|
1. The agent writes a Python script using `from hermes_tools import ...`
|
|
2. Hermes generates a `hermes_tools.py` stub module with RPC functions
|
|
3. Hermes opens a Unix domain socket and starts an RPC listener thread
|
|
4. The script runs in a child process — tool calls travel over the socket back to Hermes
|
|
5. Only the script's `print()` output is returned to the LLM; intermediate tool results never enter the context window
|
|
|
|
```python
|
|
# The agent can write scripts like:
|
|
from hermes_tools import web_search, web_extract
|
|
|
|
results = web_search("Python 3.13 features", limit=5)
|
|
for r in results["data"]["web"]:
|
|
content = web_extract([r["url"]])
|
|
# ... filter and process ...
|
|
print(summary)
|
|
```
|
|
|
|
**Available tools inside scripts:** `web_search`, `web_extract`, `read_file`, `write_file`, `search_files`, `patch`, `terminal` (foreground only).
|
|
|
|
## When the Agent Uses This
|
|
|
|
The agent uses `execute_code` when there are:
|
|
|
|
- **3+ tool calls** with processing logic between them
|
|
- Bulk data filtering or conditional branching
|
|
- Loops over results
|
|
|
|
The key benefit: intermediate tool results never enter the context window — only the final `print()` output comes back, dramatically reducing token usage.
|
|
|
|
## Practical Examples
|
|
|
|
### Data Processing Pipeline
|
|
|
|
```python
|
|
from hermes_tools import search_files, read_file
|
|
import json
|
|
|
|
# Find all config files and extract database settings
|
|
matches = search_files("database", path=".", file_glob="*.yaml", limit=20)
|
|
configs = []
|
|
for match in matches.get("matches", []):
|
|
content = read_file(match["path"])
|
|
configs.append({"file": match["path"], "preview": content["content"][:200]})
|
|
|
|
print(json.dumps(configs, indent=2))
|
|
```
|
|
|
|
### Multi-Step Web Research
|
|
|
|
```python
|
|
from hermes_tools import web_search, web_extract
|
|
import json
|
|
|
|
# Search, extract, and summarize in one turn
|
|
results = web_search("Rust async runtime comparison 2025", limit=5)
|
|
summaries = []
|
|
for r in results["data"]["web"]:
|
|
page = web_extract([r["url"]])
|
|
for p in page.get("results", []):
|
|
if p.get("content"):
|
|
summaries.append({
|
|
"title": r["title"],
|
|
"url": r["url"],
|
|
"excerpt": p["content"][:500]
|
|
})
|
|
|
|
print(json.dumps(summaries, indent=2))
|
|
```
|
|
|
|
### Bulk File Refactoring
|
|
|
|
```python
|
|
from hermes_tools import search_files, read_file, patch
|
|
|
|
# Find all Python files using deprecated API and fix them
|
|
matches = search_files("old_api_call", path="src/", file_glob="*.py")
|
|
fixed = 0
|
|
for match in matches.get("matches", []):
|
|
result = patch(
|
|
path=match["path"],
|
|
old_string="old_api_call(",
|
|
new_string="new_api_call(",
|
|
replace_all=True
|
|
)
|
|
if "error" not in str(result):
|
|
fixed += 1
|
|
|
|
print(f"Fixed {fixed} files out of {len(matches.get('matches', []))} matches")
|
|
```
|
|
|
|
### Build and Test Pipeline
|
|
|
|
```python
|
|
from hermes_tools import terminal, read_file
|
|
import json
|
|
|
|
# Run tests, parse results, and report
|
|
result = terminal("cd /project && python -m pytest --tb=short -q 2>&1", timeout=120)
|
|
output = result.get("output", "")
|
|
|
|
# Parse test output
|
|
passed = output.count(" passed")
|
|
failed = output.count(" failed")
|
|
errors = output.count(" error")
|
|
|
|
report = {
|
|
"passed": passed,
|
|
"failed": failed,
|
|
"errors": errors,
|
|
"exit_code": result.get("exit_code", -1),
|
|
"summary": output[-500:] if len(output) > 500 else output
|
|
}
|
|
|
|
print(json.dumps(report, indent=2))
|
|
```
|
|
|
|
## Execution Mode
|
|
|
|
`execute_code` has two execution modes controlled by `code_execution.mode` in `~/.hermes/config.yaml`:
|
|
|
|
| Mode | Working directory | Python interpreter |
|
|
|------|-------------------|--------------------|
|
|
| **`project`** (default) | The session's working directory (same as `terminal()`) | Active `VIRTUAL_ENV` / `CONDA_PREFIX` python, falling back to Hermes's own python |
|
|
| `strict` | A temp staging directory isolated from the user's project | `sys.executable` (Hermes's own python) |
|
|
|
|
**When to leave it on `project`:** you want `import pandas`, `from my_project import foo`, or relative paths like `open(".env")` to work the same way they do in `terminal()`. This is almost always what you want.
|
|
|
|
**When to flip to `strict`:** you need maximum reproducibility — you want the same interpreter every session regardless of which venv the user activated, and you want scripts quarantined from the project tree (no risk of accidentally reading project files through a relative path).
|
|
|
|
```yaml
|
|
# ~/.hermes/config.yaml
|
|
code_execution:
|
|
mode: project # or "strict"
|
|
```
|
|
|
|
Fallback behavior in `project` mode: if `VIRTUAL_ENV` / `CONDA_PREFIX` is unset, broken, or points at a Python older than 3.8, the resolver falls back cleanly to `sys.executable` — it never leaves the agent without a working interpreter.
|
|
|
|
Security-critical invariants are identical across both modes:
|
|
|
|
- environment scrubbing (API keys, tokens, credentials stripped)
|
|
- tool whitelist (scripts cannot call `execute_code` recursively, `delegate_task`, or MCP tools)
|
|
- resource limits (timeout, stdout cap, tool-call cap)
|
|
|
|
Switching mode changes where scripts run and which interpreter runs them, not what credentials they can see or which tools they can call.
|
|
|
|
## Resource Limits
|
|
|
|
| Resource | Limit | Notes |
|
|
|----------|-------|-------|
|
|
| **Timeout** | 5 minutes (300s) | Script is killed with SIGTERM, then SIGKILL after 5s grace |
|
|
| **Stdout** | 50 KB | Output truncated with `[output truncated at 50KB]` notice |
|
|
| **Stderr** | 10 KB | Included in output on non-zero exit for debugging |
|
|
| **Tool calls** | 50 per execution | Error returned when limit reached |
|
|
|
|
All limits are configurable via `config.yaml`:
|
|
|
|
```yaml
|
|
# In ~/.hermes/config.yaml
|
|
code_execution:
|
|
mode: project # project (default) | strict
|
|
timeout: 300 # Max seconds per script (default: 300)
|
|
max_tool_calls: 50 # Max tool calls per execution (default: 50)
|
|
```
|
|
|
|
## How Tool Calls Work Inside Scripts
|
|
|
|
When your script calls a function like `web_search("query")`:
|
|
|
|
1. The call is serialized to JSON and sent over a Unix domain socket to the parent process
|
|
2. The parent dispatches through the standard `handle_function_call` handler
|
|
3. The result is sent back over the socket
|
|
4. The function returns the parsed result
|
|
|
|
This means tool calls inside scripts behave identically to normal tool calls — same rate limits, same error handling, same capabilities. The only restriction is that `terminal()` is foreground-only (no `background` or `pty` parameters).
|
|
|
|
## Error Handling
|
|
|
|
When a script fails, the agent receives structured error information:
|
|
|
|
- **Non-zero exit code**: stderr is included in the output so the agent sees the full traceback
|
|
- **Timeout**: Script is killed and the agent sees `"Script timed out after 300s and was killed."`
|
|
- **Interruption**: If the user sends a new message during execution, the script is terminated and the agent sees `[execution interrupted — user sent a new message]`
|
|
- **Tool call limit**: When the 50-call limit is hit, subsequent tool calls return an error message
|
|
|
|
The response always includes `status` (success/error/timeout/interrupted), `output`, `tool_calls_made`, and `duration_seconds`.
|
|
|
|
## Security
|
|
|
|
:::danger Security Model
|
|
The child process runs with a **minimal environment**. API keys, tokens, and credentials are stripped by default. The script accesses tools exclusively via the RPC channel — it cannot read secrets from environment variables unless explicitly allowed.
|
|
:::
|
|
|
|
Environment variables containing `KEY`, `TOKEN`, `SECRET`, `PASSWORD`, `CREDENTIAL`, `PASSWD`, or `AUTH` in their names are excluded. Only safe system variables (`PATH`, `HOME`, `LANG`, `SHELL`, `PYTHONPATH`, `VIRTUAL_ENV`, etc.) are passed through.
|
|
|
|
### Skill Environment Variable Passthrough
|
|
|
|
When a skill declares `required_environment_variables` in its frontmatter, those variables are **automatically passed through** to both `execute_code` and `terminal` child processes after the skill is loaded. This lets skills use their declared API keys without weakening the security posture for arbitrary code.
|
|
|
|
For non-skill use cases, you can explicitly allowlist variables in `config.yaml`:
|
|
|
|
```yaml
|
|
terminal:
|
|
env_passthrough:
|
|
- MY_CUSTOM_KEY
|
|
- ANOTHER_TOKEN
|
|
```
|
|
|
|
See the [Security guide](/user-guide/security#environment-variable-passthrough) for full details.
|
|
|
|
### `HERMES_*` variables in the child
|
|
|
|
The child process receives only a small, fixed set of operational `HERMES_*`
|
|
variables by exact name:
|
|
|
|
- `HERMES_HOME`
|
|
- `HERMES_PROFILE`
|
|
- `HERMES_CONFIG`
|
|
- `HERMES_ENV`
|
|
|
|
(plus `HERMES_RPC_DIR` / `HERMES_RPC_SOCKET` / `TZ` / `HOME`, which Hermes
|
|
injects explicitly so the RPC channel works).
|
|
|
|
:::note Behavior change
|
|
Earlier versions passed **any** variable whose name began with `HERMES_`
|
|
through to the child. That broad prefix was removed for security hardening: it
|
|
could leak `HERMES_*`-named configuration that doesn't match a secret substring
|
|
(for example `HERMES_BASE_URL`, `HERMES_KANBAN_DB`, or a `HERMES_*_WEBHOOK`
|
|
endpoint) into arbitrary sandboxed code.
|
|
|
|
If an `execute_code` script — or a repo/plugin module it imports at import time
|
|
— relied on a `HERMES_*` variable outside the four operational names above, it
|
|
will now find that variable **unset** in the child. The drop is intentional,
|
|
not a bug.
|
|
:::
|
|
|
|
**Workaround — opt the variable back in explicitly.** Both routes pass the
|
|
variable through `execute_code` *and* `terminal` children, and neither weakens
|
|
the secret-stripping guarantee (Hermes-managed provider credentials can never
|
|
be re-allowed this way):
|
|
|
|
1. **Per-machine, in `config.yaml`** — add the exact variable name to the
|
|
passthrough allowlist:
|
|
|
|
```yaml
|
|
terminal:
|
|
env_passthrough:
|
|
- HERMES_KANBAN_DB
|
|
- HERMES_BASE_URL
|
|
```
|
|
|
|
2. **Per-skill, in the skill's frontmatter** — declare it so it is registered
|
|
automatically whenever that skill is loaded:
|
|
|
|
```yaml
|
|
required_environment_variables:
|
|
- HERMES_KANBAN_DB
|
|
```
|
|
|
|
**Diagnosing it.** When the child drops one or more non-allowlisted `HERMES_*`
|
|
variables, Hermes emits a one-line `debug` log naming them and pointing at the
|
|
`env_passthrough` escape hatch. Run with debug logging (`hermes logs --level
|
|
DEBUG`, or check `~/.hermes/logs/agent.log`) and look for
|
|
`execute_code: dropped N non-allowlisted HERMES_* var(s)` if a script behaves
|
|
as though a `HERMES_*` variable is missing.
|
|
|
|
Hermes always writes the script and the auto-generated `hermes_tools.py` RPC stub into a temp staging directory that is cleaned up after execution. In `strict` mode the script also *runs* there; in `project` mode it runs in the session's working directory (the staging directory stays on `PYTHONPATH` so imports still resolve). The child process runs in its own process group so it can be cleanly killed on timeout or interruption.
|
|
|
|
## execute_code vs terminal
|
|
|
|
| Use Case | execute_code | terminal |
|
|
|----------|-------------|----------|
|
|
| Multi-step workflows with tool calls between | ✅ | ❌ |
|
|
| Simple shell command | ❌ | ✅ |
|
|
| Filtering/processing large tool outputs | ✅ | ❌ |
|
|
| Running a build or test suite | ❌ | ✅ |
|
|
| Looping over search results | ✅ | ❌ |
|
|
| Interactive/background processes | ❌ | ✅ |
|
|
| Needs API keys in environment | ⚠️ Only via [passthrough](/user-guide/security#environment-variable-passthrough) | ✅ (most pass through) |
|
|
|
|
**Rule of thumb:** Use `execute_code` when you need to call Hermes tools programmatically with logic between calls. Use `terminal` for running shell commands, builds, and processes.
|
|
|
|
## Platform Support
|
|
|
|
Code execution requires Unix domain sockets and is available on **Linux and macOS only**. It is automatically disabled on Windows — the agent falls back to regular sequential tool calls.
|