Commit Graph

12 Commits

Author SHA1 Message Date
827f251426 perf(observability): gate tool-hook emit on has_hook; slim per-tool footprint
The salvaged observer contract gated the API-request hot path on has_hook()
but left the per-tool emit ungated: every tool call ran result-field
derivation + payload dict build + invoke_hook dispatch even with zero
plugins registered.

- _emit_post_tool_call_hook now short-circuits on has_hook("post_tool_call")
  and derives status/error fields lazily (after the gate, only when a
  listener will consume them). status defaults to None -> derived; explicit
  blocked/cancelled callers still pass status through.
- transform_tool_result emit (pre-existing hook) likewise gated on
  has_hook(); skips _tool_result_observer_fields when no listener.
- Removed the now-redundant _tool_result_observer_fields pre-computation at
  the three ok-path call sites (model_tools, agent_runtime_helpers,
  tool_executor) — the helper derives them, so the no-listener path costs
  one dict lookup and the call sites shrink.
- Tests: stub has_hook=True where payload correctness is asserted; add a
  no-listener regression proving post_tool_call/transform_tool_result emit
  is skipped when nothing is registered.
2026-06-03 06:36:46 -07:00
0d9b7132ff feat(observability): observer-grade telemetry hooks + NeMo-Relay plugin
Adds backend-neutral observer hooks for plugins: session, turn, API
request, tool, approval, and subagent lifecycle events with stable
correlation IDs (session_id, task_id, turn_id, api_request_id,
tool_call_id, parent/child subagent ids). Extends VALID_HOOKS with
api_request_error and subagent_start.

Hot path is zero-cost when no plugin subscribes: has_hook()/presence
checks gate all payload construction, request payloads are returned
by reference when no middleware rewrites, and the sanitized response
payload no longer embeds raw response objects.

Bundles the optional NeMo-Relay observability plugin
(plugins/observability/nemo_relay) as an in-repo consumer of the new
hooks, peer to the existing langfuse plugin. Fails open when the
optional nemo-relay package is not installed.

Authored-by: Bryan Bednarski <bbednarski@nvidia.com>
Salvaged from #29722 onto current main.
2026-06-03 06:36:46 -07:00
66827f8947 chore: prune unused imports and duplicate import redefinitions
Remove unused imports (F401) and duplicate/shadowed import
redefinitions (F811) across the codebase using ruff's safe
autofixes. No behavioral changes -- imports only.

- ~1400 safe autofixes applied across 644 files (net -1072 lines)
- __init__.py re-exports preserved (excluded from F401 removal so
  public re-export surfaces stay intact)
- Re-exports that are imported or monkeypatched by tests but look
  unused in their defining module are kept with explicit # noqa:
  F401 (gateway/run.py load_dotenv; run_agent re-exports from
  agent.message_sanitization, agent.context_compressor,
  agent.retry_utils, agent.prompt_builder, agent.process_bootstrap,
  agent.codex_responses_adapter)
- Unsafe F841 (unused-variable) fixes deliberately skipped -- those
  can change behavior when the RHS has side effects
- ruff lints remain disabled in pyproject.toml (only PLW1514 is
  selected); this is a one-time cleanup, not a config change

Verification:
- python -m compileall: clean
- pytest --collect-only: all 27161 tests collect (zero import errors)
- core entry points import clean (run_agent, model_tools, cli,
  toolsets, hermes_state, batch_runner, gateway)
- static scan: every name any test imports directly from an edited
  module still resolves
2026-05-28 22:26:25 -07:00
5af672c753 chore: remove Atropos RL environments and tinker-atropos integration (#26106)
* chore: remove Atropos RL environments, tools, tests, skill, and tinker-atropos submodule

Delete:
- environments/ (43 files — base env, agent loop, tool call parsers, benchmarks)
- rl_cli.py (standalone RL training CLI)
- tools/rl_training_tool.py (all 10 rl_* tools)
- tests: test_rl_training_tool, test_tool_call_parsers, test_managed_server_tool_support,
  test_agent_loop, test_agent_loop_vllm, test_agent_loop_tool_calling,
  test_terminalbench2_env_security
- optional-skills/mlops/hermes-atropos-environments/
- tinker-atropos git submodule + .gitmodules

* chore: remove RL/Atropos references from Python source

- toolsets.py: remove rl toolset block + update comment
- model_tools.py: remove rl_tools group + update async bridging comment
- hermes_cli/tools_config.py: remove RL display entry, _DEFAULT_OFF_TOOLSETS,
  setup block, and rl_training post-setup handler
- tools/budget_config.py: remove RL environment reference in docstring
- tests/test_model_tools.py: remove rl_tools from expected groups
- tests/run_agent/test_streaming_tool_call_repair.py: fix stale cross-reference

* chore: remove rl/yc-bench extras and tinker-atropos refs from pyproject.toml

- Remove rl extra (atroposlib, tinker, fastapi, uvicorn, wandb)
- Remove yc-bench extra
- Remove rl_cli from py-modules
- Remove [tool.ty.src] exclude for tinker-atropos
- Remove [tool.ruff] exclude for tinker-atropos
- Regenerate uv.lock

* chore: remove tinker-atropos from install/setup scripts

- setup-hermes.sh: remove entire tinker-atropos submodule install block
- scripts/install.sh: remove both tinker-atropos blocks (Termux + standard)
- scripts/install.ps1: remove tinker-atropos block
- nix/hermes-agent.nix: remove tinker-atropos pip install line

* chore: remove RL references from cli-config.yaml.example

* docs: remove Atropos/RL references from README, CONTRIBUTING, AGENTS.md

* docs: remove RL/Atropos references from website

- Delete: environments.md, rl-training.md, mlops-hermes-atropos-environments.md
- sidebars.ts: remove rl-training and environments sidebar entries
- optional-skills-catalog.md: remove hermes-atropos-environments row
- tools-reference.md: remove entire rl toolset section
- toolsets-reference.md: remove rl row + update example
- integrations/index.md: remove RL Training bullet
- architecture.md: remove environments/ from tree + RL section
- contributing.md: remove tinker-atropos setup
- updating.md: remove tinker-atropos install + stale submodule update

* chore: remove remaining RL/Atropos stragglers

- hermes_cli/config.py: remove TINKER_API_KEY + WANDB_API_KEY env var defs
- hermes_cli/doctor.py: remove Submodules check section (tinker-atropos)
- hermes_cli/setup.py: remove RL Training status check
- hermes_cli/status.py: remove Tinker + WandB from API key status display
- agent/display.py: remove both rl_* tool preview/activity blocks
- website/docs: remove RL references from providers.md + env-variables.md
- tests: remove TINKER_API_KEY from conftest, set_config_value, setup_script

* chore: remove RL training section from .env.example
2026-05-15 10:36:38 +05:30
9be3ab1a5b fix(plugins): stop firing pre_tool_call hook twice per tool execution (#17611)
The skip_pre_tool_call_hook flag was added to prevent double-firing of
pre_tool_call when run_agent._invoke_tool pre-checks for a block
directive and then dispatches via handle_function_call. But the
implementation added an else: branch that fired invoke_hook again for
'observers', without noticing that get_pre_tool_call_block_message() in
hermes_cli.plugins already fires invoke_hook('pre_tool_call', ...) as
part of its block-directive poll.

Result: every tool call ran through the run_agent loop fired the hook
twice — reported by community users whose observer / audit plugins
logged each tool invocation twice with identical timestamps.

Fix: delete the else: branch. The single-fire contract is now:
  - skip=False (direct handle_function_call): hook fires once inside
    get_pre_tool_call_block_message().
  - skip=True (run_agent._invoke_tool path): caller fires the hook
    once via get_pre_tool_call_block_message(); handle_function_call
    must not fire it again.

Tightened the existing skip-flag test (renamed to
test_skip_flag_prevents_double_fire) to assert pre_tool_call fires
zero times when skip=True, and added
test_run_agent_pattern_fires_pre_tool_call_exactly_once to lock in
end-to-end that the full block-check + dispatch sequence fires the
hook exactly once.
2026-04-29 12:43:39 -07:00
59b56d445c feat(hooks): add duration_ms to post_tool_call + transform_tool_result (#15429)
Plugin hooks fired after a tool dispatch now receive an integer
duration_ms kwarg measuring how long the tool's registry.dispatch()
call took (time.monotonic() before/after). Inspired by Claude Code
2.1.119 which added the same field to PostToolUse hook inputs.

Wire points:
- model_tools.py: measure dispatch latency, pass duration_ms to
  invoke_hook("post_tool_call", ...) and invoke_hook("transform_tool_result", ...)
- hermes_cli/hooks.py: include duration_ms in the synthetic payload
  used by 'hermes hooks test' and 'hermes hooks doctor' so shell-hook
  authors see the same shape at development time as runtime
- shell hooks (agent/shell_hooks.py): no code change needed;
  _serialize_payload already surfaces non-top-level kwargs under
  payload['extra'], so duration_ms lands at extra.duration_ms for
  shell-hook scripts

Plugin authors can now build latency dashboards, per-tool SLO alerts,
and regression canaries without having to wrap every tool manually.

Test: tests/test_model_tools.py::test_post_tool_call_receives_non_negative_integer_duration_ms
E2E: real PluginManager + dispatch monkey-patched with a 50ms sleep,
hook callback observes duration_ms=50 (int).

Refs: https://code.claude.com/docs/en/changelog (2.1.119, Apr 23 2026)
2026-04-25 22:13:12 -07:00
ef9355455b test: regression coverage for checkpoint dedup and inf/nan coercion
Covers the two bugs salvaged from PR #15161:

- test_batch_runner_checkpoint: TestFinalCheckpointNoDuplicates asserts
  the final aggregated completed_prompts list has no duplicate indices,
  and keeps a sanity anchor test documenting the pre-fix pattern so a
  future refactor that re-introduces it is caught immediately.

- test_model_tools: TestCoerceNumberInfNan asserts _coerce_number
  returns the original string for inf/-inf/nan/Infinity inputs and that
  the result round-trips through strict (allow_nan=False) json.dumps.
2026-04-24 14:32:21 -07:00
04068c5891 feat(plugins): add transform_tool_result hook for generic tool-result rewriting (#12972)
Closes #8933 more fully, extending the per-tool transform_terminal_output
hook from #12929 to a generic seam that fires after every tool dispatch.
Plugins can rewrite any tool's result string (normalize formats, redact
fields, summarize verbose output) without wrapping individual tools.

Changes
- hermes_cli/plugins.py: add "transform_tool_result" to VALID_HOOKS
- model_tools.py: invoke the hook in handle_function_call after
  post_tool_call (which remains observational); first valid str return
  replaces the result; fail-open
- tests/test_transform_tool_result_hook.py: 9 new tests covering no-op,
  None return, non-string return, first-match wins, kwargs, hook
  exception fallback, post_tool_call observation invariant, ordering
  vs post_tool_call, and an end-to-end real-plugin integration
- tests/hermes_cli/test_plugins.py: assert new hook in VALID_HOOKS
- tests/test_model_tools.py: extend the hook-call-sequence assertion
  to include the new hook

Design
- transform_tool_result runs AFTER post_tool_call so observers always
  see the original (untransformed) result. This keeps post_tool_call's
  observational contract.
- transform_terminal_output (from #12929) still runs earlier, inside
  terminal_tool, so plugins can canonicalize BEFORE the 50k truncation
  drops middle content. Both hooks coexist; they target different layers.
2026-04-20 03:48:08 -07:00
eabc0a2f66 feat(plugins): let pre_tool_call hooks block tool execution
Plugins can now return {"action": "block", "message": "reason"} from
their pre_tool_call hook to prevent a tool from executing. The error
message is returned to the model as a tool result so it can adjust.

Covers both execution paths: handle_function_call (model_tools.py) and
agent-level tools (run_agent.py _invoke_tool + sequential/concurrent).
Blocked tools skip all side effects (counter resets, checkpoints,
callbacks, read-loop tracker).

Adds skip_pre_tool_call_hook flag to avoid double-firing the hook when
run_agent.py already checked and then calls handle_function_call.

Salvaged from PR #5385 (gianfrancopiana) and PR #4610 (oredsecurity).
2026-04-13 22:01:49 -07:00
9e820dda37 Add request-scoped plugin lifecycle hooks 2026-04-05 23:31:29 -07:00
5c867fd79f test: strengthen assertions across 3 more test files (batch 2)
test_run_agent.py (2 weak → 0, +13 assertions):
  - Session ID validated against actual YYYYMMDD_HHMMSS_hex format
  - API failure verifies error message propagation
  - Invalid JSON args verifies empty dict fallback + message structure
  - Context compression verifies final_response + completed flag
  - Invalid tool name retry verifies api_calls count
  - Invalid response verifies completed/failed/error structure

test_model_tools.py (3 weak → 0):
  - Unknown tool error includes tool name in message
  - Exception returns dict with 'error' key + non-empty message
  - get_all_tool_names verifies both web_search AND terminal present

test_approval.py (1 weak → 0, assert ratio 1.1 → 2.2):
  - Dangerous commands verify description content (delete, shell, drop, etc.)
  - Safe commands explicitly assert key AND desc are None
  - Pre/post condition checks for state management
2026-03-05 18:46:30 -08:00
ffbdd7fcce test: add unit tests for 8 modules (batch 2)
Cover model_tools, toolset_distributions, context_compressor,
prompt_caching, cronjob_tools, session_search, process_registry,
and cron/scheduler with 127 new test cases.
2026-02-26 13:54:20 +03:00