hermes-agent

Files

Teknium e5af1dd633 fix(review): tell background reviewer not to capture transient env failures as skills (#23004 )

Closes #6051.

Reported failure mode: agent migrated to WSL2, browser launch failed
because Playwright wasn't installed yet. Background reviewer captured
the failure as a durable skill (`browser-tool-launch-issue`) and the
agent kept refusing the browser tool for weeks after Playwright was
installed and verified working. Negative claims also propagated into
unrelated skills ("browser tools do not work", "cannot use Y from
execute_code").

Root cause: `_SKILL_REVIEW_PROMPT` and `_COMBINED_REVIEW_PROMPT` both
lean hard on "be active, save things, a pass that does nothing is a
missed learning opportunity." Neither distinguished durable knowledge
from transient environment state. The reviewer was doing what it was
told.

Fix at the write site — both prompts now carry a "Do NOT capture"
section calling out:
  • Environment-dependent failures (missing binaries, fresh-install
    errors, post-migration path mismatches, 'command not found',
    unconfigured credentials, uninstalled packages)
  • Negative claims about tools or features ("X does not work")
    that harden into self-cited refusals
  • Session-specific transient errors that resolved before the
    conversation ended
  • One-off task narratives ("summarize today's market", "analyze
    this PR") — also addresses the #12812 / #4538 family

Plus a positive-reframing line: when a tool fails because of setup
state, capture the FIX (install command, config step, env var)
under an existing setup/troubleshooting skill — never "this tool
doesn't work" as a standalone constraint.

Targeted tests: 24/24 passing in tests/run_agent/test_review_prompt_class_first.py
(2 new + all existing review-prompt assertions). Substring-based
checks so future prompt edits don't false-fail.

2026-05-09 22:51:25 -07:00

__init__.py

refactor(tests): re-architect tests + fix CI failures (#5946 )

2026-04-07 17:19:07 -07:00

conftest.py

test: speed up slow tests (backoff + subprocess + IMDS network) (#11797 )

2026-04-17 14:21:22 -07:00

test_413_compression.py

fix(agent): surface preflight compression status

2026-05-04 01:41:51 -07:00

test_860_dedup.py

fix: lazy session creation — defer DB row until first message (#18370 )

2026-05-01 18:39:12 +05:30

test_1630_context_overflow_loop.py

fix(tests): make AIAgent constructor calls self-contained (#11755 )

2026-04-17 12:32:03 -07:00

test_agent_guardrails.py

fix(agent): include name field on every role:tool message for Gemini compatibility (#16478 )

2026-05-04 05:06:33 -07:00

test_agent_loop_tool_calling.py

refactor(tests): re-architect tests + fix CI failures (#5946 )

2026-04-07 17:19:07 -07:00

test_agent_loop_vllm.py

refactor(tests): re-architect tests + fix CI failures (#5946 )

2026-04-07 17:19:07 -07:00

test_agent_loop.py

refactor(tests): re-architect tests + fix CI failures (#5946 )

2026-04-07 17:19:07 -07:00

test_anthropic_error_handling.py

feat(providers): extend request_timeout_seconds to all client paths

2026-04-19 11:23:00 -07:00

test_anthropic_prompt_cache_policy.py

fix(minimax): enable Anthropic prompt caching for MiniMax's own models (#17425 )

2026-04-29 04:56:55 -07:00

test_anthropic_third_party_oauth_guard.py

fix(anthropic): complete third-party Anthropic-compatible provider support (#12846 )

2026-04-19 22:43:09 -07:00

test_anthropic_truncation_continuation.py

refactor: remove _nr_to_assistant_message shim + fix flush_memories guard

2026-04-23 02:30:05 -07:00

test_api_max_retries_config.py

feat(agent): make API retry count configurable via agent.api_max_retries (#14730 )

2026-04-23 13:59:32 -07:00

test_async_httpx_del_neuter.py

fix(copilot): send vision header for Copilot vision requests

2026-04-27 08:35:50 -07:00

test_background_review_summary.py

fix(agent): exclude prior-history tool messages from background review summary

2026-04-24 03:10:19 -07:00

test_background_review_toolset_restriction.py

fix(ci): stabilize main test suite regressions (#17660 )

2026-04-29 23:18:55 -07:00

test_background_review.py

fix(cli): surface self-improvement review summaries from bg thread

2026-04-30 14:07:22 -07:00

test_codex_multimodal_tool_result.py

feat(vision): vision_analyze returns pixels to vision-capable models, not aux text (#22955 )

2026-05-09 21:06:19 -07:00

test_commit_memory_session_context_engine.py

fix(agent): notify context engine on commit_memory_session (#22764 )

2026-05-09 12:28:42 -07:00

test_compress_focus_plugin_fallback.py

refactor(memory): remove flush_memories entirely (#15696 )

2026-04-25 08:21:14 -07:00

test_compression_boundary_hook.py

fix: signal compression boundary to context engine

2026-04-26 19:07:18 -07:00

test_compression_boundary.py

refactor(tests): re-architect tests + fix CI failures (#5946 )

2026-04-07 17:19:07 -07:00

test_compression_feasibility.py

refactor(memory): remove flush_memories entirely (#15696 )

2026-04-25 08:21:14 -07:00

test_compression_persistence.py

fix(tests): make AIAgent constructor calls self-contained (#11755 )

2026-04-17 12:32:03 -07:00

test_compression_trigger_excludes_reasoning.py

fix(compression): exclude completion tokens from compression trigger (#12026 )

2026-04-20 05:12:10 -07:00

test_compressor_fallback_update.py

refactor(tests): re-architect tests + fix CI failures (#5946 )

2026-04-07 17:19:07 -07:00

test_concurrent_interrupt.py

test: remove 50 stale/broken tests to unblock CI (#22098 )

2026-05-08 14:55:40 -07:00

test_context_token_tracking.py

feat(providers): extend request_timeout_seconds to all client paths

2026-04-19 11:23:00 -07:00

test_copilot_native_vision_headers.py

fix(copilot): mark native image requests as vision

2026-04-27 08:35:50 -07:00

test_create_openai_client_kwargs_isolation.py

fix(tests): make AIAgent constructor calls self-contained (#11755 )

2026-04-17 12:32:03 -07:00

test_create_openai_client_proxy_env.py

test(proxy): regression tests for NO_PROXY bypass on keepalive client

2026-04-24 03:04:42 -07:00

test_create_openai_client_reuse.py

fix(tests): make AIAgent constructor calls self-contained (#11755 )

2026-04-17 12:32:03 -07:00

test_deepseek_reasoning_content_echo.py

fix(deepseek): use non-empty reasoning_content placeholder for V4 Pro thinking mode

2026-04-30 23:04:23 -07:00

test_deepseek_v4_thinking_live.py

fix(deepseek): preserve v4 reasoning_content on replay

2026-04-30 11:18:39 -07:00

test_dict_tool_call_args.py

fix(tests): fix 78 CI test failures and remove dead test (#9036 )

2026-04-13 10:50:24 -07:00

test_empty_response_recovery_persistence.py

fix(run_agent): break permanent empty-response loop from orphan tool-tail (#21385 )

2026-05-07 08:35:10 -07:00

test_exit_cleanup_interrupt.py

test: speed up slow tests (backoff + subprocess + IMDS network) (#11797 )

2026-04-17 14:21:22 -07:00

test_fallback_model.py

fix(fallback): resolve api_key_env in fallback chain entries (carve-out of #22665 )

2026-05-09 17:53:56 -07:00

test_image_rejection_fallback.py

fix(computer-use): harden image-rejection fallback + AUTHOR_MAP

2026-05-08 11:07:38 -07:00

test_image_shrink_recovery.py

feat(image-input): native multimodal routing based on model vision capability (#16506 )

2026-04-27 06:27:59 -07:00

test_init_fallback_on_exhausted_pool.py

fix(agent): try fallback providers at init when primary credential pool is exhausted (#17929 )

2026-05-02 02:09:46 -07:00

test_interactive_interrupt.py

refactor(tests): re-architect tests + fix CI failures (#5946 )

2026-04-07 17:19:07 -07:00

test_interrupt_propagation.py

test: stop testing mutable data — convert change-detectors to invariants (#13363 )

2026-04-20 23:20:33 -07:00

test_invalid_context_length_warning.py

fix(tests): resolve CI test failures — pool auto-seeding, stale assertions, mock isolation

2026-04-15 22:05:21 -07:00

test_iteration_budget_race.py

fix(run_agent): acquire lock in IterationBudget.used property

2026-05-04 12:37:28 -07:00

test_jsondecodeerror_retryable.py

fix(agent): retry on json.JSONDecodeError instead of treating it as a local validation error (#15107 )

2026-04-24 05:02:58 -07:00

test_last_reasoning_per_turn.py

test: pin per-turn reasoning extraction semantics

2026-05-05 05:00:05 -07:00

test_long_context_tier_429.py

refactor(tests): re-architect tests + fix CI failures (#5946 )

2026-04-07 17:19:07 -07:00

test_memory_nudge_counter_hydration.py

fix(agent): hydrate memory-nudge counters from conversation_history (#22774 )

2026-05-09 12:48:03 -07:00

test_memory_provider_init.py

fix(memory): keep Honcho provider opt-in

2026-04-18 22:50:55 -07:00

test_memory_sync_interrupted.py

feat(memory): notify providers on mid-process session_id rotation (#17409 )

2026-04-29 04:57:22 -07:00

test_message_sequence_repair.py

fix(run_agent): break permanent empty-response loop from orphan tool-tail (#21385 )

2026-05-07 08:35:10 -07:00

test_openai_client_lifecycle.py

refactor(tests): re-architect tests + fix CI failures (#5946 )

2026-04-07 17:19:07 -07:00

test_percentage_clamp.py

fix: update 6 test files broken by dead code removal

2026-04-10 03:44:43 -07:00

test_plugin_context_engine_init.py

fix(tests): make AIAgent constructor calls self-contained (#11755 )

2026-04-17 12:32:03 -07:00

test_primary_runtime_restore.py

fix(agent): only set rate-limit cooldown when leaving primary; add tests

2026-04-24 05:35:43 -07:00

test_provider_attribution_headers.py

refactor(gmi): move User-Agent to profile.default_headers

2026-05-08 03:22:11 -07:00

test_provider_fallback.py

fix(fallback): skip chain entries matching current provider/model/base_url (#22780 )

2026-05-09 12:48:19 -07:00

test_provider_parity.py

fix(aux): remove hardcoded Codex fallback model, drop Codex from auto chain (#17765 )

2026-04-29 23:23:50 -07:00

test_real_interrupt_subagent.py

fix(tests): fix 78 CI test failures and remove dead test (#9036 )

2026-04-13 10:50:24 -07:00

test_redirect_stdout_issue.py

refactor(tests): re-architect tests + fix CI failures (#5946 )

2026-04-07 17:19:07 -07:00

test_repair_tool_call_arguments.py

fix(run_agent): handle unescaped control chars in tool_call arguments (#15356 )

2026-04-24 15:06:41 -07:00

test_repair_tool_call_name.py

fix(agent): repair CamelCase + _tool suffix tool-call emissions (#15124 )

2026-04-24 05:32:08 -07:00

test_review_prompt_class_first.py

fix(review): tell background reviewer not to capture transient env failures as skills (#23004 )

2026-05-09 22:51:25 -07:00

test_run_agent_codex_responses.py

fix(memory): drop scrub from interim commentary + final response

2026-04-27 12:37:33 -07:00

test_run_agent_multimodal_prologue.py

refactor: unify transport dispatch + collapse normalize shims

2026-04-22 18:34:25 -07:00

test_run_agent.py

fix(agent): extract thinking from content-list blocks for DeepSeek V4 Pro

2026-05-09 13:36:12 -07:00

test_sequential_chats_live.py

test: regression guards for the keepalive/transport bug class (#10933 ) (#11266 )

2026-04-16 16:36:33 -07:00

test_session_meta_filtering.py

refactor(tests): re-architect tests + fix CI failures (#5946 )

2026-04-07 17:19:07 -07:00

test_session_reset_fix.py

refactor(tests): re-architect tests + fix CI failures (#5946 )

2026-04-07 17:19:07 -07:00

test_steer.py

refactor(steer): simplify injection marker to 'User guidance:' prefix (#13340 )

2026-04-20 22:18:49 -07:00

test_stream_drop_logging.py

feat(stream-retry): add upstream + timing diagnostics to drop log (#23005 )

2026-05-09 22:49:35 -07:00

test_stream_interrupt_retry.py

fix: /stop now immediately aborts streaming retry loop

2026-04-25 09:51:39 -07:00

test_streaming_tool_call_repair.py

fix: repair malformed tool call args in streaming assembly before flagging as truncated

2026-04-24 15:03:07 -07:00

test_streaming.py

fix(copilot-acp): disable streaming path for CopilotACPClient

2026-04-28 11:33:07 -07:00

test_strict_api_validation.py

refactor(tests): re-architect tests + fix CI failures (#5946 )

2026-04-07 17:19:07 -07:00

test_strip_reasoning_tags_cli.py

fix(display): strip standalone tool-call XML tags from visible text

2026-04-22 18:12:42 -07:00

test_switch_model_context.py

fix: pass config_context_length to switch_model context compressor

2026-04-10 05:52:45 -07:00

test_switch_model_fallback_prune.py

fix(agent): default missing fallback chain on switch

2026-04-24 05:35:43 -07:00

test_thinking_only_sanitizer.py

fix(agent): drop thinking-only assistant turns before provider call (#16959 )

2026-04-28 03:50:51 -07:00

test_token_persistence_non_cli.py

fix: make session search initialize session db

2026-05-09 14:36:58 -07:00

test_tool_arg_coercion.py

fix(tools): wrap bare scalars in single-element list for array-typed args

2026-05-04 05:00:37 -07:00

test_tool_call_args_sanitizer.py

fix(agent): include name field on every role:tool message for Gemini compatibility (#16478 )

2026-05-04 05:06:33 -07:00

test_tool_call_guardrail_runtime.py

fix(agent): make tool loop guardrails warning-first

2026-04-30 20:43:15 -07:00

test_tool_executor_contextvar_propagation.py

fix(agent): propagate ContextVars to concurrent tool worker threads (#18123 )

2026-04-30 16:26:26 -07:00

test_unicode_ascii_codec.py

fix: always retry on ASCII codec UnicodeEncodeError — don't gate on per-component sanitization

2026-04-15 15:03:28 -07:00

test_vision_aware_preprocessing.py

feat(image-input): native multimodal routing based on model vision capability (#16506 )

2026-04-27 06:27:59 -07:00