hermes-agent

Files

xxxigm b5ea6a5c80 test(xai-oauth): regression coverage for the bad-credentials disambiguator (#29344 )

Eleven new tests pinning the #29344 fix.  Layout mirrors the existing
"Fix D" entitlement section so the bad-credentials disambiguator
sits alongside the entitlement-block tests it complements.

Classifier-level coverage:

* ``test_is_entitlement_failure_false_for_bad_credentials_wke_suffix``
  — verbatim shape from the reporter's wire capture
  (``{code: 'caller does not have permission', error: 'OAuth2 access
  token could not be validated. [WKE=unauthenticated:bad-credentials]'}``)
  ↦ classifier must return False so the refresh path runs.
* ``test_is_entitlement_failure_false_for_wke_suffix_in_normalized_shape``
  — same body after ``_extract_api_error_context`` has rewritten it
  to ``{reason, message}``.  The disambiguator must fire in BOTH
  shapes; without this guard the production call site at
  ``_recover_with_credential_pool`` (which goes through the
  normalised extractor) would still misclassify.
* ``test_is_entitlement_failure_false_for_any_wke_unauthenticated_variant``
  — parametrised forward-compat: ``bad-credentials``,
  ``expired-token``, ``revoked``, ``some-future-reason``.  xAI
  documents the prefix as stable, the suffix after the colon as a
  reason code that can grow; every variant under
  ``unauthenticated:`` must route to refresh.
* ``test_is_entitlement_failure_false_via_oauth2_validation_phrase_alone``
  — belt-and-braces guard: if a future API revision drops the WKE
  suffix but keeps "OAuth2 access token could not be validated", we
  still classify correctly.
* ``test_is_entitlement_failure_wke_signal_overrides_entitlement_keywords``
  — defensive: if a body ever carries BOTH the WKE suffix and
  entitlement language, the WKE signal wins.  Auth is recoverable;
  entitlement isn't, and a refreshed token will resurface the
  entitlement message on the next request.
* ``test_is_entitlement_failure_case_insensitive_wke_match`` —
  pins that the classifier lowercases the haystack so a future xAI
  build that uppercases the prefix doesn't reintroduce the bug.

Recovery-path coverage (end-to-end through
``_recover_with_credential_pool``):

* ``test_recover_with_credential_pool_refreshes_on_xai_bad_credentials_403``
  — the headline test the reporter requested: a bad-credentials 403
  with the exact wire body must call ``try_refresh_current()``
  exactly once and ``_swap_credential`` once.  Pre-fix this returned
  ``(False, _)`` because the entitlement classifier over-matched and
  short-circuited the refresh path.
* ``test_recover_with_credential_pool_still_blocks_real_entitlement``
  — companion regression guard for #26847: a pure unsubscribed-
  account body (no WKE suffix, no OAuth2-validation phrase) must
  still surface as entitlement and skip refresh.  The new
  disambiguator must not weaken the original loop-protection it
  was added to preserve.

The scaffolding reuses ``_make_codex_agent``, ``_FakePool``, and the
existing ``MagicMock`` patterns from the surrounding tests so the
new section reads as a natural extension of "Fix D" rather than a
separate test file.

2026-05-23 02:48:13 -07:00

acp

test(acp): pin parse_model_input in slash-command tests

2026-05-23 01:44:56 -07:00

acp_adapter

feat(azure-foundry): add Microsoft Entra ID auth

2026-05-18 10:14:38 -07:00

agent

test: keep tirith checks hermetic

2026-05-23 02:20:14 -07:00

cli

refactor(session-log): drop branch/compress re-point of session_log_file

2026-05-20 11:44:10 -07:00

cron

Fix unsafe gateway media path delivery

2026-05-23 01:40:35 -07:00

e2e

refactor(gateway): migrate Discord adapter to bundled plugin (full Teams parity)

2026-05-22 14:21:41 -07:00

fakes

fix: streaming tool call parsing, error handling, and fake HA state mutation

2026-03-14 14:27:20 +03:00

gateway

feat(telegram): edit status messages in place instead of appending (#30864 )

2026-05-23 02:42:10 -07:00

hermes_cli

fix(tests): allowlist tmp_path for kanban_notify artifact delivery (#30852 )

2026-05-23 02:34:34 -07:00

hermes_state

feat(session_search): single-shape tool with discovery, scroll, browse — no LLM (#27590 )

2026-05-17 23:28:45 -07:00

honcho_plugin

chore: ruff auto-fix PLR6201 resweep — tuple → set in membership tests (#27355 )

2026-05-17 02:29:41 -07:00

integration

refactor(gateway): migrate Discord adapter to bundled plugin (full Teams parity)

2026-05-22 14:21:41 -07:00

openviking_plugin

fix(openviking): pre-check fs/stat to route file URIs before hitting directory-only endpoints

2026-04-30 02:35:29 -07:00

plugins

fix(opencode-go): emit Kimi reasoning_effort, match KimiProfile shape

2026-05-23 02:20:28 -07:00

providers

fix(custom): pass custom provider extra body

2026-05-21 07:48:53 -07:00

run_agent

test(xai-oauth): regression coverage for the bad-credentials disambiguator (#29344 )

2026-05-23 02:48:13 -07:00

scripts

feat(acp-registry): switch to uvx distribution, drop npm launcher

2026-05-14 22:27:09 -07:00

skills

fix(skills): add timeout to Google OAuth urlopen calls

2026-05-19 00:11:44 -07:00

stress

docs: align kanban readiness docs and smoke tests

2026-05-18 21:07:03 -07:00

tools

fix(skills_guard): explain why --force is rejected on dangerous verdicts

2026-05-23 02:37:30 -07:00

tui_gateway

chore: ruff auto-fix PLR6201 resweep — tuple → set in membership tests (#27355 )

2026-05-17 02:29:41 -07:00

website

docs(skills): explain restoring bundled skills

2026-05-05 13:46:20 -07:00

__init__.py

A bit of restructuring for simplicity and organization

2025-10-01 23:29:25 +00:00

conftest.py

test: keep tirith checks hermetic

2026-05-23 02:20:14 -07:00

run_interrupt_test.py

fix: thread safety for concurrent subagent delegation (#1672 )

2026-03-17 02:53:33 -07:00

test_account_usage.py

feat(account-usage): add per-provider account limits module

2026-04-21 01:56:35 -07:00

test_atomic_replace_symlinks.py

refactor: consolidate symlink-safe atomic replace into shared helper

2026-04-28 04:58:22 -07:00

test_base_url_hostname.py

security(runtime_provider): close OLLAMA_API_KEY substring-leak sweep miss (#13522 )

2026-04-21 06:06:16 -07:00

test_batch_runner_checkpoint.py

test: regression coverage for checkpoint dedup and inf/nan coercion

2026-04-24 14:32:21 -07:00

test_bitwarden_secrets.py

feat(secrets): Bitwarden Secrets Manager integration with lazy bws install (#30035 )

2026-05-21 14:10:34 -07:00

test_cli_file_drop.py

fix(tui): improve macOS paste and shortcut parity

2026-04-21 08:00:00 -07:00

test_cli_manual_compress.py

fix(tests): catch up six stale tests after compression/aux/kanban changes (#28465 )

2026-05-18 21:43:59 -07:00

test_cli_skin_integration.py

fix(ci): stabilize main test suite regressions (#17660 )

2026-04-29 23:18:55 -07:00

test_ctx_halving_fix.py

fix(cache): kill long-lived prefix layout — system prompt is now byte-static within a session (#24778 )

2026-05-12 20:46:04 -07:00

test_empty_model_fallback.py

fix: fall back to provider's default model when model config is empty (#8303 )

2026-04-12 03:53:30 -07:00

test_env_loader_secret_sources.py

feat(secrets): label detected credentials with their source (Bitwarden) (#30364 )

2026-05-22 03:32:58 -07:00

test_evidence_store.py

feat: add OSS Security Forensics skill (Skills Hub) (#1482 )

2026-03-15 21:59:53 -07:00

test_gateway_streaming_nested_config.py

fix(gateway): load streaming config from nested gateway.streaming key

2026-05-14 14:51:07 -07:00

test_get_tool_definitions_cache_isolation.py

fix(tools): isolate get_tool_definitions quiet_mode cache + dedup LCM injection (#17335 )

2026-04-30 04:32:06 -07:00

test_hermes_bootstrap.py

fix(entry-points): guard hermes_bootstrap import so partial updates don't brick hermes (#22091 )

2026-05-08 14:43:13 -07:00

test_hermes_constants.py

fix(security): guard os.chmod(parent) against / and top-level dirs

2026-05-20 22:56:55 -07:00

test_hermes_home_profile_warning.py

fix(constants): warn once when get_hermes_home() falls back under an active profile (#18746 )

2026-05-02 01:49:55 -07:00

test_hermes_logging.py

fix(tests): catch up 25 stale tests after recent merges (#28626 )

2026-05-19 01:28:32 -07:00

test_hermes_state_wal_fallback.py

fix(sqlite): fall back to journal_mode=DELETE on NFS/SMB/FUSE (#22043 )

2026-05-09 02:09:35 -07:00

test_hermes_state.py

fix(gateway): separate observed Telegram group context

2026-05-23 01:33:42 -07:00

test_honcho_client_config.py

feat(memory): pluggable memory provider interface with profile isolation, review fixes, and honcho CLI restoration (#4623 )

2026-04-02 15:33:51 -07:00

test_install_sh_browser_install.py

fix(install): support non-sudo service-user installs on apt distros (#25814 )

2026-05-14 09:05:31 -07:00

test_install_sh_pythonpath_sanitization.py

fix: harden install.sh against inherited Python env leakage

2026-05-06 04:02:02 -07:00

test_install_sh_setup_wizard_tty_probe.py

fix(install): widen /dev/tty open-probe to sibling gates (#16746 )

2026-04-28 06:45:55 -07:00

test_install_sh_symlink_stomp.py

fix(install): preserve pip entry point when re-running on symlinked install

2026-05-14 07:08:45 -07:00

test_install_sh_termux_network_prereqs.py

fix: strengthen termux install network prerequisites

2026-05-07 13:04:08 -07:00

test_ipv4_preference.py

feat: add network.force_ipv4 config to fix IPv6 timeout issues (#8196 )

2026-04-11 23:12:11 -07:00

test_lazy_session_regressions.py

fix: resolve lazy session creation regressions (#18370 fallout) (#20363 )

2026-05-06 01:11:49 +05:30

test_lint_config.py

lint: enable PLW1514 as a blocking ruff rule

2026-05-08 14:27:40 -07:00

test_live_system_guard_self_test.py

chore: ruff auto-fix PLR6201 resweep — tuple → set in membership tests (#27355 )

2026-05-17 02:29:41 -07:00

test_mcp_serve.py

fix(mcp): unwrap platforms key in channels_list

2026-05-07 13:41:16 -07:00

test_mini_swe_runner.py

fix(kimi): omit temperature entirely for Kimi/Moonshot models (#13157 )

2026-04-20 12:23:05 -07:00

test_minimax_model_validation.py

fix(models): validate MiniMax models against static catalog (#12611 , #12460 , #12399 , #12547 )

2026-04-19 22:44:47 -07:00

test_minimax_oauth.py

fix(minimax-oauth): refresh short-lived access tokens per request (#30619 )

2026-05-22 15:16:15 -07:00

test_minisweagent_path.py

chore: remove all remaining mini-swe-agent references

2026-03-24 08:19:23 -07:00

test_model_picker_scroll.py

fix: CLI/UX batch — ChatConsole errors, curses scroll, skin-aware banner, git state banner (#5974 )

2026-04-07 17:59:42 -07:00

test_model_tools_async_bridge.py

fix(model_tools): cancel coroutine on timeout so worker thread exits + log full traceback

2026-04-29 05:00:40 -07:00

test_model_tools.py

chore: remove Atropos RL environments and tinker-atropos integration (#26106 )

2026-05-15 10:36:38 +05:30

test_ollama_num_ctx.py

fix: provider/model resolution — salvage 4 PRs + MiniMax aux URL fix (#5983 )

2026-04-07 22:23:28 -07:00

test_package_json_lazy_deps.py

fix(update): make Camofox lazy-installed instead of eager (#27055 )

2026-05-16 12:15:45 -07:00

test_packaging_metadata.py

chore: prepare Hermes for Homebrew packaging (#4099 )

2026-03-30 17:34:43 -07:00

test_plugin_skills.py

fix(skills): support category-qualified local skill names

2026-05-05 10:15:31 -07:00

test_process_loop_event_loop_warning.py

fix(cli): replace get_event_loop() with get_running_loop() to silence RuntimeWarning in process_loop thread (#19285 )

2026-05-07 06:35:54 -07:00

test_project_metadata.py

fix(packaging): ship dashboard plugin assets in wheel

2026-05-18 20:35:00 -07:00

test_retry_utils.py

feat(agent): add jittered retry backoff

2026-04-08 00:41:36 -07:00

test_run_tests_parallel.py

test: use subprocesses for each test file (#29016 )

2026-05-21 16:40:04 +05:30

test_sanitize_tool_error.py

security: sanitize tool error strings before injecting into model context (#26823 )

2026-05-16 00:57:39 -07:00

test_sql_injection.py

fix(security): eliminate SQL string formatting in execute() calls

2026-03-19 15:16:35 +01:00

test_subprocess_home_isolation.py

fix: avoid process-wide cron profile home mutation

2026-05-18 17:39:50 +00:00

test_termux_all_extra_compat.py

fix: add termux-all install profile and safe fallbacks

2026-05-07 13:04:08 -07:00

test_timezone.py

chore: ruff auto-fix PLR6201 resweep — tuple → set in membership tests (#27355 )

2026-05-17 02:29:41 -07:00

test_toolset_distributions.py

test: add unit tests for 8 modules (batch 2)

2026-02-26 13:54:20 +03:00

test_toolsets.py

test(toolsets): lock web search into default platform coverage

2026-05-14 08:03:33 -07:00

test_trajectory_compressor_async.py

fix(kimi): omit temperature entirely for Kimi/Moonshot models (#13157 )

2026-04-20 12:23:05 -07:00

test_trajectory_compressor.py

fix(kimi): omit temperature entirely for Kimi/Moonshot models (#13157 )

2026-04-20 12:23:05 -07:00

test_transform_llm_output_hook.py

test+docs: cover transform_llm_output hook + release author map

2026-05-07 05:46:05 -07:00

test_transform_tool_result_hook.py

test: stop testing mutable data — convert change-detectors to invariants (#13363 )

2026-04-20 23:20:33 -07:00

test_tui_gateway_server.py

fix(tui): surface verbose tool details (#30225 )

2026-05-22 00:16:52 -05:00

test_utils_truthy_values.py

Gate tool-gateway behind an env var, so it's not in users' faces until we're ready. Even if users enable it, it'll be blocked server-side for now, until we unlock for non-admin users on tool-gateway.

2026-03-30 13:28:10 +09:00

test_yuanbao_integration.py

yuanbao platform (#16298 )

2026-04-26 18:50:49 -07:00

test_yuanbao_markdown.py

yuanbao platform (#16298 )

2026-04-26 18:50:49 -07:00

test_yuanbao_pipeline.py

yuanbao platform (#16298 )

2026-04-26 18:50:49 -07:00

test_yuanbao_proto.py

yuanbao platform (#16298 )

2026-04-26 18:50:49 -07:00