hermes-agent

Author	SHA1	Message	Date
teknium1	fd09b2c55e	fix(gateway): trust adapter-owned access policy over env default-deny (#34515 ) Config-driven platform policies (dm_policy / group_policy / allow_from / group_allow_from) for WeCom, Weixin, Yuanbao, and QQBot now work without also setting a PLATFORM_ALLOWED_USERS env var. These adapters enforce their access policy at intake — a message is dropped inside the adapter and never dispatched unless it already passed the policy. The gateway's env-based check (_is_user_authorized) ran afterward and, with no env allowlist set, fell through to an env-only default-deny — silently rejecting `dm_policy: open` and config-only allowlists the adapter had already authorized. Rather than re-implement each adapter's policy a second time in run.py (which would drift), adapters that own their gate now declare it via a new BasePlatformAdapter.enforces_own_access_policy property (default False). The gateway trusts that flag and skips the env-only default-deny for those platforms. Env allowlists still take precedence when set. Also resolves unauthorized DM behavior from config dm_policy so allowlist / disabled policies drop unauthorized DMs silently instead of leaking pairing codes, while an explicit pairing policy opts back in. Co-authored-by: Frowtek <frowte3k@gmail.com>	2026-05-29 04:22:41 -07:00
teknium1	58e1b04665	chore(release): map tillfalko to GitHub login for PR #29987 salvage	2026-05-29 03:58:56 -07:00
teknium1	a87f0a82a5	test(tool-search): redact secrets from harness transcripts + console The live harness runs against a real OpenRouter key; record['error'] is a full traceback that, on an auth failure, could echo a request header or URL containing the key. _redact_secrets() now masks the live OPENROUTER_API_KEY, any sk-/sk-or- bearer token, and Authorization/Bearer headers before final_response and error enter the transcript or the console print. Addresses the CodeQL clear-text-storage/logging findings at the source.	2026-05-29 02:04:12 -07:00
teknium1	1709776120	test(tool-search): add live A/B harness, drop checked-in transcripts Brings in the tool_search live-test harness from the original PR but leaves out the 11 checked-in scripts/out/*.json transcript files — those are non-deterministic model output that goes stale the moment the model changes and were the bulk of the diff. scripts/out/ is now gitignored so a harness run never re-commits them. Fixes on top: - API-key loading goes through hermes_cli.env_loader.load_hermes_dotenv instead of hand-parsing ~/.hermes/.env and assigning the value to a local. The canonical loader never materializes the secret in a local variable in this module, which clears the four CodeQL high alerts (py/clear-text-storage / py/clear-text-logging-sensitive-data at the transcript write/print sites — they were tracing the key from the hand-rolled parser into the records) and removes a hand-rolled parser. - encoding='utf-8' on every write_text/read_text in both harness scripts (Windows-footgun hygiene). Co-authored-by: teknium1 <127238744+teknium1@users.noreply.github.com>	2026-05-29 02:04:12 -07:00
Teknium	0384398c65	chore(release): map blackpilledsoftware-prog email to GitHub login Required by CI author validation after salvaging PR #16780.	2026-05-29 00:31:44 -07:00
Teknium	c1485d52e3	chore(release): add moikapy AUTHOR_MAP for PR #31527 salvage	2026-05-29 00:28:02 -07:00
teknium1	8d57281650	chore: add AUTHOR_MAP entry for Interstellar-code	2026-05-29 00:21:54 -07:00
teknium1	592a4ffb6b	fix(kanban): close three blocked/iteration-exhausted handling gaps (#29747 ) Reporter diagnosed three independent gaps that together allowed infinite 'unblock → re-stuck' loops with no surfacing or escalation: GAP 1: `_rule_stuck_in_blocked` resets timer on any `commented`/`unblocked` event, so a task that cycles every few minutes is invisible to it regardless of how many times it cycles. Fix: new `_rule_block_unblock_cycling` rule (`hermes_cli/kanban_diagnostics.py`) that counts block→unblock cycles in a sliding window. Default threshold 3 cycles within 24h, configurable via `block_cycle_threshold` / `block_cycle_window_seconds`. Walks events in arrival order (event id) since multiple events can share the same `created_at` second. Fires as a warning with a CLI hint to inspect the block reasons. GAP 2: Iteration-budget-exhausted runs in kanban workers map to `kanban_block` (status=blocked, but a clean exit from the kernel's perspective). `_rule_repeated_failures` reads `consecutive_failures`, which `_record_task_failure` increments only for crashed/timed_out/ spawn_failed — `blocked` outcome bypasses the failure counter, so the `kanban.failure_limit` circuit breaker never trips on budget-exhaustion loops. Fix: `agent/conversation_loop.py` budget-exhaustion path now calls `_record_task_failure(outcome="timed_out")` instead of `kanban_block`. Budget exhaustion is genuinely a timeout-shaped failure (the task ran out of allowed iterations), so this is more honest semantics; it also routes through the unified failure counter, so repeated budget exhaustions trip the circuit breaker and the task auto-blocks with `gave_up` after `failure_limit` retries. GAP 3: `release_stale_claims` uses `_pid_alive(worker_pid)` only and ignores `last_heartbeat_at`. Reporter observed a 91-min run that held its claim with frozen heartbeat because the worker entered a logic loop with no tool calls — `_pid_alive` kept returning True so the claim was extended every 15 minutes indefinitely. Fix: heartbeat-stale backstop. If `last_heartbeat_at` is set AND older than `DEFAULT_CLAIM_HEARTBEAT_MAX_STALE_SECONDS` (default 1h), reclaim even if the PID is alive. NULL `last_heartbeat_at` preserves backward compatibility (no heartbeat yet = extend, as before). The reclaim event payload now includes a `heartbeat_stale` boolean so operators see why a live-PID worker was reclaimed. This works cleanly in concert with PR #34418 (#31752 runtime → heartbeat bridge): once `_touch_activity` keeps `last_heartbeat_at` fresh as a side effect of normal API traffic, the backstop only fires for genuinely wedged workers (no chunks, no tool results, no progress at all). Co-authored-by: baofuen <45189813+baofuen@users.noreply.github.com>	2026-05-29 00:13:29 -07:00
teknium1	bc31ee5cf8	fix(kanban): bridge worker runtime activity to board heartbeat (#31752 ) The dispatcher watchdog (release_stale_claims) reads tasks.last_heartbeat_at to decide whether to reclaim a running task. The agent maintains its own in-process `_last_activity_ts` for every chunk/tool result, but those liveness ticks never reach the board unless the model explicitly calls the `kanban_heartbeat` tool — so a worker actively executing a long run without tool-level heartbeats can be reclaimed mid-flight as 'stale', returning the task to ready and orphaning the in-flight worker's progress. Fix: in `_touch_activity` (the canonical 'we just did work' hook in run_agent.py), call a new `heartbeat_current_worker_from_env` helper in `tools/kanban_tools.py` that: - No-ops outside dispatcher-spawned worker context (no HERMES_KANBAN_TASK). - Rate-limited to one DB write per 60s (runtime activity ticks too often to faithfully mirror; we just need the watchdog to see liveness). - Best-effort: never raises. heartbeat_claim + heartbeat_worker calls are individually try/except'd; any DB error logs at debug and returns. - Uses worker env identity: HERMES_KANBAN_TASK + HERMES_KANBAN_RUN_ID + HERMES_KANBAN_CLAIM_LOCK (all pinned by the dispatcher at spawn time). - No durable note on auto-heartbeats — that's reserved for the explicit `kanban_heartbeat` tool which carries a model-supplied note. The explicit `kanban_heartbeat` tool stays available unchanged for workers that want to attach a note or pre-emptively extend a claim across a known-long single tool call. Co-authored-by: faisfamilytravel <223516181+faisfamilytravel@users.noreply.github.com>	2026-05-29 00:05:58 -07:00
teknium1	40217aa194	fix(kanban): tell workers not to use clarify; route to kanban_block instead (#32167 ) Kanban workers run headless — no live user is on the other side of `clarify`, so the call times out (~120s default) and the task sits silently in `running` with no signal to the operator that input is needed. Reporter observed a real incident where a worker asked 'promote to production, or check staging first?' via clarify, the call timed out, the agent hallucinated a fallback, and the task sat 'running' for hours. Fix: explicit 'do not call clarify' bullet in two surfaces every kanban worker sees — - `agent/prompt_builder.py` KANBAN_GUIDANCE `## Do NOT` section (auto-injected into every dispatcher-spawned worker run). - `skills/devops/kanban-worker/SKILL.md` `## Do NOT` section (the bundled worker skill). Both point at the right pattern: `kanban_comment` (context) + `kanban_block` (decision needed) — the task surfaces on the board as blocked, the operator sees it, unblocks with their answer in a comment, and the worker respawns with the thread. Co-authored-by: kweiner <17778+kweiner@users.noreply.github.com>	2026-05-28 23:57:20 -07:00
teknium1	ae6817f7f7	fix(kanban): add --reason flag to unblock for symmetry with block (#30897 ) `hermes kanban unblock <id> review-required: ...` parsed every trailing word as another task_id (since `task_ids` is `nargs='+'`), then quietly failed on each non-existent id with "cannot unblock review-required: (not blocked/scheduled?)". Reporter saw this as asymmetric with `block <id> <reason...>` which accepts positional reason words. Fix: add a `--reason "..."` flag that, when provided, is appended as a `UNBLOCK: <reason>` comment before the unblock transition. Bulk syntax (`unblock t_a t_b t_c`) is preserved unchanged. Co-authored-by: julio-cloudvisor <211828103+julio-cloudvisor@users.noreply.github.com>	2026-05-28 23:41:44 -07:00
Teknium	71ae98b792	chore(release): map seppe@fushia.be to GitHub login Required by CI author validation after salvaging PR #33193.	2026-05-28 23:30:39 -07:00
Gabor Barany	1386a7e478	fix(xai-sanitize): deepcopy tools_for_api before in-place mutation (#27907 ) The xAI tool-schema sanitizers (strip_slash_enum, strip_pattern_and_format) mutate their input in place — that's their documented contract. The two call sites (chat_completion_helpers.build_api_kwargs and the auxiliary client) were passing agent.tools straight through, so the first xAI request would permanently strip slash-containing enum constraints and pattern/format keywords from the per-agent tool registry. Effect: any subsequent non-xAI call from the same agent (auxiliary task routed to Anthropic, OpenRouter fallback, mid-session model switch) saw the already-stripped schema with no way for the user to notice from their config. Fix: deepcopy tools_for_api before sanitizing at both call sites. The slash-enum bug itself (xAI 400ing on enums with '/') was fixed earlier by #32443 (Nami4D) — that PR landed the strip but used the sanitizers directly without copying. This salvages #27907's correctness contribution (the deepcopy) while skipping its redundant parallel sanitizer (strip_xai_incompatible_enum_values is functionally equivalent to the existing strip_slash_enum) and its preflight- neutrality argument (we chose model-gated preflight in #32443). 3 new tests in tests/run_agent/test_run_agent_codex_responses.py: - strips_slash_enum_from_outgoing_request — outgoing kwargs has no slash-containing enum values (functional contract preserved). - does_not_mutate_agent_tools — headline #27907 regression. Snapshot agent.tools before build_api_kwargs, assert it survives intact after. Pre-fix this assertion would have caught the mutation. - is_idempotent_across_repeated_calls — three xAI requests in a row each strip cleanly AND don't progressively erode the source schema. 344/344 across tests/agent/test_auxiliary_client.py, tests/agent/transports/test_codex_transport.py, tests/run_agent/test_run_agent_codex_responses.py, and tests/tools/test_schema_sanitizer.py. Co-authored-by: Gabor Barany <barany.gabor@gmail.com>	2026-05-28 23:29:59 -07:00
kshitijk4poor	66827f8947	chore: prune unused imports and duplicate import redefinitions Remove unused imports (F401) and duplicate/shadowed import redefinitions (F811) across the codebase using ruff's safe autofixes. No behavioral changes -- imports only. - ~1400 safe autofixes applied across 644 files (net -1072 lines) - __init__.py re-exports preserved (excluded from F401 removal so public re-export surfaces stay intact) - Re-exports that are imported or monkeypatched by tests but look unused in their defining module are kept with explicit # noqa: F401 (gateway/run.py load_dotenv; run_agent re-exports from agent.message_sanitization, agent.context_compressor, agent.retry_utils, agent.prompt_builder, agent.process_bootstrap, agent.codex_responses_adapter) - Unsafe F841 (unused-variable) fixes deliberately skipped -- those can change behavior when the RHS has side effects - ruff lints remain disabled in pyproject.toml (only PLW1514 is selected); this is a one-time cleanup, not a config change Verification: - python -m compileall: clean - pytest --collect-only: all 27161 tests collect (zero import errors) - core entry points import clean (run_agent, model_tools, cli, toolsets, hermes_state, batch_runner, gateway) - static scan: every name any test imports directly from an edited module still resolves	2026-05-28 22:26:25 -07:00
kshitijk4poor	d464d08a5f	chore: add devwdave to AUTHOR_MAP Maps both commit emails (david@memorilabs.ai, dave@devwdave.com) used on #28065 to the devwdave GitHub account so the contributor audit in scripts/release.py passes.	2026-05-29 02:16:43 +05:30
teknium1	c5e496e1c0	chore: map yanghongda@jackyun.com -> yangguangjin in AUTHOR_MAP	2026-05-28 12:26:53 -07:00
Teknium	7050c052e3	fix(skills): pull full skills.sh catalog via sitemap (858 → 19,932) (#34025 ) The skills.sh source was returning ~858 unique skills from a hardcoded list of 28 popular keyword searches (each capped at 50 results). The real catalog is ~20k — exposed via sitemap-skills-{1,2}.xml linked from the site's sitemap index. Switch the empty-query path in SkillsShSource.search() to walk the sitemap instead of scraping the homepage's curated featured strip. Falls back to the homepage scrape if the sitemap is unreachable. build_skills_index.crawl_skills_sh() now just calls search("", limit=0) instead of running 28 keyword searches — same result in one HTTP round instead of 28. Also handle a httpx + brotlicffi interaction: the per-skill sitemaps are ~900 KB brotli-compressed and the cffi backend's streaming decode chokes on them. Forcing Accept-Encoding to gzip dodges the bug without requiring a brotli library upgrade. E2E against live skills.sh: 19,932 unique skills walked in 0.7s. Tests: 137 pass (+1 new regression test exercising the sitemap path). Floor for skills.sh raised 100 → 10,000 in EXPECTED_FLOORS so a future regression hard-fails the build.	2026-05-28 11:28:12 -07:00
Teknium	0c859a1c04	chore: release v0.15.0 (2026.5.28) (#34008 ) Some checks are pending Publish to PyPI / Build distribution 📦 (push) Waiting to run Details Publish to PyPI / Publish to PyPI (push) Blocked by required conditions Details Publish to PyPI / Sign and attach to GitHub Release (push) Blocked by required conditions Details * chore: release v0.15.0 (2026.5.28) The Velocity Release. Run_agent.py refactor (16k→3.8k LOC, -76%), kanban grows into a multi-agent platform (104 PRs), cold-start perf wave continues (-240ms / -47% per-turn function calls / -195ms per tool call), session_search rebuilt (4500x faster, no LLM), promptware defense lands, Bitwarden Secrets Manager integration, two new image_gen providers (Krea 2, FAL plugin port), Nous-approved MCP catalog, OpenHands skill, ntfy as 23rd messaging platform, deep xAI integration round. 15 P0 + 65 P1 closures. 747 PRs, 1,302 commits, 321 contributors. * chore(release): bump acp_registry/agent.json to 0.15.0 (sync with pyproject)	2026-05-28 10:45:33 -07:00
kshitij	0554ef1aa3	fix(agent): fallback immediately on provider content-policy blocks (#33883 ) * fix(agent): fallback immediately on provider content-policy blocks Provider safety-filter refusals (e.g. OpenAI Codex 'flagged for possible cybersecurity risk', OpenAI moderation 'violates our usage policies', Anthropic safety-system rejections, Azure content_filter) are deterministic decisions about a specific prompt. Retrying the same prompt up to api_max_retries times just reproduces the same refusal and burns paid attempts before surfacing the generic 'API failed after 3 retries — <provider message>' to Telegram / cron with no indication that the failure came from the model provider rather than Hermes itself. Classify these as a new FailoverReason.content_policy_blocked (non-retryable, should_fallback=True) and route them through the existing is_client_error path so the loop: - skips the 3x retry backoff - activates a configured fallback model immediately - emits a clear provider-safety message to the user (not the generic 'Non-retryable error (HTTP None)') and surfaces actionable guidance when no fallback is configured (rephrase, narrow context, or set fallback_model in hermes config) - returns a final_response that explicitly tells the user this came from the model provider, so gateway delivery is unambiguous and cron last_status reflects the safety block rather than a vague 'agent reported failure' Patterns are intentionally narrow — verbatim refusal phrasings keyed to specific provider safety pipelines, not generic words like 'policy' or 'violation' that would collide with billing / format / auth errors. Regression guards in test_18028_content_policy_blocked.py verify billing 402s, generic 400s, and OpenRouter account-level provider_policy_blocked remain distinct classifications. Salvaged from #18164 onto current main (file restructure: loop logic moved from run_agent.py to agent/conversation_loop.py, _emit_status → _buffer_status), broadened patterns beyond the original OpenAI Codex cybersecurity case to cover OpenAI moderation, Anthropic safety system, and Azure content_filter; added user-actionable guidance and a clear final_response so cron/gateway surfaces the policy block instead of a generic non-retryable error, and added a regression-guard test module mirroring the is_client_error predicate. Addresses #18028. Co-authored-by: Kuan-Chieh Huang <kchuang1015@users.noreply.github.com> * chore: add kchuang1015 to AUTHOR_MAP --------- Co-authored-by: Kuan-Chieh Huang <kchuang1015@users.noreply.github.com>	2026-05-28 07:28:24 -07:00
Pluviobyte	eafe11d456	fix(gateway): backfill Discord thread context Discord threads where the bot has already participated bypass mention gating by default, but the backfill check was still tied to the mention-needed condition. That meant follow-up thread messages could trigger a response without providing recent thread history to the session. Run history backfill for thread messages whenever backfill is enabled, while keeping DMs skipped and channel mention backfill behavior unchanged. Add a regression test for a known thread follow-up without an explicit mention. Fixes #33666 Co-authored-by: Cursor <cursoragent@cursor.com>	2026-05-28 04:52:02 -07:00
teknium1	f8896dedc8	chore(release): map biser@bisko.be -> bisko in AUTHOR_MAP	2026-05-28 03:21:00 -07:00
teknium1	247b24b49f	chore(release): add AUTHOR_MAP entry for AdityaRajeshGadgil	2026-05-28 02:45:25 -07:00
Dusk1e	aa3466063b	fix(android): reject unsafe tar members in psutil compatibility installer	2026-05-28 02:36:09 -07:00
Teknium	bb0ac5ced2	chore(release): AUTHOR_MAP entry for vynxevainglory-ai PR #29233 salvage.	2026-05-28 02:33:51 -07:00
Teknium	fb9f3a4ef9	fix(skills): pull full ClawHub catalog into the skills index (200 → 20k+) (#33748 ) * fix(skills): pull full ClawHub catalog into the skills index The website was showing 200 ClawHub skills out of 20k+ because `ClawHubSource.search("")` for empty queries went straight to a single unpaginated request. ClawHub's API caps any single page at 200 items and returns a `nextCursor`; we grabbed page 1 and stopped, so the cached index served from hermes-agent.nousresearch.com had a silent 99% truncation. End users never hit clawhub.ai directly (the index is rebuilt twice daily by .github/workflows/skills-index.yml and served as a static JSON on the docs site), so the cap-and-cache architecture is correct — it just wasn't being filled. Changes: - `ClawHubSource.search(query="")` now routes through the existing `_load_catalog_index()` paginating walker instead of the unpaginated listing fallback (non-empty queries still hit the fast catalog search). - `_load_catalog_index()` max_pages 50 → 250 (50k-skill ceiling; live catalog is ~20k as of May 2026, with headroom for growth). - `build_skills_index.py`: per-source crawl limits split out — ClawHub and LobeHub get 100k, others keep their effective caps. - `EXPECTED_FLOORS["clawhub"]` 50 → 5000 so the next pagination regression hard-fails the CI build instead of silently shipping a degenerate index. Test plan: - New unit test `test_search_empty_query_paginates_full_catalog` exercises the cursor-following path with three mocked pages (450 total items) and asserts all pages are walked. - Existing 9 ClawHub tests + 127 broader skills_hub tests all pass. - E2E against live ClawHub API: walker reached 9700+ skills across 49 pages before this commit landed, paginating well past the previous 50-page cap. * fix(skills): raise ClawHub ceilings — live catalog is 50k, not 20k E2E walk against live ClawHub API hit my initial 250-page cap at 49,698 skills with cursor=yes still pending. The catalog is roughly 2.5x larger than the docstring estimate. - max_pages 250 → 750 (150k ceiling, walks terminate on cursor=None well before this in practice) - SOURCE_LIMITS['clawhub'] 100k → 200k - EXPECTED_FLOORS['clawhub'] 5000 → 20000	2026-05-28 01:42:19 -07:00
kshitijk4poor	7b778db472	chore(release): map MoonRay305 contributor email for #32759 salvage Adds `squiddy@2rook.ai → MoonRay305` to AUTHOR_MAP so contributor_audit.py passes for the salvaged commits in #33482-followup PR.	2026-05-27 23:28:51 -07:00
kshitijk4poor	2d5dcfabc3	test(kanban): update dispatcher tick counter for hoisted zombie reaper The reaper hoist in the prior commit adds an extra `asyncio.to_thread(_kb.reap_worker_zombies)` call at the top of every dispatcher tick (before the per-board work). The existing `test_gateway_dispatcher_disables_corrupt_board_without_traceback` mocks `to_thread` with a 4-call cap that previously matched 2 full dispatch ticks. With the reaper hoist each tick is now 3 `to_thread` calls instead of 2, so the cap is raised to 6 to preserve the same number of dispatch ticks. The `connect == 5` assertion is unchanged. Also add the contributor's `steveonjava@gmail.com` to AUTHOR_MAP alongside `steve@steveonjava.com` so contributor-audit passes for both identities used across the salvaged commits. Salvage follow-up for PR #32857.	2026-05-27 14:31:55 -07:00
Stephen Chin	ffdc937c18	fix(kanban): hoist zombie reaper out of dispatch_once Reaper now runs at the top of every dispatcher tick regardless of per-board connect() failures. Previously the reaper sat inside dispatch_once after the kanban_db.connect() call — any EIO during connect would skip reaping for that tick, accumulating zombie workers and stale claim_lock rows. Also: reap_worker_zombies now returns the list of reaped pids (the dispatcher logs them) and a test indentation fix. Squashes three sibling commits from PR #32301 into one logical change for batch review.	2026-05-27 14:31:55 -07:00
Wesley Simplicio	4efb40c325	fix(install): set world-readable uv python dirs for root FHS layout When installing as root on Linux with the default FHS layout (/usr/local/lib/hermes-agent), `uv python install` placed the managed Python under /root/.local/share/uv/python/, which non-root users cannot traverse. The shared /usr/local/bin/hermes wrapper then failed for them with "bad interpreter: Permission denied" when execing the venv python. Export UV_PYTHON_INSTALL_DIR and UV_PYTHON_BIN_DIR to /usr/local/share/uv/ in the root-FHS branch of resolve_install_layout so the managed Python is world-readable and the shared wrapper works for any user. Closes #21457	2026-05-27 13:55:51 -07:00
teknium1	5deb384b53	chore(release): map donovan-yohan for #33263 salvage	2026-05-27 11:48:23 -07:00
teknium1	8386f84454	chore(release): map Brixyy for #33136 salvage	2026-05-27 11:30:55 -07:00
teknium1	3476509f97	chore(release): map sanghyuk-seo-nexcube for #33383 salvage	2026-05-27 11:19:55 -07:00
Erosika	eccbbe4b1b	chore(release): map adopted Honcho contributors	2026-05-27 10:49:33 -07:00
kshitij	dd0d5d5a82	chore: add JohnC1009 to AUTHOR_MAP (#33351 ) Pre-requisite for PR #32020 salvage (auth: global auth.json fallback in _load_provider_state). Contributor_audit strict mode fails if any commit author email on main is unmapped. Co-authored-by: kshitijk4poor <kshitijk4poor@gmail.com>	2026-05-27 09:37:50 -07:00
Teknium	b4eea187d5	fix(xai-oauth): gate slash-enum strip on model name + add regression tests (#28490 ) Three additions on top of @Nami4D's salvage: 1. Gate the preflight slash-enum strip on the model name pattern (grok-* / x-ai/grok-*). The original PR stripped slash-containing enum values from every codex_responses request, but native Codex (OpenAI) and GitHub Models DO accept slash enums — stripping them there would silently degrade tool-schema constraints. xAI is the only Responses-API surface that rejects the shape. 2. Resolve the merge conflict in agent/transports/codex.py by preserving both the timeout-forwarding block that landed on main between the PR's branch point and now AND the new service_tier strip. Behavioural intent of both is preserved. 3. Six new tests in tests/agent/transports/test_codex_transport.py covering: - TestCodexTransportXaiServiceTierStrip (3 tests): xAI strips service_tier from request_overrides; non-xAI codex_responses and GitHub Models both KEEP service_tier (regression guards so the strip stays xAI-only). - TestPreflightSlashEnumStrip (3 tests): Grok and aggregator- prefixed Grok model names both trigger the safety-net strip; non-Grok models preserve slash enums as a regression guard against the strip becoming too broad. 51/51 in tests/agent/transports/test_codex_transport.py. Co-authored-by: Nami4D <hello@nami4d.tech>	2026-05-27 05:25:38 -07:00
Teknium	4feb181eb4	chore(release): map sir-ad + rdasilva1016-ui in AUTHOR_MAP	2026-05-27 02:41:24 -07:00
Teknium	581b0215a5	chore(release): map chaconne67 noreply for #31629 salvage	2026-05-27 02:40:03 -07:00
Teknium	c819bc575b	chore(release): map kpadilha noreply for #11038 salvage	2026-05-27 02:25:59 -07:00
Teknium	c752205635	chore(release): map superearn-fisher noreply for #33122 salvage	2026-05-27 02:06:21 -07:00
orcool	f0fdb5e67d	feat(catalog): add qwen3.7-max to alibaba + alibaba-coding-plan model lists Alibaba's latest flagship Qwen model is released but not yet present in the DashScope (alibaba) or Alibaba Coding Plan curated catalogs. Add it so it shows up in the /model picker and setup wizard for those providers. OpenCode Go routing for qwen3.7-max already landed via #32780 (commit `2fc77c53f`). OpenRouter + Nous catalog entries already landed via #32809 (commit `ccd3d04fc`). This salvage picks up the remaining alibaba / alibaba-coding-plan entries from #32806 — the AI Gateway entry is dropped because Vercel AI Gateway was removed in #33067.	2026-05-27 02:05:58 -07:00
Teknium	96223265b9	chore(api-server): mark skills_api capability True now that /v1/skills shipped #33016 added GET /v1/skills + /v1/toolsets on the API server; the capability flag introduced in this branch was placeholder-False. Flip to True so capability probers see the truth.	2026-05-27 01:56:55 -07:00
Teknium	f0be32232d	chore(release): map EvilHumphrey noreply for #33034 salvage	2026-05-27 01:52:34 -07:00
Teknium	b6ca56f651	fix(codex-responses): gracefully recover from invalid_encrypted_content (salvage #10144 ) (#33035 ) * fix(codex-responses): gracefully recover from invalid_encrypted_content (salvage #10144) When an OpenAI-compatible Responses API surface accepts an initial request but later rejects the replayed `codex_reasoning_items` encrypted blob with HTTP 400 `invalid_encrypted_content`, the session previously got stuck retrying the same poisoned payload. Recovery: classify the error as a dedicated FailoverReason, and on the first hit disable encrypted reasoning replay for the rest of the session, strip cached items from message history, and retry once. Changes: * error_classifier: add FailoverReason.invalid_encrypted_content branch in _classify_400 (before context_overflow so the messages that mention 'encrypted content … could not be verified' don't trip context heuristics), in _classify_by_error_code, and extend _extract_error_code to peek inside wrapped JSON in error.message and ignore the bare '400' as a code. * agent_init: initialize `_codex_reasoning_replay_enabled = True` on every agent. * run_agent: add AIAgent._disable_codex_reasoning_replay() helper that flips the flag and pops cached items. * codex_responses_adapter: thread a `replay_encrypted_reasoning` kwarg through _chat_messages_to_responses_input so that when the flag is False we don't replay codex_reasoning_items. * transports/codex.py: read `replay_encrypted_reasoning` from params, thread it into the adapter, and gate the `include=['reasoning.encrypted_content']` request hint on it. * chat_completion_helpers: pass the agent's replay flag through to the transport. * conversation_loop: in the retry loop, add an invalid_encrypted_content recovery branch that fires once per session, only when api_mode == codex_responses, only when replay is still enabled, and only when at least one assistant message in history actually carries cached reasoning items (otherwise the 400 has nothing to do with our cache and the normal retry path handles it). Tests: * test_error_classifier: new wrapped-JSON _extract_error_code case; new TestClassifyApiError cases proving the 400 is retryable with no fallback, that the broad message match doesn't catch a generic 'parsed' message, and that the error code match is case-insensitive. * test_run_agent_codex_responses: end-to-end test of the recovery branch firing once and disabling replay, plus a sibling test that proves the branch does not fire (and the flag stays True) when history has no cached reasoning items. Salvages PR #10144 onto the post-refactor module layout (error_classifier / codex_responses_adapter / transports/codex / conversation_loop / agent_init) since the original diff was written against the pre-refactor monolithic run_agent.py. * chore(release): map victorGPT in AUTHOR_MAP for #10144 salvage --------- Co-authored-by: victorGPT <wuxuebin1993@gmail.com>	2026-05-26 22:01:17 -07:00
teknium1	9feadc2734	chore(release): map ticketclosed-wontfix noreply to GitHub login	2026-05-26 20:51:59 -07:00
Teknium	16e86ce6a7	chore(release): map wangpuv contributor email for #32933 (#33005 ) Pre-stages the AUTHOR_MAP entry so the contributor-check workflow passes when Will Falcon's image-gen SSE fix lands.	2026-05-26 20:40:17 -07:00
Teknium	2a8d217417	chore(release): map carltonawong noreply to GitHub login Added AUTHOR_MAP entry for the cherry-picked fix in the preceding commit so the release contributor audit can resolve Carlton's noreply email.	2026-05-26 19:37:37 -07:00
teknium1	f05a47309e	fix(gateway): refresh cached agent tools on /reload-mcp When the gateway processes /reload-mcp, it reconnects MCP servers and updates the global _servers registry, but cached AIAgent instances in _agent_cache keep the tools list they were built with. The user had to also run /new (discarding conversation history) before the agent could see the new tools — even though /reload-mcp had succeeded. This patch refreshes each cached agent's .tools and .valid_tool_names in _execute_mcp_reload after discovery returns, so existing sessions pick up new MCP tools on their next turn. The slash-confirm gate in _handle_reload_mcp_command already obtains user consent for the implied prompt-cache invalidation before this code runs. Mirrors the equivalent behaviour the CLI already does in cli.py _reload_mcp. Per-agent enabled_toolsets and disabled_toolsets are preserved so an agent that was scoped to a subset of toolsets does not silently gain disabled tools after the reload. Original diagnosis + initial implementation in #23812 from @fujinice. The auto-reload watcher half of that PR is intentionally dropped — users want /reload-mcp to remain explicit. Co-authored-by: fujinice <45688690+fujinice@users.noreply.github.com>	2026-05-26 14:28:51 -07:00
kshitij	66851dc413	chore: add krislidimo to AUTHOR_MAP for PR #29775 (#32434 )	2026-05-25 23:15:56 -07:00
Teknium	d8703e27f5	feat(skills-hub): health checks, freshness badge, and a watchdog cron (#32345 ) Layered safety so the Skills Hub at /docs/skills stays in sync without silent rot. Three pieces: 1. build_skills_index.py — refuses to ship a degenerate index. EXPECTED_FLOORS per source (skills.sh ≥100, lobehub ≥100, clawhub ≥50, official ≥50, github ≥30, browse-sh ≥50) and MIN_TOTAL=1500. Any source collapsing to zero (the silent OpenAI breakage that hid for weeks) now fails the workflow loud — broken index never reaches the live site. 2. extract-skills.py + the React page — visible freshness signal. Sidecar website/src/data/skills-meta.json carries the index's generated_at timestamp, plus per-source counts. Skills Hub renders a 'Catalog refreshed N hours ago · auto-rebuilt twice daily' line under the hero copy. If the cron stalls, users see the staleness immediately. 3. .github/workflows/skills-index-freshness.yml — watchdog cron. Every 4 hours, fetches the live /docs/api/skills-index.json, validates shape, checks age (>26h is stale), checks the same per-source floors, and opens (or appends to) a GitHub issue when anything is off. The issue is title-prefixed [skills-index-watchdog] so subsequent failures append a comment instead of spamming new issues. Net effect: - A silent regression like 'OpenAI tap moved its skills' now fails the build instead of shipping a quietly broken catalog. - A stuck cron (like the landingpage breakage that ran red for weeks) now files an issue within 4 hours. - Users see how fresh the catalog is on the page itself. Test plan: - Local: built skills-meta.json from the live index → 'Catalog refreshed N minutes ago' rendered correctly in the static HTML. - Probe logic dry-run against the live index: total=2456, all 6 sources above floor, age 0.1h — issues=NONE. - Triggered skills-index.yml manually; both jobs green, deploy-site.yml dispatch fired.	2026-05-25 23:10:45 -07:00
Teknium	cea87d9139	fix(skills-hub): show every catalog source on /docs/skills (skills.sh, ClawHub, browse.sh, OpenAI, …) (#32336 ) The Skills Hub page was stuck on a stale Feb 25 snapshot, showing only Built-in + Optional + Anthropic + LobeHub. The unified index already has 2078 skills from skills.sh / ClawHub / LobeHub / GitHub taps / Claude Marketplace, and BrowseShSource adds another ~330 — none of it was reaching the page. Changes: - website/scripts/extract-skills.py: read website/static/api/skills-index.json (the unified multi-source catalog, rebuilt twice daily) as the canonical external source. Keep the legacy skills/index-cache/ fallback for offline builds. Add friendly per-source labels (skills.sh, ClawHub, browse.sh, OpenAI, HuggingFace, Anthropic, LobeHub, etc.) and per-entry installCmd. - website/src/pages/skills/index.tsx: add source pills + ordering for the 11 new sources; render installCmd from the index entry. - website/scripts/prebuild.mjs: when no local skills-index.json exists, fetch the live one from hermes-agent.nousresearch.com so local 'npm run build' matches production without burning GitHub API quota. - scripts/build_skills_index.py: crawl BrowseShSource so browse.sh entries land in the unified index. Adjust source_order. - tools/skills_hub.py: GitHubSource.DEFAULT_TAPS — openai/skills moved its skills into skills/.curated/ and skills/.system/, so add both as explicit taps (the listing code skips dotted dirs by design). Drop VoltAgent/awesome-agent-skills (README-only, no SKILL.md files) and MiniMax-AI/cli (singular skill, not a tap directory). Net effect: github source jumps from 83 → 143 skills, with OpenAI properly included. - .github/workflows/deploy-site.yml: build the unified index BEFORE running extract-skills.py — previous order meant extract-skills always fell back to the legacy cache. Drop the 'skip if file exists' guard; the file is gitignored and must be rebuilt every deploy. - .github/workflows/skills-index.yml: drop the broken 'deploy-with-index' job (it cp'd 'landingpage/\*' which no longer exists, failing every cron run since the landingpage move). Replace it with a workflow_dispatch trigger of deploy-site.yml so the index refresh still reaches production on schedule. - website/docs/user-guide/features/skills.md: drop VoltAgent from the default-taps doc list to match the code. Before: 695 skills (Built-in 90, Optional 84, Anthropic 16, LobeHub 505). After: 2168 skills across 9 source pills, including the 1212 skills.sh entries the user expected to see.	2026-05-25 18:34:54 -07:00

1 2 3 4 5 ...

964 Commits