hermes-agent

Author	SHA1	Message	Date
Zhipeng Li	020601d41e	fix(compression): drop conflicting 'resume Active Task' directive in summary prefix SUMMARY_PREFIX previously contained two contradictory directives: 1. "treat it as background reference, NOT as active instructions" "Do NOT answer questions or fulfill requests mentioned in this summary" "Respond ONLY to the latest user message that appears AFTER this summary" 2. "Your current task is identified in the '## Active Task' section of the summary — resume exactly from there." When the latest user message contradicted Active Task (e.g. 'stop the i18n refactor', 'never mind, look at grafana instead'), models tended to follow (2) anyway because 'resume exactly' is a strong, unambiguous directive — leading to repeated re-surfacing of already-cancelled work across turns, even after explicit 'stop'/'don't keep bringing that up' messages from the user. This change: - Removes the conflicting 'resume exactly from Active Task' clause. - Makes the precedence explicit: latest user message is the single source of truth; it WINS on conflict; cancelled Active Task / In Progress / Pending User Asks / Remaining Work must be discarded entirely (no 'wrap up the old task first'). - Names canonical reverse signals (stop, undo, roll back, never mind, just verify, topic change) so the model recognizes them as cancellation triggers, not background context. - Updates the summarizer template instruction so the LLM doesn't mechanically copy a cancelled task into Active Task on the next compaction (it's instructed to copy the reverse signal verbatim). - Preserves: REFERENCE ONLY framing, MEMORY.md/USER.md authority, and the 'don't repeat work already reflected in session state' clause. Adds tests/agent/test_summary_prefix_semantics.py to pin invariants so the conflict can't regress. Tested: - All compaction tests pass: tests/agent/test_context_compressor.py, tests/agent/test_context_compressor_summary_continuity.py, tests/run_agent/test_413_compression.py, tests/run_agent/test_compression_persistence.py, tests/run_agent/test_compression_boundary_hook.py, tests/cli/test_manual_compress.py — 117/117 passing. - Tested on macOS.	2026-05-30 07:29:21 -07:00
teknium1	182739fcda	test(interrupt): assert no leaked tid instead of no-op block Follow-up on the #35309 regression test: the trailing `with _lock: pass` asserted nothing. Replace it with a concrete assertion that _interrupted_threads is empty after the worker exits, directly verifying the leak the fix prevents.	2026-05-30 07:28:11 -07:00
liuhao1024	bede3cf12d	fix(tools): wrap _run_tool cleanup in finally to prevent interrupt state leak When _invoke_tool raises a BaseException (CancelledError, KeyboardInterrupt), the cleanup code at the end of _run_tool was bypassed because it sat outside the except block (which only catches Exception). ThreadPoolExecutor recycles thread IDs, so the leaked tid in _interrupted_threads poisons the next tool scheduled on that thread — it instantly aborts with 'Interrupted'. Move the discard + _set_interrupt(False) into a finally block so cleanup runs regardless of how the worker exits. Fixes #35309	2026-05-30 07:28:11 -07:00
Teknium	2b16b756a7	fix(gateway): recover model on post-interrupt turn; gate fallback status (#35381 ) Empty model could reach the API on a recovery turn after stream_interrupt_abort, failing HTTP 400 "No models provided" with no recovery — the session went silent until the user manually re-sent (#35314). - gateway/run.py: cache last-successfully-resolved model per session (+ a process-wide slot); when a fresh config read returns an empty model on a recovery turn, reuse the last-known-good instead of building model="". - run_agent.py + agent/conversation_loop.py: only emit "trying fallback..." status when a fallback chain actually exists, so the UI stops announcing a fallback that will never run (also #17446). - tests: empty-model recovery + _has_pending_fallback gate.	2026-05-30 07:28:06 -07:00
Teknium	10dec7c6dc	fix(kanban): respect mobile safe areas in task detail drawer (#35378 ) * fix(file-tools): handle UTF-8 BOM in read_file / write_file / patch Some Windows editors prepend an invisible UTF-8 BOM (U+FEFF) to text files. We had no awareness of it, so: read_file surfaced a phantom U+FEFF as the first character; patch matches against the true first line could miss; and a write/patch round-trip silently stripped the marker, changing the file's byte signature. Now: - read_file / read_file_raw strip a single leading BOM so the model never sees it (only on the first chunk — the marker lives at byte 0). - patch_replace strips the BOM before fuzzy-matching (so an exact first-line match works) and its post-write verification compares BOM-stripped content. - write_file restores the BOM when the original file had one and the new content doesn't, mirroring the existing line-ending preservation (detect on disk via a cheap `head -c 3` probe or reuse pre_content, re-prepend across the edit). Guards against double-BOM. Mid-content U+FEFF is left alone (it's data there, not a file marker). Tests: TestBomHandling (real LocalEnvironment) — read-strips, raw-read strips, write preserves, no-BOM-when-original-had-none, no-double-BOM, patch round-trip preserves, patch matches first line through a BOM, plus helper unit tests. 208 file-tool tests green. * fix(kanban): respect mobile safe areas in task detail drawer The task detail drawer is a body-level z-60 fixed overlay using height:100vh starting at the viewport top. On mobile this puts the drawer header behind the dashboard's fixed top bar (min-h-14, z-40) and lets the bottom comment input sit under the browser's collapsing nav bar. - drawer: 100vh -> 100dvh (+ max-height:100dvh), 100vh kept as fallback - head: padding-top honors env(safe-area-inset-top); mobile (<1024px, matching the lg breakpoint where the fixed bar shows) clears the 3.5rem header - comment-row + body: bottom padding extended with env(safe-area-inset-bottom) so the bottom-most element clears the mobile browser chrome Mirrors the host shell idiom (100dvh + env(safe-area-inset-bottom) in web/), and web/index.html already sets viewport-fit=cover so the insets resolve. max()/calc() fallbacks leave desktop unchanged. Closes #35324	2026-05-30 07:13:26 -07:00
Teknium	ea6eaabd8f	perf(read_file): compact line-number gutter — ~14% fewer tokens per read (#35368 ) read_file's gutter used a fixed-width zero/space-padded prefix (" 1\|content"). The padding is pure token overhead: measured with cl100k on real Hermes source, the padded gutter costs ~48% more tokens than bare content and ~16% more than a compact "<n>\|content" gutter, because the leading spaces tokenize into extra tokens on every line. Switched the default to the compact "<n>\|content" form. An A/B (Sonnet 4.6 via OpenRouter, 2 passes, 4-task battery, every claim verified against ground truth) showed: - padded : 4/4 PASS both passes - compact : 4/4 PASS both passes ← keeps line-referencing + patch - none : 3/4 PASS both passes ← dropping numbers entirely made the model hand-count lines and answer off-by-one (33 vs 34) So we keep the line numbers (the model genuinely uses them to reference lines) but drop the wasteful padding — capturing ~14% of the read-token cost with zero measured accuracy change. Dropping numbers entirely (the larger 33% saving) is rejected: it regresses line-referencing. patch/fuzzy_match never consumed the gutter (they match old_string text and compute char offsets internally), so editing is unaffected. No downstream parser keys on the fixed-width columns. HERMES_READ_GUTTER= padded restores the legacy format for anyone relying on alignment. Tests: updated the 3 format assertions to the compact gutter; added an env-override test for the legacy padded format. 209 file-tool tests green.	2026-05-30 07:01:22 -07:00
Teknium	5f84c9144a	fix(file-tools): handle UTF-8 BOM in read_file / write_file / patch (#35278 ) Some Windows editors prepend an invisible UTF-8 BOM (U+FEFF) to text files. We had no awareness of it, so: read_file surfaced a phantom U+FEFF as the first character; patch matches against the true first line could miss; and a write/patch round-trip silently stripped the marker, changing the file's byte signature. Now: - read_file / read_file_raw strip a single leading BOM so the model never sees it (only on the first chunk — the marker lives at byte 0). - patch_replace strips the BOM before fuzzy-matching (so an exact first-line match works) and its post-write verification compares BOM-stripped content. - write_file restores the BOM when the original file had one and the new content doesn't, mirroring the existing line-ending preservation (detect on disk via a cheap `head -c 3` probe or reuse pre_content, re-prepend across the edit). Guards against double-BOM. Mid-content U+FEFF is left alone (it's data there, not a file marker). Tests: TestBomHandling (real LocalEnvironment) — read-strips, raw-read strips, write preserves, no-BOM-when-original-had-none, no-double-BOM, patch round-trip preserves, patch matches first line through a BOM, plus helper unit tests. 208 file-tool tests green.	2026-05-30 06:25:50 -07:00
sprmn24	5a1aa9e68c	fix(nous_account): add threading lock to prevent TOCTOU race on cache Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-30 06:25:43 -07:00
teknium1	44f3e51865	fix(gateway): run adapter config hooks for nested-only platform blocks The plugin apply_yaml_config_fn dispatch loop only ran when a top-level platform block (e.g. `discord:`) existed. Configs that defined a platform only under `platforms.<name>` or `gateway.platforms.<name>` skipped the hook, so `platforms.discord.extra.allow_from` never reached DISCORD_ALLOWED_USERS. Fall back to those nested blocks when the top-level one is absent. Also map byquenox@gmail.com -> Que0x for the salvaged commits.	2026-05-30 05:23:55 -07:00
quen0xi	6d2727ef1c	fix(discord): bridge explicit allow_from configuration to env var mapping	2026-05-30 05:23:55 -07:00
quen0xi	0bfe19ba17	fix(gateway): merge nested gateway.platforms configuration block	2026-05-30 05:23:55 -07:00
Teknium	61268ff7a9	feat(cli): add hermes prompt-size diagnostic (#35276 ) Adds a 'hermes prompt-size' command that reports the fixed prompt budget for a fresh session: system prompt total, skills index, memory, user profile, prompt tiers, and tool-schema JSON bytes. Runs offline (dummy credentials force the direct-construction path, no network call). Lets users see which block dominates their per-call payload — the skills index is often the largest single block when many skills are installed (issue #34667). Zero model-tool footprint: it's a top-level CLI subcommand, not an agent tool. --platform <name> simulates a channel's platform hint; --json emits a machine-readable breakdown. Closes #34667	2026-05-30 02:53:42 -07:00
kshitijk4poor	cbf851ae1d	perf(tui): stop slow/dead MCP servers from freezing TUI startup The 'summoning hermes…' phase blocked on gateway.ready, which ran MCP tool discovery inline. Any configured-but-unreachable MCP server burned its full connect-retry backoff (1+2+4s ≈ 7s) before the composer appeared — startup went from instant to ~7.5s of dead air for anyone with a down stdio/http server in mcp_servers. Move discovery into a background daemon thread so gateway.ready fires immediately; tools register into the shared registry as servers connect, and the agent isn't built until the first prompt. Measured spawn→ready: ~7500ms → ~115ms (dead twozero_td server in config). Also drop rich.console + prompt_toolkit off banner.py's import path (lazy-imported inside cprint/build_welcome_banner). tui_gateway.server imports banner only to reach the lightweight prefetch_update_check helper; the eager rich/pt imports added ~45ms before gateway.ready for no benefit. tui_gateway.server import: ~115ms → ~69ms.	2026-05-30 02:53:37 -07:00
teknium1	bfc4a26032	fix(tools): point email home-channel error at EMAIL_HOME_ADDRESS The no-home-channel error for send_message derived the env var name generically as <PLATFORM>_HOME_CHANNEL, producing EMAIL_HOME_CHANNEL for the email platform. But gateway/config.py reads EMAIL_HOME_ADDRESS, so a user following the error's guidance would set a variable that is never consulted. Add a per-platform override map so the email hint names the variable actually read; all other platforms keep the generic hint.	2026-05-30 02:39:08 -07:00
liuhao1024	d3724c0be6	fix(tools): recognize email addresses as explicit targets in send_message When using send_message with the email platform, valid email addresses like user@example.com were not recognized as explicit targets by _parse_target_ref(). This caused the function to return (None, None, False), forcing the system into channel-name resolution which has no way to resolve a raw email address, resulting in 'No home channel set for email' errors. Add _EMAIL_TARGET_RE pattern and email platform handler in _parse_target_ref() so email addresses are treated as explicit targets and routed directly without requiring a home target configuration.	2026-05-30 02:39:08 -07:00
teknium1	622e534379	test(auxiliary): e2e routing assertions for custom-provider aux resolution Adds two real-client tests on top of the salvaged #34783 fix: - config-less custom:<name> endpoint routes via the carried live base_url (guards the #34777 symptom directly, not just the wiring) - named custom:<name> WITH a config entry still resolves via the named-custom branch (regression guard against collapsing to bare custom)	2026-05-30 02:38:59 -07:00
liuhao1024	40fcb96585	fix(auxiliary): pass base_url/api_key/api_mode through set_runtime_main for custom providers When a user configures a custom: provider (e.g. custom:openclaw-router), set_runtime_main() only stored provider and model in process-local globals. _resolve_auto() then had no base_url or api_key for the custom endpoint, causing Step 1 to fail and auxiliary tasks (approval, compression, title generation) to fall through to the aggregator chain and route to wrong providers. Fix: extend set_runtime_main() to accept base_url, api_key, and api_mode keyword arguments; store them in new globals alongside the existing provider and model; fall back to these globals in _resolve_auto() when the main_runtime dict is empty. The call site in conversation_loop.py now passes all five fields from the agent object. Fixes #34777	2026-05-30 02:38:59 -07:00
Teknium	2475244ca0	fix(update/windows): robustly exclude launcher-shim ancestors from concurrent check (#35257 ) hermes update on Windows still aborted with 'Another hermes.exe is running', listing its own launcher shim(s) as concurrent instances (issues #29341, #34795). The distlib Scripts\hermes.exe launcher spawns python.exe and waits; detection runs in the python child, so the launcher shim shows up in process_iter. The prior fix walked the ancestor chain with per-hop current.parent() inside 'except: break' — the first psutil AccessDenied/NoSuchProcess (common on Windows across session/elevation boundaries) bailed the walk early, leaving the launcher in the candidate set and re-triggering the false positive. - Switch to proc.parents() (whole ancestor list in one call), evaluate each ancestor independently so one unreadable hop never strands the launcher. - Only exclude ancestors whose exe is itself a shim, so a genuine second hermes.exe under a non-Hermes parent (Desktop backend child) is still flagged. - Message now prints a copy-pasteable 'taskkill /PID … /F' for the exact stale PIDs so a user who already closed everything can self-remediate. Conservative shim-only ancestor approach credited to the parallel attempts in PRs #29358 (xxxigm) and #31808 (jquesnelle).	2026-05-30 02:38:40 -07:00
Donovan Yohan	8bd00607dc	fix(google-workspace): handle Gmail header casing case-insensitively Normalize Gmail API message header names to lowercase before lookup so gmail get/search/reply populate to/subject/from regardless of the casing the message was stored with. Emit conventional MIME header casing (To/Subject/Cc/From) on send and reply. Fixes #34806 Co-authored-by: Donovan Yohan <donovan-yohan@users.noreply.github.com>	2026-05-30 02:38:18 -07:00
beardthelion	6baf0016be	fix(run_agent): gate concurrent checkpoint preflight on block_result (fixes #34827 ) In the concurrent tool-execution path, checkpoint preflight (write_file, patch, destructive terminal) fired BEFORE plugin guardrail block_result was computed. A blocked write_file could still dirty checkpoint state (doc_modified_this_turn, _last_write_file_call_id, turn_counter). Move checkpoint preflight to AFTER block_result computation, gated on `if block_result is None:` — matching the invariant the sequential path already enforces.	2026-05-30 02:38:12 -07:00
teknium1	e1945ff697	test(state): cover update_session_model overwrite + getattr-guard text path Follow-up to LengR's #35181 salvage: - gateway text-path uses getattr(self, '_session_db', None) to match the picker callback path (defensive for object.__new__() gateway test pattern). - add SessionDB.update_session_model test asserting it overwrites the COALESCE-pinned model and survives subsequent token updates (#34850).	2026-05-30 02:35:36 -07:00
lengr	794519c6ad	fix(state): persist mid-session model switch to database When a user switches models mid-session via /model, the gateway updates the in-memory agent and session overrides, but the database was never updated. The COALESCE(model, ?) in update_token_counts() only fills NULL values, so the dashboard always showed the original model. Fix: Add SessionDB.update_session_model() that unconditionally sets the model column, and call it from both the interactive picker and direct /model command paths in the gateway. Fixes #34850	2026-05-30 02:35:36 -07:00
teknium1	c9e31a8e4b	chore(release): map tuancookiez-hub for #34865 salvage	2026-05-30 02:08:36 -07:00
Tuna Dev	296fcdfa52	fix(lsp): handle Windows .cmd shims in LSP process spawn asyncio.create_subprocess_exec cannot run .cmd/.bat files on Windows because CreateProcess expects a valid PE executable. npm-installed LSP servers (intelephense, typescript-language-server, etc.) ship as .cmd shims on Windows, causing WinError 193 on spawn. Detect .cmd/.bat extensions and wrap with cmd.exe /c before spawning. Gated behind sys.platform == 'win32' — no code path changes elsewhere. Fixes #34864	2026-05-30 02:08:36 -07:00
Sylw3ster	460771bf0f	fix(lsp): detect Windows wrapper binaries in installer probes	2026-05-30 02:08:36 -07:00
teknium1	41decf2c4a	test(mcp): import os and pytest in test_mcp_stability The salvaged grandchild-reaping tests reference os.getpgid/os.killpg and pytest.mark/skip/importorskip directly, but the file only imported asyncio, signal, and unittest.mock. Add the missing imports so collection succeeds on current main.	2026-05-30 02:08:29 -07:00
konsisumer	a29d64e50c	fix(mcp): reap stdio MCP grandchildren via process-group signal The orphan reaper for stdio MCP subprocesses only tracked the direct child PID spawned by ``stdio_client`` (e.g. ``openclaw mcp serve``). When that wrapper itself spawned a helper (``claude mcp serve``) and then exited, the helper reparented to ``systemd --user`` and survived shutdown. The MCP SDK already spawns stdio children with ``start_new_session=True``, so the wrapper is its own pgroup leader and same-pgroup descendants are reachable via ``killpg``. Capture the pgid at spawn time and reap via ``killpg(pgid, sig)`` so reparented grandchildren are reaped alongside the direct child, even after the wrapper itself exits. Falls back to per-pid ``os.kill`` on Windows or when no pgid was recorded. Fixes part 2 (orphan ``claude mcp serve``) of #23799. Part 1 (per-invocation respawn) was confirmed by the reporter to be an environmental artifact, not a code bug.	2026-05-30 02:08:29 -07:00
teknium1	4d7ea3fd36	chore(release): map inchargeautomation-lab author email	2026-05-30 02:08:11 -07:00
teknium1	2334228eca	fix(update): handle pipx installs + --system fallback in _cmd_update_pip Extends the uv-tool detection (briandevans, #29703) to cover the remaining no-venv install layouts that hit the same uv 'No virtual environment found' error: - pipx-managed installs (sys.prefix under .../pipx/...) -> 'pipx upgrade', matching scripts/auto-update.sh (pipx-detection idea from inchargeautomation-lab, #29852) - bare pip outside any venv -> 'uv pip install --system --upgrade' - venv (launcher shim) keeps the VIRTUAL_ENV overlay from #35224 and never gets --system, so the install always targets the venv, not system Python The four branches are mutually exclusive; VIRTUAL_ENV is exported only for the uv-pip-in-venv path (uv tool / pipx upgrade ignore it). Co-authored-by: Joshua Kimbrell <incharge.automation@gmail.com>	2026-05-30 02:08:11 -07:00
briandevans	bebd4f8516	fix(cli): restrict uv-tool-install detection to running interpreter Copilot review on PR #29703 flagged two issues with the `uv tool list` fallback in `is_uv_tool_install`: 1. False positive: `uv tool list` returns the machine's installed tools, not the active install. A regular pip/venv Hermes on a host that also has `uv tool install hermes-agent` available would be misclassified as a uv-tool install, and `hermes update` would upgrade the wrong copy. 2. Overhead: the subprocess call (up to a 15s timeout) was triggered even from `recommended_update_command_for_method`, which just computes a display string. Restrict detection to properties of the running interpreter (`sys.prefix` and `sys.executable` — both can carry the uv-tool layout marker depending on entry point). Drop the `uv tool list` fallback and the `uv_path` parameter entirely. `_cmd_update_pip` now also surfaces a clear hint when the runtime looks like a uv-tool install but `uv` is missing from PATH, instead of silently falling back to `python -m pip`.	2026-05-30 02:08:11 -07:00
briandevans	1bdb29d938	fix(cli): use `uv tool upgrade` when Hermes is a uv tool install (#29700 ) Hermes installed via `uv tool install hermes-agent` lives outside any venv. `_cmd_update_pip` previously ran `uv pip install --upgrade`, which errors with `No virtual environment found; run uv venv ...`. The user hits this on the very first `hermes update` after a standard non-`--system` install with `uv` on PATH. Add `is_uv_tool_install()` in `hermes_cli/config.py`: fast path inspects `sys.prefix` for the standard `uv/tools/hermes-agent/` layout, falls back to `uv tool list` for non-standard prefixes. Both the user-facing `recommended_update_command_for_method("pip")` string and the actual subprocess invocation in `_cmd_update_pip` now switch to `uv tool upgrade hermes-agent` when detected. Non-tool installs and the no-`uv` fallback keep their existing commands unchanged.	2026-05-30 02:08:11 -07:00
Teknium	39f6b6e9d2	fix(file-tools): make write_file/patch atomic (temp-file + rename) (#35252 ) * Inspired by Claude Code: /compress here [N] — boundary-aware 'summarize up to here' Adds a user-chosen compression boundary to the existing /compress command. /compress here [N] summarizes everything except the most recent N exchanges (default 2), which are preserved verbatim — letting the user pick the compression boundary instead of relying on the automatic token-budget heuristic. Inspired by Claude Code's Rewind 'Summarize up to here' action (v2.1.139, Week 20, May 2026): https://code.claude.com/docs/en/whats-new/2026-w20 - hermes_cli/partial_compress.py: pure split/parse helpers + seam-alternation guard (shared by CLI and gateway). - cli.py / gateway/run.py: route 'here [N]' / '--keep N' to partial compression; compress only the head, re-append the verbatim tail through the seam guard. - Preserves message-flow role alternation (seam guard merges any illegal user->user / assistant->assistant adjacency). - Reuses the existing _compress_context session-rotation/lock machinery — no changes to the compression core. - Bare /compress (full) and /compress <focus> behavior unchanged. Tests: 12 helper unit tests + 5 CLI integration tests + E2E (interleaved tool-call transcript, degenerate/multimodal seams, real handler path). * fix(file-tools): make write_file/patch atomic (temp-file + rename) write_file streamed content straight into the target via `cat > path`, so a crash, SIGKILL, or truncated pipe mid-write left the file half-written and corrupt. patch_replace routes through write_file, so it shared the flaw. Now writes stream into a temp file in the SAME directory and `mv` it over the target — a real same-filesystem rename, which is atomic on POSIX and on every terminal backend (local/docker/ssh/modal). A failed write leaves the original byte-intact and leaks no temp file. The existing file's mode is preserved across the swap (stat + chmod, GNU/BSD), and content still rides stdin so there's no ARG_MAX limit. A trap cleans the temp on any error path. Tests: added TestAtomicWrite (real LocalEnvironment, no mocks) covering inode-change-on-overwrite, mode preservation, failed-write-leaves-original, no-temp-leak, special chars, and patch routing. Updated two mocks in test_file_operations.py that keyed on the literal `cat >` write command to key on the stdin_data behavioral signal instead. 200 file-tool tests green.	2026-05-30 02:07:50 -07:00
teknium1	6a08fd3c3f	test(skills): assert restore via synced[copied], not manifest re-read The hermetic CI env (slice 4/6) redirects HERMES_HOME, so a post-restore _read_manifest() can resolve to an empty/redirected manifest path and return {}. Assert on sync_skills's in-memory return value (synced["copied"]) instead, which is the resilient signal that the skill was re-copied and is no longer in limbo.	2026-05-30 02:05:10 -07:00
teknium1	8ae0802d59	fix(skills): make _rmtree_writable handle read-only directories, not just files The cherry-picked fix's onerror handler chmod'd only the failing path, but unlinking a child requires write permission on its PARENT directory. On a true Nix-store copy (r-xr-xr-x dirs + files) rmtree still failed. Now chmod the parent dir as well before retrying. Also rewrites the regression test: the original asserted the helper FAILS on a read-only dir (documenting the limitation), which is the wrong success criterion. Split into two tests — restore succeeds on a full read-only tree (real Nix case), and manifest is preserved when removal genuinely cannot proceed (monkeypatched).	2026-05-30 02:05:10 -07:00
annguyenNous	83a7d0b601	fix(skills): fix transaction ordering in reset_bundled_skill and handle read-only files in rmtree Two related bugs in tools/skills_sync.py affecting Nix-store and immutable-package installs: #34972 — reset_bundled_skill corrupts manifest on rmtree failure: The function deleted the manifest entry BEFORE attempting rmtree. If rmtree failed (read-only files from Nix store), the function returned early — leaving the skill in a manifest-less limbo state where future syncs silently skip it forever. Fix: reorder steps — attempt rmtree FIRST, only delete manifest entry after rmtree succeeds. If rmtree fails, nothing is changed. #34860 — stale .bak directories after sync: sync_skills() called shutil.rmtree(backup, ignore_errors=True) which silently failed on read-only files, leaving persistent .bak dirs. Fix: add _rmtree_writable() helper that makes files writable via an onerror callback before retrying removal. Used in both sync_skills() backup cleanup and reset_bundled_skill(). Fixes #34972 Fixes #34860	2026-05-30 02:05:10 -07:00
liuhao1024	a57cc00081	fix(packaging): include mcp_serve in py-modules so hermes mcp serve works on pip installs mcp_serve.py was missing from the setuptools py-modules list, causing hermes mcp serve to crash with ModuleNotFoundError on standard pip installs. Fixes #34871	2026-05-30 01:45:30 -07:00
Teknium	93e6a05efc	feat(model-picker): group multi-endpoint providers under one row (#35227 ) * Inspired by Claude Code: /compress here [N] — boundary-aware 'summarize up to here' Adds a user-chosen compression boundary to the existing /compress command. /compress here [N] summarizes everything except the most recent N exchanges (default 2), which are preserved verbatim — letting the user pick the compression boundary instead of relying on the automatic token-budget heuristic. Inspired by Claude Code's Rewind 'Summarize up to here' action (v2.1.139, Week 20, May 2026): https://code.claude.com/docs/en/whats-new/2026-w20 - hermes_cli/partial_compress.py: pure split/parse helpers + seam-alternation guard (shared by CLI and gateway). - cli.py / gateway/run.py: route 'here [N]' / '--keep N' to partial compression; compress only the head, re-append the verbatim tail through the seam guard. - Preserves message-flow role alternation (seam guard merges any illegal user->user / assistant->assistant adjacency). - Reuses the existing _compress_context session-rotation/lock machinery — no changes to the compression core. - Bare /compress (full) and /compress <focus> behavior unchanged. Tests: 12 helper unit tests + 5 CLI integration tests + E2E (interleaved tool-call transcript, degenerate/multimodal seams, real handler path). * feat(model-picker): group multi-endpoint providers under one row The interactive provider pickers (hermes model, setup wizard, Telegram /model) listed every provider slug flat, so vendors with several endpoints (Kimi/Moonshot, MiniMax, xAI Grok, Google Gemini, OpenAI, OpenCode, GitHub Copilot) each occupied multiple top-level rows. Now related slugs fold into one top-level row that drills down to the specific endpoint. - models.py: add PROVIDER_GROUPS table + group_providers() fold (display only — CANONICAL_PROVIDERS, slugs, --provider, /model <provider:model> all unchanged and individually addressable). - hermes model (main.py): group rows drill into a member sub-picker, then dispatch to the existing _model_flow_* unchanged. setup wizard inherits it. - Telegram /model: new mpg:<group> callback expands to member mp:<slug> buttons; single authenticated member degrades to a direct button. - Grouping is the single shared fold across all three surfaces. Validation: 163 targeted tests pass; E2E confirms group->member->model resolves to the correct concrete slug for all families.	2026-05-30 01:41:33 -07:00
LeonSGP43	14517ac1f5	fix(update): export launcher virtualenv to uv	2026-05-30 01:41:29 -07:00
teknium1	8e5a6854c3	fix(kanban): align recompute_ready guard with breaker's configured failure_limit Follow-up to the budget-exhaustion recovery fix. recompute_ready's new circuit-breaker guard resolved its effective limit from per-task max_retries -> DEFAULT_FAILURE_LIMIT, skipping the dispatcher's configured kanban.failure_limit. _record_task_failure resolves max_retries -> failure_limit(config) -> DEFAULT, so the two disagreed whenever an operator set kanban.failure_limit != 2: - config > 2: a task could get stuck at DEFAULT(2) before reaching its allowed retry count. - config < 2: a task the breaker already blocked could be auto-recovered back to ready, defeating the stricter limit. Thread the dispatcher's failure_limit through dispatch_once into recompute_ready so the guard and the breaker share one resolution order. Updated test_circuit_breaker_block_still_auto_promotes (it asserted a failures=5 block auto-recovers and resets the counter — that's the pre-#35072 behavior the loop fix removes); it now exercises a below-limit transient block, with the at-limit case covered in test_kanban_db.py. Added two tests for the config-tier and per-task override resolution.	2026-05-30 01:40:57 -07:00
liuhao1024	6ab71d3bb4	fix(kanban): prevent infinite retry loop when worker exhausts iteration budget recompute_ready() previously reset consecutive_failures to 0 when auto-recovering a blocked task. This defeated the circuit-breaker: a task that repeatedly exhausted its iteration budget would cycle forever (block → auto-recover with counter=0 → respawn → budget exhausted → block → …) with no signal to the operator. Fix: don't auto-recover tasks whose consecutive_failures has reached the effective failure limit (per-task max_retries or DEFAULT_FAILURE_LIMIT). The counter is also preserved across recovery so the breaker can accumulate across cycles. Fixes #35072	2026-05-30 01:40:57 -07:00
teknium1	c70dca3a88	fix(kanban): rebuild legacy TEXT-PK tables to INTEGER AUTOINCREMENT on open Legacy kanban boards (pre-AUTOINCREMENT schema) crashed the gateway notifier on every tick — int(None) on a NULL id in unseen_events_for_sub — silently losing all kanban notifications. CREATE TABLE IF NOT EXISTS skips existing tables regardless of schema and _add_column_if_missing only adds columns, so neither could fix a drifted primary-key type. _rebuild_drifted_tables() detects the legacy shape via PRAGMA table_info and rebuilds task_events/task_comments/task_runs (TEXT PK -> INTEGER AUTOINCREMENT) and kanban_notify_subs.last_event_id (TEXT/NULL -> INTEGER NOT NULL DEFAULT 0), preserving data. The whole pass is one transaction so an interruption can't leave a table half-renamed, and recreates every index DROP TABLE would otherwise take down (including idx_events_run). Co-authored-by: liuhao1024 <liuhao1024@users.noreply.github.com>	2026-05-30 01:40:49 -07:00
teknium1	16882cfded	refactor(tui): simplify base64 clipboard write to a stdin flag The per-entry psScript callback was identical for every PowerShell entry, so the function-valued union member added structure without behavior. Collapse WriteCmd to a plain stdin boolean and apply the one shared base64 script in the write loop. Document the CP936 root cause inline. Co-authored-by: BROCCOLO1D <279959838+BROCCOLO1D@users.noreply.github.com>	2026-05-30 01:40:44 -07:00
annguyenNous	64998fa93e	fix(tui): use base64 encoding for PowerShell clipboard writes to preserve UTF-8 When writing text to the clipboard via PowerShell (WSL2 and native Windows), the previous implementation piped text through stdin using `Set-Clipboard -Value $input`. PowerShell reads stdin using the Windows system's default ANSI code page (e.g. CP936 for Chinese Windows), causing all non-ASCII characters (CJK, emoji, accented) to become garbled. Fix: encode the text as base64 in Node.js and pass it as a command argument. PowerShell decodes it from base64 using explicit UTF-8, bypassing the code page issue entirely. Fixes #35107	2026-05-30 01:40:44 -07:00
Teknium	b4cf114f68	fix(vision): fail fast on non-retryable image download errors (#35221 ) _download_image() wrapped every download attempt in a blanket `except Exception` and retried 3x with 2s/4s/8s backoff regardless of cause. A 404/403 image URL would never resolve on retry, so it just burned up to 6s of wall-clock + extra GETs before failing — inflating latency for a deterministic failure (issue #32296, umbrella #35114). Add _is_retryable_download_error(): 4xx client errors (except 429), website-policy PermissionError, and too-large/SSRF ValueError now raise on the first attempt. 429, 5xx, and unclassified network errors stay retryable. Removed the now-unreachable fall-through branch since the loop always returns on success or re-raises on the final/terminal attempt.	2026-05-30 01:40:39 -07:00
kshitij	e481b15333	Merge pull request #35216 from kshitijk4poor/fix/agents-nudge-single-delegate fix: surface /agents nudge for single-delegate fan-out (TUI + CLI)	2026-05-30 00:57:15 -07:00
kshitijk4poor	9d2571c86a	fix: surface /agents nudge while delegate_task is in-flight (TUI + CLI) The subagent spawn-observability overlay added a `(/agents)` hint, but only on the standalone "Spawn tree" panel, gated behind `!inlineDelegateKey` — it never showed for a single delegate_task call, and only appeared once subagents had already registered. A nudge that arrives at the end (or only after spawn) is useless for the actual goal: letting users open the live monitor while delegation is running. Surface it the moment delegation starts, on both surfaces: TUI (ui-tui/src/components/thinking.tsx) - Show `(/agents)` on any "Delegate Task" tool group as soon as it appears (in-flight, before any subagent registers), not gated on subagents already existing. Same `startsWith('Delegate Task')` predicate already used for delegateGroups. CLI (agent/tool_executor.py) - Append `· /agents to monitor` to the delegate spinner label, which is displayed for the full duration of the delegate_task call. The previous attempt put the hint on the completion line (get_cute_tool_message), which only renders after the call finishes — reverted. TUI tsc clean (pre-existing execFileNoThrow type errors unrelated); subagentTree 35/35; display.py reverted to upstream.	2026-05-30 13:22:45 +05:30
Teknium	bb79bcde61	fix: detect pyproject.toml / __init__.py version drift in hermes doctor (#35142 ) A git conflict resolution (reset --hard or merge) can revert hermes_cli/__init__.py to a stale __version__ while pyproject.toml stays current, so 'hermes --version' silently reports the wrong version. Nothing cross-checked the two files. Add a version-consistency check to the doctor 'Python Environment' section: reads the [project] version from pyproject.toml and compares it to hermes_cli.__version__. Reports OK when they match, fails with a re-sync hint when they drift, and is a silent no-op for installed wheels where pyproject.toml isn't present. Closes #35070	2026-05-30 00:32:05 -07:00
teknium1	e5765e61fa	chore(release): map wei.chen.coder@gmail.com -> wenchengxucool	2026-05-30 00:30:55 -07:00
weichengxu	84ee80eb5d	feat: set process title to 'hermes' in ps/top/htop Adds _set_process_title() in hermes_cli/main.py, called first thing in main(). Tries setproctitle (optional) for a full ps-args rewrite, then falls back to ctypes prctl(PR_SET_NAME) on Linux / pthread_setname_np on macOS. No-op on Windows and on any failure. No new dependency: the setproctitle path is best-effort via ImportError guard. Fixes #35108	2026-05-30 00:30:55 -07:00
teknium1	17103a1f11	chore: add SeaXen to AUTHOR_MAP for salvaged PR #33278	2026-05-30 00:23:44 -07:00

1 2 3 4 5 ...

10069 Commits