f736d2be86b8a76d2ced3d41ced8b172c24a1eeb
980 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
| e7bc6189cf |
feat(cli): resume relaunches in the directory the session was started from (#38562)
hermes -c / --resume now reopen a session in its original working directory. The sessions table already had a cwd column; the classic CLI just never wrote or read it. - run_agent._ensure_db_session stamps cwd for local CLI sessions only (new _launch_cwd_for_session gates out gateway/cron and non-local terminal backends, where a host cwd is meaningless to restore). - cli._restore_session_cwd chdir's the process AND retargets TERMINAL_CWD so the terminal tool, code-exec tool, and relative-path resolution all land in the restored dir. Called from both resume paths (interactive run() and the -q single-query path). - Robust degradation: no-op when no cwd recorded, when already there, or when the dir is gone (single dim warning, stays put — no crash). |
|||
| e8c3ac2f5c |
fix: strip extra_content from tool_calls for strict APIs (Fireworks, Mistral)
Fireworks/Mistral reject HTTP 400 'Extra inputs are not permitted, field: messages[N].tool_calls[M].extra_content' on any session whose history contains prior Gemini tool calls. Gemini 3 thinking models attach extra_content (thought_signature) to tool_calls; it survived to the wire because the sanitize paths only stripped call_id/response_item_id. Strip extra_content from the outgoing wire copy in both sanitize paths (ChatCompletionsTransport.convert_messages + _sanitize_tool_calls_for_strict_api), but gate it on the target model: keep extra_content for Gemini-family targets (the thought_signature MUST be replayed or Gemini 400s), strip it for everyone else — including non-Gemini models that inherit a stale Gemini signature earlier in a mixed-provider session. Native Gemini is unaffected (GeminiNativeClient bypasses these paths). Original stored history is never mutated (only the per-call copy). Fixes #17986. |
|||
| 0d9b7132ff |
feat(observability): observer-grade telemetry hooks + NeMo-Relay plugin
Adds backend-neutral observer hooks for plugins: session, turn, API request, tool, approval, and subagent lifecycle events with stable correlation IDs (session_id, task_id, turn_id, api_request_id, tool_call_id, parent/child subagent ids). Extends VALID_HOOKS with api_request_error and subagent_start. Hot path is zero-cost when no plugin subscribes: has_hook()/presence checks gate all payload construction, request payloads are returned by reference when no middleware rewrites, and the sanitized response payload no longer embeds raw response objects. Bundles the optional NeMo-Relay observability plugin (plugins/observability/nemo_relay) as an in-repo consumer of the new hooks, peer to the existing langfuse plugin. Fails open when the optional nemo-relay package is not installed. Authored-by: Bryan Bednarski <bbednarski@nvidia.com> Salvaged from #29722 onto current main. |
|||
| 023149f665 |
fix(agent): stop reporting broken streams as output-length truncation (#36705)
A stream that drops mid-response after tokens are delivered (peer-closed connection, stale-stream reconnect) is converted into a synthetic finish_reason="length" stub. The conversation loop treated that network stall as a max-output-tokens truncation: when the dropped content was a tool call it retried exactly once, then hard-failed with "Response truncated due to output length limit" — even on large-output models that never hit any cap (e.g. Opus). - Tool-call truncation now retries up to 3 times (was 1) with a progressive max_tokens boost, and is stub-aware: a PARTIAL_STREAM_STUB_ID stall prints "Stream interrupted mid tool-call — retrying (n/3)" instead of the false "model hit max output tokens", and the give-up message distinguishes a network drop from a real truncation. - Length-continuation retries preserve the original request's output cap as a floor, so a high provider/model default isn't silently downshifted to 8K/12K on retry. - Added _requested_output_cap_from_api_kwargs() helper. Tests: stub-stall mid-tool-call recovery within 3 retries; continuation preserves a large provider-default output cap. Fixes #26425. Salvages the substance of #26427 (cap floor) and #9525 (retry bump), adapted to the post-refactor conversation_loop.py which handles all three api_modes uniformly. Co-authored-by: LeonSGP43 <cine.dreamer.one@gmail.com> Co-authored-by: ygd58 <ygd58@users.noreply.github.com> |
|||
| 51c68d4ab1 |
Add Hermes desktop app (#20059)
* feat: better composer etc * docs: add desktop and dashboard run instructions * fix(desktop): address security scan findings * fix(dashboard): resolve @nous-research/ui path under npm workspaces The sync-assets prebuild step shelled out to 'cp -r node_modules/@nous-research/ui/dist/fonts ...' with a path relative to apps/dashboard/. That works only when the dep is installed locally in the dashboard workspace, but 'npm install' at the repo root (the documented setup — see apps/desktop/README.md) hoists shared deps to the root node_modules under npm workspaces. The relative cp then fails with 'No such file or directory', sync-assets exits 1, the Vite build aborts, and 'hermes dashboard' surfaces a generic 'Web UI build failed' message. Replace the shell one-liner with scripts/sync-assets.cjs, which walks up from the dashboard directory looking for node_modules/ @nous-research/ui — working in both the hoisted (workspaces) and co-located (standalone) layouts. Also guards against a missing dist/fonts or dist/assets with a clearer error pointing at a rebuild of the UI package rather than silently copying nothing. * feat(desktop): support connecting to a remote Hermes backend Add HERMES_DESKTOP_REMOTE_URL and HERMES_DESKTOP_REMOTE_TOKEN env vars that, when set, short-circuit the local-child spawn in startHermes() and connect the Electron renderer to an already- running 'hermes dashboard' server reachable over the network. Motivating use case: WSL2 users who want to run the Hermes core (agent loop, tools, filesystem access) inside their WSL distribution while rendering the Electron GUI on native Windows. Before this change, the desktop app always spawned a local Python child on the same host as the renderer, which doesn't cross the WSL/Windows boundary. The remote path reuses waitForHermes() as a liveness probe (/api/status is in the backend's public endpoint allowlist), so the connection is only returned once the backend is actually ready. WebSocket URL derivation picks ws:// or wss:// based on the input scheme. URL validation rejects non-http(s) schemes and requires both env vars together to avoid a half-configured connection that would silently fall through to the spawn path. No behaviour change when the env vars are unset — the default local-spawn flow is untouched. Typical usage: # in WSL2 hermes dashboard --tui --no-open --host 0.0.0.0 --port 9119 --insecure # on Windows set HERMES_DESKTOP_REMOTE_URL=http://localhost:9119 set HERMES_DESKTOP_REMOTE_TOKEN=<session token> set HERMES_DESKTOP_IGNORE_EXISTING=1 (launch Hermes desktop) * ci(desktop): automate desktop releases Add GitHub Actions release channels for signed desktop installers and document the stable/nightly download paths. * feat: file tabs * refactor(desktop): tighten right-rail tab close API Promote closeRightRailTab/closeActiveRightRailTab as the single public entry point. Drops the activeTabRef + handleCloseDocument indirection in ChatPreviewRail, the unused $rightRailHasContent atom, and the legacy dismissFilePreviewTarget alias. -70 LOC. * feat(desktop): polish composer pill toward reference look Solid foreground-on-background send/voice-conversation circle (black-on-white in light, white-on-black in dark) anchors the right edge as the primary CTA instead of the orange theme primary. Bumps the primary control to 2.125rem so it visually outranks the ghost mic/plus controls. Opens up the surface padding (0.625rem x / 0.5rem y) so the input row breathes around its controls, and nudges the corner radius from 20 to 24px for a slightly pill-ier silhouette. LiquidGlass distortion is preserved. * feat(desktop): add startup and onboarding flow Add phase-based desktop boot progress, fresh-install sandbox testing, and first-run provider credential onboarding so packaged installs can start cleanly without manual settings detours. * fix(desktop): gate prompts on provider setup Show the desktop provider onboarding flow before prompt submission when no inference provider is configured, preventing fresh installs from falling through to backend credential errors. * fix(desktop): surface provider onboarding from session warnings Propagate credential warnings through session runtime info and open desktop onboarding whenever a session reports no usable provider, so unconfigured installs cannot fall through to prompt errors. * fix(desktop): route gateway provider errors to onboarding The "No inference provider configured" auth error reaches the renderer through gateway error events, not the prompt.submit promise; the previous patch only caught the latter, so the error toast still surfaced and onboarding never opened. Also strip credential-shaped env vars from the test:desktop:fresh sandbox so the packaged backend can't see provider keys leaking from the launching shell. * fix(desktop): use strict runtime check to drive onboarding setup.status returned True whenever any provider auth state was discoverable, including indirect fallbacks like a gh-CLI Copilot token. That made desktop think the user was set up while the agent's actual resolve_runtime_provider call still raised AuthError, leaving the user with a useless toast and no onboarding. Add a setup.runtime_check gateway method that runs the same resolver the agent uses on session creation, and switch the desktop onboarding overlay and prompt precheck to use it. * feat(desktop): OAuth-first onboarding using existing dashboard provider API Replace the engineer-flavored API key form with a Sign-in-first onboarding overlay that uses the dashboard's existing /api/providers/oauth catalog and PKCE/device-code endpoints (Anthropic, Nous, OpenAI Codex, etc.). API key entry is now a fallback tab with friendly provider names instead of env var prefixes, and the loud raw resolver error is gone in favor of a one-line welcome message. * fix(desktop): polish onboarding provider list Reorder OAuth providers so Nous Portal is first, give the segmented Sign in / API key control equal column widths, and replace the engineer-flavored backend names like "Anthropic (Claude API)" / "MiniMax (OAuth)" with friendlier in-app titles. External-CLI providers now show a softer subtitle and an external-link icon instead of a chevron. * refactor(desktop): split onboarding overlay into store + view Move the OAuth state machine, runtime check, copy-to-clipboard, and api-key save into store/onboarding.ts (matching the boot.ts pattern), leaving the overlay as a presentation layer that subscribes via useStore. Tabs are now table-driven, child panels read flow from the store instead of prop-drilling, and the polling/PKCE/error/success branches share a small Status atom. * fix(desktop): external CLI providers + center mode tabs External-CLI providers (Claude Code, Qwen Code) now open an in-overlay panel with the CLI command, copy button, and an "I've signed in" recheck instead of firing an invisible toast. Center the Sign in / API key tab control so it sits under the heading instead of hugging the left edge. * fix(desktop): drop onboarding tabs for an inline link, group device-code waiting state Replace the Sign in / API key tab pair with an "I have an API key" footer link under the OAuth provider list, with a "Back to sign in" affordance inside the API key form. Group the device-code "Waiting for you to authorize..." status next to the Cancel button so the alignment matches the action. * refactor(desktop): tighten onboarding store + overlay Drop the dead isOnboardingBusy/BUSY set, factor the catch-fallback dance into safeReq, and share a single reloadAndConnect helper between PKCE submit, device-code success, external recheck, and api-key save. In the overlay, extract Step / CodeBlock / FlowFooter / CancelBtn / DocsLink atoms so the four sign-in panels share the same chrome instead of repeating it inline. Net effect: fewer literal divs, one place to touch the spacing, and the code-block + footer rows are reusable across future flows. * fix(desktop): mount onboarding from frame 1 to kill the FOUT Default onboarding.configured to null (unknown until the runtime check resolves) and have the onboarding overlay render whenever it's not yet confirmed true. The boot overlay now yields to it, so the very first paint is the Welcome card with a "While we get you set up..." progress strip instead of a flash of the chat shell between boot dismiss and onboarding mount. The picker swaps in cleanly once the gateway opens and the runtime check confirms the user is not configured. Already-configured users see the same prep card briefly while their existing runtime warms up, then the overlay dismisses without touching the chat shell. * fix(desktop): top-align empty sessions placeholder The "Start a chat to build your history." empty state used a min-h-35 grid place-items-center container, which floated the text in a tall dead zone. Render it as a flat paragraph that sits right under the section header like the empty pinned state does. * refactor(desktop): drop dead boot overlay Onboarding overlay subsumes the boot card now that it mounts from frame 1 and renders boot progress inline. The standalone DesktopBootOverlay is unreachable in every flow (yields whenever onboarding has not confirmed configured, dismisses once it has). * fix(desktop): hide pinned/recents sections until first session A fresh sidebar showed the Pinned and Recent chats headers with floating empty-state copy underneath. Drop both sections (and the now-orphan SidebarEmptySessionState) when there are no sessions yet — they reappear after the first chat. Skeletons during initial load are unchanged. * feat(gui): route embedded TUI through dashboard gateway (#21979) Inject HERMES_TUI_GATEWAY_URL into dashboard PTY sessions so embedded ui-tui instances attach to the in-process websocket gateway, with coverage for the new env wiring. * Add desktop remote gateway settings Make the desktop gateway connection configurable from settings so local remains the default while remote backends can be saved, tested, and applied without environment variables. * feat(gui): first-class Messaging page + gateway menu redesign - Add Messaging page to the desktop app with per-platform setup, status, and inline guidance. Catalog derives from gateway.config Platform enum + plugin registry, so every messaging adapter the CLI supports (Telegram, Discord, Slack, Mattermost, Matrix, WhatsApp, Signal, BlueBubbles, Home Assistant, Email, SMS, DingTalk, Feishu, WeCom, Weixin, QQ, Yuanbao, API server, Webhooks, plugins) shows up without per-platform code. - New REST endpoints: GET /api/messaging/platforms, PUT and POST /test on the same path. Secrets go through the existing .env pipeline; enable/disable writes config.yaml. - Replace gateway statusbar dropdown with a richer panel: status row, icon-only restart + system-panel actions, recent activity (with timestamps trimmed in display, full text on hover), platform list. - Auto-poll the messaging page every 6s (paused when hidden) so status updates without a manual check. - Drop Settings / Command Center from the sidebar nav (still reachable via shortcuts and the titlebar cog). - Flatten top corners on Messaging/Skills/Artifacts/Chat panes. - Share new StatusDot component across messaging + gateway menu. - Fix gateway/config.py so an explicit platforms.<name>.enabled=false in config.yaml is honored when env tokens are present. - pb-9 on the chat content area for breathing room above the composer. * Potential fix for pull request finding 'CodeQL / Clear-text logging of sensitive information' Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com> * pin electron version * hide application menu on non-mac systems * interpret compactPreview for non-string vlaues as JSON or an empty string * fix(desktop): keep composer contenteditable mounted across stacked toggle The composer rendered {input} inside two different parent fragments depending on `stacked`. When auto-expand flipped `stacked` (e.g. the moment typed text wrapped past two lines), React reconciled the two branches as different positions and unmounted/remounted the contenteditable. The fresh mount started empty, so any in-flight characters — most reliably reproduced by holding a key — were lost. Replace the conditional with a single CSS Grid whose template-areas swap on `stacked`. The three children (menu, input, controls) keep stable identities across the toggle; only their grid placement changes, which the browser handles without React tearing down the editor. * refactor(desktop): align install layout with install.ps1 / install.sh Make the desktop app's runtime layout match what scripts/install.ps1 and scripts/install.sh produce, so a desktop-only user and a CLI-only user end up with the same files in the same places and can share one install. Layout - ACTIVE_HERMES_ROOT = HERMES_HOME/hermes-agent (was: process.resourcesPath/hermes-agent, read-only) - VENV_ROOT = HERMES_HOME/hermes-agent/venv (was: userData/hermes-runtime) - desktop.log = HERMES_HOME/logs/desktop.log (was: userData/desktop.log) - HERMES_HOME default: %LOCALAPPDATA%\hermes on Windows, ~/.hermes elsewhere The packaged .app/.exe still ships a read-only payload at process.resourcesPath/hermes-agent (FACTORY_HERMES_ROOT). On first launch or after an installer-driven upgrade we sync factory -> active, then provision the venv and run pip install -e . against the active root. Key behaviors - Pin HERMES_HOME in the spawned Python's env so get_hermes_home() resolves to the same path resolveHermesHome() picked. Without this, Python falls back to ~/.hermes on every platform - fine on mac/linux, a split-state bug on Windows where our default is %LOCALAPPDATA%\hermes. - Detect developer installs by .git presence at ACTIVE; never overwrite a user's checkout via factory sync. - Marker at ACTIVE/.hermes-desktop-runtime.json (schema v4) tracks pyproject hash + factory version + runtime schema version. depsFresh fast-paths when nothing changed. - Dev (npm run dev) prefers SOURCE_REPO_ROOT over ACTIVE so devs run their local edits, not whatever's under HERMES_HOME. - Better error messages distinguish "no payload" from "no Python". - Preserve a legacy ~/.hermes on Windows when no %LOCALAPPDATA%\hermes exists, so users with prior pip/manual installs aren't orphaned. pyproject.toml - Promote fastapi, uvicorn[standard], ptyprocess (non-Windows), and pywinpty (Windows) to main dependencies. The dashboard backend (hermes dashboard) needs them at runtime; the previous lazy-import fallback was a footgun for fresh installs. - Empty the [pty] optional-extra; kept as a no-op back-compat alias for any existing pip install hermes-agent[pty] invocations. Drops the hardcoded BUNDLED_RUNTIME_REQUIREMENTS list in main.cjs - the desktop now installs whatever pyproject.toml says, single source of truth. Files - apps/desktop/electron/main.cjs: runtime layout, HERMES_HOME pin, factory->active sync, marker v4 - apps/desktop/scripts/test-desktop.mjs: track new venv location - apps/desktop/README.md: new Setup, Runtime Bootstrap, and Debugging sections - pyproject.toml: fastapi/uvicorn/pty backends in main dependencies; [pty] extra emptied Tested locally on Windows: npm run dev boots cleanly, sessions land at the new location, type-check + lint + test:desktop:platforms all pass. Verified end-to-end on a fresh Win11 VM via dist:win installer. Known gaps (filed as follow-ups, not in this PR): - Skills not seeded on packaged installs (sync_skills only runs in cmd_chat, not cmd_dashboard). Need to move to shared pre-dispatch. - Git Bash not bundled or detected; agent's terminal tool errors out with a useful message but desktop bootstrapper should pre-flight it. - install.ps1 / install.sh should be decomposed into composable phase libraries so the desktop bootstrapper can reuse them as a single source of truth across all install surfaces. * feat(desktop): theme polish, prose chat typography, composer chrome - DS tokens/midground, Backdrop, scoped scrollbars, typography plugin + prose - Composer liquid/radius utilities, thread font parity, tool/thinking cues - File tree label scale, preview flex, thread retry loading + streaming tests * feat(desktop): NSIS prereq detection page + auto-install via winget The packaged Windows installer now detects Python 3.11+ and Git for Windows at install time and offers to install missing prereqs via winget. Mirrors the prereq logic scripts/install.ps1 already runs for CLI installs, so desktop installer users get the same out-of-the-box experience as install.ps1 users. Why - Hermes' terminal tool calls bash.exe directly (tools/environments/ local.py); on Windows that's Git Bash from Git for Windows. Without it, the agent fails on the first terminal() call. - Hermes' Python runtime needs 3.11+. Without it, the desktop bootstrapper errors out at venv creation. - Both gaps surfaced on a fresh Windows 11 VM smoke test: VM had Python pre-installed but no Git, so the agent's first terminal call failed with "Git Bash isn't installed." - install.ps1 has had Install-Git + Install-Uv functions for ages. The desktop installer was the asymmetric outlier. How — NSIS prereq page - New file: apps/desktop/installer/prereq-check.nsh (plugged into electron-builder via build.nsis.include) - Real Wizard page using nsDialogs, inserted via customPageAfterChangeDir hook (between the Directory page and InstFiles). - Group boxes for Python and Git, each showing detection status. - Pre-checked install checkboxes when winget is available. - Auto-skips silently if both prereqs are already installed. - Falls back to manual download URLs when winget itself is missing. - Detection: - Python: probes `py -3.11`/`-3.12`/`-3.13`/`-3.14` via the Python launcher. Microsoft Store "Python stub" (no py.exe) is correctly classified as not-installed. - Git: `where git`. - winget: `where winget` (Win10 1809+ / Win11 with App Installer). - Install execution (in customInstall macro): - Python: nsExec::ExecToLog with `--scope user --silent`. Per-user install, no UAC prompt, output streams to install log. - Git: ExecShellWait via Windows ShellExecute. Critical because Git always installs per-machine and triggers UAC; ShellExecute preserves the foreground focus chain across non-elevated → elevated process spawns, so UAC actually comes to the foreground. nsExec::ExecToLog breaks the chain because winget runs hidden. - Both pass `--disable-interactivity --accept-package-agreements --accept-source-agreements` to suppress winget's own dialogs. - Verification: probes Git's standard install locations via FileExists rather than `where git`. NSIS's process inherits PATH at startup, so a freshly-installed Git won't be visible to `where` until restart. - Silent installs (/S) skip the prompts; managed deploys handle prereqs out-of-band via Group Policy / Intune. How — Electron-side safety net - New findGitBash() in main.cjs, parallel to findSystemPython(). Probes the same locations as tools/environments/local.py:_find_bash() so a positive result here means the agent's terminal tool will work. - ensureRuntime now throws a clear, actionable error on Windows when Git Bash isn't found, matching the existing "Python 3.11+ is required" error path. - Catches users the NSIS page doesn't: .msi installer users (NSIS prereq page doesn't run for MSI), `npm run dev` users, manual installers, anyone who unchecked the install boxes on the NSIS prereq page. - All gated on `IS_WINDOWS`; macOS / Linux unaffected. NSIS build issue (resolved) - electron-builder defaults to `-WX` (warnings as errors). NSIS optimizer emits "warning 6010: function not referenced" for our page functions because Page custom directives don't count as references in its static-analysis pass. The functions ARE called at runtime when NSIS invokes the page; the optimizer just can't see it statically. - Set `build.nsis.warningsAsErrors=false` in package.json so this spurious warning doesn't fail the build. (Documented option from electron-builder's nsisOptions.) Out of scope (filed for future work) - MSI prereq detection: Windows Installer custom actions are a different mechanism. Enterprise deploys typically handle prereqs via GP/Intune. - Bundle PortableGit + python-build-standalone in extraResources for zero-network installs. ~80MB increase. - Mac / Linux GUI prereq flows (different installer formats; Xcode CLT covers most macOS prereqs already; Linux is per-distro hard). Files - apps/desktop/installer/prereq-check.nsh (new, ~290 lines NSIS) - apps/desktop/package.json (build.nsis.include + warningsAsErrors) - apps/desktop/electron/main.cjs (findGitBash + preflight) - apps/desktop/README.md (Runtime prerequisites section) Cross-platform impact - macOS / Linux builds (dist:mac, dist:mac:dmg, dist:mac:zip): nsis config is ignored entirely; .nsh is dormant. - npm run dev: .nsh dormant; main.cjs preflight gated on IS_WINDOWS. - scripts/install.ps1, scripts/install.sh: no reference to any new files; CLI install paths untouched. - Hermes CLI / dashboard / gateway: no reference; runtime untouched. - All checks: node --check on main.cjs and test-desktop.mjs pass; npm run test:desktop:platforms 4/4 passing; node --test green. Tested - npm run dist:win produces signed .exe and .msi without errors. - Fresh Win11 VM (Python pre-installed, no Git): prereq page renders, Python check shows detected, Git checkbox pre-checked. Click Next → Git installs via winget with UAC prompt in foreground. - After install completes, Hermes launches and the agent's terminal tool can run bash commands. Verified Git Bash is detected at `C:\Program Files\Git\bin\bash.exe` by ensureRuntime's preflight. * feat: theme changes, composer tweaks, in app update ux, finesse * fix(cli): seed bundled skills on dashboard + gateway entrypoints `sync_skills(quiet=True)` was only being called from inside `cmd_chat`, which meant `hermes dashboard` (the desktop GUI's backend) and `hermes gateway` (Telegram/Discord/Slack/etc daemons) never seeded the bundled skill library into ~/.hermes/skills/. This surfaced as "No skills found" in the desktop GUI's skills panel on fresh installs, despite the agent having access to the full bundled library when invoked via `hermes chat`. scripts/install.ps1 worked around it by running skills_sync.py as part of Copy-ConfigTemplates, but that's not part of the desktop installer's bootstrap chain. Fix - Extract the skills-sync block from cmd_chat into a module-level `_sync_bundled_skills_quietly()` helper. - Call the helper from cmd_chat (preserving existing behavior), cmd_dashboard (after the --status/--stop early-return paths and fastapi import check, so we don't run skills_sync on management commands or when deps aren't installed), and cmd_gateway. Why these three entrypoints - cmd_chat: the user's primary CLI entrypoint - cmd_dashboard: the desktop GUI's backend; this is what `hermes dashboard --tui` invokes when the desktop bootstrapper spawns Hermes - cmd_gateway: long-running daemons where the user expects the agent to have full skill access Other entrypoints (cmd_config, cmd_doctor, cmd_login, cmd_status, etc.) are management commands that don't need skill discovery and were never running skills_sync in the first place — leaving them alone. Idempotence - tools/skills_sync.py is manifest-based: skipped skills cost milliseconds. Calling it from multiple entrypoints adds no real cost, and users running `hermes chat` then `hermes dashboard` get two fast no-ops on the second call. Failure handling - Helper wraps skills_sync in try/except. Skills are an enhancement, not a hard dependency — Hermes runs fine with an empty skills/ dir. Files - hermes_cli/main.py: + new helper `_sync_bundled_skills_quietly()` at module level + cmd_chat: replace inline block with helper call + cmd_dashboard: add helper call after fastapi import succeeds + cmd_gateway: add helper call before delegating to gateway_command * feat(desktop): hoisted todo widget, JSON tool summaries, history grouping & timer fixes - Hoist todo to first-class widget (shadcn checkboxes, brand colors, no tool-accordion). Header derives label from active task; non-active rows fade. - Replace raw JSON dumps with structured key/value summaries via formatToolResultSummary; nested error extraction for clearer failures. - Fix loaded-session grouping: stitch interleaved assistant/tool iterations into one bubble instead of orphaned synthetic messages. - Stable tool/thinking timers via keyed registry so unmount/scroll doesn't reset elapsed counts; gate "running" on real live thread state. - Reorganize chat-only assistant-ui components under components/chat/. * fix(desktop): address CodeQL alerts on PR #20059 - settings/helpers.ts: harden setNested against prototype pollution. POLLUTING_PATH_PARTS check is now applied at every assignment site (loop + leaf) and uses Object.defineProperty so CodeQL can see the guard inline rather than via a helper function call. - lib/markdown-preprocess.ts: rebuild the dangling-fence close regex from a fence-char + length instead of marker.replace(...). The marker is captured by `(`{3,}|~{3,})` so it can only be backticks or tildes, but CodeQL was tracing tainted input text into the RegExp source and flagging hostname dots from input as part of the pattern (false positive js/incomplete-hostname-regexp on the test fixture URLs). Reconstructing from a literal char breaks the dataflow. - scripts/notarize-artifact.cjs: drop args from the run() rejection message. Args carry --key-id / --issuer / key file path; the existing outer catch already squashes errors to a generic line, but CodeQL was flagging the args.join(' ') as clear-text logging of APPLE_API_KEY_ID. Composer DOM-text-as-HTML alerts (composer/index.tsx:379, :547) are already addressed in 4dd9732a9 — innerHTML assignment was replaced with renderComposerContents which builds DOM via replaceChildren / append text nodes (no HTML interpretation). * fix(desktop): inline prototype-pollution guard so CodeQL sees it CodeQL's dataflow doesn't follow the helper-function guard inside `safeSet`, so it kept flagging Object.defineProperty as prototype- polluting. Inline the literal `__proto__`/`constructor`/`prototype` check at the assignment site to break the dataflow. Behavior unchanged — same set of disallowed keys, same throw. * feat(ui-tui): resolve links to readable page titles Mirror desktop pretty-link behavior in the TUI by resolving HTTP links to page titles with shared caching and safe fetch filters, plus slug-based fallbacks so chat links stay readable even when title fetch fails. * fix(desktop): drop RegExp from dangling-fence close detection Previous attempt tried to break the dataflow by reconstructing the close-fence regex from a literal char + marker.length, but CodeQL still traced marker.length back to input and kept flagging the test-fixture URLs as hostname-regex sources (js/incomplete-hostname-regexp). Replace `new RegExp(...)` + `closeRe.test(body)` with a string-only hasCloseFenceLine() helper that splits on '\n' and uses ===. No regex on this path now, so input data can no longer reach a RegExp source. Behavior preserved: matches lines that are (whitespace + marker + whitespace), which is what the original `\n[ \t]*${marker}[ \t]*(?=\n|$)` matched. All 12 markdown-text tests still pass. * fix(process-registry): suppress windows-footgun false positive on guarded killpg Keep the existing POSIX-only process-group teardown path, but make the signal selection explicit via getattr and add an inline windows-footgun suppression marker on the guarded os.killpg line so the Windows footgun check no longer blocks CI on this intentionally platform-gated code. * feat(desktop): reconcile live tool events, polish thread chrome, harden boot - chat-messages: match tool rows by overlapping query/context/preview values so preview-first `tool.progress` rows reliably adopt later stable-id `tool.start` payloads instead of spawning ghost rows or mis-merging parallel same-name calls; preserve prior args/result across phases. - tui_gateway: emit full args + parsed result on `tool.start` / `tool.complete`, drop redundant `tool.started` re-emit from `tool.progress`. - electron/main: prefer SOURCE_REPO_ROOT before PATH `hermes` in dev so local backend edits actually run; split hardening helpers into `electron/hardening.cjs` with tests. - thread/tool UI: one-shot enter animation keyed by stable ids, braille spinner for running rows, Cursor-like disclosure rows, drill-down + duration/count formatting via new tool-fallback-model. - composer: extract `text-utils`, drop liquid-glass overrides. - right-rail: split preview-pane into preview-console / preview-file. - runtime: incremental external-store runtime + runtime-readiness gate; onboarding store + tests; route-resume hook test. - regression tests for live tool reconciliation (parallel tools, id-less progress, preview-first rows, structured args/results). * feat(desktop): add ripgrep to NSIS prereq page + polish layout Add ripgrep as a third (recommended) prereq alongside Python and Git in the NSIS prereq detection page, and clean up the page layout based on on-VM testing. Why ripgrep - Hermes' search_files tool calls `rg` directly for content + filename search (tools/file_operations.py:1382). Falls back to grep/find from Git Bash when missing — works but slower and noisier (no .gitignore awareness). - ~5MB winget install via `BurntSushi.ripgrep.MSVC --scope user` — no UAC prompt, parallel to how Python installs. - scripts/install.ps1 already installs ripgrep as part of Install-SystemPackages; this brings the desktop installer to parity. Why "recommended" not "required" - Python and Git are hard requirements: without them the agent runtime or terminal tool refuses to start. The bootstrapper preflight throws. - ripgrep is a performance enhancement: missing it just means slower searches. Page wording reflects this; failure to install is logged but doesn't show a MessageBox or block. Layout polish (response to on-VM screenshot review) - Wizard header now correctly reads "System Requirements" instead of the leftover "Choose Install Location" from the previous page. Set via `GetDlgItem $HWNDPARENT 1037/1038` + WM_SETTEXT — the standard NSIS pattern for overriding the page header on a custom Page. - Removed redundant in-body title + verbose intro paragraph; the wizard header IS the title now. Body has one short intro line. - Group boxes tightened to 26u with content positioned just below the groupbox title (not top-anchored status + bottom-anchored checkbox with empty space in the middle). All three panels + footer fit comfortably in 126u, well under the 140u page limit. - Checkbox labels simplified: dropped "(per-user, no admin prompt)" and "(administrator approval required)" suffixes. The footer note still calls out UAC for Git when relevant. - Footer text trimmed to fit cleanly without clipping. Install order (in customInstall macro) - Python → ripgrep → Git - Python and ripgrep are silent and run first; Git's UAC prompt comes last so the user's approval interaction isn't interrupted by silent activity afterwards. Skip behavior unchanged - All three detected → page auto-skips via Abort - Silent install (/S) → customInstall winget block skips - User unchecks all → page advances without running winget Files - apps/desktop/installer/prereq-check.nsh: ripgrep detection block, ripgrep page panel + checkbox, ripgrep customInstall block, GetDlgItem header override, layout reflow - apps/desktop/README.md: Runtime prerequisites section updated to list ripgrep as recommended, with manual winget command * feat(desktop): add model-confirmation step to onboarding After OAuth/API-key login completes, onboarding now shows a confirmation card with the curated default model and a Change button before dropping the user into chat. Closes the gap where the desktop's `model.default` was empty after first launch and the agent had to fall back to whatever heuristic happened to fire — leaving users wondering "why am I getting sonnet-4 when I logged into Nous Portal?" Why - Desktop onboarding only persisted credentials, never `model.default`. The CLI's `hermes model` command pairs provider + model selection, but the desktop's onboarding skipped the model step entirely. - Result: users saw whichever model the agent's auto-fallback picked, unpredictably and undocumented. - For the BUILD demo we want users to land on the model they expect for their provider, with a clear "this is what you're getting" UI and a one-click path to change it before chatting. How - New `confirming_model` flow status carries the just-authenticated provider slug, current default model, label, and a saving flag. - `completeWithModelConfirm()` runs after credentials succeed: reloads env, verifies runtime, fetches /api/model/options to find the curated first-model for the provider, persists it via /api/model/set, then transitions into `confirming_model`. - If anything fails (no providers returned, network error), falls through to the previous behaviour — onboarding completes without the confirm step. Polish, not a hard requirement. - All four credential paths (device_code OAuth, PKCE OAuth, external CLI flow, API key) now use completeWithModelConfirm instead of reloadAndConnect. UI - `ConfirmingModelPanel` shows: green "<provider> connected" banner, card with "Default model: <name>" + Change button, and a "Start chatting" CTA that finalises onboarding. - Reuses the existing `ModelPickerDialog` (the same picker available from the chat shell) for the change-model UX. Search, filtering, multi-provider listing — all already built. - Stacking: ModelPickerDialog defaults to z-130, which renders UNDER the onboarding overlay (z-1300) and breaks pointer events. Added optional `contentClassName` prop to ModelPickerDialog so callers can override; onboarding passes `z-[1310]`. Provider-slug matching - For OAuth flows: pass `provider.id` directly as the preferred slug. - For API-key flows: `OPENROUTER_API_KEY` → "openrouter" via env-key prefix strip. Also includes the user-visible label as a fallback candidate. - fetchProviderDefaultModel falls back to the first authenticated provider in the response if no preferred slug matches — so even a miss still surfaces a reasonable default. Files - apps/desktop/src/store/onboarding.ts: + new `confirming_model` flow variant + fetchProviderDefaultModel + completeWithModelConfirm helpers + setOnboardingModel (optimistic update + revert on failure) + confirmOnboardingModel (finalises onboarding from the card) - reloadAndConnect (replaced; the four call sites now go through completeWithModelConfirm) - apps/desktop/src/components/desktop-onboarding-overlay.tsx: + ConfirmingModelPanel component + new branch in FlowPanel for status `confirming_model` + ModelPickerDialog usage with z-[1310] content class - apps/desktop/src/components/model-picker.tsx: + optional `contentClassName` prop on ModelPickerDialog so the dialog can be stacked on top of other fixed overlays Tested - `npm run type-check` passes - `npx eslint` clean on touched files - Live test in `npm run dev`: cleared onboarding cache, walked through Nous device-code flow, saw confirm card with curated default, clicked Change → ModelPickerDialog rendered above the onboarding overlay with working pointer events, picked a different model, "Start chatting" persisted to ~/.hermes/config.yaml. * fix(desktop): suppress generic provider warning in onboarding Hide the red setup notice when the message is the generic missing-provider guidance, since onboarding already presents provider auth actions. Centralize provider-setup matching across desktop hooks and add coverage for the matcher. * fix(desktop): add 2u clearance below prereq checkboxes Group box bottom border was clipping the checkboxes by 1-2px. Bumped each box height 26u→30u; checkboxes now sit 2u above the bottom border. * fix(nix): refresh dashboard lockfile hash Update the web npm deps hash in nix/web.nix to match the committed apps/dashboard/package-lock.json so bb/gui passes the nix lockfile check. * fix(desktop): install TUI deps in release workflow Ensure desktop release builds install the standalone ui-tui package before bundling the TUI payload. * fix(desktop): run release builder from app package Invoke the desktop builder through the package script so electron-builder uses apps/desktop/package.json. * fix(desktop): expand release artifact names safely Build desktop artifact names from workflow version/channel while preserving electron-builder platform macros. * fix(desktop): use package artifact naming in release workflow Let electron-builder's desktop package config provide platform-specific artifact extensions while the workflow injects the release version/channel metadata. * fix(nix): fetch dashboard npm deps from package root Point the dashboard npm dependency fetch at apps/dashboard so Nix can find the package lockfile after the dashboard move. * fix(nix): build dashboard from package directory Set the web package source root to apps/dashboard so npm patch/build phases run beside the dashboard lockfile while keeping apps/shared available as a sibling. * feat(desktop): render LaTeX math via KaTeX after streaming completes Add @streamdown/math plugin to the chat markdown renderer. Inline ($x^2$) and block ($$...$$) math both supported with singleDollarTextMath enabled. Plugin is gated to non-streaming state to match the existing pattern for syntax highlighting — math renders when the message completes, avoiding KaTeX re-render churn during streaming. KaTeX CSS is imported in styles.css; ~30KB CSS + ~430KB JS added to the bundle. Smoothness improvements during streaming deferred to a follow-up. * perf(desktop): memoize KaTeX renders so math streams without re-rendering Wrap rehype-katex with a per-equation LRU cache (keyed by displayMode + source text) and re-enable math during streaming. Stock @streamdown/math runs rehype-katex on every markdown commit, so each new token re-katexes every equation in the message. For math-heavy responses (an equation derived step-by-step) that's hundreds of ms of wasted work per token and the streaming UI chokes. With memoization, each equation pays katex.renderToString exactly once; subsequent tokens re-walk the tree but hit cache for unchanged equations. The wrapper mirrors rehype-katex's semantics exactly: same class detection (language-math, math-inline, math-display), same <pre>-walk-up for fenced math blocks, same parent.children.splice replacement, same SKIP traversal, same strict-then-lenient render strategy with VFile message reporting. Cached children are structuredCloned on each splice so downstream rehype plugins or toJsxRuntime can't mutate the cache. * fix(desktop): declare katex-memo deps directly + drop per-app lockfile katex-memo.ts (added in 112cad59b) imports hast-util-from-html-isomorphic, hast-util-to-text, remark-math, katex, and unist-util-visit-parents but those were never added to apps/desktop/package.json. They were silently resolving via @streamdown/math at the workspace root, which broke the moment `npm i --prefix apps/desktop` ran with the per-workspace lockfile because that install only consults apps/desktop/package.json. Add them as direct deps, plus unified/vfile/@types/hast for the type imports. Also delete apps/desktop/package-lock.json — root package.json declares workspaces: ["apps/*"], so npm manages all lockfile state at the root. The stale per-app lockfile is what made `npm i --prefix apps/desktop` diverge from the workspace install in the first place and left an empty apps/desktop/node_modules/@assistant-ui/ stub that Vite's dep optimizer then tried (and failed) to open at @assistant-ui/core/dist/internal.js. * feat(desktop): disable Backdrop noise overlay by default The noise overlay defaulted to on, which adds a busy speckle layer over the whole window for every new user. Flip the Leva default to off; the toggle stays in Backdrop / Noise for anyone who wants it back. * fix(desktop): polish LaTeX rendering — currency, code blocks, brackets Five distinct bugs surfaced from a math-heavy stress test: 1. Adjacent code fences glued together. scrubBacktickNoise's second-pass regex /``\s*``/g matched the LAST 2 backticks of one fence + whitespace + FIRST 2 backticks of the next, collapsing two blocks into one. Fixed with lookbehind/lookahead so we only match exactly 2 backticks not part of a longer run. 2. Whitespace eaten between fences and following content. stripPreviewTargets internally calls .trim() which strips leading/ trailing whitespace from each split-segment. For segments between two fences this collapsed \n\n to '', gluing fence close to next block. Fixed by capturing leading/trailing whitespace at the call site and restoring it after the transform. 3. Currency dollar signs eaten as math. With singleDollarTextMath:true remark-math greedy-matched any pair of $, so '$5 ... $10' became one inline math span. Added escapeCurrencyDollars to escape $<digit> patterns to \$<digit> in prose segments (not in code). Trade-off: math expressions starting with a digit (rare — '$5x = 10$') get escaped too. Mirrors the convention in ChatGPT/Claude's UIs. 4. \(...\) and \[...\] LaTeX brackets unsupported. Models often emit these instead of $...$ / $$...$$. Added rewriteLatexBracketDelimiters preprocessor pass. 5. ```latex / ```tex blocks were being routed to KaTeX via a rewrite to ```math. Aligns with GitHub markdown convention: ```math = render as math; ```latex / ```tex = LaTeX/TeX source code (syntax highlighted, not rendered). Conflating them broke teaching/showing-source use cases. MATH_FENCE_LANGUAGES pruned to {'math'} only. Also flipped parseIncompleteMarkdown to true (was !isStreaming) so the math parser can't see $ inside streaming-but-not-yet-closed code fences. Shiki was already deferred via defer={isStreaming} so this doesn't introduce new tokenization cost. Test: 18/18 existing tests still pass; one test updated to expect escaped \$ in currency-prose-with-URL case. * fix(desktop): detect Python via registry/filesystem; pin to 3.11–3.13 Two related fixes for Python detection on Windows: 1. py.exe (Python launcher) is missing from per-user installs that didn't check the launcher option, so 'py -3.X --version' alone misses real Python installs. User-reported case: clean Win11 + official Python.org 3.14 install -> 'where py' returned nothing, our installer offered to install Python again. Both NSIS prereq page and main.cjs now probe in this order: 1. py.exe launcher (when present) 2. PEP 514 registry: HKLM/HKCU\SOFTWARE\Python\PythonCore\<v>\InstallPath 3. Filesystem: %ProgramFiles%\Python<v>, %LocalAppData%\Programs\Python\Python<v> Crucially, we never fall back to running 'python.exe' from PATH on Windows — the WindowsApps stub at %LOCALAPPDATA%\Microsoft\ WindowsApps\python.exe is a redirector that opens the Microsoft Store window if no Store Python is installed. Triggering that during boot would be terrible UX. Registry/filesystem probes never execute the binary. 2. Drop 3.14 from the supported version set. Several Hermes deps (notably pywinpty, which carries Rust crates like windows_x86_64_msvc) don't yet publish 3.14 wheels. With wheels missing, 'pip install -e .' falls back to building from sdist, which needs a Rust toolchain — users see 'could not compile windows_x86_64_msvc build script' on first run. install.ps1 sidesteps this by pinning to 3.11 via uv; the desktop installer doesn't yet have the same uv-managed-Python pathway, so for now we accept 3.11/3.12/3.13 and tell winget to install 3.11 if none of those are present. Revisit when the wheel ecosystem catches up to 3.14 (~early 2026). * feat(desktop): Cron, Profiles, usage analytics, and titlebar fixes - Add Cron and Profiles sidebar routes with full CRUD-style flows and API wiring. - Extend Command Center with auxiliary task overrides and a Usage panel (7d/30d/90d). - Fix titlebar geometry for WSL/Windows (native overlay width, tool spacing). - Remove stray merge conflict markers from pyproject.toml optional deps. Co-authored-by: Cursor <cursoragent@cursor.com> * fix(title-bar): position sidebar toggle button * feat(desktop): composer queue — queue many, edit/delete/cancel-edit, Cursor-style Press Enter while busy with a draft to queue it; with no draft to interrupt and send the next queued turn. Auto-drains one queued turn each time the session settles, same as Cursor. Queue persists across reloads so an interrupted-and-queued turn isn't lost on refresh. Each queued row supports edit-in-composer (with explicit Save/Cancel), send-now (↑), and delete. Drain skips only the entry currently being edited so the rest of the queue keeps flowing. Queue dequeue is transactional — an entry only leaves the queue after `prompt.submit` is accepted, so a rejected submit doesn't drop the turn. Also shrinks the `[interrupted]` marker to a muted one-liner and drops its assistant footer so it stops looking like a real reply. * fix(desktop): handle empty usage analytics totals Co-authored-by: Cursor <cursoragent@cursor.com> * fix(desktop): address PR review titlebar and usage races Co-authored-by: Cursor <cursoragent@cursor.com> * feat(desktop): add MCP settings and live subagent tree Surface configured MCP servers in Settings with JSON edit/save and a gateway-backed reload action so users can manage tool servers without falling back to slash commands. Track live subagent gateway events in a desktop store, show active subagent counts in the Agents statusbar item, and replace the Agents overlay stub with a live spawn tree for the active session. * fix(desktop): move power-user views out of sidebar Keep Cron and Profiles available through lower-prominence chrome entry points so the workspace sidebar stays focused on core chat navigation. Co-authored-by: Cursor <cursoragent@cursor.com> * refactor(desktop): subagent overlay reads like a live transcript, not a dashboard Strip the card chrome and rewire /agents to feel like peeking into the child agent's stream: - subagents store: single `stream` of typed entries (thinking/tool/progress/ summary) replaces the parallel notes/thinking/tools arrays. Drop unused fields (toolsets, depth, apiCalls, reasoningTokens, sessionId). - agents view: no OverlayCards, no boxed stream, no per-row borders. Goal + status pill + indented stream lines, full row width. - Group root spawns into "Delegation N" sections when batch shape + spawn time match — hides task-index interleaving and makes hierarchy obvious. - Sort tree by spawn time, then task_index. Step indicator is one colored pill (primary while running, emerald when done) inside the row, not a trailing pill that wrapped under the chevron. - Tree picks up `subagent.start` (not only `spawn_requested`) and prunes delegate-tool fallback rows once native subagent events land for the session — fixes duplicate "Delegated task" rows alongside the real ones. * feat(desktop): Esc closes every OverlayView-based overlay Lift the keyboard handler into the shared OverlayView so Agents, Settings, Command Center — and anything we build on top of it later — all dismiss on Esc by default. Nested Radix dialogs stop propagation themselves, so a modal opened inside an overlay (e.g. model picker inside Settings) still closes the modal first, not the overlay underneath. Drop the now-redundant Esc handlers in Settings (kept Cmd/Ctrl+P) and Command Center. * fix(desktop): drop numbered step pill on subagent rows The pill was getting clipped at the overlay edge anyway. Just use the status glyph (●/✓/✗/■/○) — the delegation header already conveys "3 workers, 3 active", and order in the list implies which step you're looking at. * fix(desktop): drop noisy "returned N items / empty object" stub strings When a tool returns nothing useful, the row should be silent — the title ("Search Files", etc.) already tells the user what happened. Counting the fields in an opaque payload is engineer-noise. `formatToolResultSummary` and `minimalValueSummary` now return '' for empty arrays / records / unrecognized values; tool-fallback already hides the detail section when its body is empty. * refactor(desktop): subagent rows borrow chat tool patterns (fade-in, lucide glyphs, shimmer) Pull the agents view closer to how chat tool blocks render: - statusGlyph() returns the same lucide BrailleSpinner / CheckCircle2 / AlertCircle vocabulary as tool-fallback's statusGlyph - Stream lines fade-in via useEnterAnimation (one-shot WAAPI), keyed per entry so streamed deltas settle in instead of popping - Subagent rows fade in too, and pick up the existing data-slot=tool-block spacing rules between blocks - Active stream line trails a BrailleSpinner instead of a hand-rolled pulsing rectangle - Goal text drops FadeText (which forces nowrap); keep FadeText only for the single-line meta subtitle - Running rows shimmer the title — same affordance the chat thinking row uses * refactor(desktop): make /agents subagent-only, drop sidebar + dead sections Activity rail and History stub were both noise. Strip the split layout, sidebar, route enum, and the rail/stub helpers — the overlay is now just the spawn tree, centered in a max-w-3xl column so it stops claiming the whole screen for one section's worth of content. * feat: update cron modals * Add dedicated GUI log stream for dashboard debugging. Capture dashboard and PTY websocket lifecycle failures in gui.log and expose it via hermes logs. * Improve desktop runtime UX by surfacing inference readiness in gateway status and hardening WSL link opening. This also stabilizes markdown code/table block spacing and adds root-install guards so desktop dev runs use a healthy workspace dependency tree. * Log detailed GUI websocket failure metadata. Capture richer reject/disconnect/send/parse context for dashboard gateway websocket flows so GUI connection failures are diagnosable from logs. * Default dashboard startup logging to GUI mode. Detect the dashboard subcommand during early CLI bootstrap so gui.log is attached from process start and GUI startup failures are always captured. * Clean up gateway status conditionals and logging bootstrap mode detection. Simplify nested dashboard gateway status branches for readability and use a concise first-subcommand check when selecting early GUI logging mode. * add logging to nsis installer * feat: glass ui pass * fix(desktop): persist inline assistant errors across hydrate/resume - Detect provider failure text arriving via message.complete (HTTP 4xx, "API call failed after N retries", Provider/Gateway error: ...) and persist as an inline assistant error instead of regular completion text, blocking the hydrate that was wiping it. - preserveLocalAssistantErrors: merge by id so same-id hydrated messages keep their local error, and preserve the optimistic user+error pair as a unit (with tail-user dedupe). - Hook all hydrate/resume writers (use-session-actions resume + fallback, hydrateFromStoredSession, syncSessionStateToView) into the merge so stale snapshots can't clobber a failed turn. - Add error to chatMessagesEquivalent so the resume diff actually sees error-only changes and paints them. - editMessage on a failed turn now submits a plain resend (no truncate_before_user_ordinal) and retries plainly on the "no longer in session history" race. Style polish on touched files: - Inline error: text-only treatment (no card). - User stop / edit-composer send: shared Tabler IconPlayerStopFilled glyph + shared icon-button class slot for parity. * feat(desktop): theme xterm with active light/dark mode The right-sidebar terminal hardcoded a light palette, which read poorly on the dark glass surface. Subscribe to `useTheme().resolvedMode` and hot-swap `term.options.theme` so Shift+X (and any other mode change) updates the terminal in place without tearing down the PTY session. Dark mode uses xterm's built-in defaults (white fg/cursor + vivid ANSI 16) with just a transparent background so the glass shows through; light mode keeps the existing hand-tuned overrides for legibility on a bright surface. * feat(sidebar): right-click + drag-reorder sessions and workspaces - Wire right-click on session rows to open the same actions menu; suppresses the OS-native context menu so Windows stops looking awful. - Share dropdown + context menu items via useSessionActions() driving a single declarative ItemSpec[]; render polymorphic over MenuItem. - New shadcn ContextMenu primitive mirroring DropdownMenu styling. - Restore drag-and-drop reordering for Agents (lost during the cwd cleanup) and add reordering of workspace groups via a right-side grab handle. Pinned reorder unchanged. - Generic orderByIds<T> replaces the duplicated session/group orderers; useSortableBindings() hook collapses the two Sortable wrappers. - cursor-pointer on every actionable element; cursor-grab on handles. - KISS pass: baseName() helper, AGE_TICKS table, single WORKSPACE_PAGE constant, flatter SidebarSessionsSection render. * feat(desktop): solarize the xterm palette in both light & dark xterm's default ANSI 16 is tuned for dark and reads candy-bright on the light glass surface (vivid cyans/greens). Ship the canonical Solarized palette (Schoonover) for both modes — same 16 accents either way, only fg/cursor swap between `base00/01` (light) and `base0/1` (dark), so a prompt's colors look uniform across a Shift+X toggle. Background stays transparent in both modes — Solarized's cream/slate backgrounds would fight the glass. * feat(desktop): virtualize chat thread + sidebar via TanStack Virtual Replaces `use-stick-to-bottom` and per-row session rendering with `@tanstack/react-virtual`, matching what Cursor uses. Chat thread (`thread-virtualizer.tsx`): - Natural-flow virtualization (padding spacers, not absolute items) so `position: sticky` on the human bubble still resolves cleanly against the scroller. - Custom at-bottom anchor: pins when armed, disarms on user-driven upward scroll, re-arms at bottom, jumps on session switch + `thread.runStart`. - Loading indicator and `--thread-last-message-clearance` move to a real `[data-slot=aui_composer-clearance]` node; drops the brittle `:nth-last-child(1 of …)` rule that can't fire reliably under virtualization. Sidebar (`virtual-session-list.tsx`): - Flat agents list virtualizes at >=25 rows; pinned and workspace-grouped paths stay direct-render. - `SortableContext` keeps all IDs; only the window mounts; dnd-kit's `setNodeRef` is merged with `virtualizer.measureElement` so rows participate in both DnD hit-testing and TanStack measurement. Drops `use-stick-to-bottom`. Streaming test gets a global `offsetWidth/offsetHeight` stub so the virtualizer's viewport sizing works in jsdom; the scroll-up-doesn't-pull-back invariant still passes. * feat: more ui qa * fix(desktop): trim sidebar terminal startup spacer Drop zsh's initial spacer row before writing the first terminal prompt so new sidebar terminal sessions do not open with a selectable blank line. * chore: uptick * feat(desktop): thin installer + first-launch install.ps1 bootstrap Converges the Windows packaged desktop installer onto a single canonical install topology: drop the Electron shell only (~80MB instead of ~500MB), clone Hermes Agent at a build-time-pinned commit on first launch via install.ps1's stage protocol, and treat the resulting git checkout at %LOCALAPPDATA%\hermes\hermes-agent\ as the canonical install location (same path the CLI installer uses). Future updates flow through the existing applyUpdates() git-pull path. Replaces the previous fat-installer architecture where the .exe bundled a pre-staged hermes-agent source tree under resources/hermes-agent/ that was then sync'd into ACTIVE_HERMES_ROOT at launch -- a complicated factory-vs-active dance with several footguns (FACTORY_HERMES_ROOT mismatch on path resolve, isGitCheckout guard regressions, pyproject hash drift detection inside the sync loop). Architecture overview --------------------- Build time apps/desktop/scripts/write-build-stamp.cjs writes apps/desktop/build/install-stamp.json with {commit, branch, builtAt, dirty}. Honours $GITHUB_SHA / $GITHUB_REF_NAME in CI, falls back to `git rev-parse HEAD` locally. apps/desktop/scripts/stage-native-deps.cjs copies the runtime subset of @homebridge/node-pty-prebuilt-multiarch from the workspace-root node_modules into apps/desktop/build/native-deps/. Workspace dedup hoists this dep to the root, out of reach of electron-builder's `files:`-restricted collector; staging gives us a deterministic path to extraResources. electron-builder ships both into resources/install-stamp.json and resources/native-deps/ respectively. Boot resolver (electron/main.cjs) Resolver order: 1. HERMES_DESKTOP_HERMES_ROOT override 2. SOURCE_REPO_ROOT (dev mode) 3. ACTIVE_HERMES_ROOT git checkout WITH .hermes-bootstrap-complete marker -- the post-install fast path 4. `hermes` on PATH (CLI-installed user adding the desktop) 5. pip-installed hermes_cli via system Python 6. bootstrap-needed sentinel -> hand off to runBootstrap Deletes the entire FACTORY_HERMES_ROOT / RUNTIME_MARKER / syncTreeExcludingVenv machinery (-200 lines). The isGitCheckout guard that bit us in the install.ps1 PR is gone. First-launch bootstrap (electron/bootstrap-runner.cjs) 1. Resolve install.ps1: prefer SOURCE_REPO_ROOT/scripts (dev), else download from GitHub raw at INSTALL_STAMP.commit (cached at HERMES_HOME\bootstrap-cache\install-<sha>.ps1). 2. Fetch the stage manifest via install.ps1 -Manifest -Commit X -Branch Y. 3. Iterate stages: install.ps1 -Stage <name> -NonInteractive -Json -Commit X -Branch Y per stage. 4. On all stages green: write the .hermes-bootstrap-complete marker with {schemaVersion, pinnedCommit, pinnedBranch, completedAt, desktopVersion}. Per-run log to HERMES_HOME\logs\bootstrap-<ts>.log. Cancellation via AbortSignal. Manifest cache so retries don't re-download. Install overlay (src/components/desktop-install-overlay.tsx) Mounted alongside the existing onboarding overlay; flexbox card with header (static) + middle (scrollable) + footer (failure-only, static). Subscribes to hermes:bootstrap:event IPC + resyncs from hermes:bootstrap:get on mount/reload. Renders: - 14-stage checklist with per-stage state icons - Overall progress bar + current-stage spotlight - Auto-expanded installer-output panel on failure - "Copy output" button (full ring buffer + error to clipboard) - "Reload and retry" wired through hermes:bootstrap:reset to clear main.cjs's latched failure Synthetic empty-manifest event from main.cjs flips the overlay to 'active' immediately so the slow install.ps1 download doesn't leave the user staring at the generic Preparing splash. Failure latching (main.cjs) bootstrapFailure module-scope variable holds the rejection after install.ps1 fails. startHermes() throws the latched error immediately when set, bypassing the entire ensureRuntime + runBootstrap chain. Without this, the renderer's ensureGatewayOpen retries would re-run install.ps1 in a 5-10 min hot loop while the user was still reading the failure overlay. Cleared via hermes:bootstrap:reset on user-driven retry. Unsupported-platform overlay (1F) macOS / Linux packaged builds (no install.sh stage protocol yet) emit an unsupported-platform event with a copy-pasteable install command + docs URL. Dedicated overlay branch with "Copy command" + "I've run it -- retry" buttons. install.ps1 additions (Phase 1F.3 + 1F.5) ----------------------------------------- New -Commit and -Tag string params. Precedence Commit > Tag > Branch. Honoured by all three code paths (update / fresh clone / ZIP fallback), with archive URL selection that handles each ref-type variant. Detached-HEAD checkouts intentionally -- they're pins, not branches the user pulls into. EAP=Continue wrap around the new pin-step git invocations. `git fetch origin <commit>` writes the routine 'From <url>' info line to stderr; under the script's global EAP=Stop that terminates the script even though fetch+checkout succeed. Matches the established pattern in Install-Uv, Test-Python, _Run-NpmInstall. Backend fix (hermes_cli/web_server.py) -------------------------------------- CORS allow_origin_regex now accepts Origin: 'null'. Packaged Electron loads index.html via file://; Chromium sets the WebSocket upgrade Origin header to the opaque origin 'null', which the old regex rejected with HTTP 403 before gateway_ws() ever ran. This failure mode was masked in the older FACTORY_HERMES_ROOT architecture because the resolver often found an existing hermes on PATH with different binding behavior. Security maintained: localhost-only bind keeps cross-machine pages out; per-process session token still gates every authenticated /api/ endpoint regardless of Origin. Desktop QoL ----------- DevTools is now enabled in packaged builds (F12 / Cmd+Opt+I). Field-debugging trade-off: tiny attack surface increase versus a much better support story when CSP / WS / theme issues surface. NSIS prereq-check page deleted (-767 lines). The standard Welcome -> License -> Directory -> InstallFiles -> Finish wizard now installs without custom Python/Git/ripgrep detection -- those prereqs are install.ps1's job at first launch. Test infrastructure (Phase 1G) ------------------------------ apps/desktop/scripts/test-desktop.mjs rewritten as a cross-platform bundle validator (was darwin-only and asserted on dead factory- payload paths): NEGATIVE: hermes_cli/main.py is NOT shipped (regression guard) POSITIVE: install-stamp.json carries a real commit + branch POSITIVE: node-pty native deps shipped under resources/native-deps POSITIVE: renderer dist/index.html reachable (asar or unpacked) New nsis mode and npm run test:desktop:nsis script. Validated end-to-end on clean Win10 VM -------------------------------------- Confirmed: NSIS installer drops Electron shell, app launches, install overlay shows progress, install.ps1 clones the pinned commit, 14 stages run to completion, marker written, backend spawns, WebSocket connects, onboarding overlay asks for API key, main UI loads, integrated terminal works. Failures handled: bootstrap stays failed (no hot-loop retry), "Copy output" gives actionable transcript, "Reload and retry" explicitly re-runs install.ps1. What's deferred --------------- - MSIX wrapping (Phase 2): same Electron .exe under MSIX manifest with runFullTrust, signed and submitted to Microsoft Store. - install.sh stage protocol parity (Phase 2): once shipped, the unsupported-platform overlay becomes drive-it-yourself and macOS/Linux packaged installers gain feature parity with Windows. * feat(desktop): persistent terminal pane + fullscreen takeover Adds a VSCode-style "focus terminal" toggle to the right sidebar's Terminal tab that takes over the chat pane area without unmounting the shell. The xterm host is mounted once at the layout root and CSS-overlayed onto whichever <TerminalSlot /> is currently active, so the PTY session, scrollback, selection, focus, and WebGL renderer survive every toggle. Also: - WebGL renderer (matching dashboard ChatPage) so Hermes' TUI skins paint faithfully instead of muting through xterm's default DOM renderer - File drag/drop from the project tree or OS into xterm — paths are shell-quoted (zsh/bash/pwsh/cmd) and written straight into the PTY - Solarized dark canvas with brights promoted to real accent variants (Schoonover's UI-gray brights washed out every TUI accent) - Strip NO_COLOR/FORCE_COLOR/COLORFGBG/TERM=dumb leaking from non-tty parents (CI runners, Cursor's agent shell) so the embedded shell gets truecolor regardless of how Electron was launched - rAF-debounced ResizeObserver — running fit.fit() synchronously during sibling pane transitions crashed the WebGL texture-atlas rebuild * fix(install.ps1): strip UTF-8 BOM regression that broke 'irm | iex' The canonical install flow irm https://raw.githubusercontent.com/.../scripts/install.ps1 | iex fails on PowerShell 5.1 with a cascade of 'The assignment expression is not valid' errors at every param() default value: [string]$Branch = 'main', ~~~~~~ The assignment expression is not valid. The input to an assignment operator must be an object that is able to accept assignments... Root cause: scripts/install.ps1 carries a UTF-8 BOM (0xEF 0xBB 0xBF) as its first three bytes. 'irm' returns the response body as a string; on PS 5.1 the BOM survives into that string as a leading \ufeff character. 'iex' then evaluates the string and PS's parser chokes on the invisible character before param() -- error recovery proceeds into the body but every assignment is reported as broken. This was the exact failure mode the install.ps1 hardening pass (PR #27224) deliberately fixed by stripping the BOM and ensuring the file body is pure ASCII. Commit |
|||
| 9b78f411c8 |
fix(security): neutralize file paths in mutation-verifier footer (#35584) (#35684)
The per-turn file-mutation verifier footer rendered failed-write paths as bare absolute paths in the user-facing response. The gateway's extract_local_files() scans response text for bare paths ending in a deliverable extension (.yaml/.json/etc.), validates os.path.isfile(), and auto-attaches matches as native uploads — so a denied write to ~/.hermes/config.yaml surfaced the path in the footer and got the credential file silently uploaded to the messaging channel. The gateway denylist (validate_media_delivery_path) already blocks the config.yaml case after #35634. This is defense-in-depth at the source: backtick-wrap every path the footer emits — both the bullet path and any path echoed inside the tool's error preview (the protected-file denial message embeds the path in single quotes, which do NOT block the extractor regex). extract_local_files skips paths inside inline-code spans, so wrapping defeats auto-attachment for ANY protected file while keeping the path human-readable. - run_agent.py: _format_file_mutation_failure_footer wraps bullet paths; new _neutralize_footer_paths backticks any remaining bare path (covers the preview echo). staticmethod -> classmethod (caller unaffected). - tests: backtick-wrap assertion + end-to-end extract_local_files leak test. |
|||
| 2b16b756a7 |
fix(gateway): recover model on post-interrupt turn; gate fallback status (#35381)
Empty model could reach the API on a recovery turn after stream_interrupt_abort, failing HTTP 400 "No models provided" with no recovery — the session went silent until the user manually re-sent (#35314). - gateway/run.py: cache last-successfully-resolved model per session (+ a process-wide slot); when a fresh config read returns an empty model on a recovery turn, reuse the last-known-good instead of building model="". - run_agent.py + agent/conversation_loop.py: only emit "trying fallback..." status when a fallback chain actually exists, so the UI stops announcing a fallback that will never run (also #17446). - tests: empty-model recovery + _has_pending_fallback gate. |
|||
| fb0ab27649 |
fix(agent): register explainer config key + shorten footer prefix
Follow-up to the salvaged #34452 turn-completion explainer: - Register display.turn_completion_explainer: True in DEFAULT_CONFIG so the setting is discoverable, matching the file_mutation_verifier precedent. - Shorten the repeated footer prefix from 'Turn ended without a usable reply: ' to 'No reply: ' so the 10 reason variants don't all open with the same 8-word boilerplate. - Update the 7 assertions that referenced the old prefix. |
|||
| 59b0ea98c8 |
fix(agent): explain abnormal turn endings instead of blank/partial reply
When a turn ends abnormally after substantive tool calls (empty content after retries, a partial/truncated stream, exhausted retries, or an iteration/budget limit), the CLI/TUI response area was left blank or showed only a fragment (e.g. "The") with no consolidated reason. The internal turn_exit_reason values (empty_response_exhausted, partial_stream_recovery, etc.) were never surfaced to the user. Add a turn-completion explainer that mirrors the existing file-mutation verifier footer: at turn end, map an abnormal turn_exit_reason to a short, actionable message and either replace the bare "(empty)" sentinel or append the reason after a partial fragment. Normal text_response exits (e.g. a terse "Done.") stay quiet. Gated by display.turn_completion_explainer (default on) with HERMES_TURN_COMPLETION_EXPLAINER env override, matching the file-mutation verifier seam. Closes #34452 |
|||
| a22c250001 |
refactor(auth): remove vestigial Nous min_key_ttl/inference_auth_mode params
After the legacy session-key path was removed, two parameters became dead surface on the Nous runtime-resolution chain: - min_key_ttl_seconds: del'd inside refresh_nous_oauth_pure and pass-through / telemetry-only in refresh_nous_oauth_from_state, _try_import_shared_nous_state, _nous_device_code_login, and resolve_nous_runtime_credentials. It controlled the now-deleted agent-key mint TTL and drives no behavior. - inference_auth_mode: with the legacy mode gone, AUTO and FRESH are behaviorally identical; the value only fed _normalize_nous_inference_auth_mode validation and oauth trace output, never a branch. Removing inference_auth_mode orphaned its whole supporting cluster (NOUS_INFERENCE_AUTH_MODE_AUTO/FRESH, NOUS_INFERENCE_AUTH_MODES, _normalize_nous_inference_auth_mode), and dropping min_key_ttl_seconds orphaned DEFAULT_AGENT_KEY_MIN_TTL_SECONDS — all deleted here. Updated every caller (run_agent, auxiliary_client, credential_pool, proxy adapter, runtime_provider, web_server, main, auth_commands, setup) and pruned the matching test kwargs. Deleted two tests that exercised the removed surface (test_legacy_auth_mode_is_rejected, test_try_refresh_..._accepts_explicit_auth_mode). No behavior change: net -134 LOC of dead code. |
|||
| 41ff6e5937 | refactor(auth): Disable Nous legacy session key fallback | |||
| bc31ee5cf8 |
fix(kanban): bridge worker runtime activity to board heartbeat (#31752)
The dispatcher watchdog (release_stale_claims) reads tasks.last_heartbeat_at to decide whether to reclaim a running task. The agent maintains its own in-process `_last_activity_ts` for every chunk/tool result, but those liveness ticks never reach the board unless the model explicitly calls the `kanban_heartbeat` tool — so a worker actively executing a long run without tool-level heartbeats can be reclaimed mid-flight as 'stale', returning the task to ready and orphaning the in-flight worker's progress. Fix: in `_touch_activity` (the canonical 'we just did work' hook in run_agent.py), call a new `heartbeat_current_worker_from_env` helper in `tools/kanban_tools.py` that: - No-ops outside dispatcher-spawned worker context (no HERMES_KANBAN_TASK). - Rate-limited to one DB write per 60s (runtime activity ticks too often to faithfully mirror; we just need the watchdog to see liveness). - Best-effort: never raises. heartbeat_claim + heartbeat_worker calls are individually try/except'd; any DB error logs at debug and returns. - Uses worker env identity: HERMES_KANBAN_TASK + HERMES_KANBAN_RUN_ID + HERMES_KANBAN_CLAIM_LOCK (all pinned by the dispatcher at spawn time). - No durable note on auto-heartbeats — that's reserved for the explicit `kanban_heartbeat` tool which carries a model-supplied note. The explicit `kanban_heartbeat` tool stays available unchanged for workers that want to attach a note or pre-emptively extend a claim across a known-long single tool call. Co-authored-by: faisfamilytravel <223516181+faisfamilytravel@users.noreply.github.com> |
|||
| f61fd59b62 | docs(run_agent): clarify why F401 re-exports stay | |||
| e371bf5d68 |
fix: re-export pruned names for tests that mock.patch or from-import them
The mechanical ruff prune in the previous commit removed several names that
`appear` unused inside their defining module but are external test/runtime
anchors:
run_agent
OpenAI, _SafeWriter
get_tool_definitions, handle_function_call, check_toolset_requirements
estimate_request_tokens_rough
DEFAULT_AGENT_IDENTITY, build_context_files_prompt,
build_environment_hints, build_nous_subscription_prompt
_is_destructive_command, _extract_parallel_scope_path, _paths_overlap,
_append_subdir_hint_to_multimodal, _trajectory_normalize_msg
tools/web_tools
Firecrawl, _get_firecrawl_client
These get accessed via four channels that are invisible to ruff's
in-module usage analysis:
1. `mock.patch('module.name', ...)` in tests — resolves the attribute
lazily, so `pytest --collect-only` passes even when the name is
gone, but every test using the patch fails at runtime with
AttributeError.
2. `from run_agent import X` in production siblings (agent/transports
/codex.py, etc.).
3. The `_ra().X` indirection pattern in agent/system_prompt.py et al.
— explicitly documented ("Many tests patch('run_agent.load_soul_md')")
to preserve the patch contract.
4. `from tools.web_tools import _get_firecrawl_client` in tests.
Each re-added import carries an explicit `# noqa: F401` with a comment
naming the channel, so future cleanup passes won't strip them again.
|
|||
| 66827f8947 |
chore: prune unused imports and duplicate import redefinitions
Remove unused imports (F401) and duplicate/shadowed import redefinitions (F811) across the codebase using ruff's safe autofixes. No behavioral changes -- imports only. - ~1400 safe autofixes applied across 644 files (net -1072 lines) - __init__.py re-exports preserved (excluded from F401 removal so public re-export surfaces stay intact) - Re-exports that are imported or monkeypatched by tests but look unused in their defining module are kept with explicit # noqa: F401 (gateway/run.py load_dotenv; run_agent re-exports from agent.message_sanitization, agent.context_compressor, agent.retry_utils, agent.prompt_builder, agent.process_bootstrap, agent.codex_responses_adapter) - Unsafe F841 (unused-variable) fixes deliberately skipped -- those can change behavior when the RHS has side effects - ruff lints remain disabled in pyproject.toml (only PLW1514 is selected); this is a one-time cleanup, not a config change Verification: - python -m compileall: clean - pytest --collect-only: all 27161 tests collect (zero import errors) - core entry points import clean (run_agent, model_tools, cli, toolsets, hermes_state, batch_runner, gateway) - static scan: every name any test imports directly from an edited module still resolves |
|||
| 5a95fb2e14 |
feat: expose completed-turn message context to memory providers
Adds an optional `messages` keyword to the `MemoryProvider.sync_turn` contract so external/community memory plugins can receive the OpenAI-style conversation message list for the completed turn — including assistant tool calls and tool result content — not just the final assistant text. Dispatch uses signature inspection (`_provider_sync_accepts_messages`): only providers that declare a `messages` parameter (or `**kwargs`) receive it; all existing in-tree providers keep their legacy text-only signature and are called unchanged. No structured-trace envelope is added to core — providers reconstruct whatever they need from the standard message list. Also documents Memori as a standalone community memory provider. Salvaged from #28065 — rebased onto current main. Co-authored-by: Dave Heritage <david@memorilabs.ai> |
|||
| 67011cc0d7 |
feat(agent): buffer retry/fallback status, surface only on terminal failure (#33816)
Users report that the CLI/gateway floods them with confusing retry chatter
during transient failures: a single 429 can produce 10+ "Provider/Endpoint/
Retrying in 5s..." lines before the request eventually succeeds. The same
firehose hits Telegram, Discord, Slack, etc. via _emit_status.
This patch defers all retry/fallback/compression status messages until we
know the outcome:
- if the turn ultimately succeeds (any path: primary recovers, fallback
activates, compression unsticks the request), the buffer is silently
dropped — the user sees nothing.
- if every retry and fallback exhausts and the turn fails, the buffer
is flushed at the terminal-failure return so the user sees the full
retry trace alongside the final error.
Backend logging (agent.log) is unchanged — every emission site still
writes to logger.warning/info, so post-mortem diagnosis is intact.
## What changed
run_agent.py: four new methods on AIAgent:
_buffer_status(msg) — defer an _emit_status call
_buffer_vprint(msg) — defer a _vprint(force=True) line
_clear_status_buffer() — drop pending messages on success
_flush_status_buffer() — replay pending messages on terminal failure
agent/conversation_loop.py:
- converted ~30 mid-process emit/vprint sites in the retry, fallback,
compression, empty-response, and stream-watchdog paths to the buffered
helpers
- added _flush_status_buffer() at every terminal-failure return so users
still see the trace when it actually matters
- added _clear_status_buffer() at the "non-empty assistant content"
point (NOT at "API call returned bytes" — empty responses still loop
through the empty-retry path and would otherwise lose their trace
between iterations)
- silenced the two "(´;ω;`) oops, retrying..." / "(╥_╥) error,
retrying..." spinner final-frame messages — the spinner now stops
cleanly so retries leave no visible residue
agent/chat_completion_helpers.py: same conversion for codex TTFB / stale-
stream / fallback-activation status messages.
agent/stream_diag.py: _emit_stream_drop now buffers instead of emitting
directly.
## Tests
tests/run_agent/test_retry_status_buffer.py: 7 unit tests covering
accumulate→flush, clear-on-success, mixed kinds, empty-buffer no-op,
re-buffer after flush, exception swallowing.
Updated 3 existing tests that mocked _emit_status to also mock (or use)
_buffer_status:
- tests/run_agent/test_run_agent.py::test_empty_response_emits_status_for_gateway
- tests/run_agent/test_stream_drop_logging.py (2 tests)
- tests/agent/test_codex_ttfb_watchdog.py (TTFB hint test)
## Validation
Live test: hermes chat -q against an unreachable endpoint with no fallback
exhausts retries and prints the full trace at the end. Same flow against
a working endpoint prints zero retry chatter.
|
|||
| b5495db701 |
fix(agent): re-pad reasoning_content on cross-provider fallback to require-side providers
api_messages is built once before the retry loop while the primary provider
is active. When a mid-conversation fallback switches to a require-side thinking
provider (DeepSeek/Kimi/MiMo), assistant turns built under a non-require primary
(e.g. Codex) go out without reasoning_content and the new provider rejects the
request with HTTP 400 ("reasoning_content must be passed back").
Re-apply the echo-back pad against the current provider immediately before
building the request kwargs. Idempotent and a no-op unless the active provider
enforces echo-back, so it covers all fallback paths without affecting normal or
reject-side operation.
Drafted by Claude (Opus 4.7) under human review while fixing a personal deployment.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
|||
| 9b5dae17a5 |
feat(context-engine): host contract for external context engines
Condenses the substance of PRs #16453, #17453, #16451, #17600, and #13373 into a minimal generic host contract that external context engine plugins (e.g. hermes-lcm) need to integrate cleanly. Drops scaffolding that duplicated existing infrastructure or had marginal value. Five concrete changes: 1. `_transition_context_engine_session()` on AIAgent — generic lifecycle helper that fires on_session_end → on_session_reset → on_session_start → optional carry_over_new_session_context. Engines implement only the hooks they need; missing hooks are skipped. Built-in compressor keeps its existing reset-only behavior because callers default to no metadata. `reset_session_state()` now optionally accepts previous_messages / old_session_id / carry_over_context and delegates to the transition helper when provided. (#16453) 2. `conversation_id` passed to `on_session_start()` — both the agent-init call site and the compression-boundary call site now forward `self._gateway_session_key` so plugin engines have a stable conversation identity that survives session_id rotation (compression splits, /new, resume). The key already existed on AIAgent; it just wasn't reaching engines. (#16453) 3. Canonical cache buckets forwarded to engines — the usage dict passed to `update_from_response()` now includes input_tokens, output_tokens, cache_read_tokens, cache_write_tokens, and reasoning_tokens on top of the legacy prompt/completion/total keys. Engines can make decisions on cache-hit ratios and reasoning costs instead of only aggregates. ABC docstring updated. (#17453) 4. Plugin-registered context engines visible in the picker — `_discover_context_engines()` in plugins_cmd.py now also includes engines registered via `ctx.register_context_engine()` from plugin manifests, deduplicating by name so repo-shipped descriptions win on collision. (#16451) 5. `_EngineCollector.register_command()` — context engines using the standard `register(ctx)` pattern can now expose slash commands (e.g. `/lcm`). Routes to the global plugin command registry with the same conflict-rejection policy regular plugins use (no shadowing built-ins, no clobbering other plugins). Previously these calls hit a no-op and the slash commands silently never appeared. (#17600) Dropped from the original 5 PRs: - Compression boundary signal (`boundary_reason="compression"`) from #16453 — already on main at `agent/conversation_compression.py:412-424`, landed via the bg-review extraction. - `discover_plugins()` before fallback in run_agent.py from #16451 — redundant: `get_plugin_context_engine()` already routes through `_ensure_plugins_discovered()` which is idempotent. - Runtime identity diagnostics method + helpers from #13373 (+251 LOC) — operators can already read engine state via `engine.get_status()`; the diagnostics view added marginal value relative to its surface area. - The 553-LOC slash-command machinery from #17600 — replaced with a 20-LOC `register_command` method on the collector that reuses the existing plugin command registry instead of building a parallel one. Net: ~215 LOC of host-contract changes + 282 LOC of focused tests, vs ~1,176 LOC across the original 5 PRs. Co-authored-by: Tosko4 <1294707+Tosko4@users.noreply.github.com> Closes #16453. Closes #17453. Closes #16451. Closes #17600. Closes #13373. Related: stephenschoettler/hermes-lcm#68. |
|||
| 406901b27d | feat(auth) normalise the way in which we check whether a user has free/paid access to nous portal so we can expose behaviour and error messages accordingly. | |||
| 2e3c6627ce |
Add Honcho runtime peer mapping
(cherry picked from commit 864cdb3d2e64a46edfca4158646752b163b90ba0) |
|||
| 4243b6dc45 | fix(codex): update silent-hang workaround hint | |||
| febc4cfec0 |
remove Vercel AI Gateway and Vercel Sandbox (#33067)
* remove Vercel AI Gateway provider and Vercel Sandbox terminal backend Both Vercel-hosted integrations are removed end-to-end. Users on the AI Gateway should switch to OpenRouter or one of the other aggregators (Nous Portal, Kilo Code). Users on the Vercel Sandbox backend should switch to Docker, Modal, Daytona, or SSH. What's removed: - `plugins/model-providers/ai-gateway/` provider plugin - `hermes_cli/vercel_auth.py` Vercel-Sandbox auth helper - `tools/environments/vercel_sandbox.py` terminal backend - `ai-gateway` provider wiring across auth, doctor, setup, models, config, status, providers, main, web_server, model_normalize, dump - `vercel_sandbox` backend wiring across terminal_tool, file_tools, code_execution_tool, file_operations, approval, skills_tool, environments/local, credential_files, lazy_deps, prompt_builder, cli, gateway/run - `AI_GATEWAY_BASE_URL` constant, `_AI_GATEWAY_HEADERS` auxiliary-client header set, run_agent base-URL header/reasoning special-cases - `[vercel]` pyproject extra and `vercel`/`vercel-workers` from uv.lock - env vars: `AI_GATEWAY_API_KEY`, `AI_GATEWAY_BASE_URL`, `VERCEL_TOKEN`, `VERCEL_PROJECT_ID`, `VERCEL_TEAM_ID`, `VERCEL_OIDC_TOKEN`, `TERMINAL_VERCEL_RUNTIME` - Tests: deletes test_ai_gateway_models.py and test_vercel_sandbox_environment.py; scrubs references across 23 surviving test files (no entire tests deleted unless they were dedicated to AI Gateway / Sandbox) - Docs: provider tables, env-var reference, setup guides, security notes, tool config, terminal-backend tables — English plus zh-Hans i18n parity - `hermes-agent` skill: provider table entry and remote-backend list What stays (intentional): - `popular-web-designs/templates/vercel.md` — CSS design reference, unrelated to Vercel-the-AI-product - `x-vercel-id` in `stream_diag.py` headers — generic Vercel CDN response header, useful diag signal on any Vercel-hosted endpoint - `vercel-labs/agent-browser` URL in browser config — lightpanda browser project, different OSS effort - `userStories.json` historical contributor entry mentioning Vercel Sandbox — archive, not active docs Validation: - 1153 tests in the 22 targeted files pass (`scripts/run_tests.sh`) - Full repo `py_compile` clean - Live import of every touched module + invariant check (no `ai-gateway` in `PROVIDER_REGISTRY`, no `_AI_GATEWAY_HEADERS`, no `vercel_sandbox` in `_REMOTE_TERMINAL_BACKENDS`) * test: convert profile-count check from change-detector to invariant The hardcoded "== 34" assertion broke when ai-gateway was removed. Per AGENTS.md change-detector-test guidance, assert the relationship (registry count >= number of plugin dirs) instead of a literal count. Counts shift when providers are added/removed; that's expected. |
|||
| b6ca56f651 |
fix(codex-responses): gracefully recover from invalid_encrypted_content (salvage #10144) (#33035)
* fix(codex-responses): gracefully recover from invalid_encrypted_content (salvage #10144) When an OpenAI-compatible Responses API surface accepts an initial request but later rejects the replayed `codex_reasoning_items` encrypted blob with HTTP 400 `invalid_encrypted_content`, the session previously got stuck retrying the same poisoned payload. Recovery: classify the error as a dedicated FailoverReason, and on the first hit disable encrypted reasoning replay for the rest of the session, strip cached items from message history, and retry once. Changes: * error_classifier: add FailoverReason.invalid_encrypted_content branch in _classify_400 (before context_overflow so the messages that mention 'encrypted content … could not be verified' don't trip context heuristics), in _classify_by_error_code, and extend _extract_error_code to peek inside wrapped JSON in error.message and ignore the bare '400' as a code. * agent_init: initialize `_codex_reasoning_replay_enabled = True` on every agent. * run_agent: add AIAgent._disable_codex_reasoning_replay() helper that flips the flag and pops cached items. * codex_responses_adapter: thread a `replay_encrypted_reasoning` kwarg through _chat_messages_to_responses_input so that when the flag is False we don't replay codex_reasoning_items. * transports/codex.py: read `replay_encrypted_reasoning` from params, thread it into the adapter, and gate the `include=['reasoning.encrypted_content']` request hint on it. * chat_completion_helpers: pass the agent's replay flag through to the transport. * conversation_loop: in the retry loop, add an invalid_encrypted_content recovery branch that fires once per session, only when api_mode == codex_responses, only when replay is still enabled, and only when at least one assistant message in history actually carries cached reasoning items (otherwise the 400 has nothing to do with our cache and the normal retry path handles it). Tests: * test_error_classifier: new wrapped-JSON _extract_error_code case; new TestClassifyApiError cases proving the 400 is retryable with no fallback, that the broad message match doesn't catch a generic 'parsed' message, and that the error code match is case-insensitive. * test_run_agent_codex_responses: end-to-end test of the recovery branch firing once and disabling replay, plus a sibling test that proves the branch does *not* fire (and the flag stays True) when history has no cached reasoning items. Salvages PR #10144 onto the post-refactor module layout (error_classifier / codex_responses_adapter / transports/codex / conversation_loop / agent_init) since the original diff was written against the pre-refactor monolithic run_agent.py. * chore(release): map victorGPT in AUTHOR_MAP for #10144 salvage --------- Co-authored-by: victorGPT <wuxuebin1993@gmail.com> |
|||
| b1adb95038 |
fix(codex): surface actionable hint when stale-call detector fires on known silent-reject pattern
The ChatGPT Codex backend (chatgpt.com/backend-api/codex) has historically silently dropped certain model requests: the connection is accepted but no stream events are emitted and no error is raised. PR #31967 lowered the implicit stale-call default from 300s to 90s so fallbacks kick in faster, but users still see an opaque "No response from provider for 90s (non-streaming, ...)" message that gives no path forward. This patch adds a narrow heuristic — gpt-5.5 family on the Codex backend via codex_responses api_mode — that substitutes the generic timeout message with actionable text naming the gpt-5.4-codex workaround and pointing at #21444 for symptom history. Changes: - run_agent.py — new ``AIAgent._codex_silent_hang_hint(model=...)`` method. Returns ``None`` for any request that does not match all three guards (codex_responses api_mode, openai-codex provider or chatgpt.com Codex base URL, gpt-5.5-family model name with word-boundary regex anchoring to avoid false-positives on e.g. ``gpt-5.50``). - agent/chat_completion_helpers.py — the non-stream stale-call site consults the hint via ``getattr(...)`` so the call site stays robust if the helper is ever removed or stubbed in tests. Hint is appended to both the ``_emit_status`` warning and the ``TimeoutError`` message so the user sees it in their terminal AND it lands in any retry-loop diagnostics. - tests/run_agent/test_codex_silent_hang_hint.py — 10 regression tests covering positive cases (bare gpt-5.5, vendor-prefixed openai/gpt-5.5, gpt-5.5-codex SKU, model=None fallback to self.model) and negative cases (gpt-5.4-codex workaround, gpt-5.50 false-positive guard, non-codex api_mode, non-codex provider, empty/None model, unrelated models on Codex). Does NOT fix the backend-side issue (that's an upstream OpenAI/ChatGPT problem we cannot patch from here). Only converts an opaque timeout into text that names the workaround so users do not have to dig through logs or wait for a forum post to learn what to do. Closes #22046 |
|||
| 2d422720b5 |
fix(codex): size and propagate timeouts for Responses-API requests; lower stale defaults
Codex / Responses-API requests had three latent timeout bugs that combined into the long silent hangs reported on #21444: 1. The non-stream stale-call detector estimated context tokens from ``api_kwargs["messages"]`` only. Codex / Responses-API payloads carry their conversational load in ``input`` (with ``instructions`` and ``tools``), so every Codex turn logged ``context=~0 tokens`` and the detector never applied its >50k / >100k tier bumps. 2. ``providers.<id>.request_timeout_seconds`` was silently dropped on the main Codex path. The chat_completions path and the auxiliary Codex adapter both forwarded it; the main path skipped it through three places (``build_api_kwargs``, ``ResponsesApiTransport.build_kwargs``, ``_preflight_codex_api_kwargs``). 3. The streaming stale detector had the same payload-shape bug for ``codex_responses`` requests, which route through the non-streaming detector (it's the path that emits the user-facing "No response from provider for 300s (non-streaming, ...)" warning that reporters keep pasting). This commit: - Adds ``estimate_request_context_tokens`` in ``chat_completion_helpers``, used by both the non-stream and stream detectors. Handles ``messages`` (Chat Completions), ``input + instructions + tools`` (Responses API), bare lists, and an unknown-dict fallback. - Forwards ``timeout`` through ``ResponsesApiTransport.build_kwargs`` and ``_preflight_codex_api_kwargs`` (with guards against zero/negative/inf/bool values), and wires ``_resolved_api_call_timeout()`` into the Codex branch of ``build_api_kwargs``. - Lowers the implicit non-stream stale defaults so fallback providers kick in faster when upstream stalls: * base 300s -> 90s * >50k 450s -> 150s * >100k 600s -> 240s These only apply when the user has *not* set ``providers.<id>.stale_timeout_seconds`` or ``HERMES_API_CALL_STALE_TIMEOUT``. Explicit config still wins. - Adds regression tests for the estimator shapes, the new defaults, the context-tier scaling, transport timeout pass-through, and preflight timeout pass-through / rejection of invalid values. Closes #21444 Supersedes #21652 #24126 #31855 Co-authored-by: Hoang V. Pham <26063003+hehehe0803@users.noreply.github.com> |
|||
| dcc163ee28 |
fix(security): redact credentials before persistence in session capture
Two-layer redaction at the persistence boundary so credentials never reach
state.db, session_*.json, or compression:
1. agent/chat_completion_helpers.py :: build_assistant_message
- Redact assistant content before the message dict is constructed
(catches PATs / API keys the model inlines into natural language)
- Redact tool_call.function.arguments at the same site (catches secrets
inlined into tool args, e.g. terminal command=curl -H 'Authorization: ...')
Tool execution uses the raw API response object, not this dict, so
redacting the persisted shape is safe.
2. run_agent.py :: _save_session_log
- Add _redact_message_content() static helper that handles both string
content and OpenAI/Anthropic multimodal list-of-parts (image parts
pass through untouched, only text/content fields are redacted)
- Apply to every message + the cached system prompt before writing
session_*.json
Both layers respect HERMES_REDACT_SECRETS via redact_sensitive_text —
no-op when disabled.
Tests (TestSaveSessionLogRedactsSecrets, 4 cases):
- api key in tool content
- api key in user message
- api key in system prompt
- multimodal list-of-parts (image part preserved, text redacted)
Tests use an autouse fixture to force _REDACT_ENABLED=True because the
hermetic conftest defaults the env var to false.
Salvaged from PR #24758 by @vgocoder (build_assistant_message + session_log)
+ PR #19855 by @liuhao1024 (multimodal list helper, system_prompt redaction).
Kept only the redaction concern from #19855; its unrelated whatsapp npm
timeout + PATCH_SCHEMA changes are out of scope and dropped.
Refs #19798 (PAT leak via assistant inline mention), #19845 (session capture
credential leak).
Co-authored-by: liuhao1024 <liuhao03@bilibili.com>
Co-authored-by: teknium1 <127238744+teknium1@users.noreply.github.com>
|
|||
| 8b3cb930c9 |
fix(xai-oauth): honor [WKE=unauthenticated:...] disambiguator in entitlement classifier (#29344)
``_is_entitlement_failure`` over-matched on xAI 403s. xAI returns the
same permission-denied ``code`` text for two distinct conditions:
1. Unsubscribed account ("active Grok subscription. Manage at
https://grok.com" in the ``error`` field).
2. Stale OAuth access token ("OAuth2 access token could not be
validated. [WKE=unauthenticated:bad-credentials]" in the ``error``
field).
The classifier's "does not have permission + grok" substring heuristic
treated both identically, so the credential-pool refresh path was
short-circuited for case (2) — long-running TUI sessions stuck on a
stale OAuth token surfaced a non-retryable client error and the user
had to exit + reopen the TUI to recover (the startup-resolve path
bypasses the classifier entirely, which is why bridge adapters with
proactive refresh cadences didn't see this in practice).
This patch adopts the reporter's recommended fix (option 1, tightest):
honor xAI's explicit ``[WKE=unauthenticated:...]`` suffix and the
``OAuth2 access token could not be validated`` phrasing as
authoritative "this is auth, not entitlement" signals. When either
appears anywhere in the body's text fields, the classifier returns
False eagerly — *before* the entitlement keyword checks run — so the
refresh-on-401 path takes over and the existing loop-protection still
guards against runaway refresh storms if the refresh itself fails.
Two small adjustments fall out of this:
* The haystack now also covers ``code`` and ``error`` keys directly,
not just the ``message``/``reason`` shape ``_extract_api_error_context``
produces. Real runtime paths use the normalised shape, but the test
suite and any future call sites that pass raw bodies get the same
treatment. Backwards compatible: missing keys default to empty
strings, the haystack still skips when everything is blank.
* Both disambiguator checks fire BEFORE the entitlement keyword
checks. If a future xAI body somehow lands with both an entitlement
message AND the WKE suffix, the WKE suffix wins (correct — auth is
recoverable; entitlement is not, and a refreshed token will surface
the entitlement message on the next request anyway).
Existing tests (``test_is_entitlement_failure_matches_real_xai_bodies``,
``test_is_entitlement_failure_false_for_unrelated_auth_errors``,
``test_recover_with_credential_pool_skips_refresh_on_entitlement_403``,
``test_recover_with_credential_pool_still_refreshes_genuine_auth_failure``)
continue to pass unchanged — the unsubscribed-account path, the
generic auth-error path, and the refresh-on-401 path are all left
intact.
|
|||
| 30c22f1158 |
fix(api-call): defer client.close() to owning worker thread on interrupt (#29507)
Layer-2 defense for the FD-recycling race: even with ``force_close_tcp_sockets`` reduced to shutdown-only, the followup ``client.close()`` in ``_close_openai_client`` still walks the httpx pool and closes sockets — and if called from a stranger thread (the interrupt-check loop, the stale-call detector) it has the same FD-recycling exposure that wrote a TLS record on top of ``kanban.db``. Stamp the request_client_holder with the owning thread's ident at ``_set_request_client`` time. In ``_close_request_client_once``: * Owning thread (the worker's ``finally``) → pop + ``client.close()`` via ``_close_request_openai_client``, exactly as before. * Stranger thread → ``_abort_request_openai_client`` (new): only ``shutdown(SHUT_RDWR)`` the pool sockets and log a deferred-close marker. The holder stays populated so the worker's eventual ``finally`` performs the real close from its own thread context, where the FD release races nothing. Applied symmetrically to both the non-streaming ``interruptible_api_call`` and the streaming variant — both routinely get hit by stranger-thread interrupts. The log field ``tcp_force_closed=N`` keeps its existing shape; the new abort path adds ``deferred_close=stranger_thread`` so production triage can distinguish the two close kinds. |
|||
| c769be344a |
fix(agent): recover from providers rejecting list-type tool content (#27344) (#30259)
Some providers (Xiaomi MiMo, some Alibaba endpoints, a long tail of OpenAI-compatible servers) follow the OpenAI spec strictly and require tool message `content` to be a string — they reject our list-type content (text + image_url parts) with HTTP 400 'text is not set' / 'tool message content must be a string'. Instead of an allowlist of known-good providers (maintenance burden, guaranteed to miss aggregators like OpenRouter where the underlying model determines support, not the aggregator name), this lands a reactive recovery: 1. New `FailoverReason.multimodal_tool_content_unsupported` with a small pattern list covering the common 400 wordings. 2. `AIAgent._try_strip_image_parts_from_tool_messages` walks the API message list, downgrades any `role:tool` message whose content is list-with-image to a plain text summary (preserves text parts) in place, AND records the active (provider, model) in a session-scoped `_no_list_tool_content_models` set. 3. `_tool_result_content_for_active_model` short-circuits to a text summary when (provider, model) is in the cache — so after the first 400 + retry, subsequent screenshots in the same session skip the round trip entirely. 4. Retry hook in `agent.conversation_loop` mirrors the existing `image_too_large` recovery: detect the reason, run the helper, retry once, fall through to the normal error path if no list-type tool content was actually present. Cache is transient (per-session) by design — next session retries in case the provider added support, no persistent state to maintain. Fixes #27344. Closes #27351 (allowlist approach superseded by reactive recovery). |
|||
| 32aea113f0 |
fix(agent): consult supports_vision override in auto-mode routing
The contributor PR (#17936) only patched the strip path in `_model_supports_vision()`. The auto-mode router in `agent/image_routing._lookup_supports_vision` still only read models.dev, so a custom-provider model declared as vision-capable would still get its images routed through vision_analyze in the default `agent.image_input_mode: auto` setting. Users had to set both `supports_vision: true` AND `image_input_mode: native` to bypass the text pipeline. Single-knob behavior now: `supports_vision: true` alone is enough in auto mode. The strip path and the routing path consult the same resolver. - Extract override resolution into `_supports_vision_override()` in agent/image_routing.py and wire it into `_lookup_supports_vision()`. - Refactor `run_agent._model_supports_vision` to call the same helper (DRY, single source of truth for the resolution order). - Strict YAML boolean coercion: `supports_vision: "false"` (quoted — a common YAML mistake) no longer coerces to True via bool() truthiness. Recognised tokens: true/false/yes/no/on/off/1/0 plus real bools and 0/1. Unrecognised values return None and fall through to models.dev. - Add @CNSeniorious000 to AUTHOR_MAP for release attribution. Tests: 26 new (TestCoerceCapabilityBool, TestSupportsVisionOverride, TestLookupSupportsVisionOverride, TestAutoModeRespectsOverride). Existing contributor tests + image_routing + vision_native_fast_path + native_image_buffer_isolation all green (92/92). |
|||
| 1c76689b28 |
fix(agent): resolve supports_vision override for named custom providers
Named custom providers are rewritten to provider="custom" at runtime (hermes_cli/runtime_provider.py:_resolve_named_custom_runtime), so a config under providers.my-vllm.models.my-llava.supports_vision was unreachable via self.provider alone. Also try cfg.model.provider as a candidate provider key, covering both runtime and config naming. Adds a regression test for the named-provider path. |
|||
| 24c7ce0fb8 |
feat(agent): allow declaring supports_vision via user config
Custom/local provider models absent from models.dev get classified as
non-vision and have their image content stripped before reaching the
upstream API. Surface a user-facing override:
model:
supports_vision: true
providers:
my-vllm:
models:
my-llava:
supports_vision: true
The override short-circuits the models.dev lookup in
_model_supports_vision(), which is the single gate guarding image-strip
preprocessing on every transport path.
Refs #8731.
|
|||
| eeb747de25 |
feat(sessions): opt-in per-session JSON snapshot writer
PR #29182 deleted the per-session JSON snapshot writer outright because state.db is canonical and the snapshots had no in-tree consumer. Some users have external tooling that reads `~/.hermes/sessions/session_{sid}.json` directly, so reintroduce the writer behind a config flag that defaults to off. - Add `sessions.write_json_snapshots` (default False) to DEFAULT_CONFIG - Restore `AIAgent._save_session_log` + `_clean_session_content` as gated methods. When the flag is off the call is a fast no-op; when on, the writer behaves as before (atomic write, truncation guard preserved, REASONING_SCRATCHPAD → think tag normalization) - Re-derive the target path from `agent.session_id` on each call so `/branch` and `/compress` re-points happen automatically — no need to restore the explicit re-point bookkeeping at call sites - Wire the single call site in `_persist_session` (the cleanup-on-exit hook). Did NOT restore the 7 intra-turn calls the original PR deleted — those were redundant writes within the same turn that doubled disk I/O without adding any persistence guarantee `_persist_session` does not already provide - Read the flag once at agent init via `load_config()`, cache as `agent._session_json_enabled` - Update `TestNoSessionJsonSnapshot` → `TestSessionJsonSnapshotOptIn` to pin behavior: default off (no file), opt-in true (file written), no-op method on default agents, logs_dir retained unconditionally - Update CONTRIBUTING.md and the bundled `hermes-agent` skill to document the flag and its default |
|||
| 6f1a5f8597 |
refactor(session-log): delete dead _clean_session_content helper
Only caller was the removed _save_session_log. Also removes the unused convert_scratchpad_to_think and has_incomplete_scratchpad imports from run_agent.py (both still used elsewhere via their own imports). |
|||
| ce26785187 |
refactor(session-log): delete _save_session_log and all callers
state.db now stores every message field the JSON snapshot stored. Removed the method, all 7 call-sites, and ~13 test stubs that suppressed its file I/O. Body is in git history if it ever needs to come back. |
|||
| 544c31b50b |
perf(agent-loop): cut 47% of per-conversation function calls via 3 targeted hot-path optimizations (#28866)
* perf(config): add load_config_readonly() fast path for hot agent loop
`load_config()` is called from the agent loop's per-API-call hot path via
`get_provider_request_timeout()` and `get_provider_stale_timeout()` —
both invoked once per turn from `_resolved_api_call_timeout()` in
run_agent.py.
Profiling a synthetic 20-tool-call agent run revealed:
- 21 invocations of `load_config()` cumulating 56ms (~17% of agent loop)
- 34,398 deepcopy calls totaling 37ms (config defensive deepcopy + chain)
- 8,652 `_expand_env_vars` invocations (~412 per turn)
Microbench (cache-hit, real config.yaml present):
load_config() 265us/call (125us deepcopy + 140us infra)
load_config_readonly() 138us/call (~48% faster)
`load_config_readonly()` returns the cached dict directly without the
defensive deepcopy. Documented contract: caller must not mutate. Returns
plain dict (not MappingProxyType) so downstream `isinstance(x, dict)`
guards keep working — caught during initial implementation when
MappingProxyType broke get_provider_request_timeout's guard logic.
Wired into hermes_cli/timeouts.py (the two functions called per agent
turn). load_config() is unchanged for the 263 other call sites that
mutate the result before save_config(), are not in the hot path, or
where the safety guarantee matters more than the perf.
Profile A/B (cached config, 21-turn agent loop):
BEFORE AFTER delta
get_provider_request_timeout 55ms 16ms -71%
total function calls 399k 160k -60%
deepcopy calls (in hotspots) 34,398 ~0 ~elim
Verified:
- isinstance(load_config_readonly(), dict) is True
- timeout/stale resolutions correct
- load_config() still returns isolated mutable deepcopies
- tests/hermes_cli/test_config*.py / test_timeouts.py: 102/102 pass
- tests/cli/ + tests/agent/test_auxiliary_client.py: 883/883 pass
* perf(redact): substring pre-screens skip non-matching regex chains
Every log record passes through `RedactingFormatter.format` which calls
`redact_sensitive_text`, which historically ran ALL 13 secret-pattern
regexes against every line — including DB connection strings, JWTs,
Discord mentions, Signal phone numbers, etc. — even for typical clean
log records like 'INFO run_agent: API call completed'.
Add cheap substring pre-checks before each regex pass. False positives
still run the regex (which then matches nothing); false negatives are
impossible because every pattern requires the gated substring to match
its leading anchor:
- `_PREFIX_RE` gated on any of 33 known credential prefix substrings
- `_ENV_ASSIGN_RE` gated on `=` in text
- `_JSON_FIELD_RE` gated on `:` and `"` in text
- `_AUTH_HEADER_RE` gated on `uthorization`/`UTHORIZATION` in text
- `_TELEGRAM_RE` gated on `:` in text
- `_PRIVATE_KEY_RE` gated on `BEGIN` and `-----`
- `_DB_CONNSTR_RE` gated on `://` in text
- `_JWT_RE` gated on `eyJ` in text
- URL userinfo/query gated on `://`
- `_redact_form_body` gated on `&` and `=`
- `_DISCORD_MENTION_RE` gated on `<@`
- `_SIGNAL_PHONE_RE` gated on `+`
Microbench (5 typical log records, 20k iterations each):
BEFORE AFTER delta
redact_sensitive_text per call 5.63us 1.79us -68%
Real-world impact: ~244 log records emitted in a 30-turn agent loop, so
the chain saves ~1ms of CPU per conversation. Bigger win is the
reduction in regex execution and GC pressure during heavy logging
sessions (verbose logging, gateway message processing).
Security regression test: 30 secret-containing inputs (sk-/ghp_/JWT/DB
connstr/Auth-Bearer/private key/URL userinfo/Discord/Signal/etc.)
verified to produce identical redacted output before/after. All 75
existing tests/agent/test_redact.py cases pass.
The `?access_token=foo&code=bar` (bare query string, no scheme) case
that 'leaks' is pre-existing behavior — the URL query redaction
requires a well-formed URL with scheme+host. Not a regression.
* perf(run_agent): cache _needs_thinking_reasoning_pad result per (provider, model, base_url)
Profile of a 31-turn synthetic agent run shows `_needs_thinking_reasoning_pad`
fires 495 times (~16 per turn) and each call ran 3 helper methods, each
hitting `base_url_host_matches` 1-4 times via `urlparse`. Total cost:
3,342 base_url_host_matches calls + 3,373 urlparse calls accounting for
~36ms of agent-loop overhead (~7% of the entire post-network work).
Provider / model / base_url don't change during a conversation except via
`switch_model` and fallback activation — both of which already overwrite
those attributes atomically. Cache the result on a tuple key; since the
key is derived from the very fields that would change, the cache
auto-invalidates on the next read after a switch. No manual invalidation
needed in switch_model / _try_activate_fallback.
Profile A/B (31-turn cached-config agent run):
BEFORE AFTER delta
_needs_thinking_reasoning_pad cum 18ms 1ms -94%
_copy_reasoning_content_for_api cum 17ms 1ms -94%
base_url_host_matches calls 3,342 372 -89%
urlparse calls 3,373 403 -88%
total function calls 296k 223k -25%
Verified:
- tests/run_agent/test_deepseek_reasoning_content_echo.py: 36/36 pass
- tests/run_agent/ (full): 1383/1383 pass + 3 skipped
|
|||
| 700f3b13e7 |
fix: recognize emoji and caret as natural response endings
GLM models via Ollama report finish_reason='stop' even when the response was truncated by max_tokens. The continuation mechanism uses _has_natural_response_ending() as one of the heuristics to detect whether the response was genuinely finished. Currently only ASCII punctuation and CJK punctuation are recognized. This means any response ending with an emoji (e.g. ⚡, 👍) or the caret character ^ (common in French ^^ smiley) is not recognized as naturally ended, triggering a false-positive continuation where the model receives 'Continue where you left off' and produces garbled output. Add: - ^ (caret) to the punctuation set - Unicode emoji range (codepoint >= 0x1F300) as natural ending This only affects GLM/Ollama users but the fix is safe for all backends since _has_natural_response_ending() is only consulted inside the continuation flow. |
|||
| 1634397ddb |
fix(compress): abort instead of dropping messages when summary LLM fails (#28102)
When auxiliary compression's summary generation returns None (aux model errored, returned non-JSON, timed out, etc.) the compressor previously still dropped every middle message between compress_start..compress_end and replaced them with a static 'Summary generation was unavailable' placeholder. The session kept going but the user silently lost N turns of context for nothing. New behavior: on summary failure, compress() aborts entirely — returns the input messages unchanged and sets _last_compress_aborted=True. The existing _summary_failure_cooldown_until gate (30-60s) keeps the aux model from being burned on every turn. Auto-compress callers detect the no-op (len(after) == len(before)) and stop looping. The chat is 'frozen' at its current size until the next /compress or /new. Manual /compress (CLI + gateway) now passes force=True which clears the cooldown so users can retry immediately after an auto-abort. If the manual retry also fails, the user gets a visible warning telling them nothing was dropped and how to retry. - agent/context_compressor.py: compress() gains force= kwarg; failure branch sets _last_compress_aborted and returns messages unchanged instead of inserting placeholder. - run_agent.py: _compress_context() detects abort, surfaces warning, skips session-rotation entirely, returns messages unchanged. - cli.py + gateway/run.py: manual /compress paths pass force=True. - gateway/run.py: hygiene + /compress handlers detect _last_compress_aborted and emit the new 'Compression aborted' warning (gateway.compress.aborted) instead of the old 'N historical messages were removed' message. - locales/*.yaml: new gateway.compress.aborted key in all 16 locales. - tests: updated to assert the abort contract (messages preserved, compression_count not incremented, abort flag set, no placeholder leaked). New test_force_true_bypasses_failure_cooldown covers the manual-retry path. |
|||
| 9df9816dab |
feat(azure-foundry): add Microsoft Entra ID auth
Use azure-identity DefaultAzureCredential for keyless Foundry auth. Preserve refreshable callable credentials through OpenAI and Anthropic client paths. Add setup, doctor, auth status, docs, and tests for Entra auth. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> |
|||
| 20bffa5b37 | refactor(auth): mostly cleanups and style changes | |||
| 0bac7dd05b | refactor(auth): collapse Nous inference fallback controls | |||
| 532b209f01 | fix(run_agent): scope kimi tool-reasoning trigger to host, not model name substring | |||
| 5fba236644 |
chore: ruff auto-fix PLR6201 resweep — tuple → set in membership tests (#27355)
Six days after #23937 (608 fixes) the codebase had accumulated 241 new PLR6201 violations. Same mechanical `x in (...)` → `x in {...}` fix, same zero-risk profile: set lookup is O(1) vs O(n) for tuple and the two are semantically equivalent for hashable scalar membership tests. All 241 instances fixed via `ruff check --select PLR6201 --fix --unsafe-fixes`, zero remaining. Every changed value is a hashable scalar (str/int/None/enum/signal); no risk of unhashable runtime errors. No behavior change. Test plan: - 119 files changed, +244/-244 (net zero) — exactly one-line edits - `ruff check` clean afterward - Compile checks pass on the largest touched files (cli.py, run_agent.py, gateway/run.py, gateway/platforms/discord.py, model_tools.py) - Subset broad test run on tests/gateway/ tests/hermes_cli/ tests/agent/ tests/tools/: 18187 passed, 59 pre-existing failures (verified against origin/main with the same shape — identical failure count, identical category — all xdist test-order flakes unrelated to this change) Follows the same template as PR #23937 ([tracker: #23972](https://github.com/NousResearch/hermes-agent/issues/23972)). |
|||
| aa05ffba53 |
fix(xai): surface provider 'error' SSE frame in Codex fallback stream (#27184)
Original commit
|
|||
| fe4c87eb28 |
fix(agent): retry malformed anthropic stream parser errors — port to extracted modules
Original commit
|
|||
| 6975a2d9ae |
fix(xai-oauth): entitlement-403 chain — final state (ce0e189d3 + 9818b9a1a + 6784c8079 + dffb602f3)
Collapses the four-commit xAI entitlement-403 chain to its final
on-main state, ported to the post-refactor module layout:
- Added _is_entitlement_failure on AIAgent (run_agent.py) — detects
Grok subscription-shape 403s on (401|403|None) status codes.
- Added entitlement-skip branch to recover_with_credential_pool
(agent/agent_runtime_helpers.py) — breaks the refresh-loop that
Don's 100-iteration trace exposed when a Premium+ user hit a real
entitlement issue.
- Removed _decorate_xai_entitlement_error and unwrapped its two
_summarize_api_error call sites — xAI's own body text already
points users at grok.com/?_s=usage so we surface that verbatim
(
|
|||
| 6362e71973 |
fix(xai-oauth): recover from prelude SSE errors, gate reasoning replay, surface entitlement 403s
Original commit
|
|||
| 27df249564 |
feat(nvidia): add NIM billing origin header — port to extracted modules
Original commit
|
|||
| b07524e53a |
feat(xai-oauth): add xAI Grok OAuth (SuperGrok Subscription) provider — port to extracted modules
Original commit
|