hermes-agent

Author	SHA1	Message	Date
kyssta-exe	6bdbe30763	fix(vision): guard image pixel dimensions, not just bytes (#37677 ) Anthropic enforces two independent ceilings per image: 1. 5 MB encoded byte size 2. 8000 px longest side Hermes only guarded #1. A tall screenshot (e.g. 1200x12000 at 0.06 MB) passes every byte check but fails the pixel check, returning a non-retryable HTTP 400 that permanently bricks the conversation thread. Fixes: - error_classifier: add 'image dimensions exceed' pattern to _IMAGE_TOO_LARGE_PATTERNS so the 400 is classified as image_too_large and triggers the shrink/retry path instead of falling through to non-retryable error. - conversation_compression: check pixel dimensions (via Pillow) even when byte size is under the 4 MB target. If max(dims) > 8000, force shrink. - vision_tools._resize_image_for_vision: add optional max_dimension param. When set, images exceeding the pixel cap are downscaled even if they're under the byte budget. The resize loop now checks both byte AND pixel limits before accepting a candidate. Closes #37677	2026-06-04 06:16:45 -07:00
teknium1	7314757876	refactor(feishu): slim meeting-invite parser; add AUTHOR_MAP entry Collapse the payload-shape normalization helpers into one _as_dict and drop unused dataclass fields (user_type/user_role, duplicate id, bot) on the meeting-invite handler. Module 274->212 LOC, behavior unchanged. Add zhaolei.vc@bytedance.com -> zhaoleibd to release.py AUTHOR_MAP.	2026-06-04 06:15:23 -07:00
zhaolei.vc	f3bbfda6d1	feat(gateway): handle Feishu meeting invitations Change-Id: I8cf5638393dd9adb1d7be5e170ce5082b41f77fa	2026-06-04 06:15:23 -07:00
Teknium	38d3c49aaf	refactor(skills): clean up bundled skill set + add environments: relevance gate (#39028 ) * refactor(skills): clean up bundled skill set + add environments: relevance gate Bundled skills cleanup pass plus a new offer-time relevance gate. Removals (redundant / dead): - spotify (covered by the spotify plugin's 7 native tools) - linear (covered by `hermes mcp install linear`) - kanban-codex-lane, debugging-hermes-tui-commands - empty category markers: diagramming, gifs, inference-sh, mlops/training, mlops/vector-databases - domain (stale orphan dup of optional/research/domain-intel) Bundled -> optional: - baoyu-article-illustrator, baoyu-comic, creative-ideation, pixel-art - dspy, subagent-driven-development - minecraft-modpack-server, pokemon-player - hermes-s6-container-supervision (-> optional/devops) Consolidation: - webhook-subscriptions + native-mcp folded into the hermes-agent skill as references/webhooks.md + references/native-mcp.md with SKILL.md pointers - writing-plans merged into plan (v2.0.0); related_skills + prose refs updated New: environments: frontmatter gate (agent/skill_utils.skill_matches_environment) - Offer-time relevance filter (kanban / docker / s6), parallel to platforms:. - Wired into the 3 OFFER surfaces only (prompt_builder skills index, skills_tool.list_skills, skill_commands slash discovery). - Explicit loads (skill_view, --skills preload) intentionally BYPASS it, so load-bearing force-loads like the kanban dispatcher's `--skills kanban-worker` always resolve. Verified via E2E. - kanban-orchestrator/kanban-worker tagged environments: [kanban]; hermes-s6-container-supervision tagged environments: [s6] + platforms: [linux]. Validation: 8/8 E2E gating assertions (incl force-load invariant); 442 targeted tests green (agent, skills_tool, skill_commands, kanban worker). * docs: regenerate skill catalogs + pages for the bundled cleanup Regenerated per-skill doc pages, catalogs, and sidebar to match the skill moves/removals in the parent commit. Moved skills' pages relocate bundled -> optional (history preserved); removed skills' pages deleted; edited skills' pages refreshed (hermes-agent now embeds the webhook + native-mcp reference pointers). zh-Hans i18n mirror: stale bundled pages and catalog rows for moved/removed skills pruned (new optional translations land via the translation pipeline). * test: drop regression test for removed kanban-codex-lane skill The kanban-codex-lane skill was removed in the bundled-skills cleanup; its dedicated regression test read the now-deleted SKILL.md and failed with FileNotFoundError on CI shard 6.	2026-06-04 06:11:22 -07:00
teknium1	c136eb4de1	fix(update): harden venv rebuild + verify core deps after install Two complementary fixes for a silent partial-install failure that bit ``hermes update`` in the wild: a fresh checkout pulled 145 commits, ``rebuild_venv`` failed to recreate the venv on Windows because ``shutil.rmtree(ignore_errors=True)`` couldn't delete files held open by the running ``hermes.exe`` shim. ``uv venv`` then refused with "A directory already exists at: venv" and the update fell back to installing on top of the stale venv. The resulting partial install missed exactly one newly-added base dep — ``pathspec==1.1.1`` — which ``hermes desktop --build-only`` imports at the top of its content-hash check. The desktop rebuild died with ModuleNotFoundError and the parent update only logged "⚠ Desktop build failed (non-fatal)". Same root cause made the "default: sync failed" line in the skill-sync stage, because that sync subprocess hit the same missing import. Fix 1: ``rebuild_venv`` retries with ``--clear`` ------------------------------------------------ If ``uv venv`` fails with "already exists" in stderr (which is what uv prints, and what uv's own hint tells you to fix with --clear), retry once with ``--clear``. Only this specific failure pattern triggers the retry — disk-full / interpreter-download failures still surface as before so we don't mask real problems. Fix 2: post-install dep verification ------------------------------------ Belt-and-suspenders so future uv resolver quirks (or any other cause of partial installs) surface immediately instead of hours later in a downstream subprocess. After ``_install_python_dependencies_with_optional_fallback`` runs, ``_verify_core_dependencies_installed``: 1. Reads ``[project.dependencies]`` straight from pyproject.toml (so we don't trust the venv's stale metadata). 2. Filters by environment markers via ``packaging.requirements.Requirement`` so cross-platform exclusions (``ptyprocess ; sys_platform != 'win32'``) don't false-positive on Windows. 3. Runs ``importlib.metadata.version()`` for each remaining dep inside the target venv interpreter (resolved from ``VIRTUAL_ENV``, not ``sys.executable``). 4. If anything is missing, reinstalls the base group with ``--reinstall`` to force re-resolution. If a second probe still reports missing deps, force-installs each one with its pinned spec. 5. Treats final failure as a warning rather than a hard error — a single broken-on-PyPI dep shouldn't block an otherwise-successful update — but the message points at ``hermes update --force`` and names the missing packages so the user knows what's wrong. Tests ----- - ``TestRebuildVenv::test_retries_with_clear_when_dir_already_exists`` — simulates the rmtree-couldn't-delete-it failure mode and asserts the ``--clear`` retry path is taken and succeeds. - ``TestRebuildVenv::test_does_not_retry_when_first_failure_is_not_dir_exists`` — guards against masking real failures (disk full, etc.). - ``test_verify_core_dependencies.py`` — 7 tests covering the happy path, the regression (missing pathspec triggers --reinstall), the per-package fallback when --reinstall doesn't help, the platform- marker filter so Windows doesn't try to install ptyprocess, the missing-pyproject noop, and the VIRTUAL_ENV resolver. Co-authored-by: Kyssta <218078013+kyssta-exe@users.noreply.github.com>	2026-06-04 06:05:41 -07:00
AhmetArif0	cd68b8f0e8	fix(auth): set active_provider after hermes auth add qwen-oauth hermes auth add qwen-oauth called pool.add_entry() but never wrote to providers["qwen-oauth"] or set active_provider in auth.json. _model_section_has_credentials() checks get_active_provider() first; with active_provider unset and no api_key_env_vars configured for oauth_external providers, the setup wizard reported "No inference provider configured" even after a successful Qwen CLI OAuth login. Add _mark_qwen_oauth_active() in auth.py: writes a minimal provider state entry (base_url for display only) and calls _save_provider_state() to set active_provider. The function deliberately does not copy the api_key — that lives in the Qwen CLI credential file managed by _save_qwen_cli_tokens / resolve_qwen_runtime_credentials and must not be duplicated in auth.json where it would become stale. pool.add_entry() is retained so "hermes auth list" continues to show the entry. Runtime credential resolution continues to use resolve_qwen_runtime_credentials. Mirrors the fix applied to openai-codex (#37517) and xai-oauth (#37576).	2026-06-04 05:58:33 -07:00
Frowtek	71a9f44e80	fix(gateway): retry startup auto-resume when a failed platform reconnects	2026-06-04 05:56:45 -07:00
Fearvox	fa8e2f935b	polish(minimax): address Copilot review comments on M3 default-aux fix Three Copilot inline review comments on #37664, two worth landing in a polish pass before merge: 1. auxiliary_client.py:270 — Copilot suggested keeping the minimax-* entries in _API_KEY_PROVIDER_AUX_MODELS_FALLBACK as a safety net for environments where the profile-based resolution can't import or run plugin discovery. Declined. The deepseek precedent (commit `773a0faca`) explicitly removed deepseek from the same dict for the same reason — the profile layer is the source of truth and the dict is a legacy pre-profiles-system fallback. We do not want to fragment the codebase by provider: either the profile layer is authoritative or the dict is. The minimax PR picks profile (matching deepseek) and the dict stays cleaned up. The risk Copilot raises is real but theoretical — plugin discovery runs at import time of the providers module, which is the first thing any modern Hermes entrypoint imports. 2. tests/agent/test_minimax_provider.py:162 — Copilot flagged that the test class relies on _get_aux_model_for_provider() resolving via provider profiles but doesn't explicitly trigger plugin discovery. Fixed. Added 'import model_tools # noqa: F401' at the top of both test_minimax_aux_is_standard and test_minimax_aux_not_highspeed. The fixtures in the parallel test_minimax_profile.py already did this; the legacy test in test_minimax_provider.py was order-dependent and would silently break if anyone reorganised the test ordering. Pinned the dependency explicitly so the test is order-independent. 3. tests/plugins/model_providers/test_minimax_profile.py:46 — Copilot flagged that the docstring referenced a hard-coded line number 'hermes_cli/models.py:298' that would go stale. Fixed. Replaced with the symbol reference 'hermes_cli.models._PROVIDER_MODELS[\'minimax\']' which is stable under file edits and grep-friendly. The new docstring also reads more naturally — readers don't have to look up 'what's at line 298' to follow the reasoning. All 221 minimax-related tests still pass.	2026-06-04 05:53:35 -07:00
Fearvox	b531b5d12a	fix(minimax): update AUTHOR_MAP entry + test_minimax_oauth_aux_model_registered Two follow-ups to the M3 default-aux-model PR (#37664): 1. AUTHOR_MAP entry: add fearvox1015@gmail.com -> Fearvox so the check-attribution CI job recognises Nolan's real contributor email. The previous run of the attribution check on #37664 failed because the commit was authored as nolan@0xvox.com (wrong local git config) which isn't in AUTHOR_MAP. The commit itself is now re-authored to fearvox1015@gmail.com so both the per-commit check and the AUTHOR_MAP lookup pass. 2. tests/hermes_cli/test_api_key_providers.py::TestMinimaxOAuthProvider ::test_minimax_oauth_aux_model_registered was pinning the aux model in the legacy _API_KEY_PROVIDER_AUX_MODELS dict, which the PR correctly removed (mirrors the deepseek cleanup in `773a0faca`). The test now asserts the new world order: the aux model comes from ProviderProfile.default_aux_model on the minimax-oauth profile, not the fallback dict. This is the same pattern that the profile-layer deepseek fix introduced.	2026-06-04 05:53:35 -07:00
Fearvox	3d1d0a49fe	fix(minimax): align default_aux_model with M3 frontier on minimax + minimax-cn The minimax / minimax-cn / minimax-oauth profiles still advertised M2.7 (and M2.7-highspeed for OAuth) as their default_aux_model, predating the M3 release (2026-06-01). The user-facing _PROVIDER_MODELS['minimax'] catalog top entry is M3, and the recommended config for a Token-Plan install now sets model.default: MiniMax-M3, so the aux default was the only remaining drift. Updates: * minimax default_aux_model: M2.7 -> M3 * minimax-cn default_aux_model: M2.7 -> M3 * minimax-oauth default_aux_model: M2.7-highspeed -> M2.7 (M3 is not on the OAuth / Coding Plan tier per platform docs as of this PR; the highspeed variant was the 2x-cost regression from #4082 that PR #6082 collapsed to plain M2.7 for minimax / minimax-cn but missed OAuth) * agent/auxiliary_client.py: drop the three legacy _API_KEY_PROVIDER_AUX_MODELS_FALLBACK entries for the minimax family. _get_aux_model_for_provider() reads from ProviderProfile.default_aux_model first (line 250) and only falls back to the dict when the profile has no aux model or the profile import fails. With the profile now set, the dict entries are dead code and a drift hazard. Mirrors the deepseek cleanup in `773a0faca`. * tests/agent/test_minimax_provider.py: update the existing TestMinimaxAuxModel assertions from MiniMax-M2.7 to MiniMax-M3 (the intent — 'standard, not highspeed' — is unchanged; the pin value is). * tests/plugins/model_providers/test_minimax_profile.py: new file mirroring tests/plugins/model_providers/test_deepseek_profile.py. Pins each of the three profiles' default_aux_model and asserts _get_aux_model_for_provider() returns it. A second class guards against the highspeed regression coming back. Refs: - Closes #36196 in spirit (M3 support — the catalog half of that issue is #36212; this PR covers the profile half) - Related: #4082 (M2.7-highspeed 2x-cost), #6082 (previous M2.7-highspeed -> M2.7 fix that missed OAuth + the auxiliary_client.py fallback dict) - Pattern: `773a0faca` (same profile-layer fix for deepseek)	2026-06-04 05:53:35 -07:00
AhmetArif0	5f62ba8e4b	fix(auth): use _save_xai_oauth_tokens in auth_commands to set active_provider hermes auth add xai-oauth called pool.add_entry() directly, writing only the credential-pool entry (source "manual:xai_pkce") without touching providers["xai-oauth"] or setting active_provider in auth.json. _model_section_has_credentials() checks get_active_provider() first; with active_provider unset and no api_key_env_vars configured for oauth_external providers, the setup wizard reported "No inference provider configured" even after a successful OAuth login. Use _save_xai_oauth_tokens() — the canonical path already called from the hermes model xAI login flow — which writes providers["xai-oauth"]["tokens"] (setting active_provider) and lets _seed_from_singletons seed the pool with a "loopback_pkce" entry on the next load_pool() call. Mirrors the fix applied to openai-codex in #37517.	2026-06-04 05:48:50 -07:00
AhmetArif0	34a2903527	fix(auth): set active_provider after hermes auth add google-gemini-cli hermes auth add google-gemini-cli called pool.add_entry() but never wrote to providers["google-gemini-cli"] or set active_provider in auth.json. _model_section_has_credentials() checks get_active_provider() first; with active_provider unset and no api_key_env_vars configured for oauth_external providers, the setup wizard reported "No inference provider configured" even after a successful OAuth login. Add _mark_google_gemini_cli_active() in auth.py: writes a minimal provider state entry (email for display only) and calls _save_provider_state() to set active_provider. The function deliberately does not copy access_token or refresh_token — those are managed by agent.google_oauth in the Google credential file and must not be duplicated in auth.json where they would become stale. pool.add_entry() is retained so "hermes auth list" continues to show the entry. Runtime credential resolution continues to use agent.google_oauth directly. Mirrors the fix applied to openai-codex (#37517) and xai-oauth (#37576).	2026-06-04 05:44:22 -07:00
Teknium	9fbfeb31b9	fix(cron): make sequential jobs non-blocking too + sweep MCP after jobs finish Follow-up on the parallel-dispatch decoupling: the sequential pass for workdir/profile jobs still ran inline in the ticker thread, so a long workdir/profile job reintroduced the exact starvation #37312 describes, just for env-mutating jobs. And the MCP orphan sweep ran immediately after dispatch in sync=False mode — before jobs finished — defeating its own 'runs after every job' contract and racing jobs still spawning MCP children. - Sequential jobs now queue to a persistent single-thread cron-seq pool (preserves one-at-a-time ordering across ticks, never blocks the tick). - Same in-flight dedup guard now covers sequential jobs. - MCP orphan sweep runs via a done-callback after the LAST dispatched job completes in async mode; inline after as_completed in sync mode. Verified E2E: tick(sync=False) returns in ~1ms with a 1.5s sequential job in flight; sweep fires only after that job ends.	2026-06-04 05:40:13 -07:00
Vynxe Vainglory	eb9cde7346	fix(cron): decouple job dispatch from completion in tick() PR #13021 fixed serial starvation by adding ThreadPoolExecutor to tick(), but kept as_completed(timeout=600) which still blocks the ticker thread until the slowest job finishes. This causes the same starvation pattern: when one job runs long (15+ min), other jobs' next_run_at expires past the grace window and they get perpetually fast-forwarded instead of running. This PR decouples dispatch from completion: - Persistent ThreadPoolExecutor (reused across ticks, no auto-join) - Fire-and-forget dispatch: tick submits and returns immediately - Running-job guard: prevents re-dispatching active jobs - sync parameter: defaults to True (backward compatible), callers opt into sync=False for non-blocking behavior - atexit shutdown handler for clean pool teardown - gateway/run.py: production ticker opts into sync=False Refs #33315 (complementary — that issue's PRs fix grace handling in jobs.py; this PR prevents the grace from expiring in the first place)	2026-06-04 05:40:13 -07:00
ashishpatel26	c9b62061d4	fix(cli): launchd KeepAlive unconditional restart (#37388 ) Replace KeepAlive.SuccessfulExit=false dict with <key>KeepAlive</key><true/> so launchd restarts hermes-gateway on any exit, matching the documented drain-then-exit restart protocol used by --graceful-restart.	2026-06-04 05:38:12 -07:00
teknium	153fe28474	fix(vision): use MiniMax type="video" block (not input_video) + tests The salvaged conversion emitted type:"input_video", which MiniMax M3 rejects just like the original video_url block. Per MiniMax's Anthropic-compat docs, the video content block is type:"video" with an image-style source (base64 or url). Fixes the block type, converts URL-based videos too, and adds 4 video conversion tests (none shipped with the original PR).	2026-06-04 05:38:11 -07:00
AhmetArif0	9756dff5fd	fix(model_metadata): drop stale ≤256,000 cache entries for Grok-4.3 The ``grok-4.3`` (1M context) catalog entry was added on 2026-05-15 (`ce0e189d3`). Between 2026-04-10 (when ``grok-4`` at 256,000 was first added by `b57769718`) and 2026-05-15, grok-4.3 slugs resolved via the generic ``grok-4`` substring catch-all and that 256,000 value was persisted to context_length_cache.yaml. Users who first queried grok-4.3 in that 35-day window are stuck at 256K forever — the cache is read at step 1 before the hardcoded defaults in step 8, so the correct 1M entry is never reached. Mirror the existing Kimi/Codex/MiniMax-M3 stale-cache guards: add _model_name_suggests_grok_4_3() and an elif branch that drops any cached value ≤ 256,000 for a grok-4.3 slug so the next lookup falls through to the 1M hardcoded default. Adds 4 regression tests: helper unit test, stale-drop-and-re-resolve, correct-cache-preserved, and no-clobber for plain grok-4 (256K correct).	2026-06-04 05:36:34 -07:00
Teknium	b04c6e95f6	fix(approval): catch perl/ruby -i as a separate flag token The salvaged pattern matched -i only inside the first flag token, so `perl -p -i -e '...' config.yaml` (the -i split out after -p) slipped through. Widen to match a -...i flag token anywhere in the args; still no false positive on `perl -e` code eval or config reads. Adds tests for the separate-token, backup-suffix, and read-safe forms.	2026-06-04 05:36:30 -07:00
AhmetArif0	a6a4e6f9d7	fix(approval): gate perl/ruby -i in-place edits of Hermes config/env sed -i coverage for ~/.hermes/config.yaml and .env was added in #14639, but perl -i and ruby -i — which perform the same direct file mutation — were not covered. The existing perl/ruby pattern only catches -e/-c (code evaluation), not -i (file mutation), so: perl -i -pe 's/approvals.mode: on/approvals.mode: off/' ~/.hermes/config.yaml bypasses the approval gate entirely, letting the agent flip approvals.mode off mid-session via the mtime-keyed config cache reload. Add a single pattern mirroring the sed -i lines: `\b(?:perl\|ruby)\s+-[^\s]*i` against both _HERMES_CONFIG_PATH and _HERMES_ENV_PATH. Three regression tests pin the new coverage.	2026-06-04 05:36:30 -07:00
Sol Aitken	de60bf40c6	fix(memory): register parent packages for user-installed provider imports User-installed memory providers load under the synthetic _hermes_user_memory.<name> package, but the loader never registered that parent namespace in sys.modules (it only registers "plugins" and "plugins.memory" for bundled providers). As a result any external provider using a relative import failed to load: from . import config ModuleNotFoundError: No module named '_hermes_user_memory' The same gap in discover_plugin_cli_commands() meant an external provider's cli.py with a relative import could never be discovered, so the documented "hermes <plugin>" CLI integration did not work for standalone plugins. Register the synthetic parent namespace before loading user-installed providers, mirror it for cli.py discovery (including the per-provider parent package, without executing the plugin's __init__.py), and make _load_provider_from_dir() reuse only modules actually loaded from disk so a parent shell registered by CLI discovery is never mistaken for the loaded provider. Regressions cover: a flat provider with a sibling relative import, a provider with its implementation in a nested subpackage (including a namespace intermediate directory), cli.py discovery with a relative import, and provider load after CLI discovery ran first. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-04 05:35:43 -07:00
AhmetArif0	4ae3c988b5	fix(gateway): bridge shared-key loop to nested platform config blocks The shared-key bridging loop (allow_from, require_mention, free_response_channels, …) read only the top-level yaml platform block (yaml_cfg.get(plat.value)). When a user configured a platform solely under ``platforms:`` or ``gateway.platforms:`` with no top-level block, the loop skipped that platform entirely and all bridged keys were silently dropped into PlatformConfig.extra — making allow_from, require_mention, etc. ineffective for nested-only configs. The apply_yaml_config_fn dispatch already received this same fallback in `44f3e51` to handle plugin adapters (e.g. Discord allow_from). The shared-key loop now mirrors it: if yaml_cfg.get(plat.value) is absent, fall back to gateway.platforms.<name> then platforms.<name>. The enabled field is deliberately excluded from the nested fallback (guarded by _cfg_toplevel): _merge_platform_map already merged it with the correct precedence, so re-applying it from a single nested source would overwrite the correctly-merged value. Two new regression tests assert that allow_from and require_mention configured under platforms.telegram and gateway.platforms.telegram are bridged into PlatformConfig.extra. All 54 existing config tests pass.	2026-06-04 05:31:47 -07:00
Teknium	df9fb8e5e6	fix(tools): stop hermes tools reporting kanban as removed (#38918 ) The hermes tools save summary printed '- kanban' (and would print '+ kanban') for a platform even though kanban is never offered as a checklist option. kanban is a check_fn-gated toolset whose tools are a subset of the platform composite, so _get_platform_tools resolves it as enabled, but _prompt_toolset_checklist only renders CONFIGURABLE_TOOLSETS — so it can never survive into the returned selection. The added/removed diff (current_enabled - new_enabled) then surfaced kanban as removed. Scope the printed diff to the checklist's actual universe via the new _checklist_toolset_keys() helper at all three diff sites (first-install, all-platforms, per-platform). The persisted config is unaffected — _save_platform_tools already preserves non-configurable entries; this was purely a false-signal in the UI.	2026-06-04 03:31:43 -07:00
Ben	616c0a36b6	fix(dashboard-auth): don't abort verify chain on one provider's ProviderError The gated dashboard verifies a session cookie by trying each registered DashboardAuthProvider's verify_session in turn (the session cookie stores only the access token, not which provider issued it). A provider that doesn't recognise a token returns None; a provider whose IDP/JWKS is unreachable raises ProviderError. The loop used to return HTTP 503 on the FIRST ProviderError, before any later provider got a turn. With multiple providers stacked, that means an unreachable IDP for a session you didn't even use blocks login through a different, reachable provider. Concrete repro: a self-hosted-OIDC session hits the 'nous' provider first (registered earlier); nous tries to reach Nous Portal's JWKS, which is unreachable in a self-hosted deployment, so it raises — and the gate 503s before the 'self-hosted' provider can verify the token. Hit live while testing the new self-hosted OIDC plugin against a local Keycloak. Fix: a ProviderError from one provider is logged and the loop continues to the next. A 503 is returned only if NO provider verified the token AND at least one was unreachable — distinguishing a transient IDP outage (don't force a needless re-login) from a token that's genuinely invalid (fall through to refresh/relogin). Single-provider behaviour is unchanged. Tests: adds an _UnreachableProvider stub and three cases — unreachable provider first must not block a working second; all-unreachable still 503s; reachable-but-unrecognised falls through to 401/relogin (not 503). Mutation-tested: reverting the fix makes the first case fail with the exact 503 bug.	2026-06-04 03:23:45 -07:00
Ben	f57ce341dc	feat(dashboard-auth): add generic self-hosted OIDC provider Adds a bundled dashboard-auth provider plugin that authenticates the web dashboard against any conformant self-hosted OpenID Connect server (Authentik, Keycloak, Zitadel, Authelia, Auth0, Okta, Google, …) using standard OIDC — no per-IDP code. It's a pure drop-in plugin implementing the DashboardAuthProvider protocol; it touches no core auth/runtime/login paths. Mechanics: - OIDC discovery from {issuer}/.well-known/openid-configuration (cached; issuer pinned; endpoints required HTTPS, loopback http allowed for local-dev IDPs) - authorization-code + PKCE (S256), public client - verifies the OIDC ID token (RS256/ES256) against the discovered jwks_uri with iss/aud pinned to the configured issuer/client_id, and maps standard claims (sub/email/name/preferred_username, groups→org) onto a Session - standard refresh_token grant for silent re-auth; RFC 7009 revocation on logout when advertised Verifies the ID token (not the access token) because OIDC guarantees the ID token is a signed JWT carrying identity, while access-token format is opaque to the client per spec — the only universally-correct choice across self-hosted IDPs. Config via dashboard.oauth.self_hosted.{issuer,client_id,scopes} in config.yaml or HERMES_DASHBOARD_OIDC_{ISSUER,CLIENT_ID,SCOPES} env vars (env-wins-config, empty-is-unset — same convention as the nous plugin). Confidential clients (client_secret) left as a documented TODO seam. Docs: adds a Self-hosted OIDC section to the web-dashboard guide, including a copy-paste Keycloak worked example (realm import + docker run + dashboard wiring + login walkthrough). Tests: 65 cases covering construction, discovery (incl. issuer mismatch + https enforcement), start_login/PKCE, complete_login, ID token verification, refresh/revoke, and env/config precedence.	2026-06-04 03:23:45 -07:00
Ben	cae6b5486f	feat(dashboard): always enable embedded chat; remove dashboard --tui flag The dashboard's embedded Chat surface (/chat, /api/ws, /api/pty) was gated behind `hermes dashboard --tui` / HERMES_DASHBOARD_TUI=1. The desktop app and the dashboard's own Chat tab both drive the agent over the /api/ws + /api/pty WebSockets, so a dashboard started without the flag would pass the /api/status health check but slam the chat WebSocket shut with WS code 4403 — the app connects, reports "ready", and chat stays dead. This was the root cause behind multiple user reports of the desktop app failing to connect to a self-hosted gateway/dashboard, and it bit Docker and host installs alike. Make the embedded chat unconditional: - web_server.py: _DASHBOARD_EMBEDDED_CHAT_ENABLED defaults to True; drop the embedded_chat parameter and the runtime reassignment from start_server(). The WS gates still read the constant (now always true) so the seam — and its "rejects when disabled" contract test — stays meaningful. - main.py: remove the `--tui` argument from the dashboard subparser and the `embedded_chat = args.tui or HERMES_DASHBOARD_TUI==1` derivation. - web/: isDashboardEmbeddedChatEnabled() returns true unconditionally; drop the deprecated __HERMES_DASHBOARD_TUI__ alias and the dead LEGACY_TUI_RE scrape in the vite dev-token plugin. - apps/desktop/electron/main.cjs: drop `--tui` from the spawned dashboardArgs (it would now error with "unrecognized arguments: --tui") and the redundant HERMES_DASHBOARD_TUI env injection. - Docker: no s6 run-script change needed — the script never passed --tui; the HERMES_DASHBOARD_TUI env var is now simply a no-op, so the image works out of the box with no extra var. - Docs: remove every dashboard --tui / HERMES_DASHBOARD_TUI reference across the CLI reference, env-var reference, docker/desktop/web-dashboard guides, in-app tips, and the zh-Hans translations. The terminal `hermes --tui` / HERMES_TUI references are intentionally left untouched. Tests: 270 passing across web_server, dashboard lifecycle, host-header, auth-gate, and docker-override-scripts suites.	2026-06-04 03:03:35 -07:00
alt-glitch	aeec88c77f	fix(installer): symlink bundled node/npm into command bin dir for FHS root installs Root installs on Linux (FHS layout, #15608) put the `hermes` command in `/usr/local/bin` (on PATH) but symlinked the bundled node/npm/npx into `~/.local/bin`, which isn't on PATH for a stock root shell. `node`/`npm` were 'command not found' and `hermes dashboard` failed with 'npm is not available' because its build-on-demand fallback couldn't find npm. Fix: `install_node()` now symlinks into `get_command_link_dir()` — the same helper the `hermes` command link already uses — so node/npm/npx land wherever the command does (`/usr/local/bin` on FHS root, `~/.local/bin` otherwise, `$PREFIX/bin` on Termux). Non-root and Termux installs are unchanged. Also fixes: - `scripts/lib/node-bootstrap.sh`: adds `_nb_get_link_dir()` mirroring the same root/Termux/user logic for the standalone bootstrap path (used by `hermes update`, TUI node bootstrap, etc.) - `hermes_cli/uninstall.py`: `remove_node_symlinks()` now checks all candidate directories (`~/.local/bin`, `/usr/local/bin`, `$PREFIX/bin`) so root FHS uninstalls don't leave orphan symlinks Regression from #15608, which created the FHS path for the command but left `install_node` pointed at the legacy user-local dir.	2026-06-04 02:31:49 -07:00
Teknium	4ed63170e4	fix(update): don't fail desktop rebuild / skills sync on mid-rebuild venv (#38885 ) When 'hermes update' rebuilds the project venv (rmtree + uv venv on the first managed-uv migration), the desktop-rebuild and profile-skills-sync steps that follow both spawn sys.executable. Firing while the venv is mid-rewrite makes the child interpreter abort with the bare stderr line 'No pyvenv.cfg file', surfacing as a spurious 'Desktop build failed' / 'default: sync failed' on an update that actually succeeded. Add _wait_for_interpreter_venv_ready(): resolve the venv hosting sys.executable and poll briefly for pyvenv.cfg to (re)appear before each of those subprocess steps. No-op when the interpreter isn't venv-hosted. The desktop rebuild also retries once after re-waiting, and keeps streaming its output live (no capture). Best-effort throughout — callers proceed regardless, so a genuinely broken venv still surfaces the real error.	2026-06-04 02:20:11 -07:00
Teknium	fe709a4210	fix(test): expect 4404 close code for disabled embedded chat (#38841 ) PR #38743 split the dashboard PTY WebSocket refusal codes (4404 = chat disabled, 4403 = host/origin mismatch — see web_server.py refusal site comment) but left test_rejects_when_embedded_chat_disabled asserting the old 4403, so it has expected 4403 while the server sends 4404. Main CI has been red on test (2)/(4) shards since that commit. Update the assertion to 4404 to match the disabled-chat path.	2026-06-04 01:13:03 -07:00
Ben	3a25912c14	test(dashboard-auth): cover password login route, provider, and plugin - test_dashboard_auth_password_login.py: drives /auth/password-login end-to-end through the REAL gated_auth_middleware (login -> session cookie -> authenticated /api/auth/me -> transparent refresh via the RT cookie), plus protocol-extension checks, the generic-401/404 oracle properties, the rate limiter, and login-page rendering (form+script when supports_password, script-free otherwise, both for mixed providers). Reuses the existing StubAuthProvider harness convention. - test_basic_provider.py: scrypt hash/verify, login mint, kind-claim enforcement (access != refresh), cross-secret rejection, and the register() config/env precedence + skip reasons. Mutation-tested: dropping the kind-claim check in verify_session makes test_access_token_not_accepted_as_refresh fail, confirming the test isn't theater.	2026-06-04 01:02:25 -07:00
Ben Barclay	fe74a1acda	fix(dashboard_auth): allow any http:// host in redirect_uri fast-fail (#38827 ) The Nous dashboard OAuth login rejected any http:// redirect_uri whose host was not localhost/127.0.0.1, surfacing "redirect_uri may only use http:// for localhost/127.0.0.1" on the login screen. This broke self-hosted dashboards reached over plain HTTP — LAN IPs, internal hostnames, and reverse proxies that terminate TLS upstream. The Portal-side check (agent-redirect-uri.ts) is authoritative on which redirect_uris are permitted; this client-side _validate_redirect_uri is only a fast-fail for obvious operator error and should not second-guess valid http:// deployments. Fix: drop the localhost-only branch on the http scheme. Validation now enforces only that the scheme is http(s) and the path ends with /auth/callback. Updated the docstring to explain the relaxed contract, and replaced test_rejects_http_with_non_localhost (which pinned the old behavior) with test_allows_http_with_arbitrary_host covering a Fly hostname, a LAN IP, and an internal hostname.	2026-06-04 00:51:44 -07:00
Ben	c2ca3f01ab	fix(dashboard): honor --portal-url / HERMES_DASHBOARD_PORTAL_URL override in register The register command resolved the portal base URL purely from the stored login, ignoring any override. That meant `HERMES_DASHBOARD_PORTAL_URL` (and the absence of any flag) gave no way to point registration at a staging or preview portal — the request always hit the login's portal, returning 404 against a branch that wasn't deployed there. - _resolve_portal_base_url now takes an optional override (precedence: override > stored login portal > prod default). - New --portal-url flag; falls back to HERMES_DASHBOARD_PORTAL_URL env. - Documents that the access token must be valid at the overridden portal (it's minted by whoever you logged into). - 3 new tests for override precedence. Verified live against the PR #324 Vercel preview: CLI -> preview endpoint -> real agent:{id} client_id written to .env.	2026-06-04 00:17:57 -07:00
Ben	bb291b6bbc	feat(dashboard): `hermes dashboard register` for self-hosted OAuth client Adds a CLI command that registers this install as a self-hosted dashboard with the user's Nous Portal account, automating the manual browser flow on /local-dashboards. - New hermes_cli/dashboard_register.py: resolves a fresh Nous access token from auth.json (fast-fails with a `hermes setup` hint when not logged in), POSTs to {portal}/api/oauth/self-hosted-client, and writes HERMES_DASHBOARD_OAUTH_CLIENT_ID into ~/.hermes/.env idempotently. - Docker-style adjective_noun auto-naming; --name and --redirect-uri overrides. - Persists HERMES_DASHBOARD_PORTAL_URL only when non-default and unset (so a Vercel preview / staging portal sticks, prod default stays implicit). - Refuses in managed/hosted installs (the orchestrator stamps the client_id). - Post-register hint explains the OAuth gate only engages on a non-loopback bind. - Nested 'register' subparser leaves bare `hermes dashboard` unchanged. - 9 unit tests (name gen, fast-fails, POST shape, env writes, redirect URI, portal-URL persistence, 401/403 mapping); dashboard lifecycle tests still green. Depends on NousResearch/nous-account-service#324 (the portal endpoint).	2026-06-04 00:17:57 -07:00
kshitij	0401176c7a	Merge pull request #38760 from helix4u/fix/prefill-config-compat fix(config): align prefill messages key handling	2026-06-03 23:52:47 -07:00
Siddharth Balyan	f31c950182	refactor(supermemory): session-level ingest + kebab aliases (salvaged from #32487 ) (#38756 ) * refactor(supermemory): session-level conversation ingest + kebab tool aliases Salvaged from #32487 (by @MaheshtheDev), rebased onto current main. - sync_turn now buffers cleaned turns; the full session is ingested once at session end / switch / shutdown via the conversations endpoint - ingest_conversation() accepts and forwards functional document metadata (type, session_id, message_count, partial) - register kebab-case tool aliases (supermemory-save/search/forget/profile) alongside the snake_case names - README + docs (EN/zh-Hans) updated for the simplified session model Source/vendor-attribution removed per project policy (no telemetry): dropped x-sm-source header, sm_source metadata, and sm_capture_mode tags. Preserved the post-branch atomic_json_write(mode=0o600) hardening that the PR's stale base had reverted. Updated provider tests for the new behavior and added maheshthedev@gmail.com to release.py AUTHOR_MAP. Co-authored-by: alt-glitch <balyan.sid@gmail.com> * feat(supermemory): restore x-sm-source for Spaces routing Reinstates x-sm-source: hermes (SDK default_headers + conversations POST) and sm_source: hermes document metadata. Per @Dhravya (Supermemory), this is a functional routing key, not telemetry: it groups Hermes writes into a dedicated "Hermes" Space in the Supermemory app so users can filter and bulk-manage memories per source agent. sm_capture_mode remains dropped (appears analytics-only; Spaces are routed by sm_source) pending confirmation. Adds README note + a unit test covering _merge_metadata sm_source stamping and legacy source->type migration. --------- Co-authored-by: Mahesh Sanikommu <maheshthedev@gmail.com>	2026-06-04 11:50:02 +05:30
helix4u	ffb53767bf	fix(config): align prefill messages key handling	2026-06-03 23:51:44 -06:00
Ben Barclay	30c7b787d1	fix(memory): fall back to pip when uv is unavailable (salvage #5954 ) (#38668 ) `_install_dependencies` (hermes memory setup) hard-aborted with "uv not found — cannot install dependencies" whenever `uv` was not on PATH, even when a perfectly good `pip` was available. Slim container images and some CI environments don't ship uv, so memory-provider dependency installation dead-ended there for no good reason. Now: use `uv pip install` when uv is present, otherwise fall back to `<python> -m pip install` when pip3/pip is available, and only abort (with the uv install hint) when neither is found. The "Run manually:" hints reflect whichever installer was selected. Salvages #5954 by @MustafaKara7. Their patch added redundant local `import subprocess` / `import sys` (both are already in scope — module -level `sys`, function-top `subprocess`); this salvage drops those and adds a regression test (TestInstallDependenciesRunner) covering all three paths (uv / pip-fallback / abort). Verified adversarially: the pip-fallback test fails against origin/main's unfixed code with the exact dead-end symptom and passes with the fix. Closes #5954. Co-authored-by: MustafaKara7 <186085093+MustafaKara7@users.noreply.github.com>	2026-06-04 14:03:02 +10:00
Ben Barclay	03ba06ebfb	fix(docker): chown gateway install tree on UID remap (salvage #37928 ) (#38655 ) Salvage of #37928 (@sarvesh1327), reduced to the still-needed delta. `/opt/hermes/gateway` is a runtime-writable Python package: on first import the supervised gateway writes `__pycache__` beneath it, and the image does not set PYTHONDONTWRITEBYTECODE. When HERMES_UID/PUID is remapped at boot (e.g. Unraid 99), `usermod -u` only re-chowns the hermes home dir; the build trees under /opt/hermes keep the build-time UID (10000). main already chowns `.venv`, `ui-tui`, and `node_modules` on remap (#38556) but missed `gateway`, so the remapped gateway hits EACCES writing `__pycache__` (#27221). Add `/opt/hermes/gateway` to both chown sites — the Dockerfile build-time `chown -R hermes:hermes` line and the stage2-hook build-tree repair — so it tracks the remapped UID like the sibling trees. Differs from #37928 as submitted: dropped the `uid_gid_remapped` flag and the `\|\| [ "$uid_gid_remapped" = true ]` chown gate. main's #38556 already solved that half, and more correctly — it probes the actual tree ownership (`venv_owner != actual_hermes_uid`) rather than tracking same-boot remaps, which also catches pre-existing ownership drift and stays idempotent. Keeping #37928's flag would regress that. The salvage is the `gateway`-tree addition only. Verified end-to-end against a real image build: on baseline main a remap to UID 99 leaves `gateway` owned by 10000 and a write as uid 99 fails EACCES; with this change `gateway` is chowned to 99:100 and the write succeeds, while the default-uid (no-remap) path is unchanged. Fixes #27221. Co-authored-by: Sarvesh <sarveshagl1327@gmail.com>	2026-06-04 13:34:23 +10:00
Teknium	e45dd2b0e7	refactor(web): unify main-slot model assignment base_url/context handling (#38593 ) Both POST /api/model/set and the profile-model writer hand-rolled the same provider/default/base_url/context_length reconciliation. Extract it into _apply_main_model_assignment so the custom-vs-hosted base_url logic lives in one place — removing the future-drift risk where one site learns about custom base_url persistence and the other forgets. Behavior unchanged; pinned with a direct helper unit test.	2026-06-03 20:25:33 -07:00
Ben Barclay	e2ea648a08	test(docker): make tty-passthrough probe robust to container boot-log noise (#38665 ) `test_tty_passthrough_to_container` asserted `int(numeric_lines[0]) > 0` where `numeric_lines` was every `.isdigit()` token in the FULL PTY stream — but the container's s6 boot output (cont-init diagnostics, the preinit `uid=0 ... egid=0` line, skills-sync summaries like `Done: 90 new, 0 updated, 0 unchanged. 90 total bundled.`) is written to the same PTY before the `tput cols` probe runs. So the test was really asserting on "the first number anywhere in the boot log", which passed only by luck on whatever that first digit happened to be. Any PR that shifts boot output flips the first digit to a stray `0` and breaks the test with `assert 0 > 0` — even when TTY passthrough is working perfectly (`tput cols` returns the right value). This is a latent landmine for every Docker PR that changes boot output (e.g. adding a bundled dependency changes the skills-sync counts). Fix: emit the probe result behind a unique marker (`HERMES_TTY_COLS=<cols>` / `HERMES_TTY_COLS=NO_TTY`) and parse only the marked value, ignoring all boot-log noise. The test's real intent — verify `docker run -t` delivers a real TTY with a positive column count — is preserved (NO_TTY and non-numeric values still fail). Verified against a real build, adversarially: - Built an image with extra boot output (the markdown core-dep change from #38649, which is what surfaced this) so the OLD logic grabs a stray `0` -> reproduced `assert 0 > 0` locally. - The hardened test PASSES against that same image, and against a clean image. `tput cols` correctly returns 123 in both.	2026-06-04 13:19:13 +10:00
cornna	7402706c5e	fix(docker): accept Unraid uid mappings (#38098 ) Co-authored-by: Cornna <96944678+ymylive@users.noreply.github.com>	2026-06-04 12:38:24 +10:00
Dusk1e	2059707fce	fix(gateway-windows): anchor detached/startup cwd at HERMES_HOME	2026-06-03 19:37:29 -07:00
LeonSGP43	40fbb0f3c6	fix(constants): use windows native default hermes home	2026-06-03 19:37:29 -07:00
Teknium	e3313c50a7	feat(dashboard): add Debug Share to the System page (#38600 ) * Port from google-gemini/gemini-cli#21541: back up corrupted config.yaml When config.yaml fails to parse, load_config() silently falls back to DEFAULT_CONFIG and leaves the broken file on disk. If the user then re-runs the setup wizard or hermes config set (both rewrite config.yaml), their broken-but-recoverable overrides are lost for good. Adapts the policy-file recovery from gemini-cli#21541: on the first parse warning for a given broken file, snapshot it to config.yaml.corrupt.<ts>.bak (best-effort, symlink-guarded, size-deduped) and tell the user where it landed. Unlike Gemini's version we deliberately do NOT reset config.yaml to a clean state — hermes never silently mutates user config, and leaving it means a hand-fixed file is re-read on the next load. Tests: 3 new cases (backup created + content preserved + original untouched; same-size backup dedup; symlink not copied). E2E verified with isolated HERMES_HOME and a real tab-indented broken config. * feat(dashboard): add Debug Share to the System page Surface `hermes debug share` in the dashboard. The System > Operations section gets a dedicated card that uploads a redacted report + full logs and returns the paste URLs as real, copyable links instead of a log tail. - debug.py: factor a pure build_debug_share() returning structured {urls, failures, redacted, auto_delete_seconds}; run_debug_share now calls it (CLI output unchanged). - web_server.py: POST /api/ops/debug-share runs the share core in a worker thread and returns the structured payload synchronously (the URLs are the whole point — not a backgrounded action). - api.ts: runDebugShare() + DebugShareResponse. - SystemPage.tsx: share card with a redaction toggle (on by default), per-link + copy-all buttons, and the 6h auto-delete countdown. - tests: build_debug_share core + endpoint (redact toggle, failure 502, token gate).	2026-06-03 19:37:04 -07:00
Ben Barclay	04d620d91f	fix(docker): run config migrations during container boot (salvage #35508 ) (#36627 ) Salvage of #35508 (@dchenk), rebased onto current main. Resolved the tests/tools/test_stage2_hook_puid_pgid.py conflict (kept both the envdir-creation regression test on main and the new config-migration tests). Docker image upgrades replace code under $INSTALL_DIR but preserve $HERMES_HOME on the mounted volume, so the persisted config.yaml never received the schema migrations that non-Docker `hermes update` runs (#35406). This adds scripts/docker_config_migrate.py, invoked from stage2-hook after first-boot seeding and before gateway services start: it backs up config.yaml + .env, runs migrate_config(interactive=False), and honors HERMES_SKIP_CONFIG_MIGRATION=1 for manual control. Also fixes a latent bug in check_config_version(): it called load_config() which deep-merges DEFAULT_CONFIG, so a legacy config with no raw _config_version falsely reported as already-current. It now reads the raw on-disk file so legacy configs are correctly detected for migration. Differs from #35508 as submitted (Option B cleanup): dropped the `_config_version` line added to cli-config.yaml.example and removed the accompanying test_cli_config_example_declares_latest_version change-detector test. The example is a copy-template and has no business asserting a schema version; check_config_version() reads the user's real config.yaml, not the example. This removes a second sync point that drifts on every version bump. Closes #35508. Fixes #35406. Co-authored-by: Dmitriy Cherchenko <17372886+dchenk@users.noreply.github.com>	2026-06-04 11:11:27 +10:00
Ben Barclay	343c54e35b	fix(docker): reject unsupported --user <arbitrary-uid> start with clear guidance (#38579 ) `docker run --user $(id -u):$(id -g)` was a tini-era trick to make container-written files match the host user. Under s6-overlay it no longer works: the bootstrap (UID remap, volume + build-tree chown, config seeding) needs root, and the baked image dirs (/opt/data, /opt/hermes/.venv, ui-tui, node_modules) are owned by the hermes build UID (10000). A pinned arbitrary UID can't write them, so the runtime fails with EACCES on a bind mount or hard-crashes on a named volume (Docker inits the volume from the image as 10000; the non-root start can't even `cd /opt/data`, and the profile reconciler dies with PermissionError on gateway_state.json). Detect that start early in both the cont-init hook (stage2-hook.sh) and the CMD wrapper (main-wrapper.sh) and fail fast with actionable guidance pointing at the supported path: root start + HERMES_UID/HERMES_GID (or the PUID/PGID aliases), which remaps the hermes user and chowns the volume — the same host-UID-matching outcome --user was used for, without breaking s6. The guard fires only when the current UID is neither root NOR the hermes UID. This preserves the supported non-root start from #34648/#34837 (running with `--user 10000:10000`, i.e. pinned to the hermes UID itself), which is unaffected — only the arbitrary-UID variant that #34837 never actually made writable is rejected. Verified live across five scenarios (built image, bind + named volume): arbitrary --user on bind -> rejected with guidance, hermes does not run; arbitrary --user on named volume -> guidance shown, no raw 'can't cd' crash; --user 10000:10000 -> boots; root + HERMES_UID=4242 remap -> boots, guard not tripped; default root start -> boots. Pre-fix control reproduces the raw PermissionError + 'can't cd' crash with no guidance.	2026-06-04 10:51:51 +10:00
Teknium	b0a52d74ac	fix(mcp): resolve ${ENV} in discovery probe so header auth works (#38571 ) `hermes mcp add --auth header` built `Authorization: Bearer ${MCP_X_API_KEY}` and passed it straight to the discovery probe without interpolation, so the probe sent the literal placeholder and auth-requiring servers (e.g. n8n) returned 401. Runtime tool loading worked because `_load_mcp_config()` interpolates, but the four CLI probe call sites (add/test/login/configure) all used unresolved config. Resolve `${ENV}` inside `_probe_single_server` via a new `_resolve_mcp_server_config()` (load_hermes_dotenv + _interpolate_env_vars), mirroring runtime loading. This covers all four call sites, not just add. Also strip a leading `Bearer ` from pasted tokens before saving to `MCP_*_API_KEY`, so a token pasted with the prefix doesn't produce `Bearer Bearer <jwt>` (also a 401). Reported with a precise root-cause analysis in #37792. Co-authored-by: ThyFriendlyFox <116314616+ThyFriendlyFox@users.noreply.github.com>	2026-06-03 17:49:39 -07:00
xxxigm	ca06715721	feat(web): wire local/custom endpoints into model assignment The runtime resolver reads model.base_url from config and ignores the OPENAI_BASE_URL env var, so a self-hosted endpoint could not be configured from the GUI. Two changes enable it: - POST /api/model/set accepts an optional base_url and persists it as model.base_url when provider=custom (still clearing stale base_url for hosted providers). - POST /api/providers/validate now returns the model ids a custom endpoint advertises at /v1/models, so the GUI can auto-pick a default without asking the user to type a model name. Refs desktop onboarding "Local / custom endpoint" bug.	2026-06-03 17:48:55 -07:00
Ben Barclay	5446153c98	fix(docker): chown build trees on UID remap independently of $HERMES_HOME (#35027 regression) (#38556 ) The stage2 hook gates the recursive chown of the build trees under $INSTALL_DIR (.venv, ui-tui, node_modules) so a HERMES_UID/PUID remap leaves them writable by the new runtime UID — needed for lazy_deps 'uv pip install' of platform extras (#15012, #21100) and the TUI esbuild rebuild into ui-tui/dist (#28851). #35027 folded that chown under the $HERMES_HOME ownership check ('stat $HERMES_HOME != hermes_uid'). But 'usermod -u <new> hermes' re-chowns the hermes home dir ($HERMES_HOME == /opt/data) to the new UID as a side effect, so after any remap that stat is already satisfied and needs_chown is false — silently skipping the build-tree chown on the common PUID/NAS path. The venv stays owned by the build-time UID (10000), so lazy installs and TUI rebuilds fail with EACCES. Probe the build trees directly instead: chown only when /opt/hermes/.venv is not already owned by the runtime hermes UID. Independent of $HERMES_HOME ownership, idempotent across restarts. Verified live: built the image, booted with HERMES_UID/HERMES_GID on a fresh named volume, confirmed .venv/ui-tui/node_modules end up owned by the remapped UID and 'uv pip install' into the venv succeeds; confirmed the recursive chown fires once and is skipped on restart.	2026-06-04 10:17:55 +10:00
Ben	a6e47314f9	fix(dashboard): sanction plugin WS/upload auth via SDK helpers (gated mode) Dashboard plugins (kanban, hermes-achievements) read window.__HERMES_SESSION_TOKEN__ directly and hand-assembled WebSocket URLs with ?token=. That works in loopback/--insecure mode but is rejected on OAuth-gated deployments, where the session token is absent and _ws_auth_ok only accepts single-use ?ticket= auth. The result was 401s on plugin REST calls and 1008/403 on the kanban live-events WS whenever the dashboard ran behind OAuth (e.g. hosted Fly agents). Make the plugin SDK the single sanctioned auth surface: - web/src/lib/api.ts: add authedFetch() (raw Response for FormData uploads / blob downloads, token-or-cookie auth, no throw / no 401 redirect) and buildWsUrl() (assembles a ws(s):// URL with the correct auth param for the active mode — fresh single-use ticket in gated mode, token in loopback). - web/src/plugins/registry.ts: expose authedFetch, buildWsUrl, buildWsAuthParam, and sdkVersion on window.__HERMES_PLUGIN_SDK__; add SDK_CONTRACT_VERSION. - web/src/plugins/sdk.d.ts: hand-authored typed contract for the plugin SDK + registry globals (single source of truth for the Window declarations). - plugins/kanban + hermes-achievements dist bundles: stop reading the session token directly; route uploads/downloads through SDK.authedFetch and the live-events WS through SDK.buildWsUrl. - plugins/kanban plugin_api.py: _ws_upgrade_authorized() delegates the /events WS upgrade to the canonical web_server._ws_auth_ok gate, so it transparently accepts loopback token / gated ticket / internal credential and can never drift from core auth again. - tests: guard test asserting no plugin dist reads __HERMES_SESSION_TOKEN__ directly; kanban gated-ticket WS test. Verified live on a gated staging Fly agent: kanban /events upgrades 101 with a minted ticket (ticket_len=43, ws_auth_ok=True) where the old code got 403.	2026-06-03 16:59:36 -07:00
Nate George	e8c3ac2f5c	fix: strip extra_content from tool_calls for strict APIs (Fireworks, Mistral) Fireworks/Mistral reject HTTP 400 'Extra inputs are not permitted, field: messages[N].tool_calls[M].extra_content' on any session whose history contains prior Gemini tool calls. Gemini 3 thinking models attach extra_content (thought_signature) to tool_calls; it survived to the wire because the sanitize paths only stripped call_id/response_item_id. Strip extra_content from the outgoing wire copy in both sanitize paths (ChatCompletionsTransport.convert_messages + _sanitize_tool_calls_for_strict_api), but gate it on the target model: keep extra_content for Gemini-family targets (the thought_signature MUST be replayed or Gemini 400s), strip it for everyone else — including non-Gemini models that inherit a stale Gemini signature earlier in a mixed-provider session. Native Gemini is unaffected (GeminiNativeClient bypasses these paths). Original stored history is never mutated (only the per-call copy). Fixes #17986.	2026-06-03 16:42:52 -07:00

1 2 3 4 5 ...

4879 Commits