hermes-agent

Author	SHA1	Message	Date
teknium	62f0cfd902	fix(kanban-dashboard): use context-local board pin in specify/decompose endpoints The dashboard specify and decompose endpoints run as sync FastAPI threadpool handlers and pinned the active board by mutating the process-global HERMES_KANBAN_BOARD env var. Two concurrent requests for different boards race on that shared global and cross-write — the same bug class as the CLI path (#38323), now using the scoped_current_board() contextvar introduced by the CLI fix.	2026-06-04 07:39:53 -07:00
worlldz	081694c111	fix(kanban): isolate board override per concurrent call	2026-06-04 07:39:53 -07:00
AhmetArif0	de370fd10f	fix(dashboard): prevent stale desc-save indicator when requests overlap handleSaveDesc and handleAutoDescribe both set their loading flag in a try block but always cleared it unconditionally in finally. When a user opened profile A's description editor, clicked Save, then quickly switched to profile B's editor and saved, profile A's resolving request would clear descSaving/describing while profile B's request was still in-flight, making the "Saving…" indicator disappear prematurely. Track concurrent in-flight counts with descSavingCount and describingCount refs (mirrors the existing activeDescRequest guard pattern). The loading flag is cleared only when the counter reaches zero, i.e. all overlapping requests have settled.	2026-06-04 07:23:22 -07:00
AhmetArif0	c2d11cc95d	fix(dashboard): surface model-write failure when creating a profile POST /api/profiles returns model_set: false when the model assignment step fails (e.g. filesystem error) while the profile itself was created successfully. handleCreate discarded the response, so the user received a "Profile created" success toast with no indication that their chosen model was not persisted. Capture the response and show an error toast when a model was selected but model_set is explicitly false, directing the user to set it from the profile editor.	2026-06-04 07:23:22 -07:00
AhmetArif0	6feb40e702	fix(desktop): wait for backend exit before reloading on connection-config apply The apply handler sent SIGTERM then fired a 150 ms setTimeout to reload the renderer. If the backend took longer to shut down the port was still bound when startHermes() ran after reload, causing an "address already in use" failure. Capture the process reference before resetHermesConnection() nulls it, then await the actual exit event. A 5 s SIGKILL fallback ensures the wait never hangs if the backend ignores SIGTERM.	2026-06-04 07:23:22 -07:00
Teknium	fef04a197e	fix(desktop): purge electron cache unconditionally, not via stdlib zipfile gate The salvaged detector validated each cached electron-*.zip with zipfile.testzip() and only purged ones it judged corrupt. But stdlib zipfile reads from the end-of-central-directory backward, so it silently tolerates prepended/concatenated junk — which is exactly the corruption the bug report names ('86257938 extra bytes at beginning or within zipfile', a partial download resumed into the same file). testzip() returns clean on those zips, so the self-heal never fired for the reported failure mode. Drop the self-rolled validator: on any packaged-build failure, purge the version's cached zips AND the half-written unpacked dir, then retry once. @electron/get re-downloads with its own SHASUM verification — the real source of truth, which catches prepend/concat/truncate alike. An unrelated failure just costs one clean re-download and fails the same way. Verified empirically: zipfile.testzip() returns None (clean) on a prepended-junk zip; the unconditional purge removes it correctly.	2026-06-04 07:17:33 -07:00
Harry Riddle	f583c6ebd5	fix(desktop): recover from corrupt cached Electron download on build hermes desktop failed on Linux with an ENOENT renaming release/linux-unpacked/electron -> Hermes. Root cause is a corrupt cached Electron zip (~/.cache/electron/electron-.zip): app-builder unpack-electron extracts a partial tree from the bad zip that is missing the electron binary, so electron-builder dies on the final rename. Re-running repeats the broken extraction, leaving the desktop app permanently unlaunchable until the cache is manually purged. - Add _electron_download_cache_dirs() + _purge_corrupt_electron_cache() to hermes_cli/main.py: validate every electron-.zip via zipfile.testzip() and delete corrupt ones; honor electron_config_cache / ELECTRON_CACHE overrides with per-OS defaults. - Wire purge + single retry into cmd_gui packaged-build failure path so a poisoned download self-heals (electron re-downloads clean). - Add beforePack hook (apps/desktop/scripts/before-pack.cjs) to wipe the target unpacked dir before staging, making packaging idempotent across interrupted runs. Cross-platform, best-effort. - Tests: corrupt-zip detector, cmd_gui purge/retry/launch path, no-retry-when-clean path, and node --test for the cleanup helper.	2026-06-04 07:17:33 -07:00
brooklyn!	e003c53b06	chore(desktop): zero eslint/typecheck debt + prettier pass (#39100 ) - eslint --fix across src/ and electron/ (unused imports, import/prop sort, padding) - flatten empty catch blocks in electron CJS; drop unused applyUpdatesPosixInApp arg - add setMutableRef helper for imperative ref writes (react-compiler clean) - move sidebar cookie persistence into an effect; extract scrollElementToBottom helper	2026-06-04 14:10:38 +00:00
Frowtek	3858cf4307	fix(cli): honor global-root active_provider fallback for named profiles	2026-06-04 07:08:30 -07:00
Frowtek	b7169f9bbb	fix(gateway): keep pending /update completion notifications until the target platform reconnects	2026-06-04 06:56:28 -07:00
ethernet	a6a0a5b1b0	fix(desktop): detect linux arm64 binary	2026-06-04 09:51:26 -04:00
Frowtek	fff0561441	fix(gateway): anchor Google Chat OAuth client secret to default Hermes root	2026-06-04 06:45:32 -07:00
Frowtek	07f5382675	fix(gateway): don't treat dm_policy: pairing as open access on own-policy adapters	2026-06-04 06:31:28 -07:00
annguyenNous	4cca7f569d	fix(tools): add raise_for_status for MiniMax t2a_v2 TTS path The MiniMax t2a_v2 code path calls response.json() without first checking the HTTP status code. If the API returns HTTP 4xx/5xx with non-JSON content (e.g. HTML error page), response.json() raises an opaque JSONDecodeError instead of a clear HTTPError. The non-t2a_v2 path already has response.raise_for_status() at line 1299. Add the same check before response.json() in the t2a_v2 path for consistent error handling.	2026-06-04 06:17:11 -07:00
teknium1	dd4ba4c2c4	fix(vision): cap pixel dimensions proactively at embed time + declare Pillow Follow-up to the salvaged #37727. That PR fixed the reactive recovery path (classifier + post-failure shrinker) but left the PROACTIVE embed-time guard in vision_tools byte-only — a tall small-byte screenshot (e.g. 1200x12000 at 0.06 MB) still baked into immutable history un-resized, relying on a failed round-trip to trigger reactive shrink. - vision_tools: add _image_exceeds_dimension() + _EMBED_MAX_DIMENSION (7900px); the embed-time cap now fires on bytes OR pixels and passes max_dimension to the resizer, so tall small-byte images are shrunk before they're embedded. - vision_tools: best-effort lazy-install of Pillow (tool.vision) in the resize ImportError fallback so the soft dep self-heals (respects allow_lazy_installs). - error_classifier: add two more Anthropic dimension-cap wording variants. - pyproject + lazy_deps: declare Pillow as the [vision] extra / tool.vision lazy dep (it was undeclared everywhere; without it ALL resize recovery no-ops). - tests: cover _image_exceeds_dimension (tall/small/edge/no-Pillow/corrupt). Co-authored-by: kyssta-exe <kyssta-exe@users.noreply.github.com>	2026-06-04 06:16:45 -07:00
kyssta-exe	6bdbe30763	fix(vision): guard image pixel dimensions, not just bytes (#37677 ) Anthropic enforces two independent ceilings per image: 1. 5 MB encoded byte size 2. 8000 px longest side Hermes only guarded #1. A tall screenshot (e.g. 1200x12000 at 0.06 MB) passes every byte check but fails the pixel check, returning a non-retryable HTTP 400 that permanently bricks the conversation thread. Fixes: - error_classifier: add 'image dimensions exceed' pattern to _IMAGE_TOO_LARGE_PATTERNS so the 400 is classified as image_too_large and triggers the shrink/retry path instead of falling through to non-retryable error. - conversation_compression: check pixel dimensions (via Pillow) even when byte size is under the 4 MB target. If max(dims) > 8000, force shrink. - vision_tools._resize_image_for_vision: add optional max_dimension param. When set, images exceeding the pixel cap are downscaled even if they're under the byte budget. The resize loop now checks both byte AND pixel limits before accepting a candidate. Closes #37677	2026-06-04 06:16:45 -07:00
annguyenNous	f7dabd3019	fix(api-server): guard json.loads against corrupted SQLite data in response cache The ResponseStore.get() method calls json.loads(row[0]) without any error handling. If the SQLite responses table contains corrupted JSON data (e.g. from a crash mid-write or disk corruption), this raises an unhandled JSONDecodeError that propagates to the caller. Fix: wrap in try/except (json.JSONDecodeError, TypeError). On parse failure, log a warning, evict the corrupted entry from the cache, and return None (consistent with the function's Optional return type).	2026-06-04 06:15:29 -07:00
teknium1	7314757876	refactor(feishu): slim meeting-invite parser; add AUTHOR_MAP entry Collapse the payload-shape normalization helpers into one _as_dict and drop unused dataclass fields (user_type/user_role, duplicate id, bot) on the meeting-invite handler. Module 274->212 LOC, behavior unchanged. Add zhaolei.vc@bytedance.com -> zhaoleibd to release.py AUTHOR_MAP.	2026-06-04 06:15:23 -07:00
zhaolei.vc	f3bbfda6d1	feat(gateway): handle Feishu meeting invitations Change-Id: I8cf5638393dd9adb1d7be5e170ce5082b41f77fa	2026-06-04 06:15:23 -07:00
kyssta-exe	86c64cfb5b	fix(gateway): visually expire Discord interactive views on timeout All Discord interactive views (ExecApprovalView, SlashConfirmView, UpdatePromptView, ModelPickerView, ClarifyChoiceView) now edit their message when the view times out, disabling buttons and updating the embed to show a 'Prompt expired' footer. Previously, timed-out buttons remained visually clickable in the UI, causing Discord's generic 'Interaction failed' error when clicked. Fixes #38022	2026-06-04 06:14:54 -07:00
Teknium	38d3c49aaf	refactor(skills): clean up bundled skill set + add environments: relevance gate (#39028 ) * refactor(skills): clean up bundled skill set + add environments: relevance gate Bundled skills cleanup pass plus a new offer-time relevance gate. Removals (redundant / dead): - spotify (covered by the spotify plugin's 7 native tools) - linear (covered by `hermes mcp install linear`) - kanban-codex-lane, debugging-hermes-tui-commands - empty category markers: diagramming, gifs, inference-sh, mlops/training, mlops/vector-databases - domain (stale orphan dup of optional/research/domain-intel) Bundled -> optional: - baoyu-article-illustrator, baoyu-comic, creative-ideation, pixel-art - dspy, subagent-driven-development - minecraft-modpack-server, pokemon-player - hermes-s6-container-supervision (-> optional/devops) Consolidation: - webhook-subscriptions + native-mcp folded into the hermes-agent skill as references/webhooks.md + references/native-mcp.md with SKILL.md pointers - writing-plans merged into plan (v2.0.0); related_skills + prose refs updated New: environments: frontmatter gate (agent/skill_utils.skill_matches_environment) - Offer-time relevance filter (kanban / docker / s6), parallel to platforms:. - Wired into the 3 OFFER surfaces only (prompt_builder skills index, skills_tool.list_skills, skill_commands slash discovery). - Explicit loads (skill_view, --skills preload) intentionally BYPASS it, so load-bearing force-loads like the kanban dispatcher's `--skills kanban-worker` always resolve. Verified via E2E. - kanban-orchestrator/kanban-worker tagged environments: [kanban]; hermes-s6-container-supervision tagged environments: [s6] + platforms: [linux]. Validation: 8/8 E2E gating assertions (incl force-load invariant); 442 targeted tests green (agent, skills_tool, skill_commands, kanban worker). * docs: regenerate skill catalogs + pages for the bundled cleanup Regenerated per-skill doc pages, catalogs, and sidebar to match the skill moves/removals in the parent commit. Moved skills' pages relocate bundled -> optional (history preserved); removed skills' pages deleted; edited skills' pages refreshed (hermes-agent now embeds the webhook + native-mcp reference pointers). zh-Hans i18n mirror: stale bundled pages and catalog rows for moved/removed skills pruned (new optional translations land via the translation pipeline). * test: drop regression test for removed kanban-codex-lane skill The kanban-codex-lane skill was removed in the bundled-skills cleanup; its dedicated regression test read the now-deleted SKILL.md and failed with FileNotFoundError on CI shard 6.	2026-06-04 06:11:22 -07:00
teknium1	c136eb4de1	fix(update): harden venv rebuild + verify core deps after install Two complementary fixes for a silent partial-install failure that bit ``hermes update`` in the wild: a fresh checkout pulled 145 commits, ``rebuild_venv`` failed to recreate the venv on Windows because ``shutil.rmtree(ignore_errors=True)`` couldn't delete files held open by the running ``hermes.exe`` shim. ``uv venv`` then refused with "A directory already exists at: venv" and the update fell back to installing on top of the stale venv. The resulting partial install missed exactly one newly-added base dep — ``pathspec==1.1.1`` — which ``hermes desktop --build-only`` imports at the top of its content-hash check. The desktop rebuild died with ModuleNotFoundError and the parent update only logged "⚠ Desktop build failed (non-fatal)". Same root cause made the "default: sync failed" line in the skill-sync stage, because that sync subprocess hit the same missing import. Fix 1: ``rebuild_venv`` retries with ``--clear`` ------------------------------------------------ If ``uv venv`` fails with "already exists" in stderr (which is what uv prints, and what uv's own hint tells you to fix with --clear), retry once with ``--clear``. Only this specific failure pattern triggers the retry — disk-full / interpreter-download failures still surface as before so we don't mask real problems. Fix 2: post-install dep verification ------------------------------------ Belt-and-suspenders so future uv resolver quirks (or any other cause of partial installs) surface immediately instead of hours later in a downstream subprocess. After ``_install_python_dependencies_with_optional_fallback`` runs, ``_verify_core_dependencies_installed``: 1. Reads ``[project.dependencies]`` straight from pyproject.toml (so we don't trust the venv's stale metadata). 2. Filters by environment markers via ``packaging.requirements.Requirement`` so cross-platform exclusions (``ptyprocess ; sys_platform != 'win32'``) don't false-positive on Windows. 3. Runs ``importlib.metadata.version()`` for each remaining dep inside the target venv interpreter (resolved from ``VIRTUAL_ENV``, not ``sys.executable``). 4. If anything is missing, reinstalls the base group with ``--reinstall`` to force re-resolution. If a second probe still reports missing deps, force-installs each one with its pinned spec. 5. Treats final failure as a warning rather than a hard error — a single broken-on-PyPI dep shouldn't block an otherwise-successful update — but the message points at ``hermes update --force`` and names the missing packages so the user knows what's wrong. Tests ----- - ``TestRebuildVenv::test_retries_with_clear_when_dir_already_exists`` — simulates the rmtree-couldn't-delete-it failure mode and asserts the ``--clear`` retry path is taken and succeeds. - ``TestRebuildVenv::test_does_not_retry_when_first_failure_is_not_dir_exists`` — guards against masking real failures (disk full, etc.). - ``test_verify_core_dependencies.py`` — 7 tests covering the happy path, the regression (missing pathspec triggers --reinstall), the per-package fallback when --reinstall doesn't help, the platform- marker filter so Windows doesn't try to install ptyprocess, the missing-pyproject noop, and the VIRTUAL_ENV resolver. Co-authored-by: Kyssta <218078013+kyssta-exe@users.noreply.github.com>	2026-06-04 06:05:41 -07:00
annguyenNous	28ca4460a1	fix(gateway): guard kanban dispatcher against malformed config and empty summaries Two error handling gaps in the gateway kanban dispatcher: 1. float() on dispatch_interval_seconds crashes with ValueError if the config value is a non-numeric string. Wrap in try/except and fall back to the default 60-second interval with a warning log. 2. splitlines()[0] on payload_summary and task.result raises IndexError when the string is whitespace-only (truthy but strip() produces empty string, splitlines() returns []). Guard with a check on the lines list before indexing.	2026-06-04 06:03:05 -07:00
brooklyn!	cbfe1d21d1	docs(guides): Run Nemotron 3 Ultra free in Hermes Agent (launch guide) (#38769 ) * docs(guides): add "Run Nemotron 3 Ultra free in Hermes Agent" launch guide Day-0 NVIDIA Nemotron 3 Ultra availability on Nous Portal (free June 4-18, in partnership with NVIDIA + Nebius). Quick Setup walkthrough for selecting the nvidia/nemotron-3-ultra:free tier, plus switching/troubleshooting notes. Registered at the top of Guides & Tutorials. * docs(guides): reword Nemotron lead-in to match launch copy Frame as Nemotron Coalition induction (working with NVIDIA) + Nebius partnership for the free tier, rather than a direct NVIDIA partnership, to avoid overstating the relationship. * docs(guides): lead Nemotron guide with desktop app, CLI second Add a one-click desktop-app install track (download → Nous Portal recommended sign-in → pick the Free-tier nemotron-3-ultra model) as the recommended path for non-terminal users, and keep the CLI curl flow as Option B. Update switching/troubleshooting to cover both surfaces.	2026-06-04 09:00:29 -04:00
AhmetArif0	cd68b8f0e8	fix(auth): set active_provider after hermes auth add qwen-oauth hermes auth add qwen-oauth called pool.add_entry() but never wrote to providers["qwen-oauth"] or set active_provider in auth.json. _model_section_has_credentials() checks get_active_provider() first; with active_provider unset and no api_key_env_vars configured for oauth_external providers, the setup wizard reported "No inference provider configured" even after a successful Qwen CLI OAuth login. Add _mark_qwen_oauth_active() in auth.py: writes a minimal provider state entry (base_url for display only) and calls _save_provider_state() to set active_provider. The function deliberately does not copy the api_key — that lives in the Qwen CLI credential file managed by _save_qwen_cli_tokens / resolve_qwen_runtime_credentials and must not be duplicated in auth.json where it would become stale. pool.add_entry() is retained so "hermes auth list" continues to show the entry. Runtime credential resolution continues to use resolve_qwen_runtime_credentials. Mirrors the fix applied to openai-codex (#37517) and xai-oauth (#37576).	2026-06-04 05:58:33 -07:00
Teknium	d12c233378	docs(wecom): stop implying live streaming and typing support (#38990 ) The WeCom adapter delivers each response as a single complete message via aibot_respond_msg / aibot_send_msg — it does not stream tokens incrementally (no edit_message override) and send_typing is a no-op. Reword the 'Reply-mode streaming' feature bullet to 'Reply correlation', retitle the section to 'Reply-Mode Responses', and add a note clarifying that neither token streaming nor typing indicators are supported.	2026-06-04 05:57:01 -07:00
Frowtek	71a9f44e80	fix(gateway): retry startup auto-resume when a failed platform reconnects	2026-06-04 05:56:45 -07:00
Fearvox	fa8e2f935b	polish(minimax): address Copilot review comments on M3 default-aux fix Three Copilot inline review comments on #37664, two worth landing in a polish pass before merge: 1. auxiliary_client.py:270 — Copilot suggested keeping the minimax-* entries in _API_KEY_PROVIDER_AUX_MODELS_FALLBACK as a safety net for environments where the profile-based resolution can't import or run plugin discovery. Declined. The deepseek precedent (commit `773a0faca`) explicitly removed deepseek from the same dict for the same reason — the profile layer is the source of truth and the dict is a legacy pre-profiles-system fallback. We do not want to fragment the codebase by provider: either the profile layer is authoritative or the dict is. The minimax PR picks profile (matching deepseek) and the dict stays cleaned up. The risk Copilot raises is real but theoretical — plugin discovery runs at import time of the providers module, which is the first thing any modern Hermes entrypoint imports. 2. tests/agent/test_minimax_provider.py:162 — Copilot flagged that the test class relies on _get_aux_model_for_provider() resolving via provider profiles but doesn't explicitly trigger plugin discovery. Fixed. Added 'import model_tools # noqa: F401' at the top of both test_minimax_aux_is_standard and test_minimax_aux_not_highspeed. The fixtures in the parallel test_minimax_profile.py already did this; the legacy test in test_minimax_provider.py was order-dependent and would silently break if anyone reorganised the test ordering. Pinned the dependency explicitly so the test is order-independent. 3. tests/plugins/model_providers/test_minimax_profile.py:46 — Copilot flagged that the docstring referenced a hard-coded line number 'hermes_cli/models.py:298' that would go stale. Fixed. Replaced with the symbol reference 'hermes_cli.models._PROVIDER_MODELS[\'minimax\']' which is stable under file edits and grep-friendly. The new docstring also reads more naturally — readers don't have to look up 'what's at line 298' to follow the reasoning. All 221 minimax-related tests still pass.	2026-06-04 05:53:35 -07:00
Fearvox	b531b5d12a	fix(minimax): update AUTHOR_MAP entry + test_minimax_oauth_aux_model_registered Two follow-ups to the M3 default-aux-model PR (#37664): 1. AUTHOR_MAP entry: add fearvox1015@gmail.com -> Fearvox so the check-attribution CI job recognises Nolan's real contributor email. The previous run of the attribution check on #37664 failed because the commit was authored as nolan@0xvox.com (wrong local git config) which isn't in AUTHOR_MAP. The commit itself is now re-authored to fearvox1015@gmail.com so both the per-commit check and the AUTHOR_MAP lookup pass. 2. tests/hermes_cli/test_api_key_providers.py::TestMinimaxOAuthProvider ::test_minimax_oauth_aux_model_registered was pinning the aux model in the legacy _API_KEY_PROVIDER_AUX_MODELS dict, which the PR correctly removed (mirrors the deepseek cleanup in `773a0faca`). The test now asserts the new world order: the aux model comes from ProviderProfile.default_aux_model on the minimax-oauth profile, not the fallback dict. This is the same pattern that the profile-layer deepseek fix introduced.	2026-06-04 05:53:35 -07:00
Fearvox	3d1d0a49fe	fix(minimax): align default_aux_model with M3 frontier on minimax + minimax-cn The minimax / minimax-cn / minimax-oauth profiles still advertised M2.7 (and M2.7-highspeed for OAuth) as their default_aux_model, predating the M3 release (2026-06-01). The user-facing _PROVIDER_MODELS['minimax'] catalog top entry is M3, and the recommended config for a Token-Plan install now sets model.default: MiniMax-M3, so the aux default was the only remaining drift. Updates: * minimax default_aux_model: M2.7 -> M3 * minimax-cn default_aux_model: M2.7 -> M3 * minimax-oauth default_aux_model: M2.7-highspeed -> M2.7 (M3 is not on the OAuth / Coding Plan tier per platform docs as of this PR; the highspeed variant was the 2x-cost regression from #4082 that PR #6082 collapsed to plain M2.7 for minimax / minimax-cn but missed OAuth) * agent/auxiliary_client.py: drop the three legacy _API_KEY_PROVIDER_AUX_MODELS_FALLBACK entries for the minimax family. _get_aux_model_for_provider() reads from ProviderProfile.default_aux_model first (line 250) and only falls back to the dict when the profile has no aux model or the profile import fails. With the profile now set, the dict entries are dead code and a drift hazard. Mirrors the deepseek cleanup in `773a0faca`. * tests/agent/test_minimax_provider.py: update the existing TestMinimaxAuxModel assertions from MiniMax-M2.7 to MiniMax-M3 (the intent — 'standard, not highspeed' — is unchanged; the pin value is). * tests/plugins/model_providers/test_minimax_profile.py: new file mirroring tests/plugins/model_providers/test_deepseek_profile.py. Pins each of the three profiles' default_aux_model and asserts _get_aux_model_for_provider() returns it. A second class guards against the highspeed regression coming back. Refs: - Closes #36196 in spirit (M3 support — the catalog half of that issue is #36212; this PR covers the profile half) - Related: #4082 (M2.7-highspeed 2x-cost), #6082 (previous M2.7-highspeed -> M2.7 fix that missed OAuth + the auxiliary_client.py fallback dict) - Pattern: `773a0faca` (same profile-layer fix for deepseek)	2026-06-04 05:53:35 -07:00
AhmetArif0	5f62ba8e4b	fix(auth): use _save_xai_oauth_tokens in auth_commands to set active_provider hermes auth add xai-oauth called pool.add_entry() directly, writing only the credential-pool entry (source "manual:xai_pkce") without touching providers["xai-oauth"] or setting active_provider in auth.json. _model_section_has_credentials() checks get_active_provider() first; with active_provider unset and no api_key_env_vars configured for oauth_external providers, the setup wizard reported "No inference provider configured" even after a successful OAuth login. Use _save_xai_oauth_tokens() — the canonical path already called from the hermes model xAI login flow — which writes providers["xai-oauth"]["tokens"] (setting active_provider) and lets _seed_from_singletons seed the pool with a "loopback_pkce" entry on the next load_pool() call. Mirrors the fix applied to openai-codex in #37517.	2026-06-04 05:48:50 -07:00
teknium1	643181b346	chore: add scubamount to AUTHOR_MAP for salvaged PR #37616	2026-06-04 05:46:13 -07:00
scubamount	b6206020d3	fix(desktop): remove session search aux model	2026-06-04 05:46:13 -07:00
AhmetArif0	34a2903527	fix(auth): set active_provider after hermes auth add google-gemini-cli hermes auth add google-gemini-cli called pool.add_entry() but never wrote to providers["google-gemini-cli"] or set active_provider in auth.json. _model_section_has_credentials() checks get_active_provider() first; with active_provider unset and no api_key_env_vars configured for oauth_external providers, the setup wizard reported "No inference provider configured" even after a successful OAuth login. Add _mark_google_gemini_cli_active() in auth.py: writes a minimal provider state entry (email for display only) and calls _save_provider_state() to set active_provider. The function deliberately does not copy access_token or refresh_token — those are managed by agent.google_oauth in the Google credential file and must not be duplicated in auth.json where they would become stale. pool.add_entry() is retained so "hermes auth list" continues to show the entry. Runtime credential resolution continues to use agent.google_oauth directly. Mirrors the fix applied to openai-codex (#37517) and xai-oauth (#37576).	2026-06-04 05:44:22 -07:00
Teknium	9fbfeb31b9	fix(cron): make sequential jobs non-blocking too + sweep MCP after jobs finish Follow-up on the parallel-dispatch decoupling: the sequential pass for workdir/profile jobs still ran inline in the ticker thread, so a long workdir/profile job reintroduced the exact starvation #37312 describes, just for env-mutating jobs. And the MCP orphan sweep ran immediately after dispatch in sync=False mode — before jobs finished — defeating its own 'runs after every job' contract and racing jobs still spawning MCP children. - Sequential jobs now queue to a persistent single-thread cron-seq pool (preserves one-at-a-time ordering across ticks, never blocks the tick). - Same in-flight dedup guard now covers sequential jobs. - MCP orphan sweep runs via a done-callback after the LAST dispatched job completes in async mode; inline after as_completed in sync mode. Verified E2E: tick(sync=False) returns in ~1ms with a 1.5s sequential job in flight; sweep fires only after that job ends.	2026-06-04 05:40:13 -07:00
Vynxe Vainglory	eb9cde7346	fix(cron): decouple job dispatch from completion in tick() PR #13021 fixed serial starvation by adding ThreadPoolExecutor to tick(), but kept as_completed(timeout=600) which still blocks the ticker thread until the slowest job finishes. This causes the same starvation pattern: when one job runs long (15+ min), other jobs' next_run_at expires past the grace window and they get perpetually fast-forwarded instead of running. This PR decouples dispatch from completion: - Persistent ThreadPoolExecutor (reused across ticks, no auto-join) - Fire-and-forget dispatch: tick submits and returns immediately - Running-job guard: prevents re-dispatching active jobs - sync parameter: defaults to True (backward compatible), callers opt into sync=False for non-blocking behavior - atexit shutdown handler for clean pool teardown - gateway/run.py: production ticker opts into sync=False Refs #33315 (complementary — that issue's PRs fix grace handling in jobs.py; this PR prevents the grace from expiring in the first place)	2026-06-04 05:40:13 -07:00
teknium1	c14e6b4edf	chore(release): map ashishpatel26 author email for salvage	2026-06-04 05:38:12 -07:00
ashishpatel26	c9b62061d4	fix(cli): launchd KeepAlive unconditional restart (#37388 ) Replace KeepAlive.SuccessfulExit=false dict with <key>KeepAlive</key><true/> so launchd restarts hermes-gateway on any exit, matching the documented drain-then-exit restart protocol used by --graceful-restart.	2026-06-04 05:38:12 -07:00
teknium	153fe28474	fix(vision): use MiniMax type="video" block (not input_video) + tests The salvaged conversion emitted type:"input_video", which MiniMax M3 rejects just like the original video_url block. Per MiniMax's Anthropic-compat docs, the video content block is type:"video" with an image-style source (base64 or url). Fixes the block type, converts URL-based videos too, and adds 4 video conversion tests (none shipped with the original PR).	2026-06-04 05:38:11 -07:00
kyssta-exe	0b46c4163a	fix(vision): convert video_url blocks to Anthropic input_video format for MiniMax providers The video_analyze tool sends OpenAI-style 'video_url' content blocks, which breaks Anthropic-protocol providers (minimax, minimax-cn). These providers expect 'input_video' blocks with base64 data instead of data: URLs. Extends _convert_openai_images_to_anthropic() to also handle video_url blocks, converting them to Anthropic's input_video format when targeting Anthropic-compatible endpoints. Fixes #37219	2026-06-04 05:38:11 -07:00
AhmetArif0	9756dff5fd	fix(model_metadata): drop stale ≤256,000 cache entries for Grok-4.3 The ``grok-4.3`` (1M context) catalog entry was added on 2026-05-15 (`ce0e189d3`). Between 2026-04-10 (when ``grok-4`` at 256,000 was first added by `b57769718`) and 2026-05-15, grok-4.3 slugs resolved via the generic ``grok-4`` substring catch-all and that 256,000 value was persisted to context_length_cache.yaml. Users who first queried grok-4.3 in that 35-day window are stuck at 256K forever — the cache is read at step 1 before the hardcoded defaults in step 8, so the correct 1M entry is never reached. Mirror the existing Kimi/Codex/MiniMax-M3 stale-cache guards: add _model_name_suggests_grok_4_3() and an elif branch that drops any cached value ≤ 256,000 for a grok-4.3 slug so the next lookup falls through to the 1M hardcoded default. Adds 4 regression tests: helper unit test, stale-drop-and-re-resolve, correct-cache-preserved, and no-clobber for plain grok-4 (256K correct).	2026-06-04 05:36:34 -07:00
Teknium	b04c6e95f6	fix(approval): catch perl/ruby -i as a separate flag token The salvaged pattern matched -i only inside the first flag token, so `perl -p -i -e '...' config.yaml` (the -i split out after -p) slipped through. Widen to match a -...i flag token anywhere in the args; still no false positive on `perl -e` code eval or config reads. Adds tests for the separate-token, backup-suffix, and read-safe forms.	2026-06-04 05:36:30 -07:00
AhmetArif0	a6a4e6f9d7	fix(approval): gate perl/ruby -i in-place edits of Hermes config/env sed -i coverage for ~/.hermes/config.yaml and .env was added in #14639, but perl -i and ruby -i — which perform the same direct file mutation — were not covered. The existing perl/ruby pattern only catches -e/-c (code evaluation), not -i (file mutation), so: perl -i -pe 's/approvals.mode: on/approvals.mode: off/' ~/.hermes/config.yaml bypasses the approval gate entirely, letting the agent flip approvals.mode off mid-session via the mtime-keyed config cache reload. Add a single pattern mirroring the sed -i lines: `\b(?:perl\|ruby)\s+-[^\s]*i` against both _HERMES_CONFIG_PATH and _HERMES_ENV_PATH. Three regression tests pin the new coverage.	2026-06-04 05:36:30 -07:00
teknium1	5f199e610b	chore(release): add AUTHOR_MAP entry for solaitken	2026-06-04 05:35:43 -07:00
Sol Aitken	de60bf40c6	fix(memory): register parent packages for user-installed provider imports User-installed memory providers load under the synthetic _hermes_user_memory.<name> package, but the loader never registered that parent namespace in sys.modules (it only registers "plugins" and "plugins.memory" for bundled providers). As a result any external provider using a relative import failed to load: from . import config ModuleNotFoundError: No module named '_hermes_user_memory' The same gap in discover_plugin_cli_commands() meant an external provider's cli.py with a relative import could never be discovered, so the documented "hermes <plugin>" CLI integration did not work for standalone plugins. Register the synthetic parent namespace before loading user-installed providers, mirror it for cli.py discovery (including the per-provider parent package, without executing the plugin's __init__.py), and make _load_provider_from_dir() reuse only modules actually loaded from disk so a parent shell registered by CLI discovery is never mistaken for the loaded provider. Regressions cover: a flat provider with a sibling relative import, a provider with its implementation in a nested subpackage (including a namespace intermediate directory), cli.py discovery with a relative import, and provider load after CLI discovery ran first. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-04 05:35:43 -07:00
AhmetArif0	4ae3c988b5	fix(gateway): bridge shared-key loop to nested platform config blocks The shared-key bridging loop (allow_from, require_mention, free_response_channels, …) read only the top-level yaml platform block (yaml_cfg.get(plat.value)). When a user configured a platform solely under ``platforms:`` or ``gateway.platforms:`` with no top-level block, the loop skipped that platform entirely and all bridged keys were silently dropped into PlatformConfig.extra — making allow_from, require_mention, etc. ineffective for nested-only configs. The apply_yaml_config_fn dispatch already received this same fallback in `44f3e51` to handle plugin adapters (e.g. Discord allow_from). The shared-key loop now mirrors it: if yaml_cfg.get(plat.value) is absent, fall back to gateway.platforms.<name> then platforms.<name>. The enabled field is deliberately excluded from the nested fallback (guarded by _cfg_toplevel): _merge_platform_map already merged it with the correct precedence, so re-applying it from a single nested source would overwrite the correctly-merged value. Two new regression tests assert that allow_from and require_mention configured under platforms.telegram and gateway.platforms.telegram are bridged into PlatformConfig.extra. All 54 existing config tests pass.	2026-06-04 05:31:47 -07:00
Teknium	d3fab54933	fix(cli): clear screen on exit so live chrome isn't stranded in scrollback (#38928 ) The classic CLI left its live bottom chrome — the status bar, input box, and separator rules — frozen in terminal scrollback after exit, on every exit path (/exit, /quit, Ctrl+C, EOF) and on both Linux and Windows. The prior erase_when_done=True fix (`bf82a7f1c`) routes prompt_toolkit's teardown through renderer.erase(), but that walks back by the renderer's internal cursor model and does not reliably wipe the chrome in practice — users still saw a dead status bar + the rest of the session sitting above the resume summary. Clear the screen + scrollback directly at the single exit funnel instead. All exit paths converge on _print_exit_summary() (called from the run-loop finally block after app.run() returns and prompt_toolkit has restored terminal modes), so a new _clear_terminal_on_exit() helper runs there before the summary prints. It writes ESC[3J ESC[2J ESC[H (erase scrollback, erase screen, home cursor) on a real TTY, no-ops silently when stdout is not a terminal (pipes/redirects), and falls back to the platform clear command if the escape write fails. Works on Linux, macOS, and modern Windows terminals (Terminal/conhost with VT processing, already enabled by prompt_toolkit). The resume/goodbye summary now prints at a clean top-left with nothing stranded above it. Fixes #38252.	2026-06-04 04:38:35 -07:00
Teknium	c0435f4fef	docs: remote desktop connect uses username/password, not --insecure + session token (#38926 ) The documented path for connecting Hermes Desktop to a remote backend was `--insecure` + a pinned HERMES_DASHBOARD_SESSION_TOKEN — an unauthenticated bind plus a copy-pasted token. Replace it everywhere with the bundled username/password dashboard-auth provider: set HERMES_DASHBOARD_BASIC_AUTH_, run `hermes dashboard --host 0.0.0.0` (the non-loopback bind engages the auth gate), and Sign in from the app. - desktop.md: rewrite 'Connecting to a remote backend' for the user/pass + Sign in flow - web-dashboard.md: rewrite both remote-backend sections (overview + dedicated); reframe the auth-gate section so --insecure is a discouraged escape hatch, not a co-equal use case; drop the removed --tui flag from the systemd example - environment-variables.md: lead with HERMES_DASHBOARD_BASIC_AUTH_; drop the session-token / HERMES_DESKTOP_REMOTE_TOKEN remote-connect entries - docker.md: mention the username/password provider as the simplest gate provider	2026-06-04 21:23:59 +10:00
Teknium	df9fb8e5e6	fix(tools): stop hermes tools reporting kanban as removed (#38918 ) The hermes tools save summary printed '- kanban' (and would print '+ kanban') for a platform even though kanban is never offered as a checklist option. kanban is a check_fn-gated toolset whose tools are a subset of the platform composite, so _get_platform_tools resolves it as enabled, but _prompt_toolset_checklist only renders CONFIGURABLE_TOOLSETS — so it can never survive into the returned selection. The added/removed diff (current_enabled - new_enabled) then surfaced kanban as removed. Scope the printed diff to the checklist's actual universe via the new _checklist_toolset_keys() helper at all three diff sites (first-install, all-platforms, per-platform). The persisted config is unaffected — _save_platform_tools already preserves non-configurable entries; this was purely a false-signal in the UI.	2026-06-04 03:31:43 -07:00
Ben	616c0a36b6	fix(dashboard-auth): don't abort verify chain on one provider's ProviderError The gated dashboard verifies a session cookie by trying each registered DashboardAuthProvider's verify_session in turn (the session cookie stores only the access token, not which provider issued it). A provider that doesn't recognise a token returns None; a provider whose IDP/JWKS is unreachable raises ProviderError. The loop used to return HTTP 503 on the FIRST ProviderError, before any later provider got a turn. With multiple providers stacked, that means an unreachable IDP for a session you didn't even use blocks login through a different, reachable provider. Concrete repro: a self-hosted-OIDC session hits the 'nous' provider first (registered earlier); nous tries to reach Nous Portal's JWKS, which is unreachable in a self-hosted deployment, so it raises — and the gate 503s before the 'self-hosted' provider can verify the token. Hit live while testing the new self-hosted OIDC plugin against a local Keycloak. Fix: a ProviderError from one provider is logged and the loop continues to the next. A 503 is returned only if NO provider verified the token AND at least one was unreachable — distinguishing a transient IDP outage (don't force a needless re-login) from a token that's genuinely invalid (fall through to refresh/relogin). Single-provider behaviour is unchanged. Tests: adds an _UnreachableProvider stub and three cases — unreachable provider first must not block a working second; all-unreachable still 503s; reachable-but-unrecognised falls through to 401/relogin (not 503). Mutation-tested: reverting the fix makes the first case fail with the exact 503 bug.	2026-06-04 03:23:45 -07:00

1 2 3 4 5 ...

10571 Commits