fix(codex): surface actionable hint when stale-call detector fires on known silent-reject pattern

The ChatGPT Codex backend (chatgpt.com/backend-api/codex) has historically silently dropped certain model requests: the connection is accepted but no stream events are emitted and no error is raised. PR #31967 lowered the implicit stale-call default from 300s to 90s so fallbacks kick in faster, but users still see an opaque "No response from provider for 90s (non-streaming, ...)" message that gives no path forward. This patch adds a narrow heuristic — gpt-5.5 family on the Codex backend via codex_responses api_mode — that substitutes the generic timeout message with actionable text naming the gpt-5.4-codex workaround and pointing at #21444 for symptom history. Changes: - run_agent.py — new ``AIAgent._codex_silent_hang_hint(model=...)`` method. Returns ``None`` for any request that does not match all three guards (codex_responses api_mode, openai-codex provider or chatgpt.com Codex base URL, gpt-5.5-family model name with word-boundary regex anchoring to avoid false-positives on e.g. ``gpt-5.50``). - agent/chat_completion_helpers.py — the non-stream stale-call site consults the hint via ``getattr(...)`` so the call site stays robust if the helper is ever removed or stubbed in tests. Hint is appended to both the ``_emit_status`` warning and the ``TimeoutError`` message so the user sees it in their terminal AND it lands in any retry-loop diagnostics. - tests/run_agent/test_codex_silent_hang_hint.py — 10 regression tests covering positive cases (bare gpt-5.5, vendor-prefixed openai/gpt-5.5, gpt-5.5-codex SKU, model=None fallback to self.model) and negative cases (gpt-5.4-codex workaround, gpt-5.50 false-positive guard, non-codex api_mode, non-codex provider, empty/None model, unrelated models on Codex). Does NOT fix the backend-side issue (that's an upstream OpenAI/ChatGPT problem we cannot patch from here). Only converts an opaque timeout into text that names the workaround so users do not have to dig through logs or wait for a forum post to learn what to do. Closes #22046
2026-05-25 03:54:07 -07:00
parent 4c64638897
commit b1adb95038
4 changed files with 203 additions and 9 deletions
--- a/run_agent.py
+++ b/run_agent.py
@ -927,6 +927,57 @@ class AIAgent:
            return max(stale_base, 150.0)
        return stale_base

+    def _codex_silent_hang_hint(self, model: Optional[str] = None) -> Optional[str]:
+        """Return an actionable hint when this request matches a known
+        Codex silent-reject configuration, else ``None``.
+
+        The ChatGPT Codex backend (``chatgpt.com/backend-api/codex``) has
+        historically silently dropped certain model requests: the connection
+        is accepted but no stream events are emitted and no error is raised.
+        The stale-call detector ends the hang, but a generic "timed out"
+        message gives the user no path forward.
+
+        This helper substitutes an actionable hint into the stale-timeout
+        warning when the request matches a known silent-reject pattern.
+        Currently flagged: ``gpt-5.5`` family on the Codex backend.  See
+        hermes-agent #21444 for the symptom history.  The upstream backend
+        behavior has historically come and gone with ChatGPT entitlement
+        changes — the heuristic stays in place as future-proofing even when
+        the symptom is dormant.
+
+        Does NOT fix the backend issue.  Only converts an opaque stale-timeout
+        into actionable text so users learn the workaround in seconds rather
+        than digging through logs.
+        """
+        if self.api_mode != "codex_responses":
+            return None
+        is_codex_backend = (
+            self.provider == "openai-codex"
+            or (
+                getattr(self, "_base_url_hostname", "") == "chatgpt.com"
+                and "/backend-api/codex" in (getattr(self, "_base_url_lower", "") or "")
+            )
+        )
+        if not is_codex_backend:
+            return None
+        eff_model = (model if model is not None else self.model) or ""
+        model_lower = eff_model.lower()
+        # Match the gpt-5.5 family — bare ``gpt-5.5``, ``gpt-5.5-codex``,
+        # vendor-prefixed variants like ``openai/gpt-5.5``, and any future
+        # ``gpt-5.5-*`` SKU.  Anchor at a word boundary on either side so
+        # unrelated tokens like ``gpt-5.50`` do not match.
+        if not re.search(r"(?:^|[/\-_])gpt-5\.5(?:$|[\-_])", model_lower):
+            return None
+        return (
+            f"Codex backend appears to be silently rejecting {eff_model!r} "
+            "on chatgpt.com/backend-api/codex (no stream events, no error). "
+            "This is a known backend-side pattern that has affected ChatGPT "
+            "Plus accounts intermittently. "
+            "Workaround: try `gpt-5.4-codex` on the same OAuth profile, "
+            "or switch to a different model/provider in your fallback chain. "
+            "See hermes-agent#21444 for symptom history."
+        )
+
    def _is_openrouter_url(self) -> bool:
        """Return True when the base URL targets OpenRouter."""
        return base_url_host_matches(self._base_url_lower, "openrouter.ai")