fix(codex): surface actionable hint when stale-call detector fires on known silent-reject pattern

The ChatGPT Codex backend (chatgpt.com/backend-api/codex) has historically
silently dropped certain model requests: the connection is accepted but no
stream events are emitted and no error is raised. PR #31967 lowered the
implicit stale-call default from 300s to 90s so fallbacks kick in faster,
but users still see an opaque "No response from provider for 90s
(non-streaming, ...)" message that gives no path forward.

This patch adds a narrow heuristic — gpt-5.5 family on the Codex backend
via codex_responses api_mode — that substitutes the generic timeout
message with actionable text naming the gpt-5.4-codex workaround and
pointing at #21444 for symptom history.

Changes:

- run_agent.py — new ``AIAgent._codex_silent_hang_hint(model=...)`` method.
  Returns ``None`` for any request that does not match all three guards
  (codex_responses api_mode, openai-codex provider or chatgpt.com Codex
  base URL, gpt-5.5-family model name with word-boundary regex anchoring
  to avoid false-positives on e.g. ``gpt-5.50``).
- agent/chat_completion_helpers.py — the non-stream stale-call site
  consults the hint via ``getattr(...)`` so the call site stays robust
  if the helper is ever removed or stubbed in tests. Hint is appended to
  both the ``_emit_status`` warning and the ``TimeoutError`` message so
  the user sees it in their terminal AND it lands in any retry-loop
  diagnostics.
- tests/run_agent/test_codex_silent_hang_hint.py — 10 regression tests
  covering positive cases (bare gpt-5.5, vendor-prefixed openai/gpt-5.5,
  gpt-5.5-codex SKU, model=None fallback to self.model) and negative
  cases (gpt-5.4-codex workaround, gpt-5.50 false-positive guard,
  non-codex api_mode, non-codex provider, empty/None model, unrelated
  models on Codex).

Does NOT fix the backend-side issue (that's an upstream OpenAI/ChatGPT
problem we cannot patch from here). Only converts an opaque timeout into
text that names the workaround so users do not have to dig through logs
or wait for a forum post to learn what to do.

Closes #22046
This commit is contained in:
Tranquil-Flow
2026-05-25 03:54:07 -07:00
committed by Teknium
parent 4c64638897
commit b1adb95038
4 changed files with 203 additions and 9 deletions

View File

@ -927,6 +927,57 @@ class AIAgent:
return max(stale_base, 150.0)
return stale_base
def _codex_silent_hang_hint(self, model: Optional[str] = None) -> Optional[str]:
"""Return an actionable hint when this request matches a known
Codex silent-reject configuration, else ``None``.
The ChatGPT Codex backend (``chatgpt.com/backend-api/codex``) has
historically silently dropped certain model requests: the connection
is accepted but no stream events are emitted and no error is raised.
The stale-call detector ends the hang, but a generic "timed out"
message gives the user no path forward.
This helper substitutes an actionable hint into the stale-timeout
warning when the request matches a known silent-reject pattern.
Currently flagged: ``gpt-5.5`` family on the Codex backend. See
hermes-agent #21444 for the symptom history. The upstream backend
behavior has historically come and gone with ChatGPT entitlement
changes — the heuristic stays in place as future-proofing even when
the symptom is dormant.
Does NOT fix the backend issue. Only converts an opaque stale-timeout
into actionable text so users learn the workaround in seconds rather
than digging through logs.
"""
if self.api_mode != "codex_responses":
return None
is_codex_backend = (
self.provider == "openai-codex"
or (
getattr(self, "_base_url_hostname", "") == "chatgpt.com"
and "/backend-api/codex" in (getattr(self, "_base_url_lower", "") or "")
)
)
if not is_codex_backend:
return None
eff_model = (model if model is not None else self.model) or ""
model_lower = eff_model.lower()
# Match the gpt-5.5 family — bare ``gpt-5.5``, ``gpt-5.5-codex``,
# vendor-prefixed variants like ``openai/gpt-5.5``, and any future
# ``gpt-5.5-*`` SKU. Anchor at a word boundary on either side so
# unrelated tokens like ``gpt-5.50`` do not match.
if not re.search(r"(?:^|[/\-_])gpt-5\.5(?:$|[\-_])", model_lower):
return None
return (
f"Codex backend appears to be silently rejecting {eff_model!r} "
"on chatgpt.com/backend-api/codex (no stream events, no error). "
"This is a known backend-side pattern that has affected ChatGPT "
"Plus accounts intermittently. "
"Workaround: try `gpt-5.4-codex` on the same OAuth profile, "
"or switch to a different model/provider in your fallback chain. "
"See hermes-agent#21444 for symptom history."
)
def _is_openrouter_url(self) -> bool:
"""Return True when the base URL targets OpenRouter."""
return base_url_host_matches(self._base_url_lower, "openrouter.ai")