fix(kanban): don't permanently block tasks that hit a provider rate limit (#38223)

A kanban worker that exhausted its retries purely on a provider rate limit / quota wall (e.g. opencode-go's 5-hour window) exited with code 1. The dispatcher counted that as a crash, and with DEFAULT_FAILURE_LIMIT=2 two quota-wall hits permanently blocked the card. Fanning out many workers against one shared quota made this routine. Now a rate-limited worker exits with EX_TEMPFAIL (75); the dispatcher classifies that as a 'rate_limited' exit, releases the task back to 'ready' WITHOUT incrementing consecutive_failures (the breaker can't trip on a transient throttle), and the respawn guard defers the next attempt on a cooldown (default 5min, HERMES_KANBAN_RATE_LIMIT_COOLDOWN_SECONDS) until the quota window clears. Genuine crashes still count and trip the breaker as before. The 120s Retry-After cap is unchanged — no worker parks for hours holding a slot. - conversation_loop.py: surface failure_reason in the exhaustion return - cli.py: kanban worker picks exit 75 on rate_limit/billing failure - kanban_db.py: rate_limited exit kind, no-count requeue, cooldown guard
2026-06-03 06:19:32 -07:00
parent 60b6352fe5
commit 4c544b633d
4 changed files with 408 additions and 16 deletions
--- a/agent/conversation_loop.py
+++ b/agent/conversation_loop.py
@ -3378,6 +3378,12 @@ def run_conversation(
                        "completed": False,
                        "failed": True,
                        "error": _final_summary,
+                        # Surface the classified reason so callers (notably the
+                        # kanban worker path in cli.py) can distinguish a
+                        # transient throttle from a real failure and choose a
+                        # different exit code. ``rate_limit`` / ``billing`` here
+                        # mean "quota wall, not a task error".
+                        "failure_reason": classified.reason.value,
                    }

                # For rate limits, respect the Retry-After header if present