polish(gateway): address Copilot review comments on fd-leak fix
Seven Copilot inline review comments on #37679, four worth landing in a polish pass before merge: 1. _dispose_unused_adapter signature: 'BasePlatformAdapter' -> 'BasePlatformAdapter | None'. The function explicitly handles None and the reconnect watcher calls it with None in the except arm, so the annotation now matches the actual contract. 2. (duplicate of #1 on a different line) — same fix. 3. except Exception in _dispose_unused_adapter — the reviewer asked about asyncio.CancelledError swallowing. On Python 3.8+ (Hermes requires 3.13, see pyproject.toml), CancelledError inherits from BaseException, NOT Exception, so the existing 'except Exception' does NOT swallow task cancellation. Added an explicit comment explaining the contract so future readers don't repeat the analysis. We don't re-raise because the watcher loop intentionally treats dispose failures as best-effort: a failed dispose on an unowned adapter should not take down the watcher that's keeping the gateway alive. 4. _response_store = None after close in api_server.py — the reviewer flagged this for idempotency. Decided to keep the non-None state intentionally: setting it to None cascades to ~9 callers that access self._response_store without a None check, and 'close() is idempotent on a closed sqlite3 Connection' means the current code is already safe. The type stays stable; LSP doesn't flag a cascade of reportOptionalMemberAccess errors. (This matches the pre-existing pattern in the codebase — e.g. _mark_disconnected doesn't reset state to None either.) 5. _build_adapter_with_store: reviewer worried about disconnect() failing on the self.name property if __init__ wasn't called. Already handled: we set 'adapter.platform = Platform.API_SERVER' so the 'self.platform.value.title()' property returns 'Api_Server' without raising. The exception-swallowing branch in disconnect() does call self.name via the logger.debug format, so this is a real path that needs the platform attribute, and we have it. 6. test_disconnect_closes_response_store: bare 'pytest.raises(Exception)' -> 'pytest.raises(sqlite3.ProgrammingError)'. The bare Exception matcher would silently accept AttributeError, OperationalError, env-related issues, etc. The specific exception type ('Cannot operate on a closed database') is the actual signal we want — proves the SQLite conn is closed, not just that *something* raised. 7. test_nonretryable_failure_disposes_unowned_adapter: assertion tightened from '>= 1' to '== 1' on adapter._disconnect_calls. The docstring said 'exactly once', the assertion now matches. Catches the hypothetical 'watcher disposes the same adapter twice' regression that '>=' would have missed.
This commit is contained in:
@ -1755,7 +1755,7 @@ def _preserve_queued_followup_history_offset(
|
||||
return merged
|
||||
|
||||
|
||||
async def _dispose_unused_adapter(adapter: "BasePlatformAdapter") -> None:
|
||||
async def _dispose_unused_adapter(adapter: "BasePlatformAdapter | None") -> None:
|
||||
"""Best-effort dispose for an adapter that never made it onto ``self.adapters``.
|
||||
|
||||
The reconnect watcher in ``GatewayRunner._platform_reconnect_watcher``
|
||||
@ -1778,6 +1778,11 @@ async def _dispose_unused_adapter(adapter: "BasePlatformAdapter") -> None:
|
||||
failure paths in the reconnect watcher can all call it without
|
||||
each one having to know that ``disconnect()`` may itself raise
|
||||
on a half-constructed adapter.
|
||||
|
||||
``adapter`` may be ``None``: the reconnect watcher initialises
|
||||
``adapter = None`` before the ``try`` so the ``except Exception``
|
||||
arm can dispose a half-constructed object, and also early-returns
|
||||
here when ``_create_adapter()`` returned ``None``.
|
||||
"""
|
||||
if adapter is None:
|
||||
return
|
||||
@ -1788,6 +1793,15 @@ async def _dispose_unused_adapter(adapter: "BasePlatformAdapter") -> None:
|
||||
# crashed during aiohttp app setup) can raise from
|
||||
# disconnect() on objects that never finished initializing.
|
||||
# We must not let that escape and abort the watcher loop.
|
||||
#
|
||||
# On Python 3.8+, ``asyncio.CancelledError`` inherits from
|
||||
# ``BaseException`` (not ``Exception``), so this ``except
|
||||
# Exception`` does not swallow task cancellation. We don't
|
||||
# re-raise explicitly because the watcher loop intentionally
|
||||
# treats dispose failures as best-effort: a failed ``disconnect``
|
||||
# call should not take down the reconnect watcher that
|
||||
# itself is what's keeping the gateway alive during a partial
|
||||
# outage.
|
||||
logger.debug(
|
||||
"Adapter dispose raised on unowned adapter %r",
|
||||
getattr(adapter, "name", type(adapter).__name__),
|
||||
|
||||
@ -165,11 +165,18 @@ class TestReconnectFDLeakRegression:
|
||||
new=AsyncMock(return_value=False)):
|
||||
await _run_watcher_one_iteration(runner)
|
||||
|
||||
assert adapter._disconnect_calls >= 1, (
|
||||
# The intent of this test is "the watcher calls disconnect()
|
||||
# exactly once on the unowned adapter" — not "at least once".
|
||||
# An accidental double-dispose would be a new bug to catch
|
||||
# (e.g. the watcher's two failure paths both calling dispose
|
||||
# for the same adapter instance). Tighten to == 1.
|
||||
assert adapter._disconnect_calls == 1, (
|
||||
f"non-retryable reconnect failure must call adapter.disconnect() "
|
||||
f"at least once; got {adapter._disconnect_calls} calls. "
|
||||
f"exactly once; got {adapter._disconnect_calls} calls. "
|
||||
"Without it, 2 fds leak per retry at the 300s backoff cap "
|
||||
"(#37011)."
|
||||
"(#37011). More than one call would also be a bug — the "
|
||||
"adapter has already been disposed once, a second call is "
|
||||
"wasted work and may itself raise."
|
||||
)
|
||||
assert adapter._open_fds == 0, (
|
||||
f"adapter fds not released after disconnect(); "
|
||||
@ -303,15 +310,21 @@ class TestAPIServerDisconnectClosesResponseStore:
|
||||
to ``~/.hermes/response_store.db`` (or :memory: as a fallback),
|
||||
which is exactly the resource that was leaking pre-fix.
|
||||
"""
|
||||
import sqlite3
|
||||
|
||||
store = ResponseStore(max_size=10, db_path=str(tmp_path / "rs.db"))
|
||||
adapter = self._build_adapter_with_store(store)
|
||||
|
||||
await adapter.disconnect()
|
||||
|
||||
# Post-disconnect, the underlying sqlite3 conn should be closed.
|
||||
# Any further query raises ``ProgrammingError: Cannot operate
|
||||
# on a closed database``.
|
||||
with pytest.raises(Exception):
|
||||
# sqlite3 raises ``ProgrammingError: Cannot operate on a closed
|
||||
# database`` for any further operation. We assert on the
|
||||
# specific exception type (not bare ``Exception``) so the test
|
||||
# only passes when the close actually took effect — a generic
|
||||
# ``Exception`` catcher would mask unrelated failures (env
|
||||
# issues, AttributeError, etc.).
|
||||
with pytest.raises(sqlite3.ProgrammingError):
|
||||
store._conn.execute("SELECT 1").fetchone()
|
||||
|
||||
@pytest.mark.asyncio
|
||||
|
||||
Reference in New Issue
Block a user