Python 3.14 compatibility + EZSP-over-TCP stability fixes#720
Python 3.14 compatibility + EZSP-over-TCP stability fixes#720silenthooligan wants to merge 2 commits intozigpy:devfrom
Conversation
Five focused changes that together unbreak EZSP/EmberZNet on Python 3.14 and fix several long-standing rough edges with TCP-bridged serial radios (ser2net, ESPHome stream_server / serial_proxy). ## Python 3.14 compatibility - `bellows/thread.py`: `asyncio.iscoroutinefunction` is deprecated in 3.12 and slated for removal; use `inspect.iscoroutinefunction` instead. Same semantics, no deprecation warning. - `bellows/thread.py`, `bellows/uart.py`: replace four call sites of `asyncio.get_event_loop()` (deprecated outside a running loop) with `asyncio.get_running_loop()`. All four sites are reached from `async def` paths so they have a running loop. - `bellows/cli/util.py`: `background()` decorator wrapped CLI commands with `asyncio.get_event_loop().run_until_complete(...)`, which raises `RuntimeError: no current event loop` on 3.14 from a fresh sync entry point. Replaced with `asyncio.run(...)` which is the documented modern equivalent. ## ThreadsafeProxy: surface closed-loop as ConnectionError When the secondary event-loop thread is gone, `ThreadsafeProxy` used to log a warning and return `None`, which propagated as `TypeError: 'NoneType' object can't be awaited` at the call site — opaque, and on Python 3.14 it left the integration in an indefinite retry loop without recovery. Raise `ConnectionError` instead so callers (EZSP startup_reset, retries) can handle the failure cleanly. Also guards against the loop being closed between the `is_closed()` check and `run_coroutine_threadsafe()` dispatch. ## EZSP startup race over TCP Reset frames from the NCP can arrive on the wire faster than `wait_for_startup_reset()` is reached when the underlying transport is TCP. The first reset gets dropped, `enter_failed_state()` fires, and the integration never recovers without a manual restart. - `bellows/uart.py::_connect`: pre-create `gateway._startup_reset_future` before opening the connection so any reset frame received during setup is caught by `reset_received()` instead of being treated as spurious. - `bellows/uart.py::wait_for_startup_reset`: replace the `assert is None` with `if is None` so the pre-created future isn't rejected. ## TCP serial-bridge connection lifecycle - `bellows/ash.py::eof_received`: return `True` to suppress the transport's auto-close. Serial-over-TCP bridges (ser2net, ESPHome stream_server / serial_proxy) sometimes signal EOF during initialization handshakes without intending to close the socket; the default auto-close orphans the connection. - `bellows/ezsp/__init__.py::_startup_reset`: raise a clear `EzspError` when `self._gw` is `None` instead of failing with an opaque `AttributeError`. - `bellows/ezsp/__init__.py::disconnect`: tolerate `_gw is None`, and on `ConnectionError` from the gateway's disconnect (secondary loop dead) force-close the underlying TCP socket so ser2net/stream_server releases the port for subsequent attempts. ## Tests - `tests/test_thread.py::test_proxy_loop_closed`: now asserts `ConnectionError` is raised (was: silently returns). - `tests/test_ezsp.py::test_startup_reset_gw_none`: covers the new null-gateway guard. - `tests/test_ezsp.py::test_disconnect_gw_none`: covers tolerance of null gateway in `disconnect()`. ## Validation Patched bellows was vendored into a Home Assistant 2026.5.0b2 image and exercised against a live Nabu Casa Connect ZBT-2 (EFR32MG24 Zigbee NCP) behind an ESPHome `stream_server` raw-TCP bridge on Python 3.14.2: - `bellows.ezsp.EZSP.connect()` + `startup_reset()` complete; protocol version 13 read; `get_board_info()` returns NCP identity. - ZHA boots, forms a fresh network, pairs 5 IAS-Zone water-leak sensors. - After OTA-ing the dongle from `stream_server` to ESPHome `serial_proxy` (encrypted native API), reconfiguring ZHA to `esphome-hass://esphome/<entry_id>?port_name=<port>` reuses the existing zigbee.db; all 5 sensors auto-rejoin without re-pair. The full upstream test suite (112 tests) passes. ## Provenance The bulk of the runtime fixes (and their tests) were originally drafted by @aautem in zigpy#711 (closed by author with no review). This PR rebases that work onto current `dev`, drops the diagnostic-only `LOGGER.warning` calls that zigpy#711 carried alongside the fixes, restores the post-zigpy#714 `NETWORK_COORDINATOR_STARTUP_RESET_WAIT = 2`, and adds the `bellows/cli/util.py` get_event_loop fix and the EZSP-over-TCP end-to-end validation against real hardware. Co-Authored-By: Alex Autem <autem.alex@gmail.com>
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## dev #720 +/- ##
=======================================
Coverage 99.54% 99.54%
=======================================
Files 61 61
Lines 4147 4166 +19
=======================================
+ Hits 4128 4147 +19
Misses 19 19 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
The ZBT-2 Zigbee YAML now ships ESPHome `serial_proxy` (encrypted native API on :6053) as the default. ZHA reaches the radio via esphome-hass://esphome/<entry_id>?port_name=MG24%20Zigbee%20NCP on HA 2026.5+. The previous `stream_server` raw-TCP variant moves to zbt-2-zigbee/legacy-stream-server/ as the pre-2026.5 / pre-bellows-fix fallback. Status & dependencies are documented in zbt-2-zigbee/README.md and the top-level README. Two upstream gates: HA >= 2026.5 (for the `esphome-hass://` URL handler in homeassistant/components/esphome/ serial_proxy.py) and zigpy/bellows#720 (Python 3.14 + EZSP-over-TCP runtime fixes; until merged, vendor patched bellows into the HA image). Validated against HA 2026.5.0b2 + the patched bellows fork. EZSP startup completes, ZHA pairs IAS-Zone end devices and routers, attribute reports flow. Migration walkthrough (existing stream_server -> serial_proxy without re-pairing devices) is in zbt-2-zigbee/README.md. Key gotcha: ZHA doesn't support reconfiguring radio path, must delete + recreate using "Advanced -> Reuse settings" so the existing zigbee.db is honored. Closes #3. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds three focused tests for the new code paths introduced in this PR: - `tests/test_thread.py::test_proxy_coroutine_loop_closed_mid_dispatch` exercises the `RuntimeError` catch around `run_coroutine_threadsafe()` when the loop is closed between the `is_closed()` check and dispatch: asserts ConnectionError is raised and the orphaned coroutine is closed (no un-awaited-coroutine warning). - `tests/test_ezsp.py::test_disconnect_force_closes_socket_on_connection_error` exercises the `disconnect()` path that, on `ConnectionError` from the gateway, walks the proxy/transport chain to force-close the underlying TCP socket so ser2net / stream_server / serial_proxy releases the port. - `tests/test_ezsp.py::test_disconnect_socket_force_close_swallows_exceptions` exercises the inner `except Exception: pass` fallback when the proxy/transport chain isn't fully populated (e.g. `_obj` has no `_transport` attribute), confirming `_gw` is still cleared. Lifts patch coverage from 63.6% to 100% on `bellows/thread.py` and brings the EZSP changes within the project's 99.25% gate.
|
Can you attach some debug logs of the issue you're having and how this PR solves it? ZHA is used with a ton of TCP coordinators and we haven't really had reports of issues like this. |
|
Live logs against a Nabu Casa Connect ZBT-2 (EFR32MG24, ESPHome Full bellows DEBUG output for all three runs + the test scripts are in this gist: Summary: 01: stock 0.49.1, first connect (no teardown) 02: stock 0.49.1, secondary loop teardown then dispatch Caller has no signal that the connection died (just a 03: patched bellows (this PR), same teardown ConnectionError is raised on dispatch (catchable, signals "rebuild the connection"), and Three conditions for the bug to fire in production: closed secondary loop, caller mid-dispatch, Python 3.14's stricter Other bits in this PR ( |
|
Can you attach a log from the live device that you're communicating with? |
Five focused changes that together unbreak EZSP/EmberZNet on Python 3.14 and fix several long-standing rough edges with TCP-bridged serial radios (ser2net, ESPHome
stream_server/serial_proxy).Supersedes #711, which was closed unmerged by its author with no review. This PR rebases that work onto current
dev, drops the diagnostic-onlyLOGGER.warningcalls #711 carried alongside the fixes, restores the post-#714NETWORK_COORDINATOR_STARTUP_RESET_WAIT = 2, adds thebellows/cli/util.pyget_event_loopfix, and is validated end-to-end against real hardware (see below).Python 3.14 compatibility
bellows/thread.py:asyncio.iscoroutinefunction→inspect.iscoroutinefunction. Same semantics, drops the deprecation warning.bellows/thread.py,bellows/uart.py: replace fourasyncio.get_event_loop()calls (deprecated outside a running loop on 3.14) withasyncio.get_running_loop(). All four sites are reached fromasync defpaths.bellows/cli/util.py:background()decorator wrapped CLI commands withasyncio.get_event_loop().run_until_complete(...), which raisesRuntimeError: no current event loopon 3.14 from a fresh sync entry point. Replaced withasyncio.run(...).ThreadsafeProxy: closed-loop as ConnectionError
When the secondary event-loop thread is gone,
ThreadsafeProxyused to log a warning and returnNone, which propagated asTypeError: 'NoneType' object can't be awaitedand (on 3.14) left ZHA in an indefinite retry loop. RaiseConnectionErrorso callers can handle the failure cleanly. Also guards the loop being closed between theis_closed()check andrun_coroutine_threadsafe()dispatch.EZSP startup race over TCP
Reset frames from the NCP can arrive faster than
wait_for_startup_reset()is reached when the transport is TCP. The first reset gets dropped,enter_failed_state()fires, and the integration never recovers without a manual restart.bellows/uart.py::_connect: pre-creategateway._startup_reset_futurebefore opening the connection so any reset frame received during setup is caught byreset_received().bellows/uart.py::wait_for_startup_reset: replaceassert is Nonewithif is Noneso the pre-created future isn't rejected.TCP serial-bridge connection lifecycle
bellows/ash.py::eof_received: returnTrueto suppress the transport's auto-close. Serial-over-TCP bridges (ser2net, ESPHomestream_server/serial_proxy) sometimes signal EOF during initialization without intending to close the socket; the default auto-close orphans the connection.bellows/ezsp/__init__.py::_startup_reset: raise a clearEzspErrorwhenself._gwisNone.bellows/ezsp/__init__.py::disconnect: tolerate_gw is None, and onConnectionErrorfrom the gateway force-close the underlying TCP socket so ser2net/stream_server releases the port for subsequent attempts.Test plan
tests/test_thread.py::test_proxy_loop_closednow assertsConnectionError(was: silently returns).tests/test_ezsp.py::test_startup_reset_gw_nonecovers the new null-gateway guard.tests/test_ezsp.py::test_disconnect_gw_nonecovers tolerance of null gateway indisconnect().End-to-end validation against real hardware
Patched bellows was vendored into a Home Assistant 2026.5.0b2 image and exercised against a live Nabu Casa Connect ZBT-2 (EFR32MG24 Zigbee NCP) behind an ESPHome
stream_serverraw-TCP bridge on Python 3.14.2:bellows.ezsp.EZSP.connect()+startup_reset()complete cleanly.EZSP_VERSION = 13negotiated.get_board_info()returns NCP identity (Nabu Casa,Connect ZBT).water).serial_proxy(encrypted native API on:6053only) and reconfiguring ZHA toesphome-hass://esphome/<entry_id>?port_name=..., the existingzigbee.dbis reused viasetup_strategy_advanced -> reuse_settings. All five sensors auto-rejoin without re-pair, attribute reports flow (battery, temperature, water-leak state).Without this patch, step 1 fails immediately on Python 3.14 with
'NoneType' object can't be awaited(closed event loop) orAttributeError(get_event_loopno current loop), depending on which call path fires first. Step 3 also exercises the EOF-during-init suppression — the encrypted native-API tunnel issued an EOF during the noise-protocol handshake that theeof_received -> return Truechange keeps open.Provenance
The bulk of the runtime fixes (and their tests) were originally drafted by @aautem in #711. Credited via
Co-Authored-Bytrailer in the commit.🤖 Validation report and PR drafted with Claude Code.