fix(opcua): prevent BadSessionIdInvalid race condition during session reconnection#2123
Open
palandri wants to merge 1 commit intothingsboard:masterfrom
Open
Conversation
Move `__connected` flag clearing into `disconnect_if_connected()` to ensure connection state is properly synchronized. Add guard at monitor loop entry to skip server reads when not connected, preventing stale data access during reconnection sequences.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
fix(opcua): prevent
BadSessionIdInvalidrace condition on slow serversSummary
Fixes a race condition in the OPC-UA connector that caused
BadSessionIdInvaliderrors when reconnecting to slow servers:disconnect_if_connected()now sets__connected = Falsebefore awaiting the actual disconnect, preventing the monitor loop from observing a staleTruestate._monitor_server_loopskips theserver_state.read_value()heartbeat call whenever__connectedisFalse.Motivation
On slow (or misconfigured) OPC-UA servers, the session timeout might be long (e.g. 1200 s in our case, OPC-UA server not configured by us). The watchdog interval is derived from it (
session_timeout / 2), so_monitor_server_loopsleeps for tens of seconds between heartbeats.When the main loop or an error handler called
disconnect_if_connected(), the asyncio scheduler yielded control duringawait self.__client.disconnect()— but__connectedwas only set toFalselater (in thefinallyblock ofstart_client, and only when__stoppedwasTrue). If_monitor_server_loopwoke up in that window, it saw__connected = Trueand attemptedread_value()on an already-closed session, producingBadSessionIdInvalid.Implementation Details
Root cause fix —
disconnect_if_connected()Because asyncio is cooperative, there is no
awaitbetween the guard check and the flag mutation — the assignment is effectively atomic. Any coroutine that wakes up duringawait disconnect()will already observe__connected = False.Defensive guard —
_monitor_server_loop()A second layer of protection: even if
__connectedwere not zeroed throughdisconnect_if_connected()(e.g. set directly by the monitor's own exception handlers), the heartbeat read is unconditionally skipped when not connected.Behavior Change
read_value()→BadSessionIdInvalid__connected = False, skipsdisconnect_if_connected()called twiceBackward Compatibility
No breaking changes. The
__connectedflag semantics are preserved — it is set back toTrueonly after a successfulconnect()instart_client().Touched Code
thingsboard_gateway/connectors/opcua/opcua_connector.pydisconnect_if_connected(): set__connected = Falsebefore awaiting disconnect_monitor_server_loop(): skip heartbeat read when not connectedTests / Verification
tests/unit/connectors/opcua/— 35 tests, all passing with this fix applied.sessionTimeoutInMillis: 120000.Checklist
disconnect_if_connectedflag order)_monitor_server_loopdisconnect_if_connectedis now idempotent