fix(opcua): prevent BadSessionIdInvalid race condition during session reconnection by palandri · Pull Request #2123 · thingsboard/thingsboard-gateway

palandri · 2026-04-04T15:22:57Z

fix(opcua): prevent `BadSessionIdInvalid` race condition on slow servers

Summary

Fixes a race condition in the OPC-UA connector that caused BadSessionIdInvalid errors when reconnecting to slow servers:

disconnect_if_connected() now sets __connected = False before awaiting the actual disconnect, preventing the monitor loop from observing a stale True state.
_monitor_server_loop skips the server_state.read_value() heartbeat call whenever __connected is False.

Motivation

On slow (or misconfigured) OPC-UA servers, the session timeout might be long (e.g. 1200 s in our case, OPC-UA server not configured by us). The watchdog interval is derived from it (session_timeout / 2), so _monitor_server_loop sleeps for tens of seconds between heartbeats.

When the main loop or an error handler called disconnect_if_connected(), the asyncio scheduler yielded control during await self.__client.disconnect() — but __connected was only set to False later (in the finally block of start_client, and only when __stopped was True). If _monitor_server_loop woke up in that window, it saw __connected = True and attempted read_value() on an already-closed session, producing BadSessionIdInvalid.

Implementation Details

Root cause fix — `disconnect_if_connected()`

async def disconnect_if_connected(self):
    if self.__connected:
+       self.__connected = False   # atomic in asyncio: no await between check and set
        try:
            ...
            await self.__client.disconnect()

Because asyncio is cooperative, there is no await between the guard check and the flag mutation — the assignment is effectively atomic. Any coroutine that wakes up during await disconnect() will already observe __connected = False.

Defensive guard — `_monitor_server_loop()`

while not self.__client._closing and not self.__stopped:
    await asyncio.sleep(timeout)
+   if not self.__connected:
+       continue
    _ = await self.__client.nodes.server_state.read_value()

A second layer of protection: even if __connected were not zeroed through disconnect_if_connected() (e.g. set directly by the monitor's own exception handlers), the heartbeat read is unconditionally skipped when not connected.

Behavior Change

Scenario	Before	After
Monitor wakes up during disconnect	Calls `read_value()` → `BadSessionIdInvalid`	Sees `__connected = False`, skips
`disconnect_if_connected()` called twice	Second call disconnects again	Second call is a no-op (idempotent)
Monitor in pending-reconnect state	Keeps polling	Sleeps idle until reconnected

Backward Compatibility

No breaking changes. The __connected flag semantics are preserved — it is set back to True only after a successful connect() in start_client().

Touched Code

thingsboard_gateway/connectors/opcua/opcua_connector.py
- disconnect_if_connected(): set __connected = False before awaiting disconnect
- _monitor_server_loop(): skip heartbeat read when not connected

Tests / Verification

Existing unit tests: tests/unit/connectors/opcua/ — 35 tests, all passing with this fix applied.
Validated against packet capture showing the error eliminated on a slow server with sessionTimeoutInMillis: 120000.
Validated in production server (Siemens S7-1200) with 1200s of Max. Session Timeout.

Checklist

Root cause identified and fixed (disconnect_if_connected flag order)
Defensive guard added in _monitor_server_loop
disconnect_if_connected is now idempotent
No breaking changes to existing behavior
Existing unit tests pass (35/35)
Validation on slow server in production environment (Siemens S7-1200)

Move `__connected` flag clearing into `disconnect_if_connected()` to ensure connection state is properly synchronized. Add guard at monitor loop entry to skip server reads when not connected, preventing stale data access during reconnection sequences.

github-actions bot added bug Connector:OPC-UA labels Apr 4, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(opcua): prevent BadSessionIdInvalid race condition during session reconnection#2123

fix(opcua): prevent BadSessionIdInvalid race condition during session reconnection#2123
palandri wants to merge 1 commit intothingsboard:masterfrom
palandri:fix/opcua-badsessionidinvalid-race-condition

palandri commented Apr 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

palandri commented Apr 4, 2026

fix(opcua): prevent BadSessionIdInvalid race condition on slow servers

Summary

Motivation

Implementation Details

Root cause fix — disconnect_if_connected()

Defensive guard — _monitor_server_loop()

Behavior Change

Backward Compatibility

Touched Code

Tests / Verification

Checklist

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

fix(opcua): prevent `BadSessionIdInvalid` race condition on slow servers

Root cause fix — `disconnect_if_connected()`

Defensive guard — `_monitor_server_loop()`