[jazzy] Reduce flakiness in rosbag2 recorder end-to-end tests (backport #2370)#2415
[jazzy] Reduce flakiness in rosbag2 recorder end-to-end tests (backport #2370)#2415mergify[bot] wants to merge 1 commit intojazzyfrom
Conversation
* Reduce flakiness in rosbag2 recorder end-to-end tests Reduced flakiness in rosbag2 recorder tests by strengthening PublicationManager::wait_for_matched(). The helper `PublicationManager::wait_for_matched()` now waits on ROS graph events instead of relying only on fixed sleep-polling. It still preserves the existing API and boolean return behavior, but uses a hybrid loop: • check the publisher’s current subscription count • wait for a graph change • re-check periodically with a short bounded wait slice This improves readiness detection for recorder subscriptions that appear asynchronously during topic discovery, while still handling cases where graph notifications are delayed or not sufficient on their own. No test call sites were changed. Existing recorder tests continue to use pub_manager.wait_for_matched(...), but they now benefit from the stronger synchronization automatically. Signed-off-by: Michael Orlov <morlovmr@gmail.com> * Fix a possible negative remaining time in wait_for_matched(..) Refactor logic to check deadline before calculating remaining time Signed-off-by: Michael Orlov <morlovmr@gmail.com> --------- Signed-off-by: Michael Orlov <morlovmr@gmail.com> (cherry picked from commit 86fbdb7)
MichaelOrlov
left a comment
There was a problem hiding this comment.
LGTM with green CI.
|
Pulls: #2415 |
|
@mjcarroll @cottsay The Windows build on Jazzy permanently fails with multiple jobs with the error message that [ RUN ] RosBag2PlayTestFixture.burst_with_false_preconditions
[ERROR] [1776977464.035681700] [rmw_connextdds]: Failed to serialize rmw_dds_common::msg::dds_::ParticipantEntitiesInfo_ sample: Not enough memory in the buffer stream
[ERROR] [1776977464.035968800] [rmw_connextdds]: failed to write message to DDS
[ERROR] [1776977464.036026000] [rmw_connextdds]: failed to publish discovery sample
[ERROR] [1776977464.036058600] [rmw_connextdds]: failed to publish discovery sample
[ERROR] [1776977464.036081500] [rmw_connextdds]: failed to update graph for node
ERROR [0x0101CB47,0xFDF91636,0xE3F71A8B:0x80000003{Entity=DW,Topic=ros_discovery_info,Type=rmw_dds_common::msg::dds_::ParticipantEntitiesInfo_,Domain=147}|WRITE|ADD TO WRITER QUEUE|PROCESS SAMPLE] PRESWriterHistoryDriver_serializeSample:FAILED TO SERIALIZE | Sample in topic ros_discovery_info with type 'rmw_dds_common::msg::dds_::ParticipantEntitiesInfo_' and encapsulation ID 1
ERROR [0x0101CB47,0xFDF91636,0xE3F71A8B:0x80000003{Entity=DW,Topic=ros_discovery_info,Type=rmw_dds_common::msg::dds_::ParticipantEntitiesInfo_,Domain=147}|WRITE|ADD TO WRITER QUEUE] PRESWriterHistoryDriver_initializeSample:FAILED TO SERIALIZE | Sample with sequence number (0, 1) in encapsulation 1.
ERROR [0x0101CB47,0xFDF91636,0xE3F71A8B:0x80000003{Entity=DW,Topic=ros_discovery_info,Type=rmw_dds_common::msg::dds_::ParticipantEntitiesInfo_,Domain=147}|WRITE|ADD TO WRITER QUEUE] WriterHistoryMemoryPlugin_addEntryToSession:FAILED TO INITIALIZE | Session sample
ERROR [0x0101CB47,0xFDF91636,0xE3F71A8B:0x80000003{Entity=DW,Topic=ros_discovery_info,Type=rmw_dds_common::msg::dds_::ParticipantEntitiesInfo_,Domain=147}|WRITE|ADD TO WRITER QUEUE] WriterHistoryMemoryPlugin_addEntryToSessions:FAILED TO ADD | Entry to session with ID 0
ERROR [0x0101CB47,0xFDF91636,0xE3F71A8B:0x80000003{Entity=DW,Topic=ros_discovery_info,Type=rmw_dds_common::msg::dds_::ParticipantEntitiesInfo_,Domain=147}|WRITE|ADD TO WRITER QUEUE] WriterHistoryMemoryPlugin_getEntry:FAILED TO ADD | Virtual sample to sessions
ERROR [0x0101CB47,0xFDF91636,0xE3F71A8B:0x80000003{Entity=DW,Topic=ros_discovery_info,Type=rmw_dds_common::msg::dds_::ParticipantEntitiesInfo_,Domain=147}|WRITE|ADD TO WRITER QUEUE] WriterHistoryMemoryPlugin_addSample:FAILED TO GET | Entry
ERROR [0x0101CB47,0xFDF91636,0xE3F71A8B:0x80000003{Entity=DW,Topic=ros_discovery_info,Type=rmw_dds_common::msg::dds_::ParticipantEntitiesInfo_,Domain=147}|WRITE] PRESPsWriter_writeInternal:!collator addWrite
unknown file: error: C++ exception with description "failed to initialize rcl node: failed to write message to DDS, at C:\ci\ws\src\ros2\rmw_connextdds\rmw_connextdds_common\src\ndds\dds_api_ndds.cpp:778, at C:\ci\ws\src\ros2\rcl\rcl\src\rcl\node.c:253" thrown in the test fixture's constructor. |
|
@mjcarroll @cottsay @claraberendsen friendly ping here for comments ^^^. |
Description
Reduced flakiness in rosbag2 recorder tests by strengthening PublicationManager::wait_for_matched().
Please see RCA in the relevant issue #2369 (comment)
The helper
PublicationManager::wait_for_matched()now waits on ROS graph events rather than relying solely on fixed sleep polling. It still preserves the existing API and boolean return behavior, but uses a hybrid loop:• check the publisher’s current subscription count
• wait for a graph change
• re-check periodically with a short bounded wait slice
This improves readiness detection for recorder subscriptions that appear asynchronously during topic discovery, while still handling cases where graph notifications are delayed or not sufficient on their own.
No test call sites were changed. Existing recorder tests continue to use pub_manager.wait_for_matched(...), but they now benefit from the stronger synchronization automatically.
Is this user-facing behavior change?
No.
Did you use Generative AI?
Yes. Codex gpt-5.4
Additional Information
Can be backported.
This is an automatic backport of pull request #2370 done by [Mergify](https://mergify.com).