OSIDB-4914: Update rescheduler's querying logic in sync manager by Jincxz · Pull Request #1251 · RedHatProductSecurity/osidb

Jincxz · 2026-04-23T14:31:57Z

Adjusted the logic behind sync manager's rescheduling.

The original code checked every sync manager, of a particular class, to determine if they should be rescheduled. The new logic filters out each conditional branch for rescheduling instead. This should minimize the OOM issue and allow the collectors to run.

The rescheduled set is there to prevent re-reschedules in the case that the conditions overlap.

coderabbitai · 2026-04-23T14:32:20Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Enterprise

Run ID: a429ba2c-4744-434f-8da8-e2dc33554cde

📥 Commits

Reviewing files that changed from the base of the PR and between 00da65e and c4f00f5.

📒 Files selected for processing (1)

osidb/sync_manager.py

🚧 Files skipped from review as they are similar to previous changes (1)

osidb/sync_manager.py

📝 Walkthrough

Summary by CodeRabbit

Refactor
- Optimized sync manager rescheduling logic for improved efficiency and reliability, enhancing how the system handles reschedule detection and duplicate prevention.

Walkthrough

check_for_reschedules now detects candidates via database queryset filters (using Q/F) and reschedules by sync_id. It defines three reschedule querysets and uses an inner helper with a shared processed set to call cls.reschedule once per sync_id; a "started but not scheduled" workaround logs and marks affected sync_ids.

Changes

Cohort / File(s)	Summary
Sync Manager Reschedule Logic `osidb/sync_manager.py`	Rewrote `check_for_reschedules` to use DB-level `Q`/`F` filters for three reschedule triggers: (1) scheduled-but-didn't-start, (2) started-but-didn't-finish, (3) started-and-failed-but-not-permanently. Added an inner `reschedule(...)` helper that centralizes `cls.reschedule` calls, derives reason strings (constant or per-row lambda), and prevents duplicate reschedules using a shared `processed` set. The "started but not scheduled" case now selects rows where `last_scheduled_dt` is null and `last_started_dt` is non-null, logs each `sync_id`, and marks them processed to avoid double handling.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 66.67% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately summarizes the main change: updating the rescheduler's querying logic in the sync manager to use database filtering instead of Python iteration.
Description check	✅ Passed	The description clearly explains the change from checking all sync managers in Python to filtering via database queries, mentions the memory optimization goal, and describes the deduplication mechanism.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch OSIDB-4914-fix-collector-rescheduler

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@osidb/sync_manager.py`:
- Around line 325-327: The filters for stale-run rescheduling currently exclude
rows where last_scheduled_dt is NULL (e.g., started_not_scheduled) because
comparisons like last_scheduled_dt__lt=F(...) don't match NULL; update the
started_not_finished and started_failed_not_permanently filter clauses to
explicitly include NULLs by using Q(last_scheduled_dt__isnull=True) |
Q(last_scheduled_dt__lt=F(...)) (referencing last_scheduled_dt, last_started_dt,
last_failed_dt, F and Q) so managers with last_scheduled_dt=None are
rescheduled, and add unit tests that create managers with last_scheduled_dt=None
plus stale last_started_dt or last_failed_dt to verify they get rescheduled.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 364c5ae6-c98c-4d22-8f34-4d00d822ea57

📥 Commits

Reviewing files that changed from the base of the PR and between a71cec7 and b6cc5a9.

📒 Files selected for processing (1)

osidb/sync_manager.py

coderabbitai · 2026-04-23T14:37:47Z

+        started_not_scheduled = sync_managers.filter(
+            last_scheduled_dt__isnull=True, last_started_dt__isnull=False
+        )


⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Read the relevant section of the file wc -l osidb/sync_manager.py

Repository: RedHatProductSecurity/osidb

Length of output: 97

🏁 Script executed:

#!/bin/bash # Check the specific lines mentioned in the review sed -n '310,400p' osidb/sync_manager.py | head -100

Repository: RedHatProductSecurity/osidb

Length of output: 3836

🏁 Script executed:

#!/bin/bash # Find related test files for this sync_manager module find . -type f -name "*test*sync*" -o -name "*sync*test*" | grep -E "\.py$"

Repository: RedHatProductSecurity/osidb

Length of output: 157

🏁 Script executed:

#!/bin/bash # Check imports in sync_manager.py to confirm F() and Q() availability head -50 osidb/sync_manager.py

Repository: RedHatProductSecurity/osidb

Length of output: 1852

🏁 Script executed:

#!/bin/bash # Read the test file to see if there are tests covering NULL last_scheduled_dt head -100 osidb/tests/test_sync_manager.py

Repository: RedHatProductSecurity/osidb

Length of output: 3194

🏁 Script executed:

#!/bin/bash # Get more context around the filters, specifically the complete started_not_finished and started_failed_not_permanently blocks sed -n '360,410p' osidb/sync_manager.py

Repository: RedHatProductSecurity/osidb

Length of output: 2178

🏁 Script executed:

#!/bin/bash # Search for any tests that specifically test started_not_scheduled or NULL scheduling scenarios grep -n "started_not_scheduled\|last_scheduled_dt.*None\|last_scheduled_dt.*null" osidb/tests/test_sync_manager.py

Repository: RedHatProductSecurity/osidb

Length of output: 53

🏁 Script executed:

#!/bin/bash # Search for any test methods that cover reschedule behavior with NULL scheduling grep -n "test_.*reschedule\|def test_" osidb/tests/test_sync_manager.py | head -30

Repository: RedHatProductSecurity/osidb

Length of output: 419

🏁 Script executed:

#!/bin/bash # Check if there are any test cases that create SyncManager with NULL last_scheduled_dt grep -B 5 -A 10 "last_scheduled_dt" osidb/tests/test_sync_manager.py

Repository: RedHatProductSecurity/osidb

Length of output: 4549

🏁 Script executed:

#!/bin/bash # Check if there are any check_for_reschedules tests or tests for the reschedule logic grep -n "check_for_reschedules\|started_not_finished\|started_failed_not_permanently" osidb/tests/test_sync_manager.py

Repository: RedHatProductSecurity/osidb

Length of output: 53

🏁 Script executed:

#!/bin/bash # Get the full test file length to see how comprehensive it is wc -l osidb/tests/test_sync_manager.py

Repository: RedHatProductSecurity/osidb

Length of output: 107

🏁 Script executed:

#!/bin/bash # Check for any TODO or OSIDB-3131 references in the codebase grep -r "OSIDB-3131\|started_not_scheduled" --include="*.py"

Repository: RedHatProductSecurity/osidb

Length of output: 308

🏁 Script executed:

#!/bin/bash # Verify the actual field definitions to confirm NULL is allowed on last_scheduled_dt grep -B 2 -A 2 "last_scheduled_dt" osidb/sync_manager.py | head -20

Repository: RedHatProductSecurity/osidb

Length of output: 827

🏁 Script executed:

#!/bin/bash # Let's verify Django's behavior with F() expressions and NULL values by examining database field properties # and confirm the logic around the filters more carefully sed -n '320,410p' osidb/sync_manager.py

Repository: RedHatProductSecurity/osidb

Length of output: 3828

🏁 Script executed:

#!/bin/bash # Let me create a quick verification test to understand Django F() behavior with NULL cat > /tmp/test_django_null.py << 'EOF' """ Django F() expression behavior with NULL values: In SQL, comparisons involving NULL always return UNKNOWN/NULL: - NULL < X evaluates to NULL (not TRUE or FALSE) - When used in a WHERE clause, NULL is treated as FALSE - So rows with NULL values are excluded from the result set Example: SELECT * FROM table WHERE col < 10 Will NOT return rows where col IS NULL, even if NULL should logically be "less than" 10 Solution: SELECT * FROM table WHERE col IS NULL OR col < 10 This explicitly includes the NULL case In Django ORM: Model.objects.filter(field__lt=F('other_field')) Will exclude rows where field IS NULL Model.objects.filter(Q(field__isnull=True) | Q(field__lt=F('other_field'))) Will include rows where field IS NULL OR the comparison is true """ print("Django F() with NULL behavior:") print("- NULL < X returns NULL (excluded from WHERE clause)") print("- Need explicit Q(field__isnull=True) | Q(field__lt=F(...)) to include NULLs") EOF cat /tmp/test_django_null.py

Repository: RedHatProductSecurity/osidb

Length of output: 1037

🏁 Script executed:

#!/bin/bash # Search for the MAX_RUN_LENGTH and FAIL_RESCHEDULE_DELAY constants to understand the timing context grep -n "MAX_RUN_LENGTH\|FAIL_RESCHEDULE_DELAY\|MAX_SCHEDULE_DELAY" osidb/sync_manager.py | head -20

Repository: RedHatProductSecurity/osidb

Length of output: 1366

Handle NULL schedules in the stale-run reschedule filters.

Line 325 explicitly identifies the known bad state where a manager has last_started_dt but no last_scheduled_dt. Those rows are logged but never rescheduled: the later last_scheduled_dt__lt=F(...) filters exclude NULL values, so a long-running or failed manager stuck in that state will remain unscheduled instead of being rescheduled.

In SQL, NULL < X evaluates to NULL (falsy in WHERE clauses), so rows with last_scheduled_dt IS NULL never match filters like last_scheduled_dt__lt=F("last_started_dt"). This leaves stale managers in a permanent hung state.

Use Q(last_scheduled_dt__isnull=True) | Q(last_scheduled_dt__lt=F(...)) in both started_not_finished and started_failed_not_permanently filters to explicitly include NULL cases.

Also add a test case for last_scheduled_dt=None with stale last_started_dt or last_failed_dt to ensure rescheduling coverage.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@osidb/sync_manager.py` around lines 325 - 327, The filters for stale-run rescheduling currently exclude rows where last_scheduled_dt is NULL (e.g., started_not_scheduled) because comparisons like last_scheduled_dt__lt=F(...) don't match NULL; update the started_not_finished and started_failed_not_permanently filter clauses to explicitly include NULLs by using Q(last_scheduled_dt__isnull=True) | Q(last_scheduled_dt__lt=F(...)) (referencing last_scheduled_dt, last_started_dt, last_failed_dt, F and Q) so managers with last_scheduled_dt=None are rescheduled, and add unit tests that create managers with last_scheduled_dt=None plus stale last_started_dt or last_failed_dt to verify they get rescheduled.

coderabbitai

🧹 Nitpick comments (1)

osidb/sync_manager.py (1)

310-406: Materializing full model rows partially defeats the stated OOM goal.

The PR description targets OOM reduction, but each filtered QuerySet is still fully loaded into memory when iterated by the inner reschedule helper (Django caches the result list by default). For the two reason-constant branches (scheduled_not_started, started_not_finished) only sync_id is used, and for started_failed_not_permanently only sync_id and last_consecutive_failures are needed.

Consider projecting just what's needed and/or streaming with .iterator() to avoid holding entire SyncManager instances in memory:

♻️ Suggested refinement

-        def reschedule(sync_managers, msg=None, msg_fn=None):
-            for sync_manager in sync_managers:
-                sync_id = sync_manager.sync_id
-                if sync_id in rescheduled:
-                    continue
-
-                reason = msg_fn(sync_manager) if msg_fn is not None else msg
-                cls.reschedule(sync_id, reason)
-                rescheduled.add(sync_id)
+        def reschedule(sync_managers, msg=None, msg_fn=None):
+            for sm in sync_managers.iterator():
+                sync_id = sm["sync_id"] if isinstance(sm, dict) else sm.sync_id
+                if sync_id in rescheduled:
+                    continue
+                reason = msg_fn(sm) if msg_fn is not None else msg
+                cls.reschedule(sync_id, reason)
+                rescheduled.add(sync_id)

And at the call sites, narrow the fields, e.g.:

reschedule(
    scheduled_not_started.values_list("sync_id", flat=True).iterator(),
    "Sync did not start after MAX_SCHEDULE_DELAY",
)
...
reschedule(
    started_failed_not_permanently.values("sync_id", "last_consecutive_failures").iterator(),
    msg_fn=lambda sm: f"Failed {sm['last_consecutive_failures']} times",
)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@osidb/sync_manager.py` around lines 310 - 406, The QuerySets
(scheduled_not_started, started_not_finished, started_failed_not_permanently)
are being fully materialized when iterated by reschedule; to avoid OOM, project
only the needed fields and stream results (use .values_list("sync_id",
flat=True).iterator() for branches that only need sync_id, and
.values("sync_id","last_consecutive_failures").iterator() for the branch that
needs the failure count). Update the reschedule helper (reschedule) to accept
these streamed iterables and handle both simple sync_id values and dict/row
objects (adjust msg_fn use to read sm for model, sm['last_consecutive_failures']
for dicts or support tuples) so we don't load full SyncManager instances into
memory.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@osidb/sync_manager.py`:
- Around line 310-406: The QuerySets (scheduled_not_started,
started_not_finished, started_failed_not_permanently) are being fully
materialized when iterated by reschedule; to avoid OOM, project only the needed
fields and stream results (use .values_list("sync_id", flat=True).iterator() for
branches that only need sync_id, and
.values("sync_id","last_consecutive_failures").iterator() for the branch that
needs the failure count). Update the reschedule helper (reschedule) to accept
these streamed iterables and handle both simple sync_id values and dict/row
objects (adjust msg_fn use to read sm for model, sm['last_consecutive_failures']
for dicts or support tuples) so we don't load full SyncManager instances into
memory.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: cc923a12-12e2-44e0-853e-c1c4634fc514

📥 Commits

Reviewing files that changed from the base of the PR and between b6cc5a9 and 00da65e.

📒 Files selected for processing (1)

osidb/sync_manager.py

JakubFrejlach

Change LGTM! Just a thought about another possible bottleneck.

JakubFrejlach · 2026-04-28T14:14:10Z

-                    f"Started but not scheduled, this should NEVER happen"
-                )
-                continue
+        sync_managers = SyncManager.objects.filter(name=cls.__name__)


The move from Python-level filtering to DB queries is a solid improvement, but I think there's still a potential bottleneck: if many sync managers match the reschedule conditions, we're still calling reschedule() for each one individually (multiple DB queries + a broker call per sync manager). On stage, where we're seeing OOM kills due to a large volume of failed sync managers, this could still be problematic.

Would it make sense to cap each reschedule check to a reasonable batch size? Since most collectors run every minute, we'd eventually process all pending reschedules across successive runs.

To prevent starvation (the same rows getting picked every time while others wait indefinitely), we could order candidates by last_scheduled_dt ascending, oldest-waiting items get rescheduled first, and once rescheduled their timestamp moves to the back of the queue.

That sounds like a good idea. Scheduling in batches makes sense. Will convert the rescheduler into a capped queue-like system.

Jincxz requested a review from a team April 23, 2026 14:32

coderabbitai Bot reviewed Apr 23, 2026

View reviewed changes

Jincxz added the technical For PRs that introduce changes not worthy of a CHANGELOG entry label Apr 23, 2026

Jincxz force-pushed the OSIDB-4914-fix-collector-rescheduler branch from b6cc5a9 to 00da65e Compare April 23, 2026 16:10

coderabbitai Bot reviewed Apr 23, 2026

View reviewed changes

Jincxz removed the request for review from a team April 23, 2026 16:15

Jincxz marked this pull request as draft April 23, 2026 16:15

Update rescheduler's querying logic in sync manager

c4f00f5

Jincxz force-pushed the OSIDB-4914-fix-collector-rescheduler branch from 00da65e to c4f00f5 Compare April 23, 2026 17:19

Jincxz marked this pull request as ready for review April 23, 2026 17:55

Jincxz requested a review from a team April 23, 2026 17:55

JakubFrejlach approved these changes Apr 28, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OSIDB-4914: Update rescheduler's querying logic in sync manager#1251

OSIDB-4914: Update rescheduler's querying logic in sync manager#1251
Jincxz wants to merge 1 commit intomasterfrom
OSIDB-4914-fix-collector-rescheduler

Jincxz commented Apr 23, 2026

Uh oh!

coderabbitai Bot commented Apr 23, 2026 •

edited

Loading

Summary by CodeRabbit

Walkthrough

Changes

Estimated code review effort

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot Apr 23, 2026

Uh oh!

coderabbitai Bot left a comment

Uh oh!

JakubFrejlach left a comment

Uh oh!

JakubFrejlach Apr 28, 2026

Uh oh!

Jincxz Apr 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Jincxz commented Apr 23, 2026

Uh oh!

coderabbitai Bot commented Apr 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Walkthrough

Changes

Estimated code review effort

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 23, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

JakubFrejlach left a comment

Choose a reason for hiding this comment

Uh oh!

JakubFrejlach Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

Jincxz Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

coderabbitai Bot commented Apr 23, 2026 •

edited

Loading