Skip to content

Fix unoptimized prefetch use during v1 cache rebuild#1266

Open
ethangreen-dev wants to merge 1 commit intomasterfrom
cache-build-prefetch-fix
Open

Fix unoptimized prefetch use during v1 cache rebuild#1266
ethangreen-dev wants to merge 1 commit intomasterfrom
cache-build-prefetch-fix

Conversation

@ethangreen-dev
Copy link
Copy Markdown
Member

This fixes a somewhat major performance issue in cache serialization where querysets were ignored in favour of refetching PackageListing for each query batch.

Also adds a regression test for cache rebuilds.

This fixes a somewhat major performance issue in cache serialization
where querysets were ignored in favor of refetching PackageListing for
each query batch.

Also adds a regression test for cache rebuilds
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Apr 17, 2026

Walkthrough

This pull request refactors the package listing serialization and caching pipeline. The PackageListingSerializer now computes rating_score dynamically and retrieves versions from prefetch caches with custom filtering and sorting. The cache update task was simplified to iterate directly over communities instead of per-site entries. The serialization function consolidates queryset construction to a single base query that gets batched by ID. Queryset prefetching was enhanced with additional select/prefetch relationships, custom prefetch strategies for versions, and a new rating score annotation. A test was added to verify cache updates deduplicate communities with multiple sites.

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 8.33% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly describes the main change: fixing unoptimized prefetch usage during v1 cache rebuilds, which aligns with the core performance fix across multiple files.
Description check ✅ Passed The description explains the performance issue being fixed (querysets ignored, causing refetching) and mentions the added regression test, both directly relevant to the changeset.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (5)
django/thunderstore/repository/api/v1/tests/test_caches.py (1)

217-245: Good regression test; consider also asserting exact call count.

The sorted(...) == sorted(...) assertion catches duplicates (list-vs-set-style mismatch would fail), which is the main thing. As a small hardening, asserting len(updated_communities) == Community.objects.count() makes the intent explicit and produces a clearer failure message if someone reintroduces the per-site iteration.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@django/thunderstore/repository/api/v1/tests/test_caches.py` around lines 217
- 245, In
test_update_api_v1_caches__deduplicates_communities_with_multiple_sites add an
explicit call-count assertion to ensure each community is updated exactly once:
after the existing equality check on updated_communities, assert that
len(updated_communities) == Community.objects.count() (use
Community.objects.count() as the expected value and the updated_communities list
length as the actual), referencing the updated_communities variable and the
Community model to make the intent and failure message clearer.
django/thunderstore/repository/cache.py (1)

16-17: Consider distinct=True on the rating count as future-proofing.

Count("package__package_ratings") is correct today because no other joined m2m is combined into the same queryset. If a future .filter() or .annotate() introduces another join on this queryset, the count will inflate silently. A cheap guard:

Proposed nit
-    ).annotate(
-        _rating_score=Count("package__package_ratings"),
-    ).prefetch_related(
+    ).annotate(
+        _rating_score=Count("package__package_ratings", distinct=True),
+    ).prefetch_related(
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@django/thunderstore/repository/cache.py` around lines 16 - 17, The Count used
in the annotate for _rating_score (Count("package__package_ratings")) should
include distinct=True to guard against inflated counts if additional joins are
added later; update the annotate call that defines _rating_score to use
Count("package__package_ratings", distinct=True) so the rating count remains
correct even if the queryset gains extra joins.
django/thunderstore/repository/api/v1/serializers.py (3)

75-91: Minor: prefetch loads inactive versions that are then filtered out in Python.

The Prefetch in cache.py has no is_active=True filter, so every inactive PackageVersion for every listing in the batch is pulled into memory and discarded here. For communities with long-lived packages and many deactivated versions, this can materially inflate the prefetch payload. If that matters, push the filter down into the Prefetch queryset:

Suggested change in cache.py
         Prefetch(
             "package__versions",
-            queryset=PackageVersion.objects.select_related(
+            queryset=PackageVersion.objects.filter(is_active=True).select_related(
                 "package",
                 "package__owner",
             ).prefetch_related(
                 "dependencies__package__owner",
             ),
         ),

Note this would change the semantics of versions in the output to only include active versions — which is already what available_versions did, and what this code does post-filter, so it should be safe.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@django/thunderstore/repository/api/v1/serializers.py` around lines 75 - 91,
The prefetch currently pulls all PackageVersion rows and then get_versions
(which checks instance.package._prefetched_objects_cache.get("versions"))
filters out inactive ones in Python; update the Prefetch created in cache.py to
include a queryset filter is_active=True so only active versions are prefetched
(matching available_versions and the later Python filter), ensuring get_versions
still serializes sorted active versions via PackageVersionSerializer without
loading inactive rows into memory.

105-109: DRY: delegate to PackageListing.rating_score.

PackageListing.rating_score (see community/models/package_listing.py:290-295) already implements the exact same _rating_score-annotation-with-fallback pattern. The serializer could just return that and avoid duplicating the fallback logic in two places.

Proposed simplification
-    def get_rating_score(self, instance):
-        rating_score = getattr(instance, "_rating_score", None)
-        if rating_score is not None:
-            return rating_score
-        return instance.package.rating_score
+    def get_rating_score(self, instance):
+        return instance.rating_score
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@django/thunderstore/repository/api/v1/serializers.py` around lines 105 - 109,
Replace the duplicate fallback logic in the serializer's get_rating_score method
by delegating to the model property that already implements it: return the
PackageListing.rating_score value instead of re-implementing the _rating_score
check. Concretely, change get_rating_score(self, instance) to simply return
instance.rating_score (which uses PackageListing.rating_score), removing the
manual getattr fallback code.

1-1: distutils.version.StrictVersion is removed in Python 3.12.

distutils is deprecated since 3.10 and removed in 3.12. If migrating to Python 3.12+, consider using packaging.version.Version instead. Not a blocker for this PR since the import pattern already exists elsewhere in the codebase.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@django/thunderstore/repository/api/v1/serializers.py` at line 1, Replace the
deprecated distutils import by switching from "from distutils.version import
StrictVersion" to using packaging's version API (e.g., "from packaging.version
import Version" or "from packaging import version") and update any subsequent
uses of StrictVersion in this serializer module to call Version(...) (or
version.parse(...)) so behavior remains equivalent on Python 3.12+. Ensure you
add packaging to requirements if not already present.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@django/thunderstore/repository/api/v1/serializers.py`:
- Around line 75-91: The prefetch currently pulls all PackageVersion rows and
then get_versions (which checks
instance.package._prefetched_objects_cache.get("versions")) filters out inactive
ones in Python; update the Prefetch created in cache.py to include a queryset
filter is_active=True so only active versions are prefetched (matching
available_versions and the later Python filter), ensuring get_versions still
serializes sorted active versions via PackageVersionSerializer without loading
inactive rows into memory.
- Around line 105-109: Replace the duplicate fallback logic in the serializer's
get_rating_score method by delegating to the model property that already
implements it: return the PackageListing.rating_score value instead of
re-implementing the _rating_score check. Concretely, change
get_rating_score(self, instance) to simply return instance.rating_score (which
uses PackageListing.rating_score), removing the manual getattr fallback code.
- Line 1: Replace the deprecated distutils import by switching from "from
distutils.version import StrictVersion" to using packaging's version API (e.g.,
"from packaging.version import Version" or "from packaging import version") and
update any subsequent uses of StrictVersion in this serializer module to call
Version(...) (or version.parse(...)) so behavior remains equivalent on Python
3.12+. Ensure you add packaging to requirements if not already present.

In `@django/thunderstore/repository/api/v1/tests/test_caches.py`:
- Around line 217-245: In
test_update_api_v1_caches__deduplicates_communities_with_multiple_sites add an
explicit call-count assertion to ensure each community is updated exactly once:
after the existing equality check on updated_communities, assert that
len(updated_communities) == Community.objects.count() (use
Community.objects.count() as the expected value and the updated_communities list
length as the actual), referencing the updated_communities variable and the
Community model to make the intent and failure message clearer.

In `@django/thunderstore/repository/cache.py`:
- Around line 16-17: The Count used in the annotate for _rating_score
(Count("package__package_ratings")) should include distinct=True to guard
against inflated counts if additional joins are added later; update the annotate
call that defines _rating_score to use Count("package__package_ratings",
distinct=True) so the rating count remains correct even if the queryset gains
extra joins.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 26e9e0b5-9298-45e4-ac55-e83c51874080

📥 Commits

Reviewing files that changed from the base of the PR and between 210fd90 and 2554dc1.

📒 Files selected for processing (5)
  • django/thunderstore/repository/api/v1/serializers.py
  • django/thunderstore/repository/api/v1/tasks.py
  • django/thunderstore/repository/api/v1/tests/test_caches.py
  • django/thunderstore/repository/api/v1/viewsets.py
  • django/thunderstore/repository/cache.py

@codecov
Copy link
Copy Markdown

codecov bot commented Apr 17, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 92.93%. Comparing base (210fd90) to head (2554dc1).

Additional details and impacted files
@@           Coverage Diff           @@
##           master    #1266   +/-   ##
=======================================
  Coverage   92.93%   92.93%           
=======================================
  Files         344      344           
  Lines       10695    10703    +8     
  Branches      961      964    +3     
=======================================
+ Hits         9939     9947    +8     
  Misses        632      632           
  Partials      124      124           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant