Skip to content

Optimize package index cache serialization#1260

Open
ethangreen-dev wants to merge 1 commit intomasterfrom
cache-query-optimization
Open

Optimize package index cache serialization#1260
ethangreen-dev wants to merge 1 commit intomasterfrom
cache-query-optimization

Conversation

@ethangreen-dev
Copy link
Copy Markdown
Member

This removes a N+1 query from both the chunked and non-chunked index serialization paths.

Previously, serializing each listing could trigger its own rating count query, plus its own available_versions query, with version deps prefetched again per package. Query count was roughly scaling with the number of packages in the index.

This change fetches listings in batches, prefetches active versions and dependencies, and annotates rating counts on the queryset. Serialization then reads from prefetch data instead of performing per-package queries.

Complexity reduction goes from around O(package_count) to O(batches).

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Mar 30, 2026

Walkthrough

The PR changes package listing retrieval and serialization across three modules. Serializers: import StrictVersion, change rating_score to a SerializerMethodField returning instance.rating_score, and compute versions from package.versions filtered to active and sorted by StrictVersion(version_number) descending. Viewsets: add internal _get_prefetched_listing_queryset(ids) that orders listings, adds select_related/prefetch_related including a Prefetch for active package versions and annotates a _rating_score = Count("package__package_ratings"); callers now use this helper. Models/cache: get_package_listing_chunk now returns a concrete List, replaces SQL Case/When and Subquery with an annotate Count, prefetches filtered active versions (with dependencies), reconstructs original ordering client-side, and adds _get_sorted_active_versions and _version_download_url helpers; listing_to_json now derives package/owner/versions from the prefetched package and uses the new helpers.

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarizes the main objective: optimizing package index cache serialization by removing N+1 queries.
Description check ✅ Passed The description clearly explains the problem (N+1 queries), the solution (batch fetching with prefetch), and the performance improvement (O(package_count) to O(batches)).

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@codecov
Copy link
Copy Markdown

codecov bot commented Mar 30, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 92.96%. Comparing base (39b4260) to head (f5e6373).

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1260      +/-   ##
==========================================
+ Coverage   92.92%   92.96%   +0.03%     
==========================================
  Files         344      344              
  Lines       10689    10712      +23     
  Branches      961      964       +3     
==========================================
+ Hits         9933     9958      +25     
+ Misses        632      631       -1     
+ Partials      124      123       -1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Eliminate N+1 queries in both chunked and non-chunked index
serializations by prefetching package versions and deps, annotating
ratings up front, and consuming prefetch data during serialization
@ethangreen-dev ethangreen-dev force-pushed the cache-query-optimization branch from 6dd5298 to f5e6373 Compare March 30, 2026 18:43
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (2)
django/thunderstore/repository/models/cache.py (1)

285-288: Let _get_sorted_active_versions() enforce its own contract.

Right now this only returns active versions because get_package_listing_chunk() prefetches a filtered relation into the manager cache. Any future caller that passes a plain package will serialize inactive versions too.

Suggested fix
 def _get_sorted_active_versions(package):
-    versions = list(package.versions.all())
-    versions.sort(key=lambda v: StrictVersion(v.version_number), reverse=True)
+    versions = [version for version in package.versions.all() if version.is_active]
+    versions.sort(
+        key=lambda version: StrictVersion(version.version_number),
+        reverse=True,
+    )
     return versions
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@django/thunderstore/repository/models/cache.py` around lines 285 - 288,
_get_sorted_active_versions currently relies on callers (like
get_package_listing_chunk) to have prefiltered package.versions; change it to
always query only active versions itself by using the related manager filter
(e.g. package.versions.filter(is_active=True)) before converting to a list and
sorting with StrictVersion; keep the existing sort key and reverse=True behavior
so the function becomes self-contained and safe to call with an unmodified
package instance.
django/thunderstore/repository/api/v1/serializers.py (1)

4-4: Replace StrictVersion with a modern semver parser when Python 3.12+ support is targeted.

distutils.version is removed in Python 3.12+. While the current target is Python ^3.8 (no immediate issue), StrictVersion is deprecated and fragile. The codebase already restricts version format to Major.Minor.Patch via regex validation, so switching to a supported parser like packaging.version.Version is straightforward and future-proofs the code.

This affects the sorting key at line 78 and the validator at line 48 of django/thunderstore/repository/validators.py.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@django/thunderstore/repository/api/v1/serializers.py` at line 4, Replace use
of distutils.version.StrictVersion with packaging.version.Version: update the
import (remove StrictVersion and import packaging.version.Version and
packaging.version.InvalidVersion), change any sorting key that currently
constructs StrictVersion (e.g. in the serializer sorting lambda/key used in
django/thunderstore/repository/api/v1/serializers.py) to use
Version(version_string) and handle InvalidVersion exceptions, and update the
validator in django/thunderstore/repository/validators.py (the function that
currently calls StrictVersion on the trimmed version string) to call Version and
catch packaging.version.InvalidVersion to raise the same validation error; keep
existing regex pre-checks but rely on Version for parsing and comparisons to
future-proof Python 3.12+ compatibility.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@django/thunderstore/repository/api/v1/serializers.py`:
- Line 69: The serializer's rating_score SerializerMethodField should read the
annotated alias _rating_score when present and fall back to the original package
rating_score for unannotated querysets: update the serializer method (the method
backing rating_score on the PackageListing serializer) to return
getattr(instance, "_rating_score", instance.package.rating_score) (or use
hasattr/getattr with a default) instead of accessing
instance.package.rating_score unconditionally; apply the same change where
rating is read in the cache logic (the function in
django/thunderstore/repository/models/cache.py that currently accesses
package.rating_score) to prefer _rating_score and fall back to
instance.package.rating_score.

In `@django/thunderstore/repository/models/cache.py`:
- Around line 281-282: The sort key uses a single-letter variable 'l' which
triggers E741; change the lambda in the return statement to use a clearer name
(e.g., "listing") so it reads return sorted(listings, key=lambda listing:
order_map[listing.id]); update references to the loop variable only in that
lambda, leaving order_map, listing_ids and listings unchanged.

---

Nitpick comments:
In `@django/thunderstore/repository/api/v1/serializers.py`:
- Line 4: Replace use of distutils.version.StrictVersion with
packaging.version.Version: update the import (remove StrictVersion and import
packaging.version.Version and packaging.version.InvalidVersion), change any
sorting key that currently constructs StrictVersion (e.g. in the serializer
sorting lambda/key used in django/thunderstore/repository/api/v1/serializers.py)
to use Version(version_string) and handle InvalidVersion exceptions, and update
the validator in django/thunderstore/repository/validators.py (the function that
currently calls StrictVersion on the trimmed version string) to call Version and
catch packaging.version.InvalidVersion to raise the same validation error; keep
existing regex pre-checks but rely on Version for parsing and comparisons to
future-proof Python 3.12+ compatibility.

In `@django/thunderstore/repository/models/cache.py`:
- Around line 285-288: _get_sorted_active_versions currently relies on callers
(like get_package_listing_chunk) to have prefiltered package.versions; change it
to always query only active versions itself by using the related manager filter
(e.g. package.versions.filter(is_active=True)) before converting to a list and
sorting with StrictVersion; keep the existing sort key and reverse=True behavior
so the function becomes self-contained and safe to call with an unmodified
package instance.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 52200b46-b12e-43d0-836a-44ec3d01550a

📥 Commits

Reviewing files that changed from the base of the PR and between 39b4260 and 6dd5298.

📒 Files selected for processing (3)
  • django/thunderstore/repository/api/v1/serializers.py
  • django/thunderstore/repository/api/v1/viewsets.py
  • django/thunderstore/repository/models/cache.py

Comment thread django/thunderstore/repository/api/v1/serializers.py
Comment thread django/thunderstore/repository/models/cache.py
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (3)
django/thunderstore/repository/models/cache.py (2)

303-318: ⚠️ Potential issue | 🔴 Critical

Serialize the annotated score here too.

get_package_listing_chunk() annotates _rating_score, but Line 318 still reads listing.rating_score. PackageListing does not expose that attribute, so chunk generation will fail on this path.

Suggested fix
-            "rating_score": listing.rating_score,
+            "rating_score": (
+                listing._rating_score
+                if hasattr(listing, "_rating_score")
+                else package.rating_score
+            ),
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@django/thunderstore/repository/models/cache.py` around lines 303 - 318, The
serializer listing_to_json uses listing.rating_score which doesn't exist; use
the annotated field set by get_package_listing_chunk instead
(listing._rating_score) and fall back to listing.rating_score or None if not
present; update listing_to_json to read the score via something like score =
getattr(listing, "_rating_score", getattr(listing, "rating_score", None)) and
include that score in the JSON under "rating_score" so chunk generation no
longer fails.

281-282: ⚠️ Potential issue | 🟡 Minor

Rename the sort lambda parameter.

l will keep tripping Ruff/Flake8 E741 here.

Suggested fix
-    return sorted(listings, key=lambda l: order_map[l.id])
+    return sorted(listings, key=lambda listing: order_map[listing.id])
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@django/thunderstore/repository/models/cache.py` around lines 281 - 282, The
sort lambda uses a single-letter parameter `l` which triggers Ruff/Flake8 E741;
update the lambda in the return statement to use a clearer name (e.g.,
`listing`) so change "return sorted(listings, key=lambda l: order_map[l.id])" to
use "lambda listing: order_map[listing.id]" while keeping the existing variables
`order_map`, `listing_ids`, and `listings` unchanged.
django/thunderstore/repository/api/v1/serializers.py (1)

83-84: ⚠️ Potential issue | 🔴 Critical

Read the annotated rating score, then fall back lazily.

PackageListing does not expose rating_score, so this will raise on the optimized queryset. Read _rating_score when present and only fall back to instance.package.rating_score for unannotated instances.

Suggested fix
     def get_rating_score(self, instance):
-        return instance.rating_score
+        return (
+            instance._rating_score
+            if hasattr(instance, "_rating_score")
+            else instance.package.rating_score
+        )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@django/thunderstore/repository/api/v1/serializers.py` around lines 83 - 84,
get_rating_score currently accesses instance.rating_score which fails for
optimized (annotated) querysets; change it to first read the annotated attribute
_rating_score when present and only fall back to instance.package.rating_score
for unannotated instances. Specifically, in get_rating_score check
getattr(instance, "_rating_score", _MISSING) or hasattr(instance,
"_rating_score") and return instance._rating_score if present; otherwise return
instance.package.rating_score; keep the fall back behavior on the PackageListing
serializer but prefer the annotated _rating_score when available.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@django/thunderstore/repository/api/v1/serializers.py`:
- Around line 83-84: get_rating_score currently accesses instance.rating_score
which fails for optimized (annotated) querysets; change it to first read the
annotated attribute _rating_score when present and only fall back to
instance.package.rating_score for unannotated instances. Specifically, in
get_rating_score check getattr(instance, "_rating_score", _MISSING) or
hasattr(instance, "_rating_score") and return instance._rating_score if present;
otherwise return instance.package.rating_score; keep the fall back behavior on
the PackageListing serializer but prefer the annotated _rating_score when
available.

In `@django/thunderstore/repository/models/cache.py`:
- Around line 303-318: The serializer listing_to_json uses listing.rating_score
which doesn't exist; use the annotated field set by get_package_listing_chunk
instead (listing._rating_score) and fall back to listing.rating_score or None if
not present; update listing_to_json to read the score via something like score =
getattr(listing, "_rating_score", getattr(listing, "rating_score", None)) and
include that score in the JSON under "rating_score" so chunk generation no
longer fails.
- Around line 281-282: The sort lambda uses a single-letter parameter `l` which
triggers Ruff/Flake8 E741; update the lambda in the return statement to use a
clearer name (e.g., `listing`) so change "return sorted(listings, key=lambda l:
order_map[l.id])" to use "lambda listing: order_map[listing.id]" while keeping
the existing variables `order_map`, `listing_ids`, and `listings` unchanged.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 2f269046-76a2-4de1-ac36-c4243ec63aef

📥 Commits

Reviewing files that changed from the base of the PR and between 6dd5298 and f5e6373.

📒 Files selected for processing (3)
  • django/thunderstore/repository/api/v1/serializers.py
  • django/thunderstore/repository/api/v1/viewsets.py
  • django/thunderstore/repository/models/cache.py

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant