Optimize package index cache serialization#1260
Optimize package index cache serialization#1260ethangreen-dev wants to merge 1 commit intomasterfrom
Conversation
WalkthroughThe PR changes package listing retrieval and serialization across three modules. Serializers: import StrictVersion, change rating_score to a SerializerMethodField returning instance.rating_score, and compute versions from 🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #1260 +/- ##
==========================================
+ Coverage 92.92% 92.96% +0.03%
==========================================
Files 344 344
Lines 10689 10712 +23
Branches 961 964 +3
==========================================
+ Hits 9933 9958 +25
+ Misses 632 631 -1
+ Partials 124 123 -1 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Eliminate N+1 queries in both chunked and non-chunked index serializations by prefetching package versions and deps, annotating ratings up front, and consuming prefetch data during serialization
6dd5298 to
f5e6373
Compare
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (2)
django/thunderstore/repository/models/cache.py (1)
285-288: Let_get_sorted_active_versions()enforce its own contract.Right now this only returns active versions because
get_package_listing_chunk()prefetches a filtered relation into the manager cache. Any future caller that passes a plain package will serialize inactive versions too.Suggested fix
def _get_sorted_active_versions(package): - versions = list(package.versions.all()) - versions.sort(key=lambda v: StrictVersion(v.version_number), reverse=True) + versions = [version for version in package.versions.all() if version.is_active] + versions.sort( + key=lambda version: StrictVersion(version.version_number), + reverse=True, + ) return versions🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@django/thunderstore/repository/models/cache.py` around lines 285 - 288, _get_sorted_active_versions currently relies on callers (like get_package_listing_chunk) to have prefiltered package.versions; change it to always query only active versions itself by using the related manager filter (e.g. package.versions.filter(is_active=True)) before converting to a list and sorting with StrictVersion; keep the existing sort key and reverse=True behavior so the function becomes self-contained and safe to call with an unmodified package instance.django/thunderstore/repository/api/v1/serializers.py (1)
4-4: ReplaceStrictVersionwith a modern semver parser when Python 3.12+ support is targeted.
distutils.versionis removed in Python 3.12+. While the current target is Python ^3.8 (no immediate issue),StrictVersionis deprecated and fragile. The codebase already restricts version format toMajor.Minor.Patchvia regex validation, so switching to a supported parser likepackaging.version.Versionis straightforward and future-proofs the code.This affects the sorting key at line 78 and the validator at line 48 of
django/thunderstore/repository/validators.py.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@django/thunderstore/repository/api/v1/serializers.py` at line 4, Replace use of distutils.version.StrictVersion with packaging.version.Version: update the import (remove StrictVersion and import packaging.version.Version and packaging.version.InvalidVersion), change any sorting key that currently constructs StrictVersion (e.g. in the serializer sorting lambda/key used in django/thunderstore/repository/api/v1/serializers.py) to use Version(version_string) and handle InvalidVersion exceptions, and update the validator in django/thunderstore/repository/validators.py (the function that currently calls StrictVersion on the trimmed version string) to call Version and catch packaging.version.InvalidVersion to raise the same validation error; keep existing regex pre-checks but rely on Version for parsing and comparisons to future-proof Python 3.12+ compatibility.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@django/thunderstore/repository/api/v1/serializers.py`:
- Line 69: The serializer's rating_score SerializerMethodField should read the
annotated alias _rating_score when present and fall back to the original package
rating_score for unannotated querysets: update the serializer method (the method
backing rating_score on the PackageListing serializer) to return
getattr(instance, "_rating_score", instance.package.rating_score) (or use
hasattr/getattr with a default) instead of accessing
instance.package.rating_score unconditionally; apply the same change where
rating is read in the cache logic (the function in
django/thunderstore/repository/models/cache.py that currently accesses
package.rating_score) to prefer _rating_score and fall back to
instance.package.rating_score.
In `@django/thunderstore/repository/models/cache.py`:
- Around line 281-282: The sort key uses a single-letter variable 'l' which
triggers E741; change the lambda in the return statement to use a clearer name
(e.g., "listing") so it reads return sorted(listings, key=lambda listing:
order_map[listing.id]); update references to the loop variable only in that
lambda, leaving order_map, listing_ids and listings unchanged.
---
Nitpick comments:
In `@django/thunderstore/repository/api/v1/serializers.py`:
- Line 4: Replace use of distutils.version.StrictVersion with
packaging.version.Version: update the import (remove StrictVersion and import
packaging.version.Version and packaging.version.InvalidVersion), change any
sorting key that currently constructs StrictVersion (e.g. in the serializer
sorting lambda/key used in django/thunderstore/repository/api/v1/serializers.py)
to use Version(version_string) and handle InvalidVersion exceptions, and update
the validator in django/thunderstore/repository/validators.py (the function that
currently calls StrictVersion on the trimmed version string) to call Version and
catch packaging.version.InvalidVersion to raise the same validation error; keep
existing regex pre-checks but rely on Version for parsing and comparisons to
future-proof Python 3.12+ compatibility.
In `@django/thunderstore/repository/models/cache.py`:
- Around line 285-288: _get_sorted_active_versions currently relies on callers
(like get_package_listing_chunk) to have prefiltered package.versions; change it
to always query only active versions itself by using the related manager filter
(e.g. package.versions.filter(is_active=True)) before converting to a list and
sorting with StrictVersion; keep the existing sort key and reverse=True behavior
so the function becomes self-contained and safe to call with an unmodified
package instance.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 52200b46-b12e-43d0-836a-44ec3d01550a
📒 Files selected for processing (3)
django/thunderstore/repository/api/v1/serializers.pydjango/thunderstore/repository/api/v1/viewsets.pydjango/thunderstore/repository/models/cache.py
There was a problem hiding this comment.
♻️ Duplicate comments (3)
django/thunderstore/repository/models/cache.py (2)
303-318:⚠️ Potential issue | 🔴 CriticalSerialize the annotated score here too.
get_package_listing_chunk()annotates_rating_score, but Line 318 still readslisting.rating_score.PackageListingdoes not expose that attribute, so chunk generation will fail on this path.Suggested fix
- "rating_score": listing.rating_score, + "rating_score": ( + listing._rating_score + if hasattr(listing, "_rating_score") + else package.rating_score + ),🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@django/thunderstore/repository/models/cache.py` around lines 303 - 318, The serializer listing_to_json uses listing.rating_score which doesn't exist; use the annotated field set by get_package_listing_chunk instead (listing._rating_score) and fall back to listing.rating_score or None if not present; update listing_to_json to read the score via something like score = getattr(listing, "_rating_score", getattr(listing, "rating_score", None)) and include that score in the JSON under "rating_score" so chunk generation no longer fails.
281-282:⚠️ Potential issue | 🟡 MinorRename the sort lambda parameter.
lwill keep tripping Ruff/Flake8 E741 here.Suggested fix
- return sorted(listings, key=lambda l: order_map[l.id]) + return sorted(listings, key=lambda listing: order_map[listing.id])🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@django/thunderstore/repository/models/cache.py` around lines 281 - 282, The sort lambda uses a single-letter parameter `l` which triggers Ruff/Flake8 E741; update the lambda in the return statement to use a clearer name (e.g., `listing`) so change "return sorted(listings, key=lambda l: order_map[l.id])" to use "lambda listing: order_map[listing.id]" while keeping the existing variables `order_map`, `listing_ids`, and `listings` unchanged.django/thunderstore/repository/api/v1/serializers.py (1)
83-84:⚠️ Potential issue | 🔴 CriticalRead the annotated rating score, then fall back lazily.
PackageListingdoes not exposerating_score, so this will raise on the optimized queryset. Read_rating_scorewhen present and only fall back toinstance.package.rating_scorefor unannotated instances.Suggested fix
def get_rating_score(self, instance): - return instance.rating_score + return ( + instance._rating_score + if hasattr(instance, "_rating_score") + else instance.package.rating_score + )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@django/thunderstore/repository/api/v1/serializers.py` around lines 83 - 84, get_rating_score currently accesses instance.rating_score which fails for optimized (annotated) querysets; change it to first read the annotated attribute _rating_score when present and only fall back to instance.package.rating_score for unannotated instances. Specifically, in get_rating_score check getattr(instance, "_rating_score", _MISSING) or hasattr(instance, "_rating_score") and return instance._rating_score if present; otherwise return instance.package.rating_score; keep the fall back behavior on the PackageListing serializer but prefer the annotated _rating_score when available.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@django/thunderstore/repository/api/v1/serializers.py`:
- Around line 83-84: get_rating_score currently accesses instance.rating_score
which fails for optimized (annotated) querysets; change it to first read the
annotated attribute _rating_score when present and only fall back to
instance.package.rating_score for unannotated instances. Specifically, in
get_rating_score check getattr(instance, "_rating_score", _MISSING) or
hasattr(instance, "_rating_score") and return instance._rating_score if present;
otherwise return instance.package.rating_score; keep the fall back behavior on
the PackageListing serializer but prefer the annotated _rating_score when
available.
In `@django/thunderstore/repository/models/cache.py`:
- Around line 303-318: The serializer listing_to_json uses listing.rating_score
which doesn't exist; use the annotated field set by get_package_listing_chunk
instead (listing._rating_score) and fall back to listing.rating_score or None if
not present; update listing_to_json to read the score via something like score =
getattr(listing, "_rating_score", getattr(listing, "rating_score", None)) and
include that score in the JSON under "rating_score" so chunk generation no
longer fails.
- Around line 281-282: The sort lambda uses a single-letter parameter `l` which
triggers Ruff/Flake8 E741; update the lambda in the return statement to use a
clearer name (e.g., `listing`) so change "return sorted(listings, key=lambda l:
order_map[l.id])" to use "lambda listing: order_map[listing.id]" while keeping
the existing variables `order_map`, `listing_ids`, and `listings` unchanged.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 2f269046-76a2-4de1-ac36-c4243ec63aef
📒 Files selected for processing (3)
django/thunderstore/repository/api/v1/serializers.pydjango/thunderstore/repository/api/v1/viewsets.pydjango/thunderstore/repository/models/cache.py
This removes a N+1 query from both the chunked and non-chunked index serialization paths.
Previously, serializing each listing could trigger its own rating count query, plus its own available_versions query, with version deps prefetched again per package. Query count was roughly scaling with the number of packages in the index.
This change fetches listings in batches, prefetches active versions and dependencies, and annotates rating counts on the queryset. Serialization then reads from prefetch data instead of performing per-package queries.
Complexity reduction goes from around O(package_count) to O(batches).