Optimize package index cache serialization by ethangreen-dev · Pull Request #1260 · thunderstore-io/Thunderstore

ethangreen-dev · 2026-03-30T18:33:28Z

This removes a N+1 query from both the chunked and non-chunked index serialization paths.

Previously, serializing each listing could trigger its own rating count query, plus its own available_versions query, with version deps prefetched again per package. Query count was roughly scaling with the number of packages in the index.

This change fetches listings in batches, prefetches active versions and dependencies, and annotates rating counts on the queryset. Serialization then reads from prefetch data instead of performing per-package queries.

Complexity reduction goes from around O(package_count) to O(batches).

coderabbitai · 2026-03-30T18:33:53Z

Walkthrough

The PR changes package listing retrieval and serialization across three modules. Serializers: import StrictVersion, change rating_score to a SerializerMethodField returning instance.rating_score, and compute versions from package.versions filtered to active and sorted by StrictVersion(version_number) descending. Viewsets: add internal _get_prefetched_listing_queryset(ids) that orders listings, adds select_related/prefetch_related including a Prefetch for active package versions and annotates a _rating_score = Count("package__package_ratings"); callers now use this helper. Models/cache: get_package_listing_chunk now returns a concrete List, replaces SQL Case/When and Subquery with an annotate Count, prefetches filtered active versions (with dependencies), reconstructs original ordering client-side, and adds _get_sorted_active_versions and _version_download_url helpers; listing_to_json now derives package/owner/versions from the prefetched package and uses the new helpers.

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately summarizes the main objective: optimizing package index cache serialization by removing N+1 queries.
Description check	✅ Passed	The description clearly explains the problem (N+1 queries), the solution (batch fetching with prefetch), and the performance improvement (O(package_count) to O(batches)).

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

codecov · 2026-03-30T18:34:05Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 92.96%. Comparing base (39b4260) to head (f5e6373).

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #1260      +/-   ##
==========================================
+ Coverage   92.92%   92.96%   +0.03%     
==========================================
  Files         344      344              
  Lines       10689    10712      +23     
  Branches      961      964       +3     
==========================================
+ Hits         9933     9958      +25     
+ Misses        632      631       -1     
+ Partials      124      123       -1

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Eliminate N+1 queries in both chunked and non-chunked index serializations by prefetching package versions and deps, annotating ratings up front, and consuming prefetch data during serialization

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (2)

django/thunderstore/repository/models/cache.py (1)
285-288: Let _get_sorted_active_versions() enforce its own contract.

Right now this only returns active versions because get_package_listing_chunk() prefetches a filtered relation into the manager cache. Any future caller that passes a plain package will serialize inactive versions too.
Suggested fix
 def _get_sorted_active_versions(package):
-    versions = list(package.versions.all())
-    versions.sort(key=lambda v: StrictVersion(v.version_number), reverse=True)
+    versions = [version for version in package.versions.all() if version.is_active]
+    versions.sort(
+        key=lambda version: StrictVersion(version.version_number),
+        reverse=True,
+    )
     return versions
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@django/thunderstore/repository/models/cache.py` around lines 285 - 288,
_get_sorted_active_versions currently relies on callers (like
get_package_listing_chunk) to have prefiltered package.versions; change it to
always query only active versions itself by using the related manager filter
(e.g. package.versions.filter(is_active=True)) before converting to a list and
sorting with StrictVersion; keep the existing sort key and reverse=True behavior
so the function becomes self-contained and safe to call with an unmodified
package instance.
django/thunderstore/repository/api/v1/serializers.py (1)
4-4: Replace StrictVersion with a modern semver parser when Python 3.12+ support is targeted.

distutils.version is removed in Python 3.12+. While the current target is Python ^3.8 (no immediate issue), StrictVersion is deprecated and fragile. The codebase already restricts version format to Major.Minor.Patch via regex validation, so switching to a supported parser like packaging.version.Version is straightforward and future-proofs the code.

This affects the sorting key at line 78 and the validator at line 48 of django/thunderstore/repository/validators.py.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@django/thunderstore/repository/api/v1/serializers.py` at line 4, Replace use
of distutils.version.StrictVersion with packaging.version.Version: update the
import (remove StrictVersion and import packaging.version.Version and
packaging.version.InvalidVersion), change any sorting key that currently
constructs StrictVersion (e.g. in the serializer sorting lambda/key used in
django/thunderstore/repository/api/v1/serializers.py) to use
Version(version_string) and handle InvalidVersion exceptions, and update the
validator in django/thunderstore/repository/validators.py (the function that
currently calls StrictVersion on the trimmed version string) to call Version and
catch packaging.version.InvalidVersion to raise the same validation error; keep
existing regex pre-checks but rely on Version for parsing and comparisons to
future-proof Python 3.12+ compatibility.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@django/thunderstore/repository/api/v1/serializers.py`:
- Line 69: The serializer's rating_score SerializerMethodField should read the
annotated alias _rating_score when present and fall back to the original package
rating_score for unannotated querysets: update the serializer method (the method
backing rating_score on the PackageListing serializer) to return
getattr(instance, "_rating_score", instance.package.rating_score) (or use
hasattr/getattr with a default) instead of accessing
instance.package.rating_score unconditionally; apply the same change where
rating is read in the cache logic (the function in
django/thunderstore/repository/models/cache.py that currently accesses
package.rating_score) to prefer _rating_score and fall back to
instance.package.rating_score.

In `@django/thunderstore/repository/models/cache.py`:
- Around line 281-282: The sort key uses a single-letter variable 'l' which
triggers E741; change the lambda in the return statement to use a clearer name
(e.g., "listing") so it reads return sorted(listings, key=lambda listing:
order_map[listing.id]); update references to the loop variable only in that
lambda, leaving order_map, listing_ids and listings unchanged.

---

Nitpick comments:
In `@django/thunderstore/repository/api/v1/serializers.py`:
- Line 4: Replace use of distutils.version.StrictVersion with
packaging.version.Version: update the import (remove StrictVersion and import
packaging.version.Version and packaging.version.InvalidVersion), change any
sorting key that currently constructs StrictVersion (e.g. in the serializer
sorting lambda/key used in django/thunderstore/repository/api/v1/serializers.py)
to use Version(version_string) and handle InvalidVersion exceptions, and update
the validator in django/thunderstore/repository/validators.py (the function that
currently calls StrictVersion on the trimmed version string) to call Version and
catch packaging.version.InvalidVersion to raise the same validation error; keep
existing regex pre-checks but rely on Version for parsing and comparisons to
future-proof Python 3.12+ compatibility.

In `@django/thunderstore/repository/models/cache.py`:
- Around line 285-288: _get_sorted_active_versions currently relies on callers
(like get_package_listing_chunk) to have prefiltered package.versions; change it
to always query only active versions itself by using the related manager filter
(e.g. package.versions.filter(is_active=True)) before converting to a list and
sorting with StrictVersion; keep the existing sort key and reverse=True behavior
so the function becomes self-contained and safe to call with an unmodified
package instance.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 52200b46-b12e-43d0-836a-44ec3d01550a

📥 Commits

Reviewing files that changed from the base of the PR and between 39b4260 and 6dd5298.

📒 Files selected for processing (3)

django/thunderstore/repository/api/v1/serializers.py
django/thunderstore/repository/api/v1/viewsets.py
django/thunderstore/repository/models/cache.py

coderabbitai

♻️ Duplicate comments (3)

django/thunderstore/repository/models/cache.py (2)

303-318: ⚠️ Potential issue | 🔴 Critical

Serialize the annotated score here too.

get_package_listing_chunk() annotates _rating_score, but Line 318 still reads listing.rating_score. PackageListing does not expose that attribute, so chunk generation will fail on this path.

Suggested fix

-            "rating_score": listing.rating_score,
+            "rating_score": (
+                listing._rating_score
+                if hasattr(listing, "_rating_score")
+                else package.rating_score
+            ),

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@django/thunderstore/repository/models/cache.py` around lines 303 - 318, The
serializer listing_to_json uses listing.rating_score which doesn't exist; use
the annotated field set by get_package_listing_chunk instead
(listing._rating_score) and fall back to listing.rating_score or None if not
present; update listing_to_json to read the score via something like score =
getattr(listing, "_rating_score", getattr(listing, "rating_score", None)) and
include that score in the JSON under "rating_score" so chunk generation no
longer fails.

281-282: ⚠️ Potential issue | 🟡 Minor

Rename the sort lambda parameter.

l will keep tripping Ruff/Flake8 E741 here.

Suggested fix

-    return sorted(listings, key=lambda l: order_map[l.id])
+    return sorted(listings, key=lambda listing: order_map[listing.id])

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@django/thunderstore/repository/models/cache.py` around lines 281 - 282, The
sort lambda uses a single-letter parameter `l` which triggers Ruff/Flake8 E741;
update the lambda in the return statement to use a clearer name (e.g.,
`listing`) so change "return sorted(listings, key=lambda l: order_map[l.id])" to
use "lambda listing: order_map[listing.id]" while keeping the existing variables
`order_map`, `listing_ids`, and `listings` unchanged.

django/thunderstore/repository/api/v1/serializers.py (1)

83-84: ⚠️ Potential issue | 🔴 Critical

Read the annotated rating score, then fall back lazily.

PackageListing does not expose rating_score, so this will raise on the optimized queryset. Read _rating_score when present and only fall back to instance.package.rating_score for unannotated instances.

Suggested fix

     def get_rating_score(self, instance):
-        return instance.rating_score
+        return (
+            instance._rating_score
+            if hasattr(instance, "_rating_score")
+            else instance.package.rating_score
+        )

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@django/thunderstore/repository/api/v1/serializers.py` around lines 83 - 84,
get_rating_score currently accesses instance.rating_score which fails for
optimized (annotated) querysets; change it to first read the annotated attribute
_rating_score when present and only fall back to instance.package.rating_score
for unannotated instances. Specifically, in get_rating_score check
getattr(instance, "_rating_score", _MISSING) or hasattr(instance,
"_rating_score") and return instance._rating_score if present; otherwise return
instance.package.rating_score; keep the fall back behavior on the PackageListing
serializer but prefer the annotated _rating_score when available.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@django/thunderstore/repository/api/v1/serializers.py`:
- Around line 83-84: get_rating_score currently accesses instance.rating_score
which fails for optimized (annotated) querysets; change it to first read the
annotated attribute _rating_score when present and only fall back to
instance.package.rating_score for unannotated instances. Specifically, in
get_rating_score check getattr(instance, "_rating_score", _MISSING) or
hasattr(instance, "_rating_score") and return instance._rating_score if present;
otherwise return instance.package.rating_score; keep the fall back behavior on
the PackageListing serializer but prefer the annotated _rating_score when
available.

In `@django/thunderstore/repository/models/cache.py`:
- Around line 303-318: The serializer listing_to_json uses listing.rating_score
which doesn't exist; use the annotated field set by get_package_listing_chunk
instead (listing._rating_score) and fall back to listing.rating_score or None if
not present; update listing_to_json to read the score via something like score =
getattr(listing, "_rating_score", getattr(listing, "rating_score", None)) and
include that score in the JSON under "rating_score" so chunk generation no
longer fails.
- Around line 281-282: The sort lambda uses a single-letter parameter `l` which
triggers Ruff/Flake8 E741; update the lambda in the return statement to use a
clearer name (e.g., `listing`) so change "return sorted(listings, key=lambda l:
order_map[l.id])" to use "lambda listing: order_map[listing.id]" while keeping
the existing variables `order_map`, `listing_ids`, and `listings` unchanged.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 2f269046-76a2-4de1-ac36-c4243ec63aef

📥 Commits

Reviewing files that changed from the base of the PR and between 6dd5298 and f5e6373.

📒 Files selected for processing (3)

django/thunderstore/repository/api/v1/serializers.py
django/thunderstore/repository/api/v1/viewsets.py
django/thunderstore/repository/models/cache.py

Optimize package index cache serialization

f5e6373

Eliminate N+1 queries in both chunked and non-chunked index serializations by prefetching package versions and deps, annotating ratings up front, and consuming prefetch data during serialization

ethangreen-dev force-pushed the cache-query-optimization branch from 6dd5298 to f5e6373 Compare March 30, 2026 18:43

coderabbitai bot reviewed Mar 30, 2026

View reviewed changes

Comment thread django/thunderstore/repository/api/v1/serializers.py

Comment thread django/thunderstore/repository/models/cache.py

coderabbitai bot reviewed Mar 30, 2026

View reviewed changes

ethangreen-dev requested review from Roffenlund and x753 April 17, 2026 15:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize package index cache serialization#1260

Optimize package index cache serialization#1260
ethangreen-dev wants to merge 1 commit intomasterfrom
cache-query-optimization

ethangreen-dev commented Mar 30, 2026

Uh oh!

coderabbitai bot commented Mar 30, 2026 •

edited

Loading

❌ Failed checks (1 warning)

Uh oh!

codecov bot commented Mar 30, 2026 •

edited

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ethangreen-dev commented Mar 30, 2026

Uh oh!

coderabbitai bot commented Mar 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

❌ Failed checks (1 warning)

Uh oh!

codecov bot commented Mar 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

coderabbitai bot commented Mar 30, 2026 •

edited

Loading

codecov bot commented Mar 30, 2026 •

edited

Loading