Optimize community download count aggregation#1263
Optimize community download count aggregation#1263ethangreen-dev wants to merge 1 commit intomasterfrom
Conversation
WalkthroughThe change updates 🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #1263 +/- ##
==========================================
- Coverage 92.93% 92.91% -0.02%
==========================================
Files 344 344
Lines 10695 10696 +1
Branches 961 960 -1
==========================================
- Hits 9939 9938 -1
- Misses 632 633 +1
- Partials 124 125 +1 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Community download counts were previously performed as an iterative sum over all active listings. One aggr. query per community per hour. This fixes that be replacing it with a DB-level aggregate across the listings queryset. Improves complexity from something like O(listing_count) to like O(1) (per community).
5f409b6 to
dd7613b
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
django/thunderstore/community/models/community.py (1)
370-376: Extract the download-sum rule into a shared helper.This now duplicates the logic behind
PackageListing.total_downloadsindjango/thunderstore/community/models/package_listing.py:297-302andPackage.downloadsindjango/thunderstore/repository/models/package.py:194-197. Centralizing the aggregate keepsupdate_for_communityfrom drifting the next time download semantics change.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@django/thunderstore/community/models/community.py` around lines 370 - 376, The download-sum aggregation used in update_for_community (setting community.aggregated_fields.download_count) duplicates the logic in PackageListing.total_downloads and Package.downloads; extract that aggregation into a single shared helper (e.g., a function like compute_total_downloads(listings_qs) placed in a common module) and have update_for_community call that helper instead of repeating the filter/aggregate logic, and likewise refactor PackageListing.total_downloads and Package.downloads to delegate to the same helper so all three locations (update_for_community, PackageListing.total_downloads, Package.downloads) use the centralized compute_total_downloads implementation.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@django/thunderstore/community/models/community.py`:
- Around line 368-375: The download aggregation logic in
community.aggregated_fields is duplicated with Package.downloads; replace the
inline Sum over PackageVersion (the block using
PackageVersion.objects.filter(package_id__in=listings.values("package_id")).aggregate(...))
by calling the existing aggregation on the Package model (e.g., use
Package.downloads for the set of packages represented by listings) or extract
the Sum logic into a shared helper function and call it from both places so the
aggregation lives in one spot; update references around
community.aggregated_fields.package_count and listings to use the chosen single
aggregation implementation.
---
Nitpick comments:
In `@django/thunderstore/community/models/community.py`:
- Around line 370-376: The download-sum aggregation used in update_for_community
(setting community.aggregated_fields.download_count) duplicates the logic in
PackageListing.total_downloads and Package.downloads; extract that aggregation
into a single shared helper (e.g., a function like
compute_total_downloads(listings_qs) placed in a common module) and have
update_for_community call that helper instead of repeating the filter/aggregate
logic, and likewise refactor PackageListing.total_downloads and
Package.downloads to delegate to the same helper so all three locations
(update_for_community, PackageListing.total_downloads, Package.downloads) use
the centralized compute_total_downloads implementation.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 30a92fd0-8dbe-49b6-bd66-3af0f0ff1a3e
📒 Files selected for processing (1)
django/thunderstore/community/models/community.py
| community.aggregated_fields.package_count = listings.count() | ||
| community.aggregated_fields.download_count = sum( | ||
| listing.total_downloads for listing in listings | ||
|
|
||
| from thunderstore.repository.models import PackageVersion | ||
|
|
||
| community.aggregated_fields.download_count = ( | ||
| PackageVersion.objects.filter( | ||
| package_id__in=listings.values("package_id") | ||
| ).aggregate(total=Sum("downloads"))["total"] |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
files="$(fd -i 'package_listing.py')"
[ -n "$files" ] || { echo "package_listing.py not found"; exit 1; }
echo "== PackageListing model / Meta constraints =="
while IFS= read -r file; do
echo "-- $file --"
rg -n -C5 --type=py 'class PackageListing\b|class Meta\b|constraints\s*=|UniqueConstraint|unique_together|community\s*=|package\s*=' "$file"
done <<< "$files"
echo
echo "== Any PackageListing uniqueness declarations elsewhere =="
rg -n -C3 --type=py 'PackageListing|UniqueConstraint|unique_together'Repository: thunderstore-io/Thunderstore
Length of output: 189
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Find the actual PackageListing model file
echo "== Locating PackageListing model =="
fd -type f -name "*.py" | xargs grep -l "class PackageListing" | head -5
echo
echo "== PackageListing model definition with Meta class =="
fd -type f -name "*.py" -path "*/models/*" | xargs grep -l "class PackageListing" | while read file; do
echo "-- $file --"
cat -n "$file"
doneRepository: thunderstore-io/Thunderstore
Length of output: 297
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Find files containing PackageListing class definition
echo "== Files containing PackageListing class =="
find . -name "*.py" -type f -exec grep -l "class PackageListing" {} \; | head -10
echo
echo "== PackageListing model file in community/models =="
cat -n django/thunderstore/community/models/package_listing.py | head -100Repository: thunderstore-io/Thunderstore
Length of output: 5272
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Continue reading the PackageListing model to find the Meta class
echo "== Full PackageListing model including Meta class =="
wc -l django/thunderstore/community/models/package_listing.py
echo
sed -n '100,300p' django/thunderstore/community/models/package_listing.py | cat -nRepository: thunderstore-io/Thunderstore
Length of output: 7972
Consolidate download aggregation logic into the Package model.
The (package, community) uniqueness is already enforced at the database level, so the download count aggregation is safe. However, the Sum logic is now duplicated—it's defined both here and in Package.downloads. Consider using Package.downloads directly or extracting the aggregation into a shared helper to avoid divergence.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@django/thunderstore/community/models/community.py` around lines 368 - 375,
The download aggregation logic in community.aggregated_fields is duplicated with
Package.downloads; replace the inline Sum over PackageVersion (the block using
PackageVersion.objects.filter(package_id__in=listings.values("package_id")).aggregate(...))
by calling the existing aggregation on the Package model (e.g., use
Package.downloads for the set of packages represented by listings) or extract
the Sum logic into a shared helper function and call it from both places so the
aggregation lives in one spot; update references around
community.aggregated_fields.package_count and listings to use the chosen single
aggregation implementation.
Community download counts were previously performed as an iterative sum over all active listings. One aggregate query per community per hour.
This fixes that by replacing it with a DB-level aggregate across the listings queryset. Improves complexity from something like O(listing_count) to like O(1) (per community).