Skip to content

Optimize community download count aggregation#1263

Open
ethangreen-dev wants to merge 1 commit intomasterfrom
faster-comm-download-aggregation
Open

Optimize community download count aggregation#1263
ethangreen-dev wants to merge 1 commit intomasterfrom
faster-comm-download-aggregation

Conversation

@ethangreen-dev
Copy link
Copy Markdown
Member

Community download counts were previously performed as an iterative sum over all active listings. One aggregate query per community per hour.

This fixes that by replacing it with a DB-level aggregate across the listings queryset. Improves complexity from something like O(listing_count) to like O(1) (per community).

@ethangreen-dev ethangreen-dev requested a review from x753 April 1, 2026 01:05
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Apr 1, 2026

Walkthrough

The change updates update_for_community to compute CommunityAggregatedFields.download_count via a database aggregate over PackageVersion.downloads for packages in the filtered package_listings queryset (using Sum with a default of 0) instead of summing listing.total_downloads in Python. The existing queryset filters (active, optional approved, and filter_with_single_community) and the package_count derived from listings.count() are unchanged. A runtime import of PackageVersion is added to perform the aggregate query.

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and concisely summarizes the main change: optimizing community download count aggregation from a Python-level iterative sum to a database-level aggregate query.
Description check ✅ Passed The description is directly related to the changeset, explaining the performance improvement from O(listing_count) to O(1) complexity by moving from iterative Python summation to database-level aggregation.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@codecov
Copy link
Copy Markdown

codecov bot commented Apr 1, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 92.91%. Comparing base (210fd90) to head (dd7613b).

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1263      +/-   ##
==========================================
- Coverage   92.93%   92.91%   -0.02%     
==========================================
  Files         344      344              
  Lines       10695    10696       +1     
  Branches      961      960       -1     
==========================================
- Hits         9939     9938       -1     
- Misses        632      633       +1     
- Partials      124      125       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@ethangreen-dev ethangreen-dev marked this pull request as draft April 1, 2026 01:14
Community download counts were previously performed as an iterative sum
over all active listings. One aggr. query per community per hour.

This fixes that be replacing it with a DB-level aggregate across the
listings queryset. Improves complexity from something like
O(listing_count) to like O(1) (per community).
@ethangreen-dev ethangreen-dev force-pushed the faster-comm-download-aggregation branch from 5f409b6 to dd7613b Compare April 2, 2026 09:32
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
django/thunderstore/community/models/community.py (1)

370-376: Extract the download-sum rule into a shared helper.

This now duplicates the logic behind PackageListing.total_downloads in django/thunderstore/community/models/package_listing.py:297-302 and Package.downloads in django/thunderstore/repository/models/package.py:194-197. Centralizing the aggregate keeps update_for_community from drifting the next time download semantics change.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@django/thunderstore/community/models/community.py` around lines 370 - 376,
The download-sum aggregation used in update_for_community (setting
community.aggregated_fields.download_count) duplicates the logic in
PackageListing.total_downloads and Package.downloads; extract that aggregation
into a single shared helper (e.g., a function like
compute_total_downloads(listings_qs) placed in a common module) and have
update_for_community call that helper instead of repeating the filter/aggregate
logic, and likewise refactor PackageListing.total_downloads and
Package.downloads to delegate to the same helper so all three locations
(update_for_community, PackageListing.total_downloads, Package.downloads) use
the centralized compute_total_downloads implementation.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@django/thunderstore/community/models/community.py`:
- Around line 368-375: The download aggregation logic in
community.aggregated_fields is duplicated with Package.downloads; replace the
inline Sum over PackageVersion (the block using
PackageVersion.objects.filter(package_id__in=listings.values("package_id")).aggregate(...))
by calling the existing aggregation on the Package model (e.g., use
Package.downloads for the set of packages represented by listings) or extract
the Sum logic into a shared helper function and call it from both places so the
aggregation lives in one spot; update references around
community.aggregated_fields.package_count and listings to use the chosen single
aggregation implementation.

---

Nitpick comments:
In `@django/thunderstore/community/models/community.py`:
- Around line 370-376: The download-sum aggregation used in update_for_community
(setting community.aggregated_fields.download_count) duplicates the logic in
PackageListing.total_downloads and Package.downloads; extract that aggregation
into a single shared helper (e.g., a function like
compute_total_downloads(listings_qs) placed in a common module) and have
update_for_community call that helper instead of repeating the filter/aggregate
logic, and likewise refactor PackageListing.total_downloads and
Package.downloads to delegate to the same helper so all three locations
(update_for_community, PackageListing.total_downloads, Package.downloads) use
the centralized compute_total_downloads implementation.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 30a92fd0-8dbe-49b6-bd66-3af0f0ff1a3e

📥 Commits

Reviewing files that changed from the base of the PR and between 5f409b6 and dd7613b.

📒 Files selected for processing (1)
  • django/thunderstore/community/models/community.py

Comment on lines 368 to +375
community.aggregated_fields.package_count = listings.count()
community.aggregated_fields.download_count = sum(
listing.total_downloads for listing in listings

from thunderstore.repository.models import PackageVersion

community.aggregated_fields.download_count = (
PackageVersion.objects.filter(
package_id__in=listings.values("package_id")
).aggregate(total=Sum("downloads"))["total"]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

files="$(fd -i 'package_listing.py')"
[ -n "$files" ] || { echo "package_listing.py not found"; exit 1; }

echo "== PackageListing model / Meta constraints =="
while IFS= read -r file; do
  echo "-- $file --"
  rg -n -C5 --type=py 'class PackageListing\b|class Meta\b|constraints\s*=|UniqueConstraint|unique_together|community\s*=|package\s*=' "$file"
done <<< "$files"

echo
echo "== Any PackageListing uniqueness declarations elsewhere =="
rg -n -C3 --type=py 'PackageListing|UniqueConstraint|unique_together'

Repository: thunderstore-io/Thunderstore

Length of output: 189


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Find the actual PackageListing model file
echo "== Locating PackageListing model =="
fd -type f -name "*.py" | xargs grep -l "class PackageListing" | head -5

echo
echo "== PackageListing model definition with Meta class =="
fd -type f -name "*.py" -path "*/models/*" | xargs grep -l "class PackageListing" | while read file; do
  echo "-- $file --"
  cat -n "$file"
done

Repository: thunderstore-io/Thunderstore

Length of output: 297


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Find files containing PackageListing class definition
echo "== Files containing PackageListing class =="
find . -name "*.py" -type f -exec grep -l "class PackageListing" {} \; | head -10

echo
echo "== PackageListing model file in community/models =="
cat -n django/thunderstore/community/models/package_listing.py | head -100

Repository: thunderstore-io/Thunderstore

Length of output: 5272


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Continue reading the PackageListing model to find the Meta class
echo "== Full PackageListing model including Meta class =="
wc -l django/thunderstore/community/models/package_listing.py
echo
sed -n '100,300p' django/thunderstore/community/models/package_listing.py | cat -n

Repository: thunderstore-io/Thunderstore

Length of output: 7972


Consolidate download aggregation logic into the Package model.

The (package, community) uniqueness is already enforced at the database level, so the download count aggregation is safe. However, the Sum logic is now duplicated—it's defined both here and in Package.downloads. Consider using Package.downloads directly or extracting the aggregation into a shared helper to avoid divergence.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@django/thunderstore/community/models/community.py` around lines 368 - 375,
The download aggregation logic in community.aggregated_fields is duplicated with
Package.downloads; replace the inline Sum over PackageVersion (the block using
PackageVersion.objects.filter(package_id__in=listings.values("package_id")).aggregate(...))
by calling the existing aggregation on the Package model (e.g., use
Package.downloads for the set of packages represented by listings) or extract
the Sum logic into a shared helper function and call it from both places so the
aggregation lives in one spot; update references around
community.aggregated_fields.package_count and listings to use the chosen single
aggregation implementation.

@ethangreen-dev ethangreen-dev marked this pull request as ready for review April 2, 2026 09:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant