Skip to content

fix(celery): update worker concurrency defaults#1228

Merged
mihow merged 2 commits intomainfrom
fix/celery-worker-concurrency
Apr 15, 2026
Merged

fix(celery): update worker concurrency defaults#1228
mihow merged 2 commits intomainfrom
fix/celery-worker-concurrency

Conversation

@mihow
Copy link
Copy Markdown
Collaborator

@mihow mihow commented Apr 14, 2026

Summary

  • Add explicit CELERY_WORKER_CONCURRENCY = env.int("CELERY_WORKER_CONCURRENCY", default=8) to config/settings/base.py, next to the existing CELERY_WORKER_PREFETCH_MULTIPLIER / CELERY_WORKER_ENABLE_PREFETCH_COUNT_REDUCTION block.
  • Overridable per deployment via the CELERY_WORKER_CONCURRENCY env var.
  • Production should override to 16 — committed as an example in .envs/.production/.django-example with guidance.
  • Local dev can lower to 2–4 on memory-constrained laptops — guidance added to .envs/.local/.django.

Why

The default celery worker concurrency when the setting is unset is os.cpu_count(). On the current production celery worker host (8 cores) this means an 8-process prefork pool. The dominant tasks on the antenna queue — process_nats_pipeline_result and create_detection_images — are DB/Redis-bound rather than CPU-bound: each task spends most of its time waiting on postgres/pgbouncer and Redis round-trips, not crunching numbers.

Direct observation during a high-throughput async_api job:

Raising the prefork pool size directly addresses the bottleneck. A hotfix override of 16 was applied in production via the env var ahead of this PR and confirmed to drain the backlog on the active jobs.

Why default=8, prod override=16

The previous revision of this PR used default=16 everywhere. Review flagged the blast radius across local dev laptops and the smaller staging/demo VMs — 16 prefork workers = 16 separate Python processes with imports + persistent DB connections, which is a real memory bump where it isn't needed.

The current default of 8 is Celery's typical out-of-the-box behaviour on an 8-core host (matches os.cpu_count() there) and a safer memory footprint everywhere else. Production overrides to 16 via env var — committed as an example in .envs/.production/.django-example with the rationale inline. This keeps the bottleneck fix in place where it matters without making every other environment pay for it.

16 is the smallest power-of-2 step above 8 that roughly matches the empirical gap between ingress and drain on the production incident that motivated this PR, without risking pgbouncer saturation. A larger default can be considered once we have measured postgres connection-pool headroom (see "what we still need to verify" below).

What this does not change

  • Prefetch multiplier stays at 1 — that was already set and fairness behaviour is unchanged.
  • Routing / queue topology is unchanged. Splitting the antenna queue into a dedicated "ingest fast path" vs "housekeeping / status-check" queue is a larger follow-up, filed separately.
  • Pool class stays prefork. Switching to gevent for this queue may give much higher effective concurrency on an IO-bound workload, but every task on this queue would need to be audited for gevent-safety (blocking C extensions, thread-locals in PyTorch paths, etc.) first. Out of scope here.

Deploy note

The production host is already running with CELERY_WORKER_CONCURRENCY=16 in its env file from the earlier hotfix — lowering the settings default to 8 does not regress prod. Confirm the env var is still set at next deploy; if for any reason it isn't, prod will drop to 8 until the env is fixed.

What we still need to verify

  • Postgres / pgbouncer connection pool usage after deploy — 16 prefork workers × persistent connections should be well within pgbouncer's default_pool_size, but worth confirming under load.
  • Whether the 16-prod override is still the right ceiling, or whether larger backlogs warrant raising it (gated on pgbouncer headroom).
  • Whether this change exposes any new memory-pressure pattern at peak load (current --max-tasks-per-child=100 / --max-memory-per-child=2 GiB already bound each process).

Related

Summary by CodeRabbit

  • New Features

    • Added a configurable worker concurrency setting to control parallel background task processing (default: 8, adjustable via environment variable).
  • Documentation

    • Included example environment entries for local (commented example: 4) and production (example: 16) configurations.

The default celery worker concurrency (os.cpu_count()) underutilises the
worker pool for process_nats_pipeline_result and create_detection_images,
which are DB/Redis-bound rather than CPU-bound. On a prefork pool sized
to CPU count, the pool is idle most of the time while the antenna queue
backlogs during high-throughput NATS async_api jobs.

Override via CELERY_WORKER_CONCURRENCY env var per deployment; 16 is the
new default.
Copilot AI review requested due to automatic review settings April 14, 2026 17:36
@netlify
Copy link
Copy Markdown

netlify Bot commented Apr 14, 2026

Deploy Preview for antenna-ssec canceled.

Name Link
🔨 Latest commit 07a1c2b
🔍 Latest deploy log https://app.netlify.com/projects/antenna-ssec/deploys/69ded9afa62465000866203d

@netlify
Copy link
Copy Markdown

netlify Bot commented Apr 14, 2026

Deploy Preview for antenna-preview canceled.

Name Link
🔨 Latest commit 07a1c2b
🔍 Latest deploy log https://app.netlify.com/projects/antenna-preview/deploys/69ded9af7866ed00072dfad6

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 14, 2026

📝 Walkthrough

Walkthrough

Added a new Celery worker concurrency setting exposed via an environment variable; default is read as 8, local env provides a commented example, and the production example sets it to 16.

Changes

Cohort / File(s) Summary
Celery config
config/settings/base.py
Added CELERY_WORKER_CONCURRENCY = env.int("CELERY_WORKER_CONCURRENCY", default=8) — new module-level setting controlling Celery prefork worker pool size.
Local env example
.envs/.local/.django
Added a commented-out CELERY_WORKER_CONCURRENCY=4 line with explanatory comments (no runtime change unless uncommented).
Production env example
.envs/.production/.django-example
Added CELERY_WORKER_CONCURRENCY=16 to the production example file to suggest a higher default for production deployments.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Suggested reviewers

  • carlosgjs

Poem

"🐰 I nibble configs in the night,
A concurrency carrot gleams so bright,
Defaults at eight, examples show four and sixteen,
Hopping queues kept calm and keen,
Cheers to tuned tasks and dreams unseen!"

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description check ✅ Passed The PR description comprehensively covers the summary, list of changes, motivation, deployment notes, and related issues as specified in the template, with clear rationale for design decisions.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Title check ✅ Passed The title 'fix(celery): update worker concurrency defaults' accurately summarizes the main change—adding and configuring CELERY_WORKER_CONCURRENCY settings across environments.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/celery-worker-concurrency

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adjusts the default Celery worker prefork pool size by introducing an explicit CELERY_WORKER_CONCURRENCY setting in the Django base settings, while keeping it overridable per deployment via an environment variable.

Changes:

  • Add CELERY_WORKER_CONCURRENCY = env.int("CELERY_WORKER_CONCURRENCY", default=16) to config/settings/base.py.
  • Document rationale and override behavior inline next to existing worker prefetch settings.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
config/settings/base.py (1)

401-401: Consider documenting CELERY_WORKER_CONCURRENCY in env templates/runbooks.

Optional, but adding it to .env.example/deployment docs will make per-environment tuning easier (especially smaller staging/demo stacks).

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@config/settings/base.py` at line 401, Add documentation for the
CELERY_WORKER_CONCURRENCY environment variable (used where
CELERY_WORKER_CONCURRENCY = env.int("CELERY_WORKER_CONCURRENCY", default=16)) to
the project's environment templates and deployment/runbook, e.g., update
.env.example and relevant runbooks to include the variable name, its purpose
(controls Celery worker concurrency), allowed values, and the default of 16,
plus a note recommending smaller values for staging/demo and guidance for tuning
per-environment.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@config/settings/base.py`:
- Line 401: Add documentation for the CELERY_WORKER_CONCURRENCY environment
variable (used where CELERY_WORKER_CONCURRENCY =
env.int("CELERY_WORKER_CONCURRENCY", default=16)) to the project's environment
templates and deployment/runbook, e.g., update .env.example and relevant
runbooks to include the variable name, its purpose (controls Celery worker
concurrency), allowed values, and the default of 16, plus a note recommending
smaller values for staging/demo and guidance for tuning per-environment.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: e68d66df-4cb2-4c7f-af0f-480c5272b1a4

📥 Commits

Reviewing files that changed from the base of the PR and between 1c6be7a and 44dd942.

📒 Files selected for processing (1)
  • config/settings/base.py

Drop the default from 16 to 8 so local dev, staging, and demo stacks don't
spawn 16 prefork workers by default. Production keeps 16 via the existing
CELERY_WORKER_CONCURRENCY env var (hotfix already in place).

Also add guidance + example values to .envs/.local/.django and
.envs/.production/.django-example so per-environment tuning is discoverable.
@mihow
Copy link
Copy Markdown
Collaborator Author

mihow commented Apr 15, 2026

Claude says: Addressed the nitpick about env-template documentation in 07a1c2b2. Added CELERY_WORKER_CONCURRENCY with guidance to both .envs/.local/.django (suggesting 2-4 on memory-constrained dev laptops) and .envs/.production/.django-example (recommending 16 for prod, with rationale inline). Also took the opportunity to lower the settings default from 16 → 8 after review feedback about blast radius on smaller deployments; prod keeps its 16 via the existing env var override.

@mihow mihow changed the title fix(celery): bump worker concurrency default to 16 fix(celery): update worker concurrency defaults Apr 15, 2026
@mihow mihow merged commit 32af89e into main Apr 15, 2026
7 checks passed
@mihow mihow deleted the fix/celery-worker-concurrency branch April 15, 2026 00:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants