fix(settings): make DATA_UPLOAD_MAX_MEMORY_SIZE env-configurable by mihow · Pull Request #1224 · RolnickLab/antenna

mihow · 2026-04-11T23:30:32Z

Summary

`DATA_UPLOAD_MAX_MEMORY_SIZE` in `config/settings/base.py` was hardcoded at 100 MB. In practice, ADC workers posting ML results for a full batch (detection coordinates + classifications) hit that ceiling: result payloads for the `global_moths_2024` pipeline have been observed at 139–321 MB on a staging deployment. The hardcoded value meant every environment that saw one of those bodies returned HTTP 413 until an operator patched the file on the server and restarted Django.

This PR exposes the value as an env var:

```python
DATA_UPLOAD_MAX_MEMORY_SIZE = (
env.int("DJANGO_DATA_UPLOAD_MAX_MEMORY_MB", default=100) * 1024 * 1024
)
```

Read from `DJANGO_DATA_UPLOAD_MAX_MEMORY_MB` (integer, MB). Default stays at 100 MB so existing deployments are unaffected unless they opt in.

Scope of enforcement

`DATA_UPLOAD_MAX_MEMORY_SIZE` covers multipart form data and direct `request.body` access, but does not apply to DRF JSON bodies: DRF parsers read from the raw WSGI stream, bypassing the `request.body` property where Django enforces the limit.

To cover JSON bodies (used by the ML result endpoint), a `MaxSizeJSONParser` (`ami/base/parsers.py`) is added as the default JSON parser in `REST_FRAMEWORK`. It checks the `Content-Length` header before parsing and returns HTTP 400 if the body exceeds the limit. This is effective for all well-behaved clients including ADC workers, which always send `Content-Length`. nginx's `client_max_body_size` remains the hard outer limit for all request types.

Why this and not a bigger hardcoded default

Permanently raising the ceiling in code is premature. The root problem — that a single ADC POST can carry hundreds of MB — is best fixed worker-side by incremental result posting (see #1223), not by bumping server-side limits until the next pipeline blows through them. Until #1223 lands, env-configurability is the minimum-regret escape valve: environments that need a larger ceiling can set it, and there is no ongoing maintenance burden of hot-patching the settings file on staging/production after every deploy.

Paired infra change

nginx's `client_max_body_size` on the fronting proxy must be raised in lockstep with `DJANGO_DATA_UPLOAD_MAX_MEMORY_MB` — Django will never see a request larger than nginx rejects. That value lives in the deployment's proxy config (outside this repo) and is separately configurable. Both the in-code comment and `.envs/.production/.django-example` call this out explicitly.

Test plan

`env.int("DJANGO_DATA_UPLOAD_MAX_MEMORY_MB", default=100)` parses correctly with and without the var set (verified with a small repl)
`black`, `isort`, `flake8`, `pyupgrade`, `django-upgrade` clean on the touched files (pre-commit hooks all green)
Deploy to staging and confirm 200 MB result POSTs land without a 413 once the env var is set
Verify `MaxSizeJSONParser` returns HTTP 400 for a request exceeding the configured limit

🤖 Generated with Claude Code

Previously hardcoded at 100 MB in base.py. In practice ADC workers post ML result payloads for a full batch (detection coordinates + classifications for tens of images) in a single POST, and those have been observed in the 139–321 MB range on staging for the global_moths_2024 pipeline — well above the 100 MB ceiling. Raising the limit in code would be both premature (proper fix is worker-side incremental posting, tracked in #1223) and environment- specific (staging may need to tolerate today's payloads; production may want a tighter ceiling to catch regressions). Making it an env override lets each deployment tune without a code change and without maintaining a hot-patch on the server. Reads from ``DJANGO_DATA_UPLOAD_MAX_MEMORY_MB`` (integer, in MB). Default stays at 100 MB so existing deployments see no change unless they opt in. Nginx's ``client_max_body_size`` still needs to be raised in lockstep on the fronting proxy — that is independently configurable and lives outside this repo.

netlify · 2026-04-11T23:30:37Z

✅ Deploy Preview for antenna-ssec canceled.

Name	Link
🔨 Latest commit	`7b32c62`
🔍 Latest deploy log	https://app.netlify.com/projects/antenna-ssec/deploys/69dee44b3a726d0008141129

netlify · 2026-04-11T23:30:38Z

✅ Deploy Preview for antenna-preview canceled.

Name	Link
🔨 Latest commit	`7b32c62`
🔍 Latest deploy log	https://app.netlify.com/projects/antenna-preview/deploys/69dee44b81ea2d00089ad32a

coderabbitai · 2026-04-11T23:30:47Z

Warning

Rate limit exceeded

@mihow has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 11 minutes and 54 seconds before requesting another review.

Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 11 minutes and 54 seconds.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 12598d41-ade8-4cda-8e42-2f6edd501b92

📥 Commits

Reviewing files that changed from the base of the PR and between 46c7541 and 7b32c62.

📒 Files selected for processing (3)

.envs/.local/.django
.envs/.production/.django-example
config/settings/base.py

📝 Walkthrough

Walkthrough

The DATA_UPLOAD_MAX_MEMORY_SIZE setting is modified to accept environment variable configuration via DJANGO_DATA_UPLOAD_MAX_MEMORY_MB, with a default of 100MB. This allows tuning the upload limit without altering code while maintaining backward compatibility.

Changes

Cohort / File(s)	Summary
Configuration Tuning `config/settings/base.py`	Modified `DATA_UPLOAD_MAX_MEMORY_SIZE` to read from environment variable `DJANGO_DATA_UPLOAD_MAX_MEMORY_MB` with default fallback of 100MB, enabling runtime configuration without code changes.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~8 minutes

Poem

🐰 A setting once carved in stone,
Now dances free, environment-grown,
No code to change, just vars to set,
Flexibility—the rabbit's best bet! 🎛️

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly and specifically describes the main change: making DATA_UPLOAD_MAX_MEMORY_SIZE environment-configurable, which matches the core objective of this PR.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Description check	✅ Passed	PR description is comprehensive and follows the required template with all major sections present.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch fix/data-upload-max-env

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Copilot

Pull request overview

Exposes Django’s DATA_UPLOAD_MAX_MEMORY_SIZE setting as an environment-configurable value to prevent ADC worker result uploads (often >100 MB) from triggering HTTP 413s without requiring server-side code patches.

Changes:

Replace hardcoded DATA_UPLOAD_MAX_MEMORY_SIZE = 100 * 1024 * 1024 with an env-controlled value (DJANGO_DATA_UPLOAD_MAX_MEMORY_MB, default 100).
Expand inline documentation explaining why larger uploads are needed and pointing to the longer-term worker-side fix.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

mihow · 2026-04-15T00:43:52Z

Code review

Found 2 issues:

DATA_UPLOAD_MAX_MEMORY_SIZE may not enforce limits on JSON POST bodies. Django enforces this setting inside the request.body property and MultiPartParser, but DRF's JSONParser reads directly from the raw WSGI input stream without touching request.body, bypassing the check entirely. If the ML result endpoint receives application/json, the Django-side limit may have no effect — nginx's client_max_body_size is the actual gate. This is worth verifying empirically (e.g. temporarily set the limit to 1 byte and confirm whether a JSON POST raises RequestDataTooBig). This is a documented DRF limitation (#4760).

antenna/config/settings/base.py

Lines 428 to 437 in 46c7541

    
           # Allow large request bodies from ML workers posting classification results. 
        
           # ML detection+classification payloads for a single batch can easily exceed 
        
           # the Django default (2.5 MB) and even the previous hardcoded 100 MB ceiling. 
        
           # Configurable via env (MB) so staging and production can tune without a 
        
           # code change. See RolnickLab/antenna#1223 for the longer-term fix (worker- 
        
           # side incremental result posting). 
        
           DATA_UPLOAD_MAX_MEMORY_SIZE = ( 
        
               env.int("DJANGO_DATA_UPLOAD_MAX_MEMORY_MB", default=100) * 1024 * 1024  # type: ignore[no-untyped-call] 
        
           )

New env var not added to .envs/.production/.django-example. PR fix(celery): update worker concurrency defaults #1228 (CELERY_WORKER_CONCURRENCY) established the pattern of documenting newly configurable env vars in both .envs/.local/.django and .envs/.production/.django-example with a rationale comment. DJANGO_DATA_UPLOAD_MAX_MEMORY_MB is absent from both, which defeats the goal of letting operators tune this without reading source.

antenna/config/settings/base.py

Lines 434 to 437 in 46c7541

    
           # side incremental result posting). 
        
           DATA_UPLOAD_MAX_MEMORY_SIZE = ( 
        
               env.int("DJANGO_DATA_UPLOAD_MAX_MEMORY_MB", default=100) * 1024 * 1024  # type: ignore[no-untyped-call] 
        
           )

🤖 Generated with Claude Code

_{- If this code review was useful, please react with 👍. Otherwise, react with 👎.}

DATA_UPLOAD_MAX_MEMORY_SIZE does not apply to DRF JSON bodies — DRF parsers read from the raw WSGI stream, bypassing the request.body check where Django enforces this limit. Add MaxSizeJSONParser as the default JSON parser in REST_FRAMEWORK to enforce the same ceiling for JSON bodies via Content-Length before parsing begins (effective for all well-behaved clients including ADC workers; nginx client_max_body_size remains the hard outer limit for chunked transfers). Also: - Update DATA_UPLOAD_MAX_MEMORY_SIZE comment to document the scope of enforcement and the relationship to MaxSizeJSONParser and nginx - Remove stray # type: ignore[no-untyped-call] (inconsistent with all other env.int() calls in this file) - Add DJANGO_DATA_UPLOAD_MAX_MEMORY_MB to .envs/.production/.django-example with rationale and nginx coupling note - Add commented example to .envs/.local/.django Co-Authored-By: Claude <noreply@anthropic.com>

The purpose of this PR is to raise the Django limit, not add DRF-level enforcement. nginx client_max_body_size is the intended hard cap. Retain the comment noting that DATA_UPLOAD_MAX_MEMORY_SIZE does not apply to DRF JSON bodies so the scope is clear to future readers. Co-Authored-By: Claude <noreply@anthropic.com>

Address Copilot review: the env var name said "MB" but the multiplier is 1024*1024 (binary MiB). Comments now spell that out so operators don't misjudge limits when tuning alongside nginx client_max_body_size. Co-Authored-By: Claude <noreply@anthropic.com>

Copilot AI review requested due to automatic review settings April 11, 2026 23:30

Copilot started reviewing on behalf of mihow April 11, 2026 23:31 View session

Copilot AI reviewed Apr 11, 2026

View reviewed changes

Comment thread config/settings/base.py Outdated

mihow and others added 5 commits April 14, 2026 17:51

Merge branch 'main' into fix/data-upload-max-env

71059ba

Merge branch 'main' into fix/data-upload-max-env

7b32c62

mihow merged commit 4686340 into main Apr 15, 2026
7 checks passed

mihow deleted the fix/data-upload-max-env branch April 15, 2026 01:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(settings): make DATA_UPLOAD_MAX_MEMORY_SIZE env-configurable#1224

fix(settings): make DATA_UPLOAD_MAX_MEMORY_SIZE env-configurable#1224
mihow merged 6 commits intomainfrom
fix/data-upload-max-env

mihow commented Apr 11, 2026 •

edited

Loading

Uh oh!

netlify Bot commented Apr 11, 2026 •

edited

Loading

Uh oh!

netlify Bot commented Apr 11, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot commented Apr 11, 2026 •

edited

Loading

Rate limit exceeded

Walkthrough

Changes

Estimated code review effort

Poem

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

mihow commented Apr 15, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

mihow commented Apr 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Scope of enforcement

Why this and not a bigger hardcoded default

Paired infra change

Test plan

Uh oh!

netlify Bot commented Apr 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Deploy Preview for antenna-ssec canceled.

Uh oh!

netlify Bot commented Apr 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Deploy Preview for antenna-preview canceled.

Uh oh!

coderabbitai Bot commented Apr 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rate limit exceeded

Walkthrough

Changes

Estimated code review effort

Poem

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

mihow commented Apr 15, 2026

Code review

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

mihow commented Apr 11, 2026 •

edited

Loading

netlify Bot commented Apr 11, 2026 •

edited

Loading

netlify Bot commented Apr 11, 2026 •

edited

Loading

coderabbitai Bot commented Apr 11, 2026 •

edited

Loading