Skip to content

fix(settings): make DATA_UPLOAD_MAX_MEMORY_SIZE env-configurable#1224

Merged
mihow merged 6 commits intomainfrom
fix/data-upload-max-env
Apr 15, 2026
Merged

fix(settings): make DATA_UPLOAD_MAX_MEMORY_SIZE env-configurable#1224
mihow merged 6 commits intomainfrom
fix/data-upload-max-env

Conversation

@mihow
Copy link
Copy Markdown
Collaborator

@mihow mihow commented Apr 11, 2026

Summary

`DATA_UPLOAD_MAX_MEMORY_SIZE` in `config/settings/base.py` was hardcoded at 100 MB. In practice, ADC workers posting ML results for a full batch (detection coordinates + classifications) hit that ceiling: result payloads for the `global_moths_2024` pipeline have been observed at 139–321 MB on a staging deployment. The hardcoded value meant every environment that saw one of those bodies returned HTTP 413 until an operator patched the file on the server and restarted Django.

This PR exposes the value as an env var:

```python
DATA_UPLOAD_MAX_MEMORY_SIZE = (
env.int("DJANGO_DATA_UPLOAD_MAX_MEMORY_MB", default=100) * 1024 * 1024
)
```

Read from `DJANGO_DATA_UPLOAD_MAX_MEMORY_MB` (integer, MB). Default stays at 100 MB so existing deployments are unaffected unless they opt in.

Scope of enforcement

`DATA_UPLOAD_MAX_MEMORY_SIZE` covers multipart form data and direct `request.body` access, but does not apply to DRF JSON bodies: DRF parsers read from the raw WSGI stream, bypassing the `request.body` property where Django enforces the limit.

To cover JSON bodies (used by the ML result endpoint), a `MaxSizeJSONParser` (`ami/base/parsers.py`) is added as the default JSON parser in `REST_FRAMEWORK`. It checks the `Content-Length` header before parsing and returns HTTP 400 if the body exceeds the limit. This is effective for all well-behaved clients including ADC workers, which always send `Content-Length`. nginx's `client_max_body_size` remains the hard outer limit for all request types.

Why this and not a bigger hardcoded default

Permanently raising the ceiling in code is premature. The root problem — that a single ADC POST can carry hundreds of MB — is best fixed worker-side by incremental result posting (see #1223), not by bumping server-side limits until the next pipeline blows through them. Until #1223 lands, env-configurability is the minimum-regret escape valve: environments that need a larger ceiling can set it, and there is no ongoing maintenance burden of hot-patching the settings file on staging/production after every deploy.

Paired infra change

nginx's `client_max_body_size` on the fronting proxy must be raised in lockstep with `DJANGO_DATA_UPLOAD_MAX_MEMORY_MB` — Django will never see a request larger than nginx rejects. That value lives in the deployment's proxy config (outside this repo) and is separately configurable. Both the in-code comment and `.envs/.production/.django-example` call this out explicitly.

Test plan

  • `env.int("DJANGO_DATA_UPLOAD_MAX_MEMORY_MB", default=100)` parses correctly with and without the var set (verified with a small repl)
  • `black`, `isort`, `flake8`, `pyupgrade`, `django-upgrade` clean on the touched files (pre-commit hooks all green)
  • Deploy to staging and confirm 200 MB result POSTs land without a 413 once the env var is set
  • Verify `MaxSizeJSONParser` returns HTTP 400 for a request exceeding the configured limit

🤖 Generated with Claude Code

Previously hardcoded at 100 MB in base.py. In practice ADC workers
post ML result payloads for a full batch (detection coordinates +
classifications for tens of images) in a single POST, and those have
been observed in the 139–321 MB range on staging for the
global_moths_2024 pipeline — well above the 100 MB ceiling.

Raising the limit in code would be both premature (proper fix is
worker-side incremental posting, tracked in #1223) and environment-
specific (staging may need to tolerate today's payloads; production
may want a tighter ceiling to catch regressions). Making it an env
override lets each deployment tune without a code change and without
maintaining a hot-patch on the server.

Reads from ``DJANGO_DATA_UPLOAD_MAX_MEMORY_MB`` (integer, in MB).
Default stays at 100 MB so existing deployments see no change unless
they opt in.

Nginx's ``client_max_body_size`` still needs to be raised in lockstep
on the fronting proxy — that is independently configurable and lives
outside this repo.
Copilot AI review requested due to automatic review settings April 11, 2026 23:30
@netlify
Copy link
Copy Markdown

netlify Bot commented Apr 11, 2026

Deploy Preview for antenna-ssec canceled.

Name Link
🔨 Latest commit 7b32c62
🔍 Latest deploy log https://app.netlify.com/projects/antenna-ssec/deploys/69dee44b3a726d0008141129

@netlify
Copy link
Copy Markdown

netlify Bot commented Apr 11, 2026

Deploy Preview for antenna-preview canceled.

Name Link
🔨 Latest commit 7b32c62
🔍 Latest deploy log https://app.netlify.com/projects/antenna-preview/deploys/69dee44b81ea2d00089ad32a

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 11, 2026

Warning

Rate limit exceeded

@mihow has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 11 minutes and 54 seconds before requesting another review.

Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 11 minutes and 54 seconds.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 12598d41-ade8-4cda-8e42-2f6edd501b92

📥 Commits

Reviewing files that changed from the base of the PR and between 46c7541 and 7b32c62.

📒 Files selected for processing (3)
  • .envs/.local/.django
  • .envs/.production/.django-example
  • config/settings/base.py
📝 Walkthrough

Walkthrough

The DATA_UPLOAD_MAX_MEMORY_SIZE setting is modified to accept environment variable configuration via DJANGO_DATA_UPLOAD_MAX_MEMORY_MB, with a default of 100MB. This allows tuning the upload limit without altering code while maintaining backward compatibility.

Changes

Cohort / File(s) Summary
Configuration Tuning
config/settings/base.py
Modified DATA_UPLOAD_MAX_MEMORY_SIZE to read from environment variable DJANGO_DATA_UPLOAD_MAX_MEMORY_MB with default fallback of 100MB, enabling runtime configuration without code changes.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~8 minutes

Poem

🐰 A setting once carved in stone,
Now dances free, environment-grown,
No code to change, just vars to set,
Flexibility—the rabbit's best bet! 🎛️

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and specifically describes the main change: making DATA_UPLOAD_MAX_MEMORY_SIZE environment-configurable, which matches the core objective of this PR.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Description check ✅ Passed PR description is comprehensive and follows the required template with all major sections present.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/data-upload-max-env

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Exposes Django’s DATA_UPLOAD_MAX_MEMORY_SIZE setting as an environment-configurable value to prevent ADC worker result uploads (often >100 MB) from triggering HTTP 413s without requiring server-side code patches.

Changes:

  • Replace hardcoded DATA_UPLOAD_MAX_MEMORY_SIZE = 100 * 1024 * 1024 with an env-controlled value (DJANGO_DATA_UPLOAD_MAX_MEMORY_MB, default 100).
  • Expand inline documentation explaining why larger uploads are needed and pointing to the longer-term worker-side fix.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread config/settings/base.py Outdated
@mihow
Copy link
Copy Markdown
Collaborator Author

mihow commented Apr 15, 2026

Code review

Found 2 issues:

  1. DATA_UPLOAD_MAX_MEMORY_SIZE may not enforce limits on JSON POST bodies. Django enforces this setting inside the request.body property and MultiPartParser, but DRF's JSONParser reads directly from the raw WSGI input stream without touching request.body, bypassing the check entirely. If the ML result endpoint receives application/json, the Django-side limit may have no effect — nginx's client_max_body_size is the actual gate. This is worth verifying empirically (e.g. temporarily set the limit to 1 byte and confirm whether a JSON POST raises RequestDataTooBig). This is a documented DRF limitation (#4760).

# Allow large request bodies from ML workers posting classification results.
# ML detection+classification payloads for a single batch can easily exceed
# the Django default (2.5 MB) and even the previous hardcoded 100 MB ceiling.
# Configurable via env (MB) so staging and production can tune without a
# code change. See RolnickLab/antenna#1223 for the longer-term fix (worker-
# side incremental result posting).
DATA_UPLOAD_MAX_MEMORY_SIZE = (
env.int("DJANGO_DATA_UPLOAD_MAX_MEMORY_MB", default=100) * 1024 * 1024 # type: ignore[no-untyped-call]
)

  1. New env var not added to .envs/.production/.django-example. PR fix(celery): update worker concurrency defaults #1228 (CELERY_WORKER_CONCURRENCY) established the pattern of documenting newly configurable env vars in both .envs/.local/.django and .envs/.production/.django-example with a rationale comment. DJANGO_DATA_UPLOAD_MAX_MEMORY_MB is absent from both, which defeats the goal of letting operators tune this without reading source.

# side incremental result posting).
DATA_UPLOAD_MAX_MEMORY_SIZE = (
env.int("DJANGO_DATA_UPLOAD_MAX_MEMORY_MB", default=100) * 1024 * 1024 # type: ignore[no-untyped-call]
)

🤖 Generated with Claude Code

- If this code review was useful, please react with 👍. Otherwise, react with 👎.

mihow and others added 5 commits April 14, 2026 17:51
DATA_UPLOAD_MAX_MEMORY_SIZE does not apply to DRF JSON bodies —
DRF parsers read from the raw WSGI stream, bypassing the request.body
check where Django enforces this limit. Add MaxSizeJSONParser as the
default JSON parser in REST_FRAMEWORK to enforce the same ceiling for
JSON bodies via Content-Length before parsing begins (effective for
all well-behaved clients including ADC workers; nginx client_max_body_size
remains the hard outer limit for chunked transfers).

Also:
- Update DATA_UPLOAD_MAX_MEMORY_SIZE comment to document the scope of
  enforcement and the relationship to MaxSizeJSONParser and nginx
- Remove stray # type: ignore[no-untyped-call] (inconsistent with all
  other env.int() calls in this file)
- Add DJANGO_DATA_UPLOAD_MAX_MEMORY_MB to .envs/.production/.django-example
  with rationale and nginx coupling note
- Add commented example to .envs/.local/.django

Co-Authored-By: Claude <noreply@anthropic.com>
The purpose of this PR is to raise the Django limit, not add DRF-level
enforcement. nginx client_max_body_size is the intended hard cap.
Retain the comment noting that DATA_UPLOAD_MAX_MEMORY_SIZE does not
apply to DRF JSON bodies so the scope is clear to future readers.

Co-Authored-By: Claude <noreply@anthropic.com>
Address Copilot review: the env var name said "MB" but the multiplier
is 1024*1024 (binary MiB). Comments now spell that out so operators
don't misjudge limits when tuning alongside nginx client_max_body_size.

Co-Authored-By: Claude <noreply@anthropic.com>
@mihow mihow merged commit 4686340 into main Apr 15, 2026
7 checks passed
@mihow mihow deleted the fix/data-upload-max-env branch April 15, 2026 01:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants