fix(counts): refresh deployment cached counts in pipeline.save_results#1243
fix(counts): refresh deployment cached counts in pipeline.save_results#1243
Conversation
save_results() refreshed update_calculated_fields_for_events for the batch but never the parent Deployment, leaving deployment.occurrences_count and deployment.taxa_count stale until something else (a manual deployment.save) ran. Reproduced as the "Station counts for occurrences and taxa are not always getting updated" report. Adds a per-batch deployment refresh next to the existing event refresh, so it covers every result-write path (sync MLJob.process_images, async NATS process_nats_pipeline_result, and the Celery batch wrapper) and fires incrementally during long jobs instead of only at the end. Includes a regression test that exercises save_results end-to-end with a real deployment + event + image and asserts the cached counts move off zero. Verified to fail on origin/main and pass with the patch. Co-Authored-By: Claude <noreply@anthropic.com>
✅ Deploy Preview for antenna-preview canceled.
|
✅ Deploy Preview for antenna-ssec canceled.
|
📝 WalkthroughWalkthroughThe Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Pull request overview
Fixes stale cached counts on Deployment (Stations in the UI) by ensuring save_results() refreshes the parent deployments’ calculated fields after writing ML results, alongside the existing per-event refresh. This addresses the reported issue where deployment.occurrences_count / deployment.taxa_count don’t update after ML processing until a later manual save.
Changes:
- Refresh
Deploymentcalculated fields for deployments touched by asave_results()batch. - Add a regression test that reproduces the stale deployment counts and verifies they update after
save_results().
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
ami/ml/models/pipeline.py |
Refreshes calculated fields on affected deployments after saving detections/classifications/occurrences. |
ami/ml/tests.py |
Adds an end-to-end regression test asserting deployment cached counts increase after save_results(). |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (2)
ami/ml/models/pipeline.py (2)
998-1001: Refresh is correct; note the downstream cost.
Deployment.update_calculated_fields(save=True)callsself.save(update_calculated_fields=False), which short-circuits recursion and also skipsupdate_children()/regroup_events— so this is a minimal, targeted refresh per batch, as intended. Good.Be aware that
update_calculated_fieldsitself re-runsself.captures.count(),data_source_total_sizeaggregate,get_detections_count(), and twooccurrencesqueries withapply_default_filters+ avalues("determination_id").distinct().count(). For large deployments processed in many small batches (NATS/Celery wrapper paths), this will run on every batch. If that becomes a hotspot, a lighter "counts-only" path (or debouncing per deployment within a job) is worth considering — not a blocker here.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@ami/ml/models/pipeline.py` around lines 998 - 1001, The current loop collects deployment_ids from source_images and calls deployment.update_calculated_fields(save=True) for each Deployment in Deployment.objects.filter(pk__in=deployment_ids), which triggers several costly aggregates and queries on every small batch; to mitigate downstream cost, either add and call a lighter "counts-only" or debounced path on Deployment (e.g., a new method like update_counts_only or update_calculated_fields(counts_only=True)) or implement per-job debouncing/coalescing so each deployment_id is refreshed at most once per job; update references: source_images, deployment_ids, Deployment.objects.filter(...), and deployment.update_calculated_fields(...) when making the change.
998-1000: Consider isolating per-deployment refresh failures.If any single deployment's
update_calculated_fields(save=True)raises (DB hiccup, stale FK, signal side-effect, etc.), the entiresave_resultsCelery task fails after detections/classifications/occurrences have already been persisted — counts are now stale and the job is marked failed. Since this refresh is a best-effort cache update (the underlying data is already correct), a per-deployment try/except that logs and continues would be more resilient, and keeps parity with howupdate_calculated_fields_for_eventsabove is fire-and-forget.Proposed change
deployment_ids = {img.deployment_id for img in source_images if img.deployment_id} for deployment in Deployment.objects.filter(pk__in=deployment_ids): - deployment.update_calculated_fields(save=True) + try: + deployment.update_calculated_fields(save=True) + except Exception: + job_logger.exception( + f"Failed to refresh calculated fields for deployment {deployment.pk}; continuing" + )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@ami/ml/models/pipeline.py` around lines 998 - 1000, The loop that calls deployment.update_calculated_fields(save=True) inside the save_results Celery task should be made resilient to per-deployment failures: instead of letting any exception abort the whole task, wrap each deployment.update_calculated_fields(save=True) call in a try/except that logs the exception (use the module/task logger via logger.exception or similar) and continues to the next deployment; keep the existing logic that builds deployment_ids ({img.deployment_id for img in source_images}) and the Deployment.objects.filter(...) iterator, but ensure failures are handled per-deployment so a single flaky update_calculated_fields call doesn’t fail the entire save_results task (match the fire-and-forget resiliency used by update_calculated_fields_for_events).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@ami/ml/tests.py`:
- Around line 1424-1431: The test uses a naive datetime for event_time which
triggers Django warnings when USE_TZ=True; make event_time timezone-aware (e.g.,
use datetime.datetime(..., tzinfo=datetime.timezone.utc) or wrap with
django.utils.timezone.make_aware) before calling Event.objects.create so
Event.start and Event.end (and any SourceImage.timestamp usages) are saved with
an explicit timezone.
---
Nitpick comments:
In `@ami/ml/models/pipeline.py`:
- Around line 998-1001: The current loop collects deployment_ids from
source_images and calls deployment.update_calculated_fields(save=True) for each
Deployment in Deployment.objects.filter(pk__in=deployment_ids), which triggers
several costly aggregates and queries on every small batch; to mitigate
downstream cost, either add and call a lighter "counts-only" or debounced path
on Deployment (e.g., a new method like update_counts_only or
update_calculated_fields(counts_only=True)) or implement per-job
debouncing/coalescing so each deployment_id is refreshed at most once per job;
update references: source_images, deployment_ids,
Deployment.objects.filter(...), and deployment.update_calculated_fields(...)
when making the change.
- Around line 998-1000: The loop that calls
deployment.update_calculated_fields(save=True) inside the save_results Celery
task should be made resilient to per-deployment failures: instead of letting any
exception abort the whole task, wrap each
deployment.update_calculated_fields(save=True) call in a try/except that logs
the exception (use the module/task logger via logger.exception or similar) and
continues to the next deployment; keep the existing logic that builds
deployment_ids ({img.deployment_id for img in source_images}) and the
Deployment.objects.filter(...) iterator, but ensure failures are handled
per-deployment so a single flaky update_calculated_fields call doesn’t fail the
entire save_results task (match the fire-and-forget resiliency used by
update_calculated_fields_for_events).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: d6b07ac2-2353-4514-b2ad-06902b86ed9e
📒 Files selected for processing (2)
ami/ml/models/pipeline.pyami/ml/tests.py
|
@annavik confirmed that occurrence & taxa counts are finally going up during and after processing! I put this off because I thought the answer was much more difficult!
|


Summary
save_results()inami/ml/models/pipeline.pyrefreshesupdate_calculated_fields_for_eventsfor the batch's events but neverthe parent
Deployment. As a result,deployment.occurrences_countand
deployment.taxa_countstay stale until something else (a manualdeployment.save()) runs. This is the user-reported bug "Stationcounts for occurrences and taxa are not always getting updated" —
Stations in the UI are Deployments, and their cached counts simply
don't move after an ML job processes new images.
The patch adds a per-batch deployment refresh next to the existing
event refresh:
save_results()batch, so it covers every result-writepath: sync
MLJob.process_images, async NATSprocess_nats_pipeline_result, and the Celery batch wrapper inami/ml/tasks.py.job-end, since each batch refreshes the deployments it touched.
no signals, no new tasks.
Test plan
TestSaveResultsRefreshesDeploymentCountsin
ami/ml/tests.pyexercisessave_resultsend-to-end with a realProject + Deployment + Event + SourceImage and asserts both
occurrences_countandtaxa_countmove off zero.origin/main(AssertionError: 0 not greater than 0) — proves the bug.collection and confirm Station/Deployment counts update without
needing a manual save.
Notes
post_savesignal onJobSUCCESS thatenqueues a Celery task to walk job → images → deployments. That
approach was ~300 lines and would only refresh at job-end. The
in-
save_resultsfix is ~3 lines, fires per-batch, and reusesexisting infrastructure.
(
fix/default-filters-followup-default-taxa-and-cache) whichtouches
ami/main/signals.py/ami/main/tasks.pyfor projectdefault-filter changes — different files, different scope.
Summary by CodeRabbit
Release Notes