fix(BA-5963): report current_revision_id correctly during rolling updates#11494
Merged
HyeockJinKim merged 10 commits intomainfrom May 8, 2026
Merged
fix(BA-5963): report current_revision_id correctly during rolling updates#11494HyeockJinKim merged 10 commits intomainfrom
HyeockJinKim merged 10 commits intomainfrom
Conversation
jopemachine
commented
May 8, 2026
jopemachine
commented
May 8, 2026
jopemachine
commented
May 8, 2026
jopemachine
commented
May 8, 2026
jopemachine
commented
May 8, 2026
jopemachine
commented
May 8, 2026
jopemachine
commented
May 8, 2026
current_revision_id by explicit match, not list[0]
Contributor
There was a problem hiding this comment.
Pull request overview
Fixes incorrect current_revision_id reporting during rolling updates by resolving the current revision spec via explicit ID matching (instead of list ordering), making revision iteration deterministic, and carrying current_revision_id through the data layer so v2 GraphQL/REST adapters no longer recompute it from data.revision.
Changes:
- Resolve the “current” revision spec in
_convert_deployment_info_to_data()by matchinginfo.current_revision_id, and propagatecurrent_revision_idintoModelDeploymentData. - Update v2 deployment adapter to read
current_revision_iddirectly fromModelDeploymentData. - Add deterministic ordering for
EndpointRow.revisionsand add a regression unit test for BA-5963.
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| tests/unit/manager/services/deployment/test_deployment_service.py | Adds regression coverage ensuring current revision selection is based on current_revision_id (not list order). |
| src/ai/backend/manager/services/deployment/service.py | Fixes revision resolution logic and propagates current_revision_id into the data layer. |
| src/ai/backend/manager/models/endpoint/row.py | Adds relationship ordering for deterministic revision iteration. |
| src/ai/backend/manager/data/deployment/types.py | Extends ModelDeploymentData with current_revision_id. |
| src/ai/backend/manager/api/adapters/deployment/adapter.py | Uses data.current_revision_id directly for v2 response shaping. |
| changes/11494.fix.md | Adds changelog entry for the fix. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
`_convert_deployment_info_to_data` picked `info.model_revisions[0]` as the current revision. During a rolling update the list contains both current and deploying revisions, and PostgreSQL returned them in undefined order because `EndpointRow.revisions` had no `order_by`, so `current_revision_id` in REST/GraphQL responses could be reported as the deploying revision id — or as null when downstream adapters derived it from `data.revision.id`. Match the spec by `info.current_revision_id` directly, and add `order_by` on the `revisions` relationship for deterministic iteration in any other caller. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Stop deriving the API's `current_revision_id` from `data.revision.id`, which fell through to `null` whenever the matching `ModelRevisionSpec` could not be resolved (e.g. dangling `endpoints.current_revision` pointing at a removed `deployment_revisions` row — a real path observed in the field after replica/revision cleanup). Carry the column value through the data layer: `ModelDeploymentData` now has its own `current_revision_id`, populated from `info.current_revision_id` in the service layer, and the GraphQL/REST v2 adapter reads it directly. Identity is now decoupled from the spec, so the API mirrors the DB column even when the spec lookup fails. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The unmatched-revision branch is reachable only via a caller bug (forgot to ``selectinload(EndpointRow.revisions)``) or DB integrity loss (the ``endpoints.current_revision`` pointer is set but the matching ``deployment_revisions`` row is gone). Both states are abnormal, yet the previous ``log.warning`` is easily missed in dashboards/alert rules — the response only surfaces a partial deployment with ``revision: null``, indistinguishable from an intended null. Lift the severity to ``error`` so monitoring catches the regression promptly without changing the graceful-degradation behaviour itself. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
All current readers iterate ``EndpointRow.revisions`` looking for a specific revision id (current, deploying); no caller relies on the sort direction for correctness today. Descending order lets the more-frequently-requested recent revisions match earlier on average and — should anyone reintroduce a ``revisions[0]`` reading (the anti-pattern this PR exists to fix) — points at the latest spec instead of the oldest, narrowing the blast radius if it slips back in. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…atch Pin the contract enforced by ``_convert_deployment_info_to_data``: when ``DeploymentInfo.model_revisions`` carries both the current and the deploying revision (typical during a rolling update), the conversion must resolve the current revision by an explicit ``current_revision_id`` match — not by ``model_revisions[0]``. The test uses adversarial ordering (deploying revision at index 0) so the previous list-index implementation would surface ``deploying_revision_id`` under ``current_revision_id`` and trip multiple assertions at once. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- service.py: drop the inline narration above the revision lookup; the same context already lives in the fix commit message. - test_deployment_service.py: shorten the verbose docstring, hoist ``make_revision_spec`` into a pytest fixture instead of a closure, and remove the redundant "adversarial ordering" inline comment (the spec/id wiring expresses the same intent on its own). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- ``ModelDeploymentData.current_revision_id``: drop the inline rationale; the field name on its own already reads as the DB-column identity, and the WHY is captured in the fix commit message rather than next to the field. - ``changes/11494.fix.md``: rewrite as a single user-facing sentence about the actual headline change (correct ``current_revision_id`` reporting), not the implementation detail of "list[0]". Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
21cd618 to
29afe47
Compare
Two call sites in ``api/rest/service/handler.py`` (the GET ``ServeInfo`` projection and the create-legacy-deployment response shaping) were reading ``deployment_info.model_revisions[0]`` as the active revision spec. With ``EndpointRow.revisions`` now sorted ``revision_number desc`` for cheaper current-revision lookup, ``[0]`` is the deploying (newer) revision during a rolling update, so those handlers would surface mounts / model_definition_path / runtime_variant_id from the not-yet-active spec. Replace both with a small helper that resolves the active revision by explicit id (``current_revision_id`` first, ``deploying_revision_id`` as the bootstrap fallback before promotion) — the same contract the fixed ``_convert_deployment_info_to_data`` already follows. Caught by Copilot review on PR #11494. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
HyeockJinKim
approved these changes
May 8, 2026
Collaborator
HyeockJinKim
left a comment
There was a problem hiding this comment.
Please implement this as a fix for now, but have the actual work done in the repository.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
_convert_deployment_info_to_datapickedinfo.model_revisions[0]as the current revision spec. During a rolling update the list also contains the deploying revision, and PostgreSQL returned them in undefined order, so the API could expose the deploying revision id undercurrent_revision_id(ornullwhen downstream adapters derived the id fromdata.revision.id).info.current_revision_iddirectly inservices/deployment/service.py; addorder_by="DeploymentRevisionRow.revision_number"onEndpointRow.revisionsfor deterministic iteration in any other caller.current_revision_idthrough the data layer (ModelDeploymentData.current_revision_id) and read it from there in the GraphQL/REST v2 adapter, instead of recomputing it fromdata.revision.id. This stops the API from collapsing tonullwhen the matchingModelRevisionSpeccan't be resolved (e.g. danglingendpoints.current_revisionafter a revision row was removed) — the column value now flows straight through.Resolves BA-5963.