Skip to content

fix(BA-5963): report current_revision_id correctly during rolling updates#11494

Merged
HyeockJinKim merged 10 commits intomainfrom
fix/BA-5963-current-revision-id-mismatch
May 8, 2026
Merged

fix(BA-5963): report current_revision_id correctly during rolling updates#11494
HyeockJinKim merged 10 commits intomainfrom
fix/BA-5963-current-revision-id-mismatch

Conversation

@jopemachine
Copy link
Copy Markdown
Member

@jopemachine jopemachine commented May 6, 2026

Summary

  • _convert_deployment_info_to_data picked info.model_revisions[0] as the current revision spec. During a rolling update the list also contains the deploying revision, and PostgreSQL returned them in undefined order, so the API could expose the deploying revision id under current_revision_id (or null when downstream adapters derived the id from data.revision.id).
  • Match the spec by info.current_revision_id directly in services/deployment/service.py; add order_by="DeploymentRevisionRow.revision_number" on EndpointRow.revisions for deterministic iteration in any other caller.
  • Carry current_revision_id through the data layer (ModelDeploymentData.current_revision_id) and read it from there in the GraphQL/REST v2 adapter, instead of recomputing it from data.revision.id. This stops the API from collapsing to null when the matching ModelRevisionSpec can't be resolved (e.g. dangling endpoints.current_revision after a revision row was removed) — the column value now flows straight through.

Resolves BA-5963.

@github-actions github-actions Bot added size:S 10~30 LoC comp:manager Related to Manager component size:M 30~100 LoC and removed size:S 10~30 LoC labels May 6, 2026
Comment thread src/ai/backend/manager/models/endpoint/row.py Outdated
Comment thread src/ai/backend/manager/services/deployment/service.py Outdated
@github-actions github-actions Bot added size:L 100~500 LoC and removed size:M 30~100 LoC labels May 8, 2026
Comment thread tests/unit/manager/services/deployment/test_deployment_service.py Outdated
Comment thread tests/unit/manager/services/deployment/test_deployment_service.py
Comment thread tests/unit/manager/services/deployment/test_deployment_service.py Outdated
Comment thread src/ai/backend/manager/data/deployment/types.py Outdated
Comment thread changes/11494.fix.md
@jopemachine jopemachine changed the title fix(BA-5963): resolve current_revision_id by explicit match, not list[0] fix(BA-5963): report current_revision_id correctly during rolling updates May 8, 2026
@jopemachine jopemachine requested a review from a team May 8, 2026 05:18
@jopemachine jopemachine marked this pull request as ready for review May 8, 2026 05:18
Copilot AI review requested due to automatic review settings May 8, 2026 05:18
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes incorrect current_revision_id reporting during rolling updates by resolving the current revision spec via explicit ID matching (instead of list ordering), making revision iteration deterministic, and carrying current_revision_id through the data layer so v2 GraphQL/REST adapters no longer recompute it from data.revision.

Changes:

  • Resolve the “current” revision spec in _convert_deployment_info_to_data() by matching info.current_revision_id, and propagate current_revision_id into ModelDeploymentData.
  • Update v2 deployment adapter to read current_revision_id directly from ModelDeploymentData.
  • Add deterministic ordering for EndpointRow.revisions and add a regression unit test for BA-5963.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
tests/unit/manager/services/deployment/test_deployment_service.py Adds regression coverage ensuring current revision selection is based on current_revision_id (not list order).
src/ai/backend/manager/services/deployment/service.py Fixes revision resolution logic and propagates current_revision_id into the data layer.
src/ai/backend/manager/models/endpoint/row.py Adds relationship ordering for deterministic revision iteration.
src/ai/backend/manager/data/deployment/types.py Extends ModelDeploymentData with current_revision_id.
src/ai/backend/manager/api/adapters/deployment/adapter.py Uses data.current_revision_id directly for v2 response shaping.
changes/11494.fix.md Adds changelog entry for the fix.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/ai/backend/manager/models/endpoint/row.py
jopemachine and others added 8 commits May 8, 2026 14:24
`_convert_deployment_info_to_data` picked `info.model_revisions[0]` as the
current revision. During a rolling update the list contains both current
and deploying revisions, and PostgreSQL returned them in undefined order
because `EndpointRow.revisions` had no `order_by`, so `current_revision_id`
in REST/GraphQL responses could be reported as the deploying revision id —
or as null when downstream adapters derived it from `data.revision.id`.

Match the spec by `info.current_revision_id` directly, and add `order_by`
on the `revisions` relationship for deterministic iteration in any other
caller.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Stop deriving the API's `current_revision_id` from `data.revision.id`,
which fell through to `null` whenever the matching `ModelRevisionSpec`
could not be resolved (e.g. dangling `endpoints.current_revision`
pointing at a removed `deployment_revisions` row — a real path observed
in the field after replica/revision cleanup).

Carry the column value through the data layer: `ModelDeploymentData`
now has its own `current_revision_id`, populated from
`info.current_revision_id` in the service layer, and the GraphQL/REST v2
adapter reads it directly. Identity is now decoupled from the spec, so
the API mirrors the DB column even when the spec lookup fails.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The unmatched-revision branch is reachable only via a caller bug
(forgot to ``selectinload(EndpointRow.revisions)``) or DB integrity
loss (the ``endpoints.current_revision`` pointer is set but the
matching ``deployment_revisions`` row is gone). Both states are
abnormal, yet the previous ``log.warning`` is easily missed in
dashboards/alert rules — the response only surfaces a partial
deployment with ``revision: null``, indistinguishable from an intended
null. Lift the severity to ``error`` so monitoring catches the
regression promptly without changing the graceful-degradation
behaviour itself.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
All current readers iterate ``EndpointRow.revisions`` looking for a
specific revision id (current, deploying); no caller relies on the
sort direction for correctness today. Descending order lets the
more-frequently-requested recent revisions match earlier on average
and — should anyone reintroduce a ``revisions[0]`` reading (the
anti-pattern this PR exists to fix) — points at the latest spec
instead of the oldest, narrowing the blast radius if it slips back in.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…atch

Pin the contract enforced by ``_convert_deployment_info_to_data``: when
``DeploymentInfo.model_revisions`` carries both the current and the
deploying revision (typical during a rolling update), the conversion
must resolve the current revision by an explicit
``current_revision_id`` match — not by ``model_revisions[0]``.

The test uses adversarial ordering (deploying revision at index 0) so
the previous list-index implementation would surface
``deploying_revision_id`` under ``current_revision_id`` and trip
multiple assertions at once.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- service.py: drop the inline narration above the revision lookup; the
  same context already lives in the fix commit message.
- test_deployment_service.py: shorten the verbose docstring, hoist
  ``make_revision_spec`` into a pytest fixture instead of a closure,
  and remove the redundant "adversarial ordering" inline comment
  (the spec/id wiring expresses the same intent on its own).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- ``ModelDeploymentData.current_revision_id``: drop the inline
  rationale; the field name on its own already reads as the DB-column
  identity, and the WHY is captured in the fix commit message rather
  than next to the field.
- ``changes/11494.fix.md``: rewrite as a single user-facing sentence
  about the actual headline change (correct ``current_revision_id``
  reporting), not the implementation detail of "list[0]".

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@jopemachine jopemachine force-pushed the fix/BA-5963-current-revision-id-mismatch branch from 21cd618 to 29afe47 Compare May 8, 2026 05:25
Two call sites in ``api/rest/service/handler.py`` (the GET ``ServeInfo``
projection and the create-legacy-deployment response shaping) were
reading ``deployment_info.model_revisions[0]`` as the active revision
spec. With ``EndpointRow.revisions`` now sorted ``revision_number desc``
for cheaper current-revision lookup, ``[0]`` is the deploying (newer)
revision during a rolling update, so those handlers would surface
mounts / model_definition_path / runtime_variant_id from the
not-yet-active spec.

Replace both with a small helper that resolves the active revision by
explicit id (``current_revision_id`` first, ``deploying_revision_id``
as the bootstrap fallback before promotion) — the same contract the
fixed ``_convert_deployment_info_to_data`` already follows. Caught by
Copilot review on PR #11494.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@jopemachine jopemachine requested a review from HyeockJinKim May 8, 2026 05:58
Copy link
Copy Markdown
Collaborator

@HyeockJinKim HyeockJinKim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please implement this as a fix for now, but have the actual work done in the repository.

@HyeockJinKim HyeockJinKim merged commit 3666b3f into main May 8, 2026
36 checks passed
@HyeockJinKim HyeockJinKim deleted the fix/BA-5963-current-revision-id-mismatch branch May 8, 2026 06:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp:manager Related to Manager component size:L 100~500 LoC

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants