feat: switch core StatefulSet to in-place rolling updates by keynslug · Pull Request #1179 · emqx/emqx-operator

keynslug · 2026-04-02T15:07:08Z

Summary

This PR reworks how Operator manages set of EMQX core nodes.

There's now single managed StatefulSet for core nodes (aka "core set").
Rolling updates happen in-place, without migrations across 2 or more separate StatefulSets.
Core template includes default PVC, strongly recommended against running with ephemeral volumes.
EMQX CRD is now on v3alpha1 version.
EMQX CR Status became slimmer: less duplication, less status conditions.
EMQX CR Spec is slightly more conventional (e.g. minReadySeconds, consistent naming).
No extra readiness gates, on-serving gate has been retired.
Replicants are still updated in the blue-green manner (will be addressed in followup PRs).

See individual commits for details.

Core set rolling update

Core set employs underlying StatefulSet's OnDelete strategy.
As replicant require same-version core to function,
- At least 1 core needs to be updated before spinning up new replicant set.
- At least 1 old-version core node needs to be preserved until old replicants are migrated and scaled down.
- There's now a strict requirement to have >1 cores in core-replicant clusters.

Important notes

Should be considered WIP for the most part.
There's no backward compatibility measures.
Rebalance CRD is temporarily disabled.

Replace blue-green deployment (hash-suffixed StatefulSet per template change) with a single deterministically-named StatefulSet using OnDelete update strategy. The operator detects outdated pods by comparing controller-revision-hash against StatefulSet.status.updateRevision, and deletes them one at a time (highest ordinal first) after session evacuation. Replicant multi-ReplicaSet pattern is intentionally retained.

Remove fields that: * No longer relevant for single-core-set style operation. * Duplicate existing status fields from other resources.

This field was duplicating spec fields verbatim. This commit drops this field, making status slimmer and enforcing single source of truth.

This commit makes API field naming naming simpler and clearer.

This Kubernetes version is unsupported for quite some time.

This commit simplifies operational model by removing extra readiness gates. 1. No more `on-serving` readiness gate. Instead, container readiness probe points to "availability check" endpoint of Evacuation API directly. User still can override it but it's strongly recommended against. 2. As a result, `oldestCoreRequester()` now effectively does not consider nodes that are in the process of evacuation. 3. In turn, direct `forPod(...)` API is allowed to point to non-ready-but-running EMQX nodes.

This commit ensures that core set rolling update can progress even if an outdated core node is a DS replication site. Since such rolling updates preserve persistent data, this should be safe.

This commit re-evaluates and significantly simplifies the set of EMQX CR conditions, and makes condition descriptions more informative. 1. Retires separate `Initialized`, `CoreNodesReady`, `ReplicantNodesReady`. 2. Conditions are evaluated independently, no more state machine transitions. 3. Reconcilers do not consult conditions anymore, prefer to use internal APIs instead. Also stop ignoring errors in load state reconciler.

This commit fixes the issue with in-place rolling update when core nodes constantly tried to leave and the rejoin the cluster. This was needed for blue-green updates but does not make sense for the new approach.

This makes EMQX CR status a bit more consistent and helps with observability.

This commit corrects the "cores are available" criterion for core pod removal safety: now all cores except for candidate are accounted, and NumReplicas-1 is considered sufficient.

This commit allows single-node core sets to complete rolling updates successfully, instead of them becoming stuck on node evacuation.

This commit ensures that users are not allowed to create non-rolling-updatable EMQX clusters. Since replicants are still updated in a blue-green manner, during updates involving EMQX version upgrade at least 1 older-version core and 1 newer-version core need to be running in a cluster.

This commit adds additional safeguards around cores and replicants rolling update, to accomodate for "replicant connects to an exact same version core" EMQX requirement: 1. Updated replicant sets are not allowed to spin up until at least 1 core is rolling-updated. 2. Update of 1 last core is postponed until "current" replicant set is fully migrated.

This commit ensures that scale-down picks candidates statically and deterministically.

This should give relevant controllers enough time to update respective resource statuses, to make the rest of the reconcilers chain rely on up-to-date information.

This commit also improves `dsCleanupSites` reconciler observability.

This commit improves stability of `dsUpdateReplicaSets` reconciler. It now avoids consulting both EMQX runtime cluster state and DS cluster state (using instead only the latter) as the former can sometimes fail to include nodes that are in the process of starting or stopping, which can cause unwarranted target replica set changes.

This is an important prerequisite for in-place rolling updates: persistent data is now expected to survive pods deletion. Also enforce suitable PVC retention policy: PVs / PVCs should survive regular pod deletions (as part of rolling updates) but not parent StatefulSet scale-down or deletion.

This commit fixes the consequence of introducing volume persistence to core set pods by default: node evacuation state survives pod recreation, and thus needs to be stopped explicitly.

This new reconciler is responsible for force-leaving nodes out of the EMQX cluster view, to keep it consistent with current core and replicant sets.

This ensures that potentially not-entirely-complete reconciliations have a chance to be retried earlier than in 30 seconds, in case EMQX CR is considered Ready.

This ensures there's no special conditions for choosing between shorter and longer requeue timeouts.

This commit ensures that listeners service targets correct set of pod at all times: cores if no replicants, otherwise "update" replicants if there's at least 1 ready, otherwise "current" replicants if there's at lesst 1 ready.

This is a workaround to allow label-only core set updates to complete.

This commit ensures that label-only changes are applicable to managed resources: without this, changing core template labels was either ignored or might have caused controller to stuck in a retry loop, because selector updates for StatefulSets are in general prohibited.

codecov · 2026-04-07T09:57:17Z

Codecov Report

❌ Patch coverage is 83.58663% with 162 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (main-3.x@a2885da). Learn more about missing BASE report.

Files with missing lines	Patch %	Lines
internal/controller/sync_core_set.go	79.18%	37 Missing and 14 partials ⚠️
internal/controller/load_state.go	61.53%	19 Missing and 11 partials ⚠️
internal/controller/add_replicant_set.go	84.21%	10 Missing and 2 partials ⚠️
internal/controller/ds_update_replica_sets.go	64.00%	6 Missing and 3 partials ⚠️
internal/controller/rebalance_controller.go	0.00%	8 Missing ⚠️
internal/controller/sync_emqx_config.go	61.11%	7 Missing ⚠️
internal/controller/update_emqx_status.go	96.47%	5 Missing and 1 partial ⚠️
internal/controller/util/pod.go	75.00%	4 Missing and 2 partials ⚠️
internal/controller/add_core_set.go	95.57%	3 Missing and 2 partials ⚠️
internal/emqx/api/evacuation.go	50.00%	4 Missing and 1 partial ⚠️
... and 11 more

Additional details and impacted files

@@             Coverage Diff             @@
##             main-3.x    #1179   +/-   ##
===========================================
  Coverage            ?   74.17%           
===========================================
  Files               ?       48           
  Lines               ?     3667           
  Branches            ?        0           
===========================================
  Hits                ?     2720           
  Misses              ?      801           
  Partials            ?      146

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

This commit ensures that DS reconcilers prefer to work with the same cluster view, preferably sourced from 6.x, primarily because EMQX clusters running 6.1.0 and newer can have separate cluster views different from 6.0.x and earlier.

This commit eliminates few linter complaints.

zmstone · 2026-04-07T21:24:31Z

when would it be the good timing to change v3alpha1 to v3?

keynslug · 2026-04-08T09:09:18Z

@zmstone The plan is to switch to v3beta1 once API looks reasonable, future-proof and is ready for release, and to v3 once the rest of Operator v3 features are in and it's not going to change anymore.

This commit fixes an uncommon issue where a core set having more than 10 pods was rolling updated in an incorrect order.

Specifically, this commit mentions how evacuations are managed and what should be user expectations.

keynslug added 30 commits April 2, 2026 15:04

refactor(api): make EMQX status and CRD surface slimmer

f4db418

Remove fields that: * No longer relevant for single-core-set style operation. * Duplicate existing status fields from other resources.

chore(ds): return early when DS is inactive

117c47d

refactor(api): remove Replicas from EMQX CR status

6db23f0

This field was duplicating spec fields verbatim. This commit drops this field, making status slimmer and enforcing single source of truth.

refactor(api): rename nodeEvacuationsStatus to nodeEvacuations

1dc2345

This commit makes API field naming naming simpler and clearer.

chore: remove PDB compatibility shims for Kubernetes 1.21

cdfec98

This Kubernetes version is unsupported for quite some time.

fix: unblock core rolling update if DS replication site

2c9e2d2

This commit ensures that core set rolling update can progress even if an outdated core node is a DS replication site. Since such rolling updates preserve persistent data, this should be safe.

fix: drop core set leave-cluster preStop hook

277a4d4

This commit fixes the issue with in-place rolling update when core nodes constantly tried to leave and the rejoin the cluster. This was needed for blue-green updates but does not make sense for the new approach.

test: rework sync_pods_suite_test

907c322

fix(api): preserve zero-valued status counters in EMQX CR

a876bcc

This makes EMQX CR status a bit more consistent and helps with observability.

fix: require rest of cores are available before sync progress

4fa3752

This commit corrects the "cores are available" criterion for core pod removal safety: now all cores except for candidate are accounted, and NumReplicas-1 is considered sufficient.

fix: skip evacuation for single-node clusters

84b8671

This commit allows single-node core sets to complete rolling updates successfully, instead of them becoming stuck on node evacuation.

test(e2e): update "EMQX Cluster" scenario

9c8d9a7

test(e2e): update "EMQX Cluster / Botched Rolling Updates" scenario

6d16635

fix: correct core set scale-down behavior

f4162ea

This commit ensures that scale-down picks candidates statically and deterministically.

fix: ensure outdated core pod list is empty if no updates

9a3ad81

fix: force requeue after core/replicant set added or updated

923e881

This should give relevant controllers enough time to update respective resource statuses, to make the rest of the reconcilers chain rely on up-to-date information.

fix: run dsCleanupSites earlier in the reconciler chain

9d0abf0

This commit also improves `dsCleanupSites` reconciler observability.

test(e2e): test both scale-up and scale-down

b3c64ae

test(e2e): enrich diagnostic output on failures

66c90fd

fix: ensure syncCoreSet stops evacuation explicitly on pod update

16918c8

This commit fixes the consequence of introducing volume persistence to core set pods by default: node evacuation state survives pod recreation, and thus needs to be stopped explicitly.

feat: add sync cluster membership reconciler

e17900c

This new reconciler is responsible for force-leaving nodes out of the EMQX cluster view, to keep it consistent with current core and replicant sets.

test(e2e): verify EMQX cluster view during tests

f10e483

test(e2e): update "EMQX Core-Replicant DS-Enabled Cluster" scenario

431b20c

keynslug added 8 commits April 7, 2026 10:39

fix: allow reconcilers to ask for short-timeout requeue

5b7eb01

This ensures that potentially not-entirely-complete reconciliations have a chance to be retried earlier than in 30 seconds, in case EMQX CR is considered Ready.

fix: simplify addService + refactor test suite

9582a67

feat(controller): trigger reconcile on EMQX CR spec changes only

c956d5a

fix: mark EMQX not ready when DS replication is unstable

5e0a504

This ensures there's no special conditions for choosing between shorter and longer requeue timeouts.

fix: stop propagating core template label to PVC templates

967838d

This is a workaround to allow label-only core set updates to complete.

chore(handler): drop dead code

1e91ed5

keynslug force-pushed the feat/EMQX-15033/rolling-upgrade branch from 2685270 to 966507e Compare April 7, 2026 09:38

keynslug force-pushed the feat/EMQX-15033/rolling-upgrade branch from 966507e to 62bd468 Compare April 7, 2026 11:27

keynslug added 6 commits April 7, 2026 13:41

fix: prefer EMQX 6.x to load DS cluster view

ba3f3ac

This commit ensures that DS reconcilers prefer to work with the same cluster view, preferably sourced from 6.x, primarily because EMQX clusters running 6.1.0 and newer can have separate cluster views different from 6.0.x and earlier.

test(e2e): update "EMQX Runtime-enabled DS Replication" scenario

1d43716

feat: promote EMQX CRD to v3alpha1 + turn Rebalance CRD off

e58f4ad

chore: define shared constants in api package

34071b7

This commit eliminates few linter complaints.

chore: drop dead code

5b2726b

chore(helm): switch to 3.0.0 and drop compat measures

8a234d6

keynslug force-pushed the feat/EMQX-15033/rolling-upgrade branch from 62bd468 to 8a234d6 Compare April 7, 2026 11:44

keynslug marked this pull request as ready for review April 7, 2026 13:47

zmstone approved these changes Apr 8, 2026

View reviewed changes

keynslug added 5 commits April 8, 2026 13:05

fix: sort outdated pods by ordinal numerically

1fa3208

This commit fixes an uncommon issue where a core set having more than 10 pods was rolling updated in an incorrect order.

fix: tolerate undefined "current" replicant set

8b55e6d

chore: leave comment on API conflict handling

ae46b62

fix: avoid deleting nil core pod candidate during scale down

888e597

chore: leave comment about syncCoreSet responsibilities

1f32248

Specifically, this commit mentions how evacuations are managed and what should be user expectations.

zmstone approved these changes Apr 8, 2026

View reviewed changes

keynslug merged commit 2abd471 into main-3.x Apr 8, 2026
14 checks passed

keynslug deleted the feat/EMQX-15033/rolling-upgrade branch April 8, 2026 12:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: switch core StatefulSet to in-place rolling updates#1179

feat: switch core StatefulSet to in-place rolling updates#1179
keynslug merged 51 commits intomain-3.xfrom
feat/EMQX-15033/rolling-upgrade

keynslug commented Apr 2, 2026 •

edited

Loading

Uh oh!

codecov bot commented Apr 7, 2026 •

edited

Loading

Uh oh!

zmstone commented Apr 7, 2026

Uh oh!

keynslug commented Apr 8, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

keynslug commented Apr 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Core set rolling update

Important notes

Uh oh!

codecov bot commented Apr 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

zmstone commented Apr 7, 2026

Uh oh!

keynslug commented Apr 8, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

keynslug commented Apr 2, 2026 •

edited

Loading

codecov bot commented Apr 7, 2026 •

edited

Loading