Skip to content

capz: add MAX_PODS knob to override kubelet max pods on Windows nodes#567

Merged
k8s-ci-robot merged 1 commit into
kubernetes-sigs:masterfrom
rzlink:fix/hyperv-maxpods-20
Jun 1, 2026
Merged

capz: add MAX_PODS knob to override kubelet max pods on Windows nodes#567
k8s-ci-robot merged 1 commit into
kubernetes-sigs:masterfrom
rzlink:fix/hyperv-maxpods-20

Conversation

@rzlink

@rzlink rzlink commented May 29, 2026

Copy link
Copy Markdown
Contributor

What this PR does / why we need it

On Windows-on-Hyper-V worker nodes, every pod that lands on the runhcs-wcow-hypervisor RuntimeClass spins up a Hyper-V UVM that consumes ~500 MiB of host non-paged kernel pool and substantial HNS state. This overhead is invisible to kubelet's pod-level stats (it lives in vmmem.exe/vmwp.exe), so the default --max-pods=110 allows enough pods to exhaust the kernel pool. Once that happens, new pods fail to start with HCN_E_ADDR_INVALID_OR_RESERVED (0x803b002f) and HNS returns 0xe, cascading into MemoryPressure evictions, SchedulerPreemption failures, and Calico HNS endpoint cleanup issues. This is what makes the capz-windows-master-hyperv* testgrid dashboards flake.

This PR caps maxPods at 20 on Windows workers whenever HYPERV=true by templating HYPERV_MAX_PODS into the kubelet config for the KubeadmConfigTemplate of every Windows machine pool, defaulted in run-capz-e2e.sh. Set HYPERV_MAX_PODS="" to opt out.

Validation

Three back-to-back full provision → e2e → cleanup iterations against the upstream capz-windows-master-hyperv shape (uksouth, AKS mgmt + CAPZ workload, 1× CP + 2× WS2025 Hyper-V Standard_D4s_v3 workers, ginkgo --nodes=4, focus/skip matching the Prow job verbatim):

Iter E2E wall Tests Failures Errors 0x803b002f cascades
1 1h04m 7600 0 0 0
2 1h24m 7602 0 0 0
3 1h06m 7602 0 0 0

For comparison, recent un-patched runs of the same Prow job have hundreds of 0x803b002f markers and 1–8 spec failures per run.

Special notes for your reviewer

  • The cap is opt-out (HYPERV_MAX_PODS="" disables injection).
  • Touches every Windows-bearing template (windows-base, windows-ci, windows-pr, gmsa-ci, gmsa-pr, shared-image-gallery-ci).
  • Process-isolated runs (HYPERV=false) are unaffected: the envsubst block expands to empty.

Release note

NONE

@linux-foundation-easycla

linux-foundation-easycla Bot commented May 29, 2026

Copy link
Copy Markdown

CLA Signed
The committers listed above are authorized under a signed CLA.

  • ✅ login: rzlink / name: David Wei (e478e00)

@k8s-ci-robot k8s-ci-robot requested a review from andyzhangx May 29, 2026 16:59
@k8s-ci-robot k8s-ci-robot added the tide/merge-method-squash Denotes a PR that should be squashed by tide when it merges. label May 29, 2026
@k8s-ci-robot k8s-ci-robot requested a review from claudiubelu May 29, 2026 16:59
@k8s-ci-robot k8s-ci-robot added cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels May 29, 2026
@k8s-ci-robot

Copy link
Copy Markdown
Contributor

Hi @rzlink. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work.

Tip

We noticed you've done this a few times! Consider joining the org to skip this step and gain /lgtm and other bot rights. We recommend asking approvers on your previous PRs to sponsor you.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label May 29, 2026
@rzlink rzlink force-pushed the fix/hyperv-maxpods-20 branch from ca890e3 to e478e00 Compare May 29, 2026 17:08
@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. and removed cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. labels May 29, 2026
@zylxjtu

zylxjtu commented May 29, 2026

Copy link
Copy Markdown
Contributor

/ok-to-test

1 similar comment
@zylxjtu

zylxjtu commented May 29, 2026

Copy link
Copy Markdown
Contributor

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels May 29, 2026
@rzlink

rzlink commented May 29, 2026

Copy link
Copy Markdown
Contributor Author

@zylxjtu Do you think that instead of detecting whether Hyper-V is enabled and setting HYPERV_MAX_PODS, we could just use MAX_PODS and override the value in the CI job? That might be a simpler solution.

@marosset

Copy link
Copy Markdown
Contributor

@zylxjtu Do you think that instead of detecting whether Hyper-V is enabled and setting HYPERV_MAX_PODS, we could just use MAX_PODS and override the value in the CI job? That might be a simpler solution.

+1 to this, let's try and then we can merge this if needed later

@rzlink rzlink force-pushed the fix/hyperv-maxpods-20 branch from e478e00 to a3699b8 Compare May 29, 2026 22:58
@rzlink

rzlink commented May 29, 2026

Copy link
Copy Markdown
Contributor Author

Updated per @marosset's review: replaced the Hyper-V auto-default with a generic MAX_PODS env var, default empty. Hyper-V CI jobs can opt in by setting MAX_PODS=20 in their Prow job env — follow-up PR will land in kubernetes/test-infra. PTAL.

@rzlink rzlink changed the title Cap Windows maxPods to 20 when HYPERV=true capz: add MAX_PODS knob to override kubelet max pods on Windows nodes May 29, 2026
Replaces the Hyper-V auto-detected HYPERV_MAX_PODS with a generic
MAX_PODS env var. Default is empty (no override); CI jobs that need
a lower cap (e.g. the hyperv-serial-slow Prow jobs, where each UVM
consumes host kernel pool kubelet eviction stats do not see) can set
MAX_PODS in the job env.

Derivation and per-VM-SKU recommendations:
https://github.com/kubernetes-sigs/windows-testing-benchmarks/blob/benchmarks/hyperv-resource-comparison/benchmarks/hyperv-resource-comparison/docs/customer-guidance.md
@rzlink rzlink force-pushed the fix/hyperv-maxpods-20 branch from a3699b8 to b027224 Compare May 29, 2026 23:04
rzlink added a commit to rzlink/test-infra that referenced this pull request May 29, 2026
Each Hyper-V UVM consumes host non-paged kernel pool that kubelet
eviction stats do not see; the default maxPods=110 causes HNS
exhaustion and MemoryPressure cascades on 16 GiB workers, which
shows up as flakes on capz-windows-master-hyperv and
capz-windows-master-hyperv-serial-slow.

Companion to kubernetes-sigs/windows-testing#567 which adds the
generic MAX_PODS knob to run-capz-e2e.sh.
@zylxjtu

zylxjtu commented Jun 1, 2026

Copy link
Copy Markdown
Contributor

@zylxjtu Do you think that instead of detecting whether Hyper-V is enabled and setting HYPERV_MAX_PODS, we could just use MAX_PODS and override the value in the CI job? That might be a simpler solution.

I would still prefer the explicitly setting of hyperv, "MAX_PODS" does not seem to necessarily be related with hyper-v

@zylxjtu

zylxjtu commented Jun 1, 2026

Copy link
Copy Markdown
Contributor

/approve
/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 1, 2026
@k8s-ci-robot

Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: rzlink, zylxjtu

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 1, 2026
@k8s-ci-robot k8s-ci-robot merged commit 7eb26c2 into kubernetes-sigs:master Jun 1, 2026
4 checks passed
@rzlink rzlink deleted the fix/hyperv-maxpods-20 branch June 1, 2026 20:46
rzlink added a commit to rzlink/test-infra that referenced this pull request Jun 3, 2026
…ption specs

Removes 3 entries from the GINKGO_SKIP regex now that MAX_PODS=20
(kubernetes-sigs/windows-testing#567 + test-infra#37141) prevents the
Hyper-V vmmem-overhead-induced MemoryPressure that caused them to be
skipped.

Validated 30/30 PASS on fresh CAPZ Hyper-V (WS2025, 2x D4s_v3,
MAX_PODS=20) — 10 back-to-back rounds, 0 MemoryPressure events.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. tide/merge-method-squash Denotes a PR that should be squashed by tide when it merges.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants