Skip to content

capz-windows-master-hyperv-serial-slow: re-enable 3 GC/SchedulerPreemption specs#37174

Merged
k8s-ci-robot merged 1 commit into
kubernetes:masterfrom
rzlink:davwei/capz-hyperv-serial-slow-reenable-3-gc-sp-specs
Jun 3, 2026
Merged

capz-windows-master-hyperv-serial-slow: re-enable 3 GC/SchedulerPreemption specs#37174
k8s-ci-robot merged 1 commit into
kubernetes:masterfrom
rzlink:davwei/capz-hyperv-serial-slow-reenable-3-gc-sp-specs

Conversation

@rzlink

@rzlink rzlink commented Jun 3, 2026

Copy link
Copy Markdown
Contributor

What this PR does

Re-enables 3 previously-skipped Kubernetes e2e specs on the
capz-windows-master-hyperv-serial-slow testgrid dashboard:

  • [sig-api-machinery] Garbage collector should not delete dependents that have both valid owner and owner that's waiting for dependents to be deleted [Serial] [Conformance]
  • [sig-api-machinery] Garbage collector should orphan pods created by rc if delete options say so [Serial] [Conformance]
  • [sig-scheduling] SchedulerPreemption [Serial] validates pod disruption condition is added to the preempted pod [Conformance]

These were originally skipped because they consistently OOM'd Windows worker nodes under Hyper-V isolation due to per-pod vmmem.exe overhead exceeding the kubelet's hard-eviction threshold on 16 GiB D4s_v3 workers.

Why this is safe now

The root cause was excessive Hyper-V pod density: with the default --max-pods=110, the e2e helper estimateMaximumPods(min=10, max=100) would request up to 100 pods per node, and each pod added ~500-700 MiB
of vmmem host overhead invisible to pod-level kubelet stats.

The MAX_PODS=20 cap added in kubernetes-sigs/windows-testing#567 and test-infra#37141
reduces this to at most 20 pods/node, keeping the 3 specs comfortably inside available memory.

/sig windows
/area provider/azure
/cc @marosset @zylxjtu

…ption specs

Removes 3 entries from the GINKGO_SKIP regex now that MAX_PODS=20
(kubernetes-sigs/windows-testing#567 + test-infra#37141) prevents the
Hyper-V vmmem-overhead-induced MemoryPressure that caused them to be
skipped.

Validated 30/30 PASS on fresh CAPZ Hyper-V (WS2025, 2x D4s_v3,
MAX_PODS=20) — 10 back-to-back rounds, 0 MemoryPressure events.
@k8s-ci-robot k8s-ci-robot requested review from marosset and zylxjtu June 3, 2026 01:58
@k8s-ci-robot k8s-ci-robot added sig/windows Categorizes an issue or PR as relevant to SIG Windows. area/provider/azure Issues or PRs related to azure provider cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Jun 3, 2026
@k8s-ci-robot

Copy link
Copy Markdown
Contributor

Hi @rzlink. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work.

Tip

We noticed you've done this a few times! Consider joining the org to skip this step and gain /lgtm and other bot rights. We recommend asking approvers on your previous PRs to sponsor you.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added area/config Issues or PRs related to code in /config needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. area/jobs sig/testing Categorizes an issue or PR as relevant to SIG Testing. labels Jun 3, 2026
@marosset

marosset commented Jun 3, 2026

Copy link
Copy Markdown
Contributor

/ok-to-test
/approve
/hold for CI test

@k8s-ci-robot k8s-ci-robot added do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Jun 3, 2026
@k8s-ci-robot

Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: marosset, rzlink

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 3, 2026
@marosset

marosset commented Jun 3, 2026

Copy link
Copy Markdown
Contributor

/hold cancel
we don't have a ci test for the hyperv + serial slow
/lgtm

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jun 3, 2026
@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 3, 2026
@k8s-ci-robot k8s-ci-robot merged commit dc9ad0e into kubernetes:master Jun 3, 2026
6 checks passed
@k8s-ci-robot

Copy link
Copy Markdown
Contributor

@rzlink: Updated the job-config configmap in namespace default at cluster test-infra-trusted using the following files:

  • key release-master-windows.yaml using file config/jobs/kubernetes-sigs/sig-windows/release-master-windows.yaml
Details

In response to this:

What this PR does

Re-enables 3 previously-skipped Kubernetes e2e specs on the
capz-windows-master-hyperv-serial-slow testgrid dashboard:

  • [sig-api-machinery] Garbage collector should not delete dependents that have both valid owner and owner that's waiting for dependents to be deleted [Serial] [Conformance]
  • [sig-api-machinery] Garbage collector should orphan pods created by rc if delete options say so [Serial] [Conformance]
  • [sig-scheduling] SchedulerPreemption [Serial] validates pod disruption condition is added to the preempted pod [Conformance]

These were originally skipped because they consistently OOM'd Windows worker nodes under Hyper-V isolation due to per-pod vmmem.exe overhead exceeding the kubelet's hard-eviction threshold on 16 GiB D4s_v3 workers.

Why this is safe now

The root cause was excessive Hyper-V pod density: with the default --max-pods=110, the e2e helper estimateMaximumPods(min=10, max=100) would request up to 100 pods per node, and each pod added ~500-700 MiB
of vmmem host overhead invisible to pod-level kubelet stats.

The MAX_PODS=20 cap added in kubernetes-sigs/windows-testing#567 and test-infra#37141
reduces this to at most 20 pods/node, keeping the 3 specs comfortably inside available memory.

/sig windows
/area provider/azure
/cc @marosset @zylxjtu

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@rzlink rzlink deleted the davwei/capz-hyperv-serial-slow-reenable-3-gc-sp-specs branch June 3, 2026 21:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. area/config Issues or PRs related to code in /config area/jobs area/provider/azure Issues or PRs related to azure provider cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. sig/testing Categorizes an issue or PR as relevant to SIG Testing. sig/windows Categorizes an issue or PR as relevant to SIG Windows. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants