Skip to content

chore: upgrade Kubernetes dependencies and local images to 1.35#603

Open
yankay wants to merge 1 commit into
ai-dynamo:mainfrom
yankay:chore/602-upgrade-k8s-1.35
Open

chore: upgrade Kubernetes dependencies and local images to 1.35#603
yankay wants to merge 1 commit into
ai-dynamo:mainfrom
yankay:chore/602-upgrade-k8s-1.35

Conversation

@yankay
Copy link
Copy Markdown
Contributor

@yankay yankay commented May 12, 2026

What type of PR is this?

/kind cleanup

What this PR does / why we need it:

Upgrades Grove's Kubernetes dependency baseline from 1.34 to 1.35.

  • k8s.io/*: v0.34.3 -> v0.35.4
  • sigs.k8s.io/controller-runtime: v0.22.4 -> v0.23.3
  • kindest/node: v1.34.3 -> v1.35.1
  • rancher/k3s: v1.34.2-k3s1 -> v1.35.4-k3s1
  • k8s.io/kubelet replace: v0.34.2 -> v0.35.4

Generated code refreshed via make generate and make generate-api-docs.

Which issue(s) this PR fixes:

Fixes #602

Special notes for your reviewer:

Local: go build ./... and go test ./... (with envtest on Kubernetes v1.35.0) pass across all modules.

Related design PR: #605

Does this PR introduce a API change?

The following dependencies are updated:
- `k8s.io/*`: `v0.34.3` -> `v0.35.4`
- `sigs.k8s.io/controller-runtime`: `v0.22.4` -> `v0.23.3`
- `kindest/node`: `v1.34.3` -> `v1.35.1`
- `rancher/k3s`: `v1.34.2-k3s1` -> `v1.35.4-k3s1`

Additional documentation e.g., enhancement proposals, usage docs, etc.:

NONE

Upgrade Grove's Kubernetes dependency baseline from 1.34 to 1.35:

* `k8s.io/*`: `v0.34.3` -> `v0.35.4`
* `sigs.k8s.io/controller-runtime`: `v0.22.4` -> `v0.23.3`
* `kindest/node`: `v1.34.3` -> `v1.35.1`
* `rancher/k3s`: `v1.34.2-k3s1` -> `v1.35.4-k3s1`
* `k8s.io/kubelet` replace target: `v0.34.2` -> `v0.35.4`

Regenerates client and CRD code via `make generate`. The client-go 1.35
generator now wraps informer list/watch calls with
`cache.ToListWatcherWithWatchListSemantics` and adds an
`IsWatchListSemanticsUnSupported` method on the fake clientset.

`go mod tidy` also drops `github.com/ai-dynamo/grove/operator/client`
from the operator module's direct dependencies, where it was a stale
unused entry.

Fixes ai-dynamo#602

Signed-off-by: Kay Yan <kay.yan@daocloud.io>
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented May 12, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@yankay
Copy link
Copy Markdown
Contributor Author

yankay commented May 12, 2026

CI status: blocked on KAI-Scheduler upstream

All 7 failing E2E jobs share the same root cause, and it is not the kind/k3d 1.35 image bump itself. The k3d cluster (rancher/k3s:v1.35.4-k3s1) comes up fine; failure happens later when make run-e2e tries to compile the Go e2e test binary:

# github.com/kai-scheduler/KAI-scheduler/pkg/apis/scheduling/v2alpha2
.../kai-scheduler@v0.14.0/pkg/apis/scheduling/v2alpha2/podgroup_webhook.go:18:34:
  not enough arguments in call to ctrl.NewWebhookManagedBy
    have (controllerruntime.Manager)
    want (manager.Manager, T)
FAIL  github.com/ai-dynamo/grove/operator/e2e/tests [build failed]
make[1]: *** [Makefile:126: run-e2e] Error 1

controller-runtime v0.23 made ctrl.NewWebhookManagedBy generic (mgr, obj). operator/e2e/tests transitively imports github.com/kai-scheduler/KAI-scheduler v0.14.0, which still uses the v0.22 signature, so the e2e binary no longer compiles once we bump controller-runtime.

KAI-Scheduler side

The fix is already merged on main of kai-scheduler/KAI-Scheduler:

But no released tag contains it yet — the latest tags v0.14.0 / v0.14.1 / v0.14.2 and the v0.14 release branch are all still on controller-runtime v0.22.3. The change is sitting in CHANGELOG.md under [Unreleased].

Proposed path

Two options, in order of preference:

  1. Wait for the next KAI-Scheduler minor release (presumably v0.15.0) and bump to that here. There is no public ETA yet — happy to ping upstream to ask.
  2. Temporarily pin to kai-scheduler/KAI-Scheduler main via go get github.com/kai-scheduler/KAI-scheduler@1b591f419a01 so this PR is not held up, with a follow-up to swap back to a proper semver once v0.15.0 is cut.

Marking the PR as pending upstream while we decide. Suggestions welcome.

@danbar2
Copy link
Copy Markdown
Contributor

danbar2 commented May 12, 2026

CI status: blocked on KAI-Scheduler upstream

All 7 failing E2E jobs share the same root cause, and it is not the kind/k3d 1.35 image bump itself. The k3d cluster (rancher/k3s:v1.35.4-k3s1) comes up fine; failure happens later when make run-e2e tries to compile the Go e2e test binary:

# github.com/kai-scheduler/KAI-scheduler/pkg/apis/scheduling/v2alpha2
.../kai-scheduler@v0.14.0/pkg/apis/scheduling/v2alpha2/podgroup_webhook.go:18:34:
  not enough arguments in call to ctrl.NewWebhookManagedBy
    have (controllerruntime.Manager)
    want (manager.Manager, T)
FAIL  github.com/ai-dynamo/grove/operator/e2e/tests [build failed]
make[1]: *** [Makefile:126: run-e2e] Error 1

controller-runtime v0.23 made ctrl.NewWebhookManagedBy generic (mgr, obj). operator/e2e/tests transitively imports github.com/kai-scheduler/KAI-scheduler v0.14.0, which still uses the v0.22 signature, so the e2e binary no longer compiles once we bump controller-runtime.

KAI-Scheduler side

The fix is already merged on main of kai-scheduler/KAI-Scheduler:

But no released tag contains it yet — the latest tags v0.14.0 / v0.14.1 / v0.14.2 and the v0.14 release branch are all still on controller-runtime v0.22.3. The change is sitting in CHANGELOG.md under [Unreleased].

Proposed path

Two options, in order of preference:

  1. Wait for the next KAI-Scheduler minor release (presumably v0.15.0) and bump to that here. There is no public ETA yet — happy to ping upstream to ask.
  2. Temporarily pin to kai-scheduler/KAI-Scheduler main via go get github.com/kai-scheduler/KAI-scheduler@1b591f419a01 so this PR is not held up, with a follow-up to swap back to a proper semver once v0.15.0 is cut.

Marking the PR as pending upstream while we decide. Suggestions welcome.

Thanks for raising this PR !
Option 1 is the correct one, we should wait for KAI to upgrade as well.
cc: @enoodle

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Upgrade Kubernetes dependencies and kind image to 1.35

2 participants