docs(grep): add GREP-531 Workload API gang scheduling for default-scheduler by yankay · Pull Request #1 · yankay/grove

yankay · 2026-05-12T02:15:14Z

What type of PR is this?

/kind documentation
/kind feature

What this PR does / why we need it:

Adds GREP-531: Workload API Gang Scheduling for default-scheduler Backend under docs/proposals/531-kube-scheduler-workload-gang/.

The proposal lands upstream scheduling.k8s.io Workload / PodGroup gang admission inside Grove's existing default-scheduler backend (introduced in GREP-375). Highlights:

Opt-in only via the existing KubeSchedulerConfig.GangScheduling field. No new backend, no framework change, no new user-facing API.
API version strategy: target v1alpha2 semantics (KEP-5832, aligning with LWS KEP-666); v1alpha1 only as a temporary fallback.
Forward-compatible with hierarchical gang (KEP-6012 CompositePodGroup) — Grove's PCS → PCSG → role hierarchy maps onto a native upstream hierarchical gang tree, the primary differentiator from LWS.
Escape hatch for external owners (Kueue, cross-PCS gang controllers) to take over Workload lifecycle without new API surface.
API discovery via cached RESTMapper so a missing upstream API yields a clear admission error instead of silent degradation.
Satisfies the Beta criterion already recorded in GREP-375 ("Kube backend support for advanced community features (e.g. Workload API for gang scheduling) as they become available in Kubernetes.").

This is a docs/-only change (one new file). Implementation will follow in a separate PR; existing draft #532 predates this GREP and will be revisited against the design here.

Which issue(s) this PR fixes:

Refs ai-dynamo#531
Umbrella: ai-dynamo#395

Special notes for your reviewer:

This PR targets yankay/grove:main for self-review / preview before opening upstream against ai-dynamo/grove.
The cross-references to in-flight sibling backend GREPs (KAI [GREP] scheduler plugin - KAI scheduler ai-dynamo/grove#553, Volcano Reduce memory ai-dynamo/grove#376, Koordinator Add koord-scheduler backend with gang scheduling support ai-dynamo/grove#537) intentionally point at GitHub issues/PRs rather than docs/proposals/<id>/README.md paths, since those directories do not yet exist on main.
ToC is generated via make update-toc and verified by make verify-toc.

Does this PR introduce a API change?

```release-note
NONE
```

Additional documentation e.g., enhancement proposals, usage docs, etc.:

```docs
docs/proposals/531-kube-scheduler-workload-gang/README.md
```

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 62ed9d1a79

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-12T02:18:21Z

+workloads. Upstream `scheduling.k8s.io` (alpha in 1.34+) closes that
+gap, and Grove's `PCS → PCSG → role` hierarchy is a natural consumer


Correct the Kubernetes version prerequisite

This states the upstream Workload/Gang Scheduling APIs are available in Kubernetes 1.34+, but the current Kubernetes docs list both the Workload API and Gang Scheduling as Kubernetes v1.35 [alpha]. Since Grove's local e2e defaults are still 1.34.x, this would send implementers or users toward a cluster version where the API cannot be enabled or discovered.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-05-12T02:18:21Z

+The upstream `WorkloadScheduling` / `GenericWorkload` feature gate on
+`kube-apiserver` and `kube-scheduler` is an install-time prerequisite


Require the actual gang scheduling feature gates

For clusters following these prerequisites, WorkloadScheduling is not a Kubernetes feature gate, and enabling only GenericWorkload is not enough for all-or-nothing admission. The official feature gate reference lists GenericWorkload and GangScheduling, and the gang scheduling docs describe the GangScheduling scheduler plugin as the component that enforces the policy, so the user docs/implementation plan should name GangScheduling instead of WorkloadScheduling and include it for the scheduler.

Useful? React with 👍 / 👎.

gemini-code-assist

Code Review

This pull request introduces support for multiple ClusterTopology resources and refines the topology-aware scheduling model. Key changes include moving ClusterTopology validation to a new validating webhook, implementing a dedicated reconciler for topology drift detection, and refactoring MNNVL group resolution to follow a hierarchical pattern (clique -> scaling group -> set). The deletion logic for PodCliqueSet and its children has been optimized to leverage Kubernetes garbage collection, and performance improvements were made to the controller's lookup logic using sets and maps. Feedback suggests improving context propagation in validation handlers and addresses a potential performance bottleneck in the PodCliqueSet watch registration logic.

I am having trouble creating individual review comments. Click here to see my feedback.

operator/internal/webhook/admission/pcs/validation/podcliqueset.go (694)

Update this method to accept a context.Context parameter. This allows the internal call to validateTopologyConstraintsOnCreate to use the actual request context instead of context.Background().

func (v *pcsValidator) validateTopologyConstraintsUpdate(ctx context.Context, oldPCS *grovecorev1alpha1.PodCliqueSet) field.ErrorList {

operator/internal/webhook/admission/pcs/validation/podcliqueset.go (706)

Pass the received ctx instead of context.Background() to ensure consistent context propagation during the legacy repair validation path.

				return v.validateTopologyConstraintsOnCreate(ctx)

operator/internal/controller/podcliqueset/register.go (105)

This List operation fetches all PodCliqueSet resources in the cluster whenever any ClusterTopology is updated. While ClusterTopology changes are infrequent, this could become a performance bottleneck in clusters with a very large number of PodCliqueSet objects. Consider using a field selector or an index if the number of workloads scales significantly.

…eduler backend Introduces GREP-531: Workload API Gang Scheduling for the default-scheduler backend. Key design decisions: - Phase 1 (Kubernetes 1.35, v1alpha1): flat Workload/PodGroup mapping, Pod.Spec.WorkloadRef membership, immutable gang shape with recreate-workload semantics. - Phase 2 (Kubernetes 1.36+, conditional on KEP-5832): decoupled standalone PodGroup, Pod.Spec.SchedulingGroup membership. - Phase 3 (Kubernetes 1.37+): CompositePodGroup hierarchical gang, TAS. Also renames the directory from 531-kube-scheduler-workload-gang to 531-default-scheduler-workload-gang to match the backend name. Relates to ai-dynamo#531, ai-dynamo#395 Signed-off-by: Kay Yan <kay.yan@daocloud.io>

yankay closed this May 12, 2026

yankay reopened this May 12, 2026

chatgpt-codex-connector Bot reviewed May 12, 2026

View reviewed changes

gemini-code-assist Bot reviewed May 12, 2026

View reviewed changes

yankay force-pushed the grep/531-kube-scheduler-workload-gang branch from 4f33495 to 5a8035a Compare May 12, 2026 08:54

yankay closed this May 12, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs(grep): add GREP-531 Workload API gang scheduling for default-scheduler#1

docs(grep): add GREP-531 Workload API gang scheduling for default-scheduler#1
yankay wants to merge 1 commit into
mainfrom
grep/531-kube-scheduler-workload-gang

yankay commented May 12, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot May 12, 2026

Uh oh!

chatgpt-codex-connector Bot May 12, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

		workloads. Upstream `scheduling.k8s.io` (alpha in 1.34+) closes that
		gap, and Grove's `PCS → PCSG → role` hierarchy is a natural consumer

		The upstream `WorkloadScheduling` / `GenericWorkload` feature gate on
		`kube-apiserver` and `kube-scheduler` is an install-time prerequisite

Conversation

yankay commented May 12, 2026

What type of PR is this?

What this PR does / why we need it:

Which issue(s) this PR fixes:

Special notes for your reviewer:

Does this PR introduce a API change?

Additional documentation e.g., enhancement proposals, usage docs, etc.:

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 12, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot May 12, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

operator/internal/webhook/admission/pcs/validation/podcliqueset.go (694)

operator/internal/webhook/admission/pcs/validation/podcliqueset.go (706)

operator/internal/controller/podcliqueset/register.go (105)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant