docs(grep): add GREP-531 Workload API gang scheduling for default-scheduler#1
docs(grep): add GREP-531 Workload API gang scheduling for default-scheduler#1yankay wants to merge 1 commit into
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 62ed9d1a79
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| workloads. Upstream `scheduling.k8s.io` (alpha in 1.34+) closes that | ||
| gap, and Grove's `PCS → PCSG → role` hierarchy is a natural consumer |
There was a problem hiding this comment.
Correct the Kubernetes version prerequisite
This states the upstream Workload/Gang Scheduling APIs are available in Kubernetes 1.34+, but the current Kubernetes docs list both the Workload API and Gang Scheduling as Kubernetes v1.35 [alpha]. Since Grove's local e2e defaults are still 1.34.x, this would send implementers or users toward a cluster version where the API cannot be enabled or discovered.
Useful? React with 👍 / 👎.
| The upstream `WorkloadScheduling` / `GenericWorkload` feature gate on | ||
| `kube-apiserver` and `kube-scheduler` is an install-time prerequisite |
There was a problem hiding this comment.
Require the actual gang scheduling feature gates
For clusters following these prerequisites, WorkloadScheduling is not a Kubernetes feature gate, and enabling only GenericWorkload is not enough for all-or-nothing admission. The official feature gate reference lists GenericWorkload and GangScheduling, and the gang scheduling docs describe the GangScheduling scheduler plugin as the component that enforces the policy, so the user docs/implementation plan should name GangScheduling instead of WorkloadScheduling and include it for the scheduler.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Code Review
This pull request introduces support for multiple ClusterTopology resources and refines the topology-aware scheduling model. Key changes include moving ClusterTopology validation to a new validating webhook, implementing a dedicated reconciler for topology drift detection, and refactoring MNNVL group resolution to follow a hierarchical pattern (clique -> scaling group -> set). The deletion logic for PodCliqueSet and its children has been optimized to leverage Kubernetes garbage collection, and performance improvements were made to the controller's lookup logic using sets and maps. Feedback suggests improving context propagation in validation handlers and addresses a potential performance bottleneck in the PodCliqueSet watch registration logic.
I am having trouble creating individual review comments. Click here to see my feedback.
operator/internal/webhook/admission/pcs/validation/podcliqueset.go (694)
Update this method to accept a context.Context parameter. This allows the internal call to validateTopologyConstraintsOnCreate to use the actual request context instead of context.Background().
func (v *pcsValidator) validateTopologyConstraintsUpdate(ctx context.Context, oldPCS *grovecorev1alpha1.PodCliqueSet) field.ErrorList {
operator/internal/webhook/admission/pcs/validation/podcliqueset.go (706)
Pass the received ctx instead of context.Background() to ensure consistent context propagation during the legacy repair validation path.
return v.validateTopologyConstraintsOnCreate(ctx)
operator/internal/controller/podcliqueset/register.go (105)
This List operation fetches all PodCliqueSet resources in the cluster whenever any ClusterTopology is updated. While ClusterTopology changes are infrequent, this could become a performance bottleneck in clusters with a very large number of PodCliqueSet objects. Consider using a field selector or an index if the number of workloads scales significantly.
…eduler backend Introduces GREP-531: Workload API Gang Scheduling for the default-scheduler backend. Key design decisions: - Phase 1 (Kubernetes 1.35, v1alpha1): flat Workload/PodGroup mapping, Pod.Spec.WorkloadRef membership, immutable gang shape with recreate-workload semantics. - Phase 2 (Kubernetes 1.36+, conditional on KEP-5832): decoupled standalone PodGroup, Pod.Spec.SchedulingGroup membership. - Phase 3 (Kubernetes 1.37+): CompositePodGroup hierarchical gang, TAS. Also renames the directory from 531-kube-scheduler-workload-gang to 531-default-scheduler-workload-gang to match the backend name. Relates to ai-dynamo#531, ai-dynamo#395 Signed-off-by: Kay Yan <kay.yan@daocloud.io>
4f33495 to
5a8035a
Compare
What type of PR is this?
/kind documentation
/kind feature
What this PR does / why we need it:
Adds GREP-531: Workload API Gang Scheduling for default-scheduler Backend under
docs/proposals/531-kube-scheduler-workload-gang/.The proposal lands upstream
scheduling.k8s.ioWorkload / PodGroup gang admission inside Grove's existingdefault-schedulerbackend (introduced in GREP-375). Highlights:KubeSchedulerConfig.GangSchedulingfield. No new backend, no framework change, no new user-facing API.PCS → PCSG → rolehierarchy maps onto a native upstream hierarchical gang tree, the primary differentiator from LWS.Workloadlifecycle without new API surface.This is a
docs/-only change (one new file). Implementation will follow in a separate PR; existing draft #532 predates this GREP and will be revisited against the design here.Which issue(s) this PR fixes:
Refs ai-dynamo#531
Umbrella: ai-dynamo#395
Special notes for your reviewer:
ai-dynamo/grove.docs/proposals/<id>/README.mdpaths, since those directories do not yet exist onmain.make update-tocand verified bymake verify-toc.Does this PR introduce a API change?
```release-note
NONE
```
Additional documentation e.g., enhancement proposals, usage docs, etc.:
```docs
docs/proposals/531-kube-scheduler-workload-gang/README.md
```