Skip to content

Add GREP-377 for Volcano topology-aware scheduling#590

Open
xianlubird wants to merge 1 commit into
ai-dynamo:mainfrom
xianlubird:feature/volcano-topology-backend-new
Open

Add GREP-377 for Volcano topology-aware scheduling#590
xianlubird wants to merge 1 commit into
ai-dynamo:mainfrom
xianlubird:feature/volcano-topology-backend-new

Conversation

@xianlubird
Copy link
Copy Markdown
Contributor

@xianlubird xianlubird commented May 8, 2026

Summary

  • Proposes Volcano topology-aware scheduling support for Grove through Volcano HyperNode and PodGroup networkTopology
  • Documents how Grove ClusterTopology is materialized as a Volcano HyperNode tree
  • Defines PodGang to Volcano PodGroup translation with direct subGroupPolicy usage
  • Adds future test expectations for HyperNode sync, drift detection, and topology translation

Key design decisions

  • Implement TopologyAwareSchedBackend in the Volcano backend
  • Use Volcano HyperNode as the scheduler backend topology resource
  • Translate Grove topology constraints to Volcano highestTierAllowed
  • Generate a value-level HyperNode tree from Node topology labels
  • Keep Grove PodGang as the portable scheduler API and translate to Volcano PodGroup

Fixes #571

Signed-off-by: xianlubird <xianlubird@gmail.com>
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented May 8, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@xianlubird xianlubird changed the title docs: add GREP-377 for Volcano topology-aware scheduling Add GREP-377 for Volcano topology-aware scheduling May 8, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add volcano scheduler support as a backend in Grove

1 participant