Skip to content

Retry KWOK Stage CRD wait#612

Merged
shayasoolin merged 1 commit into
ai-dynamo:mainfrom
shayasoolin:fix-kwok-stage-crd-wait
May 13, 2026
Merged

Retry KWOK Stage CRD wait#612
shayasoolin merged 1 commit into
ai-dynamo:mainfrom
shayasoolin:fix-kwok-stage-crd-wait

Conversation

@shayasoolin
Copy link
Copy Markdown
Contributor

What type of PR is this?

/kind bug

What this PR does / why we need it:

The scale-test CI hit a transient kubectl wait failure while waiting for the KWOK Stage CRD:

.status.conditions accessor error: <nil> is of the type <nil>, expected []interface{}

This PR keeps using kubectl wait, but runs it in short retry loops up to the existing timeout. That lets the CRD object finish populating status/discovery before setup fails.

Which issue(s) this PR fixes:

Fixes #550

Special notes for your reviewer:

This is a follow-up to #611. The previous PR made stage-fast.yaml mandatory; this one makes the CRD readiness wait tolerate the short window where the CRD exists but .status.conditions is not populated yet.

Does this PR introduce a API change?

NONE

Additional documentation e.g., enhancement proposals, usage docs, etc.:

NONE

@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented May 13, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@shayasoolin shayasoolin merged commit 27ca34c into ai-dynamo:main May 13, 2026
27 of 29 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Scheduled CI: scale-test trend tracking and comparison

3 participants