Enhancement Description
For high-density hardware (Multi-Tenant SmartNICs, MIG GPUs, and FPGAs), the Total Capacity is often not a static property, but a variable dictated by the context of the first pod scheduled (e.g., a specific subnet or partition profile). While KEP-5075 handles the accounting, the node-side ResourceSlice updates currently suffer from an "Informer Consistency Gap" during high-concurrency bursts, leading to physical over-subscription before the scheduler's cache can reconcile with the driver.
Adding scheduler decodable generic schema under OpaqueDeviceConfiguration provides an ideal way to solve this. By embedding a synchronous CapacityHint in the claim, we can perform capacity accounting within the scheduler, bypassing the informer lag without requiring core API changes.
/assign @ashvindeodhar
/cc @johnbelamaric @pohly @sunya-ch
- One-line enhancement description (can be used as a release note): Enable transactional capacity updates in DRA via standardized opaque hints to resolve context-dependent hardware capacity management and informer consistency gaps.
- Kubernetes Enhancement Proposal:
- Discussion Link: TBD
- PRs by stage and milestone:
Please keep this description up to date. This will help the Enhancement Team to track the evolution of the enhancement efficiently.
Enhancement Description
For high-density hardware (Multi-Tenant SmartNICs, MIG GPUs, and FPGAs), the Total Capacity is often not a static property, but a variable dictated by the context of the first pod scheduled (e.g., a specific subnet or partition profile). While KEP-5075 handles the accounting, the node-side ResourceSlice updates currently suffer from an "Informer Consistency Gap" during high-concurrency bursts, leading to physical over-subscription before the scheduler's cache can reconcile with the driver.
Adding scheduler decodable generic schema under OpaqueDeviceConfiguration provides an ideal way to solve this. By embedding a synchronous CapacityHint in the claim, we can perform capacity accounting within the scheduler, bypassing the informer lag without requiring core API changes.
/assign @ashvindeodhar
/cc @johnbelamaric @pohly @sunya-ch
k/enhancements) update PR(s):k/k) update PR(s):k/website) update PR(s):Please keep this description up to date. This will help the Enhancement Team to track the evolution of the enhancement efficiently.