registry: real control-plane resolver (k8s Service lookup by resource-id label)#28
Open
hhuuggoo wants to merge 1 commit into
Open
registry: real control-plane resolver (k8s Service lookup by resource-id label)#28hhuuggoo wants to merge 1 commit into
hhuuggoo wants to merge 1 commit into
Conversation
…e-id label
The cached/chain resolver strategies had their cache machinery built but their
LookupFunc was a STUB (conventionLookup) that fell back to a naming-template
guess. That guess can't work for Saturn inference: the model's Service name is
pd-{identity5}-{name}-{id}, which embeds the owning group and endpoint name —
neither of which phoebe receives. So static/convention were walking-skeleton
modes; the real multi-model design (cached/chain) was never wired.
Implement the real LookupFunc (internal/registry/k8s.go): resolve a deployment
id to its model Service by the saturncloud.io/resource-id label, read off the
in-cluster Kubernetes API. That label's value IS X-Saturn-Resource-Id (Saturn
stamps it on every inference Service via basic_resource_labels), so it's an
exact join key — no name reconstruction, no new Atlas API, no new headers. It's
self-correcting: the Service NAME template can change Atlas-side without breaking
phoebe.
- Select `resource-id=<id>,service-type=internal` — the service-type clause is
load-bearing: a deployment's ssh Service shares the resource-id label, so
without it the match is ambiguous. (Tested.)
- Port: prefer the port named "8000" (Route.port_name == str(container_port), and
the vLLM serve port is 8000 == Deployment.proxy_port); single-port fallback;
error rather than silently pick a wrong port on a multi-port Service. (Tested.)
- 0 matches → ErrNotFound (CachedResolver negative-caches, short TTL, so a new
model is reachable fast). API error / ambiguous / no-served-port → transient
error (NOT cached, retried). (Tested.)
- Wire k8sLookup into buildResolver for cached/chain (replacing conventionLookup).
chain keeps convention as a fallback ONLY for k8s-API-unreachable.
- New config registry.k8sNamespace (Saturn: "main-namespace"); required for
cached/chain. Fixed the misleading placeholder convention comment in
settings.example.yaml so nobody flips strategy:convention and gets silent 404s.
- Adds client-go (in-cluster client). Unit-tested with the fake clientset — no
cluster needed; 8 named tests covering the invariants + negatives + attacks.
ROUTING CONTRACT (one-way door, for Hugo's review): phoebe now depends on two
Atlas-owned k8s conventions for inference routing — (1) inference Services carry
saturncloud.io/resource-id + service-type=internal; (2) the served port is 8000.
Both hold today; this promotes them to a documented contract so an Atlas refactor
can't silently break routing. No Atlas change required to ship this.
Deploy note: the interceptor pod needs an RBAC Role granting get/list on services
in main-namespace (a saturn-k8s chart change, separate PR).
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
The
cached/chainresolver strategies had their cache built but theirLookupFuncwas a stub that fell back to a naming-template guess. That guess can't work for Saturn inference — the model's Service name ispd-{identity5}-{name}-{id}, embedding the owning group + endpoint name, neither of which phoebe receives. Sostatic/conventionwere walking-skeleton modes; the real multi-model design was never wired.This implements the real
LookupFunc: resolve a deployment id → its model Service by thesaturncloud.io/resource-idlabel, read off the in-cluster Kubernetes API. That label's value isX-Saturn-Resource-Id(Saturn stamps it on every inference Service viabasic_resource_labels), so it's an exact join key — no name reconstruction, no new Atlas API, no new headers, and self-correcting if Atlas changes the Service name template.Details
resource-id=<id>,service-type=internal— theservice-typeclause is load-bearing: a deployment's ssh Service shares the resource-id label, so without it the match is ambiguous."8000"(vLLM serve port ==Deployment.proxy_port); single-port fallback; error rather than silently pick wrong on a multi-port Service.ErrNotFound(negative-cached, short TTL → new models reachable fast). API error / ambiguous / no-served-port → transient error (not cached, retried).buildResolverforcached/chain;chainkeeps convention only as a k8s-API-unreachable fallback.registry.k8sNamespaceconfig (Saturn:main-namespace). Fixed the misleading placeholder convention comment so nobody flipsstrategy: conventionand gets silent 404s.Contracts
phoebe now depends on two Atlas-owned k8s conventions for inference routing — (1) inference Services carry
saturncloud.io/resource-id+service-type=internal; (2) the served port is 8000. Both hold today (verified inpdc); this promotes them to a documented routing contract so an Atlas refactor can't silently break routing. No Atlas change required to ship.Follow-up (separate saturn-k8s PR): the interceptor pod needs an RBAC Role granting
get/listonservicesinmain-namespace.