docs: implementation plan for hypershift CI WIF migration#39
docs: implementation plan for hypershift CI WIF migration#39cristianoveiga wants to merge 6 commits into
Conversation
Covers the migration of hypershift-ci service account authentication from a static JSON key to Workload Identity Federation using OpenShift CI build cluster OIDC issuers. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: cristianoveiga The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
Summary by CodeRabbit
WalkthroughAdds a new implementation plan to migrate hypershift-ci CI auth from static JSON keys to Workload Identity Federation (WIF) and updates the e2e-tests implementation doc to mark static credentials as superseded and link the new plan. Changeshypershift CI WIF migration
Sequence Diagram(s)sequenceDiagram
participant CI as CI Job (build cluster)
participant K8s as Kubernetes API
participant STS as GCP STS (sts.googleapis.com)
participant IAM as GCP IAM / hypershift-ci SA
participant Vault as Vault / Secret Store
CI->>K8s: Read projected SA token (aud = issuer URL)
K8s-->>CI: Projected token (JWT)
CI->>STS: Exchange JWT for GCP token (STS) with provider attribute_condition
STS-->>CI: Short-lived GCP access token
CI->>IAM: Impersonate `hypershift-ci` using access token (gcloud)
IAM-->>CI: Impersonated credentials for GCP APIs
CI->>Vault: Delete static service account key (post-validation)
Vault-->>CI: Confirm removal
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes 🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
|
/cc: @patjlm |
|
LGTM |
- Restructure plan as two Jira cards (WIF infrastructure + CI script migration) - Add security scoping with namespace + SA name attribute condition - Add pool/provider architecture decision (single pool, multiple providers) - Resolve build cluster OIDC availability via arm64 capability pinning - Mark Story 11 and CI Authentication decision as superseded in e2e tests plan Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@implementation-plans/hypershift-ci-wif-migration.md`:
- Line 95: The CEL condition currently hardcodes service account names in the
default expression
("assertion['kubernetes.io']['namespace'].startsWith('ci-op-') &&
assertion['kubernetes.io']['serviceaccount']['name'] in ['e2e-gke',
'e2e-v2-gke']"), which rejects periodic/current job SAs like e2e-gke-periodic;
update the CEL to broaden the allowlist by matching SA name patterns instead of
exact names (e.g., use a startsWith/regex match against
assertion['kubernetes.io']['serviceaccount']['name'] to accept variants such as
e2e-gke*, e2e-v2-gke*, or include the specific periodic SA names referenced
elsewhere) so periodic jobs will pass WIF auth.
- Line 105: The principalSet URI in the example for
google_service_account_iam_member.wif_workload_identity_user uses the wrong pool
ID format; update the principalSet://... portion to reference the actual pool ID
"openshift-ci" (with a hyphen) instead of "openshift_ci" (underscore) so the IAM
binding points to the defined pool (pool ID openshift-ci) and will succeed.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 44dd09ba-4306-4e6e-b1ea-2fd0a3e6c9cb
📒 Files selected for processing (2)
implementation-plans/hypershift-ci-wif-migration.mdimplementation-plans/hypershift-repo-gcp-hcp-e2e-tests-implementation.md
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
| variable "wif_providers" { | ||
| description = "Map of WIF OIDC providers, keyed by provider ID" | ||
| type = map(object({ | ||
| issuer_uri = string | ||
| })) | ||
| # Populate with all build clusters that have the arm64 capability | ||
| default = { | ||
| build01 = { issuer_uri = "https://build01-oidc.s3.us-east-1.amazonaws.com" } | ||
| # ... add all arm64-capable clusters | ||
| } | ||
| } | ||
| ``` |
There was a problem hiding this comment.
i'd rather read the issuer url from a yaml file in eg terraform/metadata/prow-build-clusters.yaml. Eg
clusters:
- name: build01
issuer_uri: https://build01-oidc.s3.us-east-1.amazonaws.com
api_url: https://api.build01.ci.devcluster.openshift.com:6443
- ...Instead of principalSet://.../openshift-ci/* (all pool identities), use principalSet://.../attribute.service_account/<sa-name> to restrict impersonation to specific CI test identities (e2e-gke, e2e-v2-gke). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
A question about the custom audience approach in the implementation plan. The plan mentions using Looking at ci-operator's RBAC setup (
I don't see
If option 1 is acceptable (default audience, relying on CEL condition + SA name matching as the security boundary), it's the simplest path. The CI folder isolation is the ultimate backstop either way. But worth validating the RBAC assumption before deciding. |
|
Update: Verified on build01 that CI step pods cannot call However, Kubernetes supports projected volumes with custom audiences directly in the pod spec ( Opened openshift/ci-tools#5093 to add a service_account_tokens:
- audience: gcp-hcp-ci-wif
mount_path: /var/run/secrets/wif
expiration_seconds: 3600If that lands, the custom audience approach in this implementation plan would work as designed. |
|
/hold |
Custom audience requires ci-tools#5093 which has no DPTP review timeline. The core security model (CEL condition + principalSet scoping + CI folder isolation) is sufficient without it. Use default audience (issuer URL) for initial implementation; custom audience can be added later as a defense-in-depth enhancement without infrastructure changes. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@implementation-plans/hypershift-ci-wif-migration.md`:
- Around line 102-103: The plan incorrectly claims `allowed_audiences` defaults
to the issuer URL; update the text around `allowed_audiences` to state the
actual default is the provider's canonical resource name
(https://iam.googleapis.com/.../workloadIdentityPools/.../providers/...) and
either instruct to explicitly set `allowed_audiences` to the issuer URL or call
out that the STS token `aud` must match the provider canonical resource
name—adjust the bullet about `allowed_audiences` and the guidance so it
accurately describes the default and the required configuration to avoid token
exchange failures.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Enterprise
Run ID: 629f310c-ee4a-4d1e-87a5-ec491b202d94
📒 Files selected for processing (1)
implementation-plans/hypershift-ci-wif-migration.md
GCP's default allowed_audiences is the provider's canonical resource name, not the issuer URL. The provider must explicitly set allowed_audiences to the issuer_uri so the pod's default projected SA token is accepted. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
♻️ Duplicate comments (1)
implementation-plans/hypershift-ci-wif-migration.md (1)
102-102:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winTighten wording for clarity at Line 102.
The second sentence is a fragment (“Can be tightened…”). Make it a complete sentence to avoid ambiguity in implementation docs.
✍️ Suggested doc tweak
-- `allowed_audiences` explicitly set to the provider's `issuer_uri` so that the pod's default projected SA token (whose `aud` = issuer URL) is accepted. GCP's default audience is the provider's canonical resource name, not the issuer URL, so this must be set explicitly. Can be tightened to a custom audience string when [ci-tools#5093](https://github.com/openshift/ci-tools/pull/5093) lands +- `allowed_audiences` explicitly set to the provider's `issuer_uri` so that the pod's default projected SA token (whose `aud` = issuer URL) is accepted. GCP's default audience is the provider's canonical resource name, not the issuer URL, so this must be set explicitly. This can be tightened to a custom audience string when [ci-tools#5093](https://github.com/openshift/ci-tools/pull/5093) lands.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@implementation-plans/hypershift-ci-wif-migration.md` at line 102, The second sentence in the bullet about `allowed_audiences` is a sentence fragment; change it to a complete sentence by prefixing with a subject and verb (for example: "This can be tightened to a custom audience string when ci-tools#5093 lands."), so the bullet reads clearly: explain why `allowed_audiences` must be set and then state the tightening option as a full sentence referencing the PR number `ci-tools#5093`.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Duplicate comments:
In `@implementation-plans/hypershift-ci-wif-migration.md`:
- Line 102: The second sentence in the bullet about `allowed_audiences` is a
sentence fragment; change it to a complete sentence by prefixing with a subject
and verb (for example: "This can be tightened to a custom audience string when
ci-tools#5093 lands."), so the bullet reads clearly: explain why
`allowed_audiences` must be set and then state the tightening option as a full
sentence referencing the PR number `ci-tools#5093`.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Enterprise
Run ID: 7baaf01b-d94d-4063-a72e-98acaa4eb7fd
📒 Files selected for processing (1)
implementation-plans/hypershift-ci-wif-migration.md
Summary
hypershift-ciSA authentication from static JSON key to Workload Identity Federation (WIF)gcp-hcp-infra(pool, OIDC providers, IAM bindings)openshift/releaseand revoke static keyarm64capability (all have public OIDC issuers)Test plan
arm64-capable build clusters🤖 Generated with Claude Code