Skip to content

Fix issue DPTP-4756 add STS hub-account role chaining for AWS cluster profiles#5094

Open
bear-redhat wants to merge 1 commit intoopenshift:mainfrom
bear-redhat:issue/DPTP-4756
Open

Fix issue DPTP-4756 add STS hub-account role chaining for AWS cluster profiles#5094
bear-redhat wants to merge 1 commit intoopenshift:mainfrom
bear-redhat:issue/DPTP-4756

Conversation

@bear-redhat
Copy link
Copy Markdown
Contributor

@bear-redhat bear-redhat commented Apr 9, 2026

When a cluster profile has hub_role_arn and target_role_arn configured, ci-operator now injects an AWS config file with source_profile chaining and a projected ServiceAccount token into step pods. The AWS SDK performs a two-hop STS exchange (SA token -> hub role via AssumeRoleWithWebIdentity, then hub -> target via AssumeRole) with automatic credential refresh, eliminating the need for static .awscred IAM user keys.

Summary by CodeRabbit

  • New Features

    • Added AWS Secure Token Service (STS) support for cluster profiles, enabling dynamic role ARN configuration and automatic service-account token injection into test pods.
  • Tests

    • Added test coverage for STS configuration creation and pod generation with STS volumes.
  • Refactor

    • Updated filesystem path construction methods for container runtime paths.

@openshift-merge-bot
Copy link
Copy Markdown
Contributor

Pipeline controller notification
This repo is configured to use the pipeline controller. Second-stage tests will be triggered either automatically or after lgtm label is added, depending on the repository configuration. The pipeline controller will automatically detect which contexts are required and will utilize /test Prow commands to trigger the second stage.

For optional jobs, comment /test ? to see a list of all defined jobs. To trigger manually all jobs from second stage use /pipeline required command.

This repository is configured in: automatic mode

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 9, 2026

Warning

Rate limit exceeded

@bear-redhat has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 52 minutes and 52 seconds before requesting another review.

To continue reviewing without waiting, purchase usage credits in the billing tab.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: 9ddd8e4f-7f57-4281-a352-4bcd16c1c9d7

📥 Commits

Reviewing files that changed from the base of the PR and between 19fd3b1 and c0b53c5.

📒 Files selected for processing (11)
  • pkg/api/constant.go
  • pkg/api/types.go
  • pkg/steps/lease.go
  • pkg/steps/lease_test.go
  • pkg/steps/multi_stage/gen.go
  • pkg/steps/multi_stage/gen_test.go
  • pkg/steps/multi_stage/init.go
  • pkg/steps/multi_stage/init_test.go
  • pkg/steps/multi_stage/multi_stage.go
  • pkg/steps/multi_stage/testdata/zz_fixture_TestGeneratePods_STS_volumes.yaml
  • pkg/steps/pod.go
📝 Walkthrough

Walkthrough

These changes add AWS STS (Secure Token Service) support to the OpenShift CI system. New API constants and struct fields define STS role ARNs, lease steps retrieve and expose these parameters, pod generation conditionally injects STS volumes and environment variables, and a ConfigMap is created to hold rendered STS configuration. Additionally, path-joining logic is refactored from filepath.Join to path.Join for in-container paths.

Changes

Cohort / File(s) Summary
API Constants & Types
pkg/api/constant.go, pkg/api/types.go
Added two new STS parameter constants (STSHubRoleARNParam, STSTargetRoleARNParam) and two corresponding optional string fields to ClusterProfileDetails struct with YAML/JSON tags.
Lease Step Extensions
pkg/steps/lease.go, pkg/steps/lease_test.go
Extended leaseStep to store STS role ARNs and expose them via Provides() method; updated test cases to verify parameter resolution including new STS scenario with explicit hub/target role ARN values.
Pod Generation & STS Injection
pkg/steps/multi_stage/gen.go, pkg/steps/multi_stage/gen_test.go
Modified addProfile to accept and conditionally inject STS volumes, service-account token projections, and environment variables when both hub and target role ARNs are present; added test case to validate STS volume injection in generated pods; replaced filepath.Join with path.Join for in-container paths.
STS ConfigMap Creation
pkg/steps/multi_stage/init.go, pkg/steps/multi_stage/init_test.go
Implemented createSTSConfigMap() method to generate INI-formatted STS configuration (with hub and [default] profiles) and create immutable Kubernetes ConfigMap; added comprehensive unit tests validating config content and immutability.
Multi-stage Test Step Core
pkg/steps/multi_stage/multi_stage.go
Added STS role ARN fields to multiStageTestStep, implemented retrieveSTSRoleARNParams() to fetch parameters from provider, and updated run() to retrieve and conditionally create STS ConfigMap.
Path Handling Refactoring
pkg/steps/pod.go
Replaced filepath.Join with path.Join when constructing container mount paths and environment variable values for secret volume references.
Test Fixture
pkg/steps/multi_stage/testdata/zz_fixture_TestGeneratePods_STS_volumes.yaml
Added new OpenShift CI Pod manifest fixture demonstrating STS-enabled multi-stage test execution with projected service-account token volumes, STS config mounts, and AWS credential environment variables.

Sequence Diagram(s)

sequenceDiagram
    participant LeaseStep as Lease Step
    participant ParamProvider as Parameter Provider
    participant MultiStageStep as Multi-Stage Test Step
    participant PodGenerator as Pod Generator
    participant STSConfigMap as STS ConfigMap Creator
    participant Kubernetes as Kubernetes API
    
    LeaseStep->>ParamProvider: Provides STS hub/target role ARNs
    ParamProvider-->>LeaseStep: Return ARN values
    
    MultiStageStep->>MultiStageStep: retrieveSTSRoleARNParams()<br/>(fetch from parameters)
    MultiStageStep->>MultiStageStep: Store stsHubRoleARN,<br/>stsTargetRoleARN
    
    MultiStageStep->>PodGenerator: generatePods(with STS params)
    PodGenerator->>PodGenerator: addProfile(hubRoleARN,<br/>targetRoleARN)
    PodGenerator->>PodGenerator: Inject STS volumes & env vars<br/>(AWS_CONFIG_FILE,<br/>AWS_SDK_LOAD_CONFIG)
    
    MultiStageStep->>STSConfigMap: createSTSConfigMap(ctx)
    STSConfigMap->>STSConfigMap: Render INI config<br/>[hub] & [default] profiles
    STSConfigMap->>Kubernetes: Create immutable ConfigMap<br/>with STS config
    Kubernetes-->>STSConfigMap: ConfigMap created
    
    PodGenerator->>Kubernetes: Mount STS config volume<br/>& service-account token
    Kubernetes-->>PodGenerator: Pod with STS ready
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~22 minutes

🚥 Pre-merge checks | ✅ 9 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (9 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately reflects the main objective: adding STS hub-account role chaining for AWS cluster profiles to resolve DPTP-4756.
Stable And Deterministic Test Names ✅ Passed Test files contain hardcoded, stable test scenario names without dynamic content or timestamps.
Test Structure And Quality ✅ Passed Tests demonstrate single responsibility with focused scenarios, proper setup/cleanup with fake clients, meaningful assertion messages for debugging, and consistency with established codebase patterns.
Microshift Test Compatibility ✅ Passed This PR does not add any Ginkgo e2e tests. All modifications are unit tests using Go's standard testing package without Ginkgo patterns.
Single Node Openshift (Sno) Test Compatibility ✅ Passed PR modifies internal ci-tools components with standard Go unit tests, not Ginkgo e2e tests for user-facing functionality. SNO compatibility check does not apply.
Topology-Aware Scheduling Compatibility ✅ Passed PR introduces AWS STS support through volume projection and environment variables only, with no topology-incompatible scheduling constraints added.
Ote Binary Stdout Contract ✅ Passed Pull request modifications do not contain stdout writes in process-level code that would violate the OTE Binary Stdout Contract.
Ipv6 And Disconnected Network Test Compatibility ✅ Passed This PR contains only standard Go unit tests using the testing package, not Ginkgo e2e tests, making the IPv6 and disconnected network compatibility check not applicable.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

  • Generate code and open pull requests
  • Plan features and break down work
  • Investigate incidents and troubleshoot customer tickets together
  • Automate recurring tasks and respond to alerts with triggers
  • Summarize progress and report instantly

Built for teams:

  • Shared memory across your entire org—no repeating context
  • Per-thread sandboxes to safely plan and execute work
  • Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Apr 9, 2026

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: bear-redhat

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci Bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Apr 9, 2026
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (3)
pkg/steps/multi_stage/gen_test.go (3)

859-867: TestAddProfileWithSTS is under-asserted for STS wiring.

This test only checks token volume existence. If addProfile stops wiring config volume or STS env vars, this still passes. Please assert config volume and AWS_CONFIG_FILE/AWS_SDK_LOAD_CONFIG too.

Suggested assertion expansion
 	hasStsTokenVol := false
+	hasStsConfigVol := false
 	for _, v := range pod.Spec.Volumes {
 		if v.Name == stsTokenVolumeName {
 			hasStsTokenVol = true
 		}
+		if v.Name == stsConfigVolumeName {
+			hasStsConfigVol = true
+		}
 	}
 	if !hasStsTokenVol {
 		t.Error("expected STS volumes when hub and target role ARNs are set")
 	}
+	if !hasStsConfigVol {
+		t.Error("expected STS config volume when hub and target role ARNs are set")
+	}
+
+	hasConfigFile := false
+	hasLoadConfig := false
+	for _, e := range pod.Spec.Containers[0].Env {
+		if e.Name == "AWS_CONFIG_FILE" {
+			hasConfigFile = true
+		}
+		if e.Name == "AWS_SDK_LOAD_CONFIG" {
+			hasLoadConfig = true
+		}
+	}
+	if !hasConfigFile || !hasLoadConfig {
+		t.Error("expected STS env vars when hub and target role ARNs are set")
+	}

As per coding guidelines, "Focus on major issues impacting performance, readability, maintainability and security. Avoid nitpicks and avoid verbosity."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/steps/multi_stage/gen_test.go` around lines 859 - 867,
TestAddProfileWithSTS currently only checks for the presence of
stsTokenVolumeName; extend the assertions to also verify that the config volume
is present on pod.Spec.Volumes, that the container env includes AWS_CONFIG_FILE
set to the expected path, and that AWS_SDK_LOAD_CONFIG is set/true. Locate the
test function TestAddProfileWithSTS and the call to addProfile, then add checks
similar to the stsTokenVolumeName loop for a config volume name (the same name
used by addProfile), and iterate pod.Spec.Containers[*].Env to assert env var
names AWS_CONFIG_FILE and AWS_SDK_LOAD_CONFIG have the correct values.

769-777: Guard projected source and expiration checks to avoid panic-driven failures.

Line 772 and Line 776 can panic if the projection source list is empty or ExpirationSeconds is nil, which makes regressions harder to diagnose from test output.

Proposed test hardening
-			src := v.VolumeSource.Projected.Sources[0].ServiceAccountToken
+			if len(v.VolumeSource.Projected.Sources) == 0 || v.VolumeSource.Projected.Sources[0].ServiceAccountToken == nil {
+				t.Fatal("expected serviceAccountToken projection source")
+			}
+			src := v.VolumeSource.Projected.Sources[0].ServiceAccountToken
 			if src.Audience != "sts.amazonaws.com" {
 				t.Errorf("expected audience sts.amazonaws.com, got %s", src.Audience)
 			}
-			if *src.ExpirationSeconds != 86400 {
+			if src.ExpirationSeconds == nil {
+				t.Fatal("expected token expiration seconds to be set")
+			}
+			if *src.ExpirationSeconds != 86400 {
 				t.Errorf("expected 86400 expiration, got %d", *src.ExpirationSeconds)
 			}

As per coding guidelines, "Focus on major issues impacting performance, readability, maintainability and security. Avoid nitpicks and avoid verbosity."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/steps/multi_stage/gen_test.go` around lines 769 - 777, The test
dereferences v.VolumeSource.Projected.Sources[0] and *src.ExpirationSeconds
without safety checks, which can panic if Sources is empty or ExpirationSeconds
is nil; update the checks in the test (around the VolumeSource.Projected,
Sources, ServiceAccountToken and ExpirationSeconds usages) to first assert
projected != nil, len(projected.Sources) > 0, that
projected.Sources[0].ServiceAccountToken is not nil, and that
ServiceAccountToken.ExpirationSeconds is not nil before dereferencing, and use
t.Fatalf/t.Errorf with clear messages when those preconditions fail so tests
surface readable failures instead of panics.

870-892: Add one-ARN boundary cases to lock role-chaining behavior.

The negative coverage only tests both ARNs empty. Add hub-only and target-only cases so the “both required” condition is explicitly protected against regressions.

As per coding guidelines, "Focus on major issues impacting performance, readability, maintainability and security. Avoid nitpicks and avoid verbosity."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/steps/multi_stage/gen_test.go` around lines 870 - 892, Add two unit tests
mirroring TestAddProfileWithoutSTS to cover one-ARN boundary cases: create
TestAddProfileWithHubOnly and TestAddProfileWithTargetOnly that call addProfile
with only the hub ARN set (non-empty) and only the target ARN set (non-empty)
respectively, using the same pod template as TestAddProfileWithoutSTS, and
assert that no STS volumes (stsTokenVolumeName, stsConfigVolumeName) are added
and no STS env vars (AWS_CONFIG_FILE, AWS_SDK_LOAD_CONFIG) are set on the
container; keep assertions identical to the existing test to ensure the "both
ARNs required" behavior is protected.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@pkg/steps/multi_stage/gen.go`:
- Around line 244-245: The bug is that STS-enabled steps with no_kubeconfig
still leave the pod ServiceAccountName empty, causing the
ServiceAccountTokenProjection created by addSTSVolumes (called from addProfile)
to mint a token from the default SA instead of the test SA that setupRBAC
creates; update the logic in addProfile (and the same pattern at the other
occurrence noted) to set pod.Spec.ServiceAccountName = s.name whenever both
s.stsHubRoleARN and s.stsTargetRoleARN are non-empty (i.e., STS enabled),
regardless of no_kubeconfig, so the ServiceAccountTokenProjection uses the test
SA configured by setupRBAC.

In `@pkg/steps/multi_stage/multi_stage.go`:
- Around line 245-250: The code reads STS params without guarding s.params,
which can panic if nil; update the run logic to check s.params != nil before
calling s.params.Get for api.STSHubRoleARNParam and api.STSTargetRoleARNParam,
and only set s.stsHubRoleARN and s.stsTargetRoleARN when the guard passes and
Get returns a non-empty value (preserve existing nil/error handling). Locate the
block using s.params.Get(...) in the multi_stage step (symbols: s.params, Get,
api.STSHubRoleARNParam, api.STSTargetRoleARNParam, s.stsHubRoleARN,
s.stsTargetRoleARN) and wrap those calls with a simple nil-check so no Get is
invoked on a nil params.

---

Nitpick comments:
In `@pkg/steps/multi_stage/gen_test.go`:
- Around line 859-867: TestAddProfileWithSTS currently only checks for the
presence of stsTokenVolumeName; extend the assertions to also verify that the
config volume is present on pod.Spec.Volumes, that the container env includes
AWS_CONFIG_FILE set to the expected path, and that AWS_SDK_LOAD_CONFIG is
set/true. Locate the test function TestAddProfileWithSTS and the call to
addProfile, then add checks similar to the stsTokenVolumeName loop for a config
volume name (the same name used by addProfile), and iterate
pod.Spec.Containers[*].Env to assert env var names AWS_CONFIG_FILE and
AWS_SDK_LOAD_CONFIG have the correct values.
- Around line 769-777: The test dereferences v.VolumeSource.Projected.Sources[0]
and *src.ExpirationSeconds without safety checks, which can panic if Sources is
empty or ExpirationSeconds is nil; update the checks in the test (around the
VolumeSource.Projected, Sources, ServiceAccountToken and ExpirationSeconds
usages) to first assert projected != nil, len(projected.Sources) > 0, that
projected.Sources[0].ServiceAccountToken is not nil, and that
ServiceAccountToken.ExpirationSeconds is not nil before dereferencing, and use
t.Fatalf/t.Errorf with clear messages when those preconditions fail so tests
surface readable failures instead of panics.
- Around line 870-892: Add two unit tests mirroring TestAddProfileWithoutSTS to
cover one-ARN boundary cases: create TestAddProfileWithHubOnly and
TestAddProfileWithTargetOnly that call addProfile with only the hub ARN set
(non-empty) and only the target ARN set (non-empty) respectively, using the same
pod template as TestAddProfileWithoutSTS, and assert that no STS volumes
(stsTokenVolumeName, stsConfigVolumeName) are added and no STS env vars
(AWS_CONFIG_FILE, AWS_SDK_LOAD_CONFIG) are set on the container; keep assertions
identical to the existing test to ensure the "both ARNs required" behavior is
protected.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: a73b888d-14ec-48a2-b84f-977b19b2347d

📥 Commits

Reviewing files that changed from the base of the PR and between e55e4bb and aa6af91.

📒 Files selected for processing (9)
  • pkg/api/constant.go
  • pkg/api/types.go
  • pkg/steps/lease.go
  • pkg/steps/lease_test.go
  • pkg/steps/multi_stage/gen.go
  • pkg/steps/multi_stage/gen_test.go
  • pkg/steps/multi_stage/init.go
  • pkg/steps/multi_stage/init_test.go
  • pkg/steps/multi_stage/multi_stage.go

Comment on lines +244 to 245
addProfile(profileSecret, s.profile, s.stsHubRoleARN, s.stsTargetRoleARN, pod)
}
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot Apr 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# First, let's look at the file structure and find the key sections
fd pkg/steps/multi_stage/gen.go --type f

Repository: openshift/ci-tools

Length of output: 44


🏁 Script executed:

# Read the relevant sections of gen.go
cat -n pkg/steps/multi_stage/gen.go | sed -n '28,31p; 155,162p; 240,250p; 510,520p; 520,570p'

Repository: openshift/ci-tools

Length of output: 3219


🏁 Script executed:

# Also check the init.go file for setupRBAC
cat -n pkg/steps/multi_stage/init.go | grep -A 20 "func.*setupRBAC"

Repository: openshift/ci-tools

Length of output: 890


🏁 Script executed:

# Check the addProfile function signature
ast-grep --pattern 'func addProfile($_, $_, $_, $_, $_) {
  $$$
}'

Repository: openshift/ci-tools

Length of output: 44


🏁 Script executed:

# Check where needsKubeConfig is determined
rg "isKubeconfigNeeded" pkg/steps/multi_stage/gen.go -A 5 -B 2

Repository: openshift/ci-tools

Length of output: 672


🏁 Script executed:

# Check the generatePods function to understand the full flow
sed -n '100,250p' pkg/steps/multi_stage/gen.go | cat -n

Repository: openshift/ci-tools

Length of output: 7413


🏁 Script executed:

# Check if there are other places where ServiceAccountName is modified
rg "ServiceAccountName" pkg/steps/multi_stage/gen.go -B 2 -A 2

Repository: openshift/ci-tools

Length of output: 292


🏁 Script executed:

# Understand the relationship between addProfile and STS
sed -n '490,530p' pkg/steps/multi_stage/gen.go | cat -n

Repository: openshift/ci-tools

Length of output: 1633


🏁 Script executed:

# Check if a no_kubeconfig step can have s.profile set (which enables STS if ARNs are present)
rg "s\.profile|no_kubeconfig" pkg/steps/multi_stage/ -B 2 -A 2 | head -100

Repository: openshift/ci-tools

Length of output: 2971


🏁 Script executed:

# Check the setupRBAC function to confirm it creates SA named s.name
sed -n '251,280p' pkg/steps/multi_stage/init.go | cat -n

Repository: openshift/ci-tools

Length of output: 1291


STS-enabled pods can run under the wrong ServiceAccount for no_kubeconfig steps.

When a step has no_kubeconfig enabled, the pod's ServiceAccountName is set to empty string (defaults to the default SA), with automount disabled. However, if the step also has a cluster profile with STS enabled (both stsHubRoleARN and stsTargetRoleARN non-empty), the addProfile function calls addSTSVolumes, which creates a ServiceAccountTokenProjection for STS. This projection will mint a token from the default SA, not the test SA (s.name) that setupRBAC creates and configures for AWS trust. This causes an IAM trust mismatch and breaks STS authentication.

The fix is to select the test SA whenever STS is enabled, not only when kubeconfig is needed:

Proposed fix
-		needsKubeConfig := isKubeconfigNeeded(&step, genPodOpts)
-		if needsKubeConfig {
+		needsKubeConfig := isKubeconfigNeeded(&step, genPodOpts)
+		needsSTS := s.stsHubRoleARN != "" && s.stsTargetRoleARN != ""
+		if needsKubeConfig || needsSTS {
 			pod.Spec.ServiceAccountName = s.name
 		} else {
 			pod.Spec.ServiceAccountName = ""
 			no := false
 			pod.Spec.AutomountServiceAccountToken = &no
 		}

Also applies to: 515-517

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/steps/multi_stage/gen.go` around lines 244 - 245, The bug is that
STS-enabled steps with no_kubeconfig still leave the pod ServiceAccountName
empty, causing the ServiceAccountTokenProjection created by addSTSVolumes
(called from addProfile) to mint a token from the default SA instead of the test
SA that setupRBAC creates; update the logic in addProfile (and the same pattern
at the other occurrence noted) to set pod.Spec.ServiceAccountName = s.name
whenever both s.stsHubRoleARN and s.stsTargetRoleARN are non-empty (i.e., STS
enabled), regardless of no_kubeconfig, so the ServiceAccountTokenProjection uses
the test SA configured by setupRBAC.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this concerns is valid.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@danilo-gemoli, agreed — let me plan the fix.

Fix: use test ServiceAccount for STS-enabled pods regardless of no_kubeconfig

  • In generatePods, introduce a needsSTS guard alongside needsKubeConfig
  • When either is true, keep pod.Spec.ServiceAccountName = s.name and leave automount alone
  • This ensures the ServiceAccountTokenProjection added by addSTSVolumes mints a token from the test SA that setupRBAC has wired to the IAM trust policy

Failed to handle agent chat message. Please try again.

Comment thread pkg/steps/multi_stage/multi_stage.go Outdated
Comment thread pkg/api/types.go
LeaseType string `yaml:"lease_type,omitempty" json:"lease_type,omitempty"`
Secret string `yaml:"secret,omitempty" json:"secret,omitempty"`
ConfigMap string `yaml:"config_map,omitempty" json:"config_map,omitempty"`
HubRoleARN string `yaml:"hub_role_arn,omitempty" json:"hub_role_arn,omitempty"`
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment thread pkg/steps/multi_stage/multi_stage.go Outdated
Comment thread pkg/steps/multi_stage/gen.go Outdated
},
})

stsConfigCMName := pod.Labels[MultiStageTestLabel] + "-sts-config"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should reuse stsConfigMapName here.

vpnContainerName = "vpn-client"
leaseProxyScriptsMountPath = "/opt/scripts/lease-proxy"
stsTokenVolumeName = "aws-sts-token"
stsTokenMountPath = "/var/run/secrets/aws/sts-token"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we pass the value of stsTokenMountPath down to the test pod as an env variable?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is consumed by the file generation, not by the tests:

	configContent := fmt.Sprintf("[profile hub]\nweb_identity_token_file = /var/run/secrets/aws/sts-token/token\nrole_arn = %s\n\n[default]\nrole_arn = %s\nsource_profile = hub\n", s.stsHubRoleARN, s.stsTargetRoleARN)

Comment on lines +244 to 245
addProfile(profileSecret, s.profile, s.stsHubRoleARN, s.stsTargetRoleARN, pod)
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this concerns is valid.

Comment thread pkg/steps/lease_test.go Outdated
}
}

func TestProvidesSTS(t *testing.T) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you try reusing the test TestAcquireLeases rather than creating a new one?

Comment thread pkg/steps/multi_stage/gen_test.go Outdated
}
}

func TestAddSTSVolumes(t *testing.T) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe that these three tests added here might be redundant with TestGeneratePods. Consider using TestGeneratePods if possible.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

♻️ Duplicate comments (1)
pkg/steps/multi_stage/gen.go (1)

155-162: ⚠️ Potential issue | 🟠 Major

STS should not be blocked by no_kubeconfig.

Projected service-account tokens only need the pod to run as s.name; they do not require the shared kubeconfig secret. With the current logic, STS-enabled steps that set no_kubeconfig are rejected entirely, and ServiceAccountName is still only selected from needsKubeConfig, so the new STS path cannot be used for those steps.

Proposed fix
 		pod.Labels[MultiStageTestLabel] = s.name
 		needsKubeConfig := isKubeconfigNeeded(&step, genPodOpts)
-		if needsKubeConfig {
+		needsSTS := s.stsHubRoleARN != "" && s.stsTargetRoleARN != ""
+		if needsKubeConfig || needsSTS {
 			pod.Spec.ServiceAccountName = s.name
 		} else {
 			pod.Spec.ServiceAccountName = ""
 			no := false
 			pod.Spec.AutomountServiceAccountToken = &no
@@
-		if s.profile != "" {
-			if !needsKubeConfig && s.stsHubRoleARN != "" && s.stsTargetRoleARN != "" {
-				errs = append(errs, fmt.Errorf("step %s sets no_kubeconfig but the test has STS enabled (hub_role_arn=%s, target_role_arn=%s); STS requires kubeconfig", step.As, s.stsHubRoleARN, s.stsTargetRoleARN))
-				continue
-			}
+		if s.profile != "" {
 			profileSecret, err := s.profileSecretName()
 			if err != nil {
 				return nil, nil, fmt.Errorf("get profile secret name: %w", err)
 			}
 			addProfile(profileSecret, s.profile, s.stsHubRoleARN, s.stsTargetRoleARN, pod)

Also applies to: 240-242

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/steps/multi_stage/gen.go` around lines 155 - 162, The current logic only
assigns pod.Spec.ServiceAccountName = s.name when isKubeconfigNeeded(&step,
genPodOpts) is true, which blocks the STS/projected service-account-token path
when no_kubeconfig is set; change the branch so ServiceAccountName is set when
either kubeconfig is needed OR a projected service-account token (STS) is
required for the step. Concretely, compute needsKubeConfig :=
isKubeconfigNeeded(&step, genPodOpts) and compute a separate predicate (e.g.,
needsProjectedSA or isSTSRequired) that detects the STS/projected-token case for
this step, then set pod.Spec.ServiceAccountName = s.name if needsKubeConfig ||
needsProjectedSA; otherwise clear ServiceAccountName and set
AutomountServiceAccountToken = &no as before; update the same logic used at the
other occurrence (lines ~240-242) for consistency.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In
`@pkg/steps/multi_stage/testdata/zz_fixture_TestGeneratePods_enable_nested_podman.yaml`:
- Around line 28-29: In generatePods(), replace any use of filepath.Join that
assembles container-internal paths (e.g., the calls building entrypoint paths
and kubeconfig mounts) with path.Join (or manual "/" concatenation) so the join
always uses forward slashes; specifically locate the filepath.Join usages inside
the generatePods function and change them to path.Join, ensure you import "path"
instead of using OS-dependent "path/filepath", and run unit tests to confirm
fixtures no longer produce backslash-separated paths like
"\tmp\entrypoint-wrapper" or "\tools\entrypoint".

In
`@pkg/steps/multi_stage/testdata/zz_fixture_TestGeneratePods_generate_pods.yaml`:
- Around line 25-27: The fixture's expected mount and command paths use Windows
backslashes (e.g. "\tools\entrypoint",
"\tmp\entrypoint-wrapper\entrypoint-wrapper", "\var\run\secrets\...") but the
pods actually mount Linux-style paths (/tools, /tmp/entrypoint-wrapper,
/var/run/secrets/...), so update the YAML in
zz_fixture_TestGeneratePods_generate_pods.yaml to normalize those expected paths
to Linux-style forward-slash paths wherever they appear (including the repeated
pod entries later in the file referenced in the comment), ensuring the "command"
entries and any "volumeMounts"/path strings match the actual mounted paths
consistently.

In
`@pkg/steps/multi_stage/testdata/zz_fixture_TestGeneratePods_lease_proxy_server_available.yaml`:
- Around line 28-29: The fixture uses Windows-style backslash paths (e.g.
"\tmp\entrypoint-wrapper\entrypoint-wrapper", "\tools\entrypoint",
"\var\run\secrets...") which diverge from the Linux pod spec; update all such
path strings in
pkg/steps/multi_stage/testdata/zz_fixture_TestGeneratePods_lease_proxy_server_available.yaml
to use POSIX forward slashes (e.g. "/tmp/entrypoint-wrapper/entrypoint-wrapper",
"/tools/entrypoint", "/var/run/secrets/..."), and apply the same replacements
for the other occurrences noted in the file (the args/env JSON sections and the
repeated path lists) so the snapshot matches the pod layout used by
GeneratePods.

---

Duplicate comments:
In `@pkg/steps/multi_stage/gen.go`:
- Around line 155-162: The current logic only assigns
pod.Spec.ServiceAccountName = s.name when isKubeconfigNeeded(&step, genPodOpts)
is true, which blocks the STS/projected service-account-token path when
no_kubeconfig is set; change the branch so ServiceAccountName is set when either
kubeconfig is needed OR a projected service-account token (STS) is required for
the step. Concretely, compute needsKubeConfig := isKubeconfigNeeded(&step,
genPodOpts) and compute a separate predicate (e.g., needsProjectedSA or
isSTSRequired) that detects the STS/projected-token case for this step, then set
pod.Spec.ServiceAccountName = s.name if needsKubeConfig || needsProjectedSA;
otherwise clear ServiceAccountName and set AutomountServiceAccountToken = &no as
before; update the same logic used at the other occurrence (lines ~240-242) for
consistency.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: 313e7c1c-5aaa-46c9-87f1-32190094efb4

📥 Commits

Reviewing files that changed from the base of the PR and between aa6af91 and c5d13da.

📒 Files selected for processing (13)
  • pkg/api/constant.go
  • pkg/api/types.go
  • pkg/steps/lease.go
  • pkg/steps/lease_test.go
  • pkg/steps/multi_stage/gen.go
  • pkg/steps/multi_stage/gen_test.go
  • pkg/steps/multi_stage/init.go
  • pkg/steps/multi_stage/init_test.go
  • pkg/steps/multi_stage/multi_stage.go
  • pkg/steps/multi_stage/testdata/zz_fixture_TestGeneratePods_STS_volumes.yaml
  • pkg/steps/multi_stage/testdata/zz_fixture_TestGeneratePods_enable_nested_podman.yaml
  • pkg/steps/multi_stage/testdata/zz_fixture_TestGeneratePods_generate_pods.yaml
  • pkg/steps/multi_stage/testdata/zz_fixture_TestGeneratePods_lease_proxy_server_available.yaml
✅ Files skipped from review due to trivial changes (2)
  • pkg/steps/multi_stage/multi_stage.go
  • pkg/steps/multi_stage/testdata/zz_fixture_TestGeneratePods_STS_volumes.yaml
🚧 Files skipped from review as they are similar to previous changes (3)
  • pkg/api/constant.go
  • pkg/steps/lease.go
  • pkg/steps/multi_stage/gen_test.go

Comment on lines +25 to +27
- \tools\entrypoint
command:
- /tmp/entrypoint-wrapper/entrypoint-wrapper
- \tmp\entrypoint-wrapper\entrypoint-wrapper
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

These expected paths no longer match the mounted Linux paths.

The updated snapshot now expects \tools\entrypoint, \tmp\entrypoint-wrapper\..., and \var\run\secrets\..., while the same pod still mounts /tools, /tmp/entrypoint-wrapper, and /var/run/secrets/.... That makes the fixture internally inconsistent, and the same mismatch repeats in the later pod entries in this file.

Also applies to: 66-67, 85-89, 125-147

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@pkg/steps/multi_stage/testdata/zz_fixture_TestGeneratePods_generate_pods.yaml`
around lines 25 - 27, The fixture's expected mount and command paths use Windows
backslashes (e.g. "\tools\entrypoint",
"\tmp\entrypoint-wrapper\entrypoint-wrapper", "\var\run\secrets\...") but the
pods actually mount Linux-style paths (/tools, /tmp/entrypoint-wrapper,
/var/run/secrets/...), so update the YAML in
zz_fixture_TestGeneratePods_generate_pods.yaml to normalize those expected paths
to Linux-style forward-slash paths wherever they appear (including the repeated
pod entries later in the file referenced in the comment), ensuring the "command"
entries and any "volumeMounts"/path strings match the actual mounted paths
consistently.

Comment on lines +28 to +29
- \tmp\entrypoint-wrapper\entrypoint-wrapper
- \tools\entrypoint
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Backslash paths make this snapshot inconsistent with the pod spec.

This fixture still mounts /tools, /tmp/entrypoint-wrapper, /logs, and /var/run/secrets/..., but the expected args/env JSON now point at \tools\..., \tmp\..., and \var\run\.... The snapshot no longer matches the Linux pod layout it is asserting and is likely to fail in CI.

Also applies to: 68-69, 81-85, 123-145

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@pkg/steps/multi_stage/testdata/zz_fixture_TestGeneratePods_lease_proxy_server_available.yaml`
around lines 28 - 29, The fixture uses Windows-style backslash paths (e.g.
"\tmp\entrypoint-wrapper\entrypoint-wrapper", "\tools\entrypoint",
"\var\run\secrets...") which diverge from the Linux pod spec; update all such
path strings in
pkg/steps/multi_stage/testdata/zz_fixture_TestGeneratePods_lease_proxy_server_available.yaml
to use POSIX forward slashes (e.g. "/tmp/entrypoint-wrapper/entrypoint-wrapper",
"/tools/entrypoint", "/var/run/secrets/..."), and apply the same replacements
for the other occurrences noted in the file (the args/env JSON sections and the
repeated path lists) so the snapshot matches the pod layout used by
GeneratePods.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
pkg/steps/multi_stage/gen.go (1)

240-242: ⚠️ Potential issue | 🟠 Major

Don't hard-fail no_kubeconfig for STS-enabled steps.

These lines turn a valid STS-only step into an error. The new flow uses the projected ServiceAccount token plus AWS_CONFIG_FILE; it does not require the kubeconfig secret. The fix is to keep no_kubeconfig behavior, but still run the pod as s.name whenever both STS role ARNs are set so the token projection uses the test ServiceAccount.

Suggested direction
 		pod.Labels[MultiStageTestLabel] = s.name
 		needsKubeConfig := isKubeconfigNeeded(&step, genPodOpts)
-		if needsKubeConfig {
+		needsSTS := s.stsHubRoleARN != "" && s.stsTargetRoleARN != ""
+		if needsKubeConfig || needsSTS {
 			pod.Spec.ServiceAccountName = s.name
 		} else {
 			pod.Spec.ServiceAccountName = ""
 			no := false
 			pod.Spec.AutomountServiceAccountToken = &no
@@
-		if s.profile != "" {
-			if !needsKubeConfig && s.stsHubRoleARN != "" && s.stsTargetRoleARN != "" {
-				errs = append(errs, fmt.Errorf("step %s sets no_kubeconfig but the test has STS enabled (hub_role_arn=%s, target_role_arn=%s); STS requires kubeconfig", step.As, s.stsHubRoleARN, s.stsTargetRoleARN))
-				continue
-			}
+		if s.profile != "" {
 			profileSecret, err := s.profileSecretName()
 			if err != nil {
 				return nil, nil, fmt.Errorf("get profile secret name: %w", err)
 			}
 			addProfile(profileSecret, s.profile, s.stsHubRoleARN, s.stsTargetRoleARN, pod)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/steps/multi_stage/gen.go` around lines 240 - 242, The current code
hard-fails when !needsKubeConfig and both s.stsHubRoleARN and s.stsTargetRoleARN
are set; change this so we do not append an error for STS-only steps but keep
the no_kubeconfig behavior: detect the case where !needsKubeConfig &&
s.stsHubRoleARN != "" && s.stsTargetRoleARN != "" and instead ensure the pod is
configured to run as the test ServiceAccount (use s.name) so the projected
ServiceAccount token + AWS_CONFIG_FILE flow works; leave the original error for
other cases where STS is not fully configured.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@pkg/steps/multi_stage/gen.go`:
- Around line 240-242: The current code hard-fails when !needsKubeConfig and
both s.stsHubRoleARN and s.stsTargetRoleARN are set; change this so we do not
append an error for STS-only steps but keep the no_kubeconfig behavior: detect
the case where !needsKubeConfig && s.stsHubRoleARN != "" && s.stsTargetRoleARN
!= "" and instead ensure the pod is configured to run as the test ServiceAccount
(use s.name) so the projected ServiceAccount token + AWS_CONFIG_FILE flow works;
leave the original error for other cases where STS is not fully configured.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: 836d4c23-c52b-465c-8f90-874c5b0c1b3d

📥 Commits

Reviewing files that changed from the base of the PR and between c5d13da and 19fd3b1.

📒 Files selected for processing (11)
  • pkg/api/constant.go
  • pkg/api/types.go
  • pkg/steps/lease.go
  • pkg/steps/lease_test.go
  • pkg/steps/multi_stage/gen.go
  • pkg/steps/multi_stage/gen_test.go
  • pkg/steps/multi_stage/init.go
  • pkg/steps/multi_stage/init_test.go
  • pkg/steps/multi_stage/multi_stage.go
  • pkg/steps/multi_stage/testdata/zz_fixture_TestGeneratePods_STS_volumes.yaml
  • pkg/steps/pod.go
✅ Files skipped from review due to trivial changes (3)
  • pkg/api/constant.go
  • pkg/steps/multi_stage/init_test.go
  • pkg/steps/multi_stage/testdata/zz_fixture_TestGeneratePods_STS_volumes.yaml
🚧 Files skipped from review as they are similar to previous changes (2)
  • pkg/steps/multi_stage/gen_test.go
  • pkg/api/types.go

@openshift-ci openshift-ci Bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 6, 2026
When a cluster profile has `hub_role_arn` and `target_role_arn` configured,
ci-operator now injects an AWS config file with source_profile chaining and
a projected ServiceAccount token into step pods. The AWS SDK performs a
two-hop STS exchange (SA token -> hub role via AssumeRoleWithWebIdentity,
then hub -> target via AssumeRole) with automatic credential refresh,
eliminating the need for static .awscred IAM user keys.
@openshift-ci openshift-ci Bot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 6, 2026
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 6, 2026

@bear-redhat: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/checkconfig c0b53c5 link true /test checkconfig
ci/prow/breaking-changes c0b53c5 link false /test breaking-changes

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants