multi_stage: default VPN entrypoint wait timeout to 5m#5157
multi_stage: default VPN entrypoint wait timeout to 5m#5157ibm-adarsh wants to merge 1 commit intoopenshift:mainfrom
Conversation
When VPN is enabled, always pass --wait-for-file /tmp/vpn/up and --wait-timeout to the entrypoint wrapper. If vpn.yaml omits wait_timeout, use 5m0s instead of omitting the flags (which led to no meaningful wait). Explicit wait_timeout in the cluster profile still overrides.
|
Pipeline controller notification For optional jobs, comment This repository is configured in: automatic mode |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: ibm-adarsh The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
📝 WalkthroughWalkthroughThe PR introduces a default VPN wait timeout constant of 5m0s and modifies the secret wrapper logic to always apply wait-timeout configuration when VPN is enabled, using the default unless explicitly overridden. ChangesVPN Wait Timeout Default
Estimated code review effort🎯 2 (Simple) | ⏱️ ~8 minutes 🚥 Pre-merge checks | ✅ 12 | ❌ 2❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (12 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Tip 💬 Introducing Slack Agent: The best way for teams to turn conversations into code.Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.
Built for teams:
One agent for your entire SDLC. Right inside Slack. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
Hi @ibm-adarsh. Thanks for your PR. I'm waiting for a openshift member to verify that this patch is reasonable to test. If it is, they should reply with Tip We noticed you've done this a few times! Consider joining the org to skip this step and gain Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
pkg/steps/multi_stage/gen.go (1)
310-317: ⚡ Quick winAdd test coverage for the new default-timeout path.
The behavioral change — always passing
--wait-timeoutwith a default of5m0swhen VPN is enabled — isn't covered by any test in this PR. The previous conditional path (WaitTimeout != nil) presumably had some coverage. Verify both the default and the override cases are exercised inaddSecretWrappertests.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pkg/steps/multi_stage/gen.go` around lines 310 - 317, Add tests in the addSecretWrapper test suite to cover the new default-timeout path when VPN is enabled: create one test case where vpnConf is set but vpnConf.WaitTimeout == nil and assert that container.Args contains "--wait-timeout" with the value of defaultVPNWaitTimeout (5m0s), and another test case where vpnConf.WaitTimeout is non-nil to assert the override value is passed; locate the behavior around the container.Args construction in gen.go (referencing vpnConf, WaitTimeout, defaultVPNWaitTimeout, and addSecretWrapper) and update or add assertions to verify both default and explicit timeout values are present.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@pkg/steps/multi_stage/gen.go`:
- Around line 310-317: Add tests in the addSecretWrapper test suite to cover the
new default-timeout path when VPN is enabled: create one test case where vpnConf
is set but vpnConf.WaitTimeout == nil and assert that container.Args contains
"--wait-timeout" with the value of defaultVPNWaitTimeout (5m0s), and another
test case where vpnConf.WaitTimeout is non-nil to assert the override value is
passed; locate the behavior around the container.Args construction in gen.go
(referencing vpnConf, WaitTimeout, defaultVPNWaitTimeout, and addSecretWrapper)
and update or add assertions to verify both default and explicit timeout values
are present.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Central YAML (inherited)
Review profile: CHILL
Plan: Enterprise
Run ID: 2fe1442d-21f3-4c76-a082-1678ad89d0c7
📒 Files selected for processing (1)
pkg/steps/multi_stage/gen.go
|
Closing for now, issue was mainly because of |
Problem
IBM Z CI jobs that run behind the cluster-profile VPN (for example libvirt / libvirt-s390x-vpn–style workflows) were failing early in the test step. The wrapped command logs showed the entrypoint-wrapper giving up while waiting for the VPN readiness file, e.g. a warning like timeout after waiting for file /tmp/vpn/up. The main container then started (or behaved as if the wait had ended) before the VPN client had finished bringing the tunnel up, which breaks anything that needs intranet/DNS reachability from the first second.
What this change does
Whenever VPN is enabled for a multi-stage step, we always pass --wait-for-file /tmp/vpn/up and a --wait-timeout. If vpn.yaml does not define wait_timeout, we default to 5 minutes so IBM Z (and similar) jobs have time for the VPN sidecar to create /tmp/vpn/up. An explicit wait_timeout in the cluster profile still overrides this default.
https://prow.ci.openshift.org/view/gs/test-platform-results/logs/periodic-ci-openshift-multiarch-main-nightly-4.22-ocp-fips-ovn-remote-libvirt-multi-z-z/2051658376382779392
Summary by CodeRabbit