Skip to content

Include --ci.runId in test ProjectKey to prevent cross-suite collisions#3555

Open
naveenku-jfrog wants to merge 13 commits into
masterfrom
fix/projectkey-runid-isolation
Open

Include --ci.runId in test ProjectKey to prevent cross-suite collisions#3555
naveenku-jfrog wants to merge 13 commits into
masterfrom
fix/projectkey-runid-isolation

Conversation

@naveenku-jfrog

Copy link
Copy Markdown
Collaborator

The test-only ProjectKey was suffixed with only the last 7 digits of the unix timestamp, so two test runs starting within the same calendar second on a shared JPD produced byte-identical keys (e.g. "prj1777802"). Because createTestProject() calls deleteProjectIfExists(tests.ProjectKey) unconditionally before creating the project, one suite would silently nuke another concurrent suite's project (and every release bundle inside it), showing up as flaky "not found" failures in lifecycle and artifactory project tests on shared JFrog instances.

Splice the sanitized --ci.runId into the key so concurrent runs get isolated keys (e.g. "prjlinux-lifecycle-1777802"), while still respecting the 2-32 lowercase-alphanumeric-hyphen project-key format and the "starts with a letter" rule. A new SanitizedCiRunId() helper exposes the runId in a charset-safe form for any future callers that need the same.

No behavior change when --ci.runId is unset (local single-suite runs and upstream workflows that bootstrap a fresh JPD per job).

  • All tests have passed. If this feature is not already covered by the tests, new tests have been added.
  • The pull request is targeting the master branch.
  • The code has been validated to compile successfully by running go vet ./....
  • The code has been formatted properly using go fmt ./....

Intermittent "Project not found" failures occur when lifecycle and
artifactory test suites run concurrently against the same JFrog Platform
instance. createTestProject() calls deleteProjectIfExists(tests.ProjectKey)
unconditionally before creating the project, so if two suites resolve to
the same ProjectKey one will silently delete the other's project (and every
release bundle inside it).

utils/tests/utils.go:
- Add SanitizedCiRunId() helper that converts --ci.runId to a valid
  project-key string (lowercase, non-alphanumeric chars replaced with
  hyphens, leading/trailing hyphens trimmed).
- Splice the sanitized runId into the test ProjectKey so concurrent suites
  get distinct keys (e.g. "prjlinux-lifecycle-1781784930" vs
  "prjlinux-artifactory-1781784935"). Previously both suites resolved to
  the same 7-digit-suffixed key.
- Use the full 10-digit Unix timestamp instead of the last 7 digits,
  giving 1000x more distinct values and making same-second collisions
  between independent CI runs effectively impossible.

Co-authored-by: Cursor <cursoragent@cursor.com>
…e using it

After createTestProject() succeeds, each Artifactory node behind the load
balancer may still have a stale in-memory cache of Access project data.
This causes intermittent 400 "Project not found" on build-publish and
release-bundle creation when a request is routed to a node whose cache
has not yet refreshed.

Add waitForProjectInArtifactory() which polls GET /api/repositories/<key>-build-info
(the repo Artifactory auto-creates per project) and requires 5 consecutive
200 responses before proceeding. Requiring consecutive successes — not just
one — ensures that every node in a round-robin pool has warmed its cache,
eliminating the intermittent failures seen in TestReleaseBundlesSearchVersions,
TestReleaseBundleCreationFromMultiBundlesUsingCommandFlagWithProject and
TestCreateBundleWithoutSpecAndWithProject.

Co-authored-by: Cursor <cursoragent@cursor.com>
…y sync

The correct API to check project existence is GET /access/api/v1/projects/<key>.
Poll until Access returns 200, then sleep 5s for Artifactory's internal project
cache to sync before build-publish or release-bundle calls that scope to the project.

Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
…ests

Instead of scattering waitForProjectInArtifactory at each call site,
call it inside createTestProject so every caller gets the 30s sync
wait for free. Also add the wait to the two artifactory inline tests
(TestArtifactoryDownloadByBuildUsingSimpleDownloadWithProject and
TestArtifactoryDownloadWithEnvProject) which inline their own project
creation and had the same race against Artifactory cache sync.

Co-authored-by: Cursor <cursoragent@cursor.com>
Replace unreliable fixed sleep with targeted retry logic. After a project
is created via the Access API, Artifactory and Lifecycle services take
35-40s (sometimes longer on HA nodes) to sync their internal project cache.

retryOnProjectNotFound() wraps any project-scoped operation and retries
up to 12 times with 5s intervals (max 60s) when the response contains
'not found' or 'project key' — the exact errors from both Artifactory's
build-publish API and Lifecycle's release-bundle creation API.

Applied to:
- uploadBuildWithArtifactsAndProject: retries jf rt build-publish
- uploadBuildWithDepsAndProject: retries jf rt build-publish
- createRbWithFlags: retries jf rbc

waitForProjectInArtifactory is kept but simplified to only confirm
project creation in Access (no more fixed sleep).

Co-authored-by: Cursor <cursoragent@cursor.com>
Replace the unreliable sleep-based wait with clean retry logic:
- retryOnProjectNotFound: 5 attempts, 30s between each (2.5min max)
- Removed waitForProjectInArtifactory and all call sites
- Artifactory build-publish retry now also applied directly in the
  two artifactory project tests

Co-authored-by: Cursor <cursoragent@cursor.com>
… poisoning

On gotestsum re-runs, the ProjectKey is the same as the initial run.
createTestProject was calling deleteProjectIfExists before creating, which:
1. Deleted the project left by the failed previous attempt
2. Immediately recreated it with the SAME key

This delete+recreate cycle poisons Artifactory's internal project cache
with a 'not found' entry for that key. Some HA nodes take 160+ seconds
(entire retry budget) to invalidate this negative cache entry, causing
all project-scoped ops to fail indefinitely.

Fix: remove the upfront deleteProjectIfExists. Project keys are unique
per suite run (full Unix timestamp), so deletion is only needed at
cleanup. If the project already exists on a re-run, reuse it silently
('already exists' is treated as success).

Co-authored-by: Cursor <cursoragent@cursor.com>
… creation

The Lifecycle service has its own project cache separate from Artifactory's
and can take 5+ minutes to warm on HA nodes running draft Artifactory builds.
Running the full 'jfrog rbc' command as retries exhausted the 2.5-min retry
budget long before LC was ready.

waitForLifecycleProjectVisibility() polls a cheap GET endpoint:
  lifecycle/api/v2/release_bundle/records/non-existing-rb?project=<key>
  - 400 Bad Request = LC doesn't know the project yet (keep waiting)
  - 404 Not Found   = LC knows the project (proceed)

Polls every 15s with a 15-minute timeout. Called inside uploadBuildsWithProject
so all callers (3 project tests) wait for LC readiness before any rbc attempt.

Co-authored-by: Cursor <cursoragent@cursor.com>
Observed Artifactory 7.158.0 draft cache propagation taking 120-150s on
some instances. 5 retries x 30s = 2.5 min was not enough. 10 retries x
30s = 5 min covers the observed worst-case propagation delay.

Co-authored-by: Cursor <cursoragent@cursor.com>
naveenku-jfrog and others added 2 commits June 19, 2026 20:13
…oject

Same Artifactory project cache propagation delay that affected lifecycle
and artifactory tests also hits pnpm's TestPnpmInstallAndPublishWithProject:
- build-publish (bp) with --project fails with 400 Project not found
- panic at line 984 follows because publishedBuildInfo is nil

Changes:
- Wrap 'jfrog rt bp --project' with retryOnProjectNotFound (10x 30s)
- Drop DeleteProject before CreateProject to avoid cache poisoning on
  re-runs (same fix applied to transfer_test.go earlier)

Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant