Skip to content

feat: add CSE timing regression tests for all Linux VHDs (Ubuntu 22.04/24.04, Azure Linux V3)#8284

Open
djsly wants to merge 10 commits intomainfrom
fix/cse-apt-lock-contention-and-perf-tests
Open

feat: add CSE timing regression tests for all Linux VHDs (Ubuntu 22.04/24.04, Azure Linux V3)#8284
djsly wants to merge 10 commits intomainfrom
fix/cse-apt-lock-contention-and-perf-tests

Conversation

@djsly
Copy link
Copy Markdown
Collaborator

@djsly djsly commented Apr 11, 2026

Summary

Adds CSE (Custom Script Extension) timing regression tests for all supported Linux VHD builds, with thresholds derived from production telemetry (GuestAgentGenericLogs, FA database on azcore cluster).

Each CSE task runs as a t.Run() sub-test, so ADO pipeline analytics can individually track pass/fail rates over time — no pipeline changes needed (existing gotestsum → JUnit → PublishTestResults@2 flow picks them up automatically).

Test Scenarios (6 total)

Test VHD Install Path Threshold Basis
Test_Ubuntu2204_CSE_CachedPerformance Ubuntu 22.04 Gen2 PMC deb (cached) prod p95 (~35K samples)
Test_Ubuntu2204_CSE_FullInstallPerformance Ubuntu 22.04 Gen2 Full install prod p99 (~35K samples)
Test_Ubuntu2404_CSE_CachedPerformance Ubuntu 24.04 Gen2 PMC deb (cached) prod p95 (~500 samples)
Test_Ubuntu2404_CSE_FullInstallPerformance Ubuntu 24.04 Gen2 Full install prod p99 (~500 samples)
Test_AzureLinuxV3_CSE_CachedPerformance Azure Linux V3 Gen2 RPM (cached) prod p95 (~1K samples)
Test_AzureLinuxV3_CSE_FullInstallPerformance Azure Linux V3 Gen2 Full install prod p99 (~1K samples)

How It Works

  1. CSE timing extraction: Parses /var/log/azure/Microsoft.Azure.Extensions.CustomScript/*/events/*.json files from the VM to extract per-task start/end timestamps
  2. Per-task sub-tests: Each task threshold becomes a t.Run() sub-test (e.g., aptmarkWALinuxAgent, installDebPackageFromFile) — individually tracked in ADO Test Analytics
  3. Dynamic catch-all: Any CSE task taking >1s without a specific threshold gets a DefaultTaskThreshold check (45s cached / 60s full install) — automatically discovers new slow tasks
  4. OS-specific thresholds: Each OS has its own threshold set reflecting different package managers and latency profiles

Threshold Methodology

All thresholds are derived from real production data queried from the GuestAgentGenericLogs Kusto table:

  • Cached path thresholds → set at p95 (catches regressions while tolerating normal infra variance)
  • Full install thresholds → set at p99 (more generous since full install is rarer and more variable)
  • Task name normalizationextract("^Completed: ([a-zA-Z_]+)", 1, Context1) strips parameters from task names

Key OS Differences Discovered

Characteristic Ubuntu 22.04 Ubuntu 24.04 Azure Linux V3
Package manager apt/deb apt/deb RPM/dnf
aptmarkWALinuxAgent Bimodal: p50=0.49s, p99=58s (apt lock) p50=4.45s, p99=15s (less contention) N/A (no apt)
installDebPackageFromFile p50=3.88s, p95=21.55s p50=4.92s, p95=23.74s N/A (no deb)
installKubeletKubectlFromPkg p50=14.68s p50=21.39s p50=29.03s
configureKubeletAndKubectl p50=6.56s p50=21.65s p50=4.56s
Unique tasks ensureNoDupOnPromiscuBridge, enableLocalDNS

What Regressions This Catches

  • Apt lock contention (the original motivation): task reordering that causes aptmarkWALinuxAgent to block on dpkg lock
  • Package install slowdowns: new package versions or mirror issues increasing installDebPackageFromFile / installKubeletKubectlFromPkg latency
  • CSE task ordering regressions: any reordering that serializes previously parallel tasks
  • New slow tasks: dynamic catch-all auto-detects any previously-fast task that starts taking >1s

Files Changed

  • e2e/scenario_cse_perf_test.go — Test scenarios and OS-specific threshold definitions
  • e2e/cse_timing.go — CSE timing extraction, validation logic, DefaultTaskThreshold catch-all

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR addresses a CSE bootstrap latency regression on Ubuntu golden-image VHDs by avoiding first-boot apt initialization/lock contention, and adds E2E timing checks to detect future regressions.

Changes:

  • Switch cached-package installation and walinuxagent hold/unhold operations to dpkg-based fast paths (with apt fallbacks).
  • Export FULL_INSTALL_REQUIRED so it’s available in subshell contexts.
  • Add E2E CSE timing extraction + two Ubuntu 22.04 performance scenarios with thresholds.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
parts/linux/cloud-init/artifacts/ubuntu/cse_helpers_ubuntu.sh Use dpkg --set-selections for walinuxagent hold/unhold and dpkg -i for cached .deb installs; add wait_for_dpkg_lock.
parts/linux/cloud-init/artifacts/cse_main.sh Export FULL_INSTALL_REQUIRED after determining install mode.
e2e/cse_timing.go New helper to extract CSE task durations from Guest Agent event JSON and validate against thresholds.
e2e/scenario_cse_perf_test.go New Ubuntu 22.04 cached/full-install performance scenarios enforcing timing thresholds.
.github/workflows/pr-lint.yaml Bump actions/github-script from v8 to v9.

@djsly djsly force-pushed the fix/cse-apt-lock-contention-and-perf-tests branch from a7b9d57 to 6f86252 Compare April 11, 2026 23:40
@awesomenix awesomenix force-pushed the fix/cse-apt-lock-contention-and-perf-tests branch from 6f86252 to 0a1d334 Compare April 12, 2026 03:33
Copilot AI review requested due to automatic review settings April 12, 2026 03:33
@awesomenix awesomenix changed the title fix: use dpkg instead of apt for cached packages to eliminate CSE lock contention feat: add cse timings test to catch regression with cached and full package installation Apr 12, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.

@djsly
Copy link
Copy Markdown
Collaborator Author

djsly commented Apr 13, 2026

Multi-OS CSE Performance Tests Added 🎯

Expanded CSE regression tests from Ubuntu 22.04 only → 3 OS variants with 6 test scenarios total:

Test VHD Threshold Source
Test_Ubuntu2204_CSE_CachedPerformance Ubuntu 22.04 Gen2 prod p95 (~35K samples)
Test_Ubuntu2204_CSE_FullInstallPerformance Ubuntu 22.04 Gen2 prod p99 (~35K samples)
Test_Ubuntu2404_CSE_CachedPerformance Ubuntu 24.04 Gen2 prod p95 (~500 samples)
Test_Ubuntu2404_CSE_FullInstallPerformance Ubuntu 24.04 Gen2 prod p99 (~500 samples)
Test_AzureLinuxV3_CSE_CachedPerformance Azure Linux V3 Gen2 prod p95 (~1K samples)
Test_AzureLinuxV3_CSE_FullInstallPerformance Azure Linux V3 Gen2 prod p99 (~1K samples)

Key OS Differences Discovered

Ubuntu 24.04 vs 22.04:

  • aptmarkWALinuxAgent: Much less bimodal (p50=4.45s, p99=15s) vs 22.04's bimodal (p50=0.49s, p99=58s)
  • Overall similar task profiles

Azure Linux V3 vs Ubuntu:

  • No apt/deb tasks — uses RPM packages, no aptmarkWALinuxAgent or installDebPackageFromFile
  • configureKubeletAndKubectl: p50=4.56s (vs Ubuntu 22.04 p50=6.56s)
  • installKubeletKubectlFromPkg: p50=29.03s (higher than Ubuntu's 14.68s)
  • Has unique tasks: ensureNoDupOnPromiscuBridge, enableLocalDNS

All thresholds from prod telemetry: GuestAgentGenericLogs table, FA database on azcore cluster.

@djsly djsly changed the title feat: add cse timings test to catch regression with cached and full package installation feat: add CSE timing regression tests for all Linux VHDs (Ubuntu 22.04/24.04, Azure Linux V3) Apr 13, 2026
Copilot AI review requested due to automatic review settings April 13, 2026 20:31
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 8 comments.

djsly and others added 8 commits April 14, 2026 16:00
…k contention

Replace apt-get/apt-mark with dpkg equivalents for cached (golden image) VHDs:

- installDebPackageFromFile: use 'dpkg -i' instead of 'apt-get -f install' when
  FULL_INSTALL_REQUIRED=false. The first apt-get call after VHD boot triggers
  expensive apt initialization (~30-50s) and holds dpkg locks. dpkg -i bypasses
  apt entirely and completes in <100ms for pre-cached packages.

- aptmarkWALinuxAgent: use 'dpkg --set-selections' instead of 'apt-mark hold/unhold'.
  apt-mark internally initializes the apt cache even for simple hold operations,
  causing 10-30s delays on first invocation.

Both functions include fallback to the original apt-based commands if dpkg fails.

Add CSE timing regression detection framework:
- e2e/cse_timing.go: Extracts CSE task timings from VM event JSON files
- e2e/scenario_cse_perf_test.go: Two perf scenarios (cached + full install paths)

E2E test results with fix (Ubuntu 22.04 golden image):
- aptmarkWALinuxAgent: 21.7s → 0.06s (360x faster)
- kubelet deb install: 47.6s → 0.04s (1190x faster)
- Total CSE time: 19.65s (down from 47s+ in production)

Root cause: PR #8105 removed installContainerRuntime's natural ~10s delay that
masked the first-apt-call initialization cost. This fix addresses the underlying
issue instead of re-introducing the delay.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Fix nil map panic in Test_Ubuntu2204_CSE_FullInstallPerformance by
  initializing vmss.Tags before setting SkipBinaryCleanup tag
- Log parse errors in ExtractCSETimings instead of silently skipping,
  with warning counts for visibility into format drift
- Fail test when no CSE tasks were parsed (empty report would silently
  pass all thresholds, making regression detection ineffective)
- Fail test when AKS.CSE.cse_start task is missing (critical for total
  CSE duration validation)
- Return error from ExtractCSETimings when zero tasks are parsed

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…esholds

Refactor ValidateCSETimings to run each threshold check as a named
sub-test (t.Run). Since the E2E pipeline already uses gotestsum to
generate JUnit XML published via PublishTestResults@2, ADO Test
Analytics will now track each CSE task individually:

  Test_Ubuntu2204_CSE_CachedPerformance/TotalCSEDuration
  Test_Ubuntu2204_CSE_CachedPerformance/Task_installDebPackageFromFile
  Test_Ubuntu2204_CSE_CachedPerformance/Task_aptmarkWALinuxAgent
  ...

This enables per-task pass/fail trends and regression detection in
ADO dashboards without any pipeline changes.

Also tighten thresholds to ~2-3x observed healthy values:
- Cached: TotalCSE 60s→35s, installDeb 25s→10s, aptmark 10s→5s
- Full:   TotalCSE 120s→90s, installDeps 90s→60s

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…p99)

Thresholds derived from GuestAgentGenericLogs table (FA/azcore cluster),
~35K samples per task over 30 minutes on Ubuntu 22.04:

Cached path (set at ~p95):
- installDebPackageFromFile:  22s (prod p95=21.55s)
- aptmarkWALinuxAgent:        24s (prod p90=23.32s, bimodal)
- configureKubeletAndKubectl: 27s (prod p95=26.06s)
- ensureContainerd:            3s (prod p95=1.99s)
- ensureKubelet:              10s (prod p95=6.20s)

Full install path (set at ~p99):
- installDebPackageFromFile:  45s (prod p99=42.88s)
- aptmarkWALinuxAgent:        60s (prod p99=58.07s)
- configureKubeletAndKubectl: 45s (prod p99=44.39s)
- extractKubeBinaries:        16s (prod p99=15.21s)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…esholds

Add DefaultTaskThreshold to CSETimingThresholds — any task exceeding this
duration that has no specific threshold automatically becomes a sub-test
in ADO Pipeline Analytics. This ensures newly added CSE tasks are tracked
without code changes.

Expanded specific thresholds from 5 → 18 tasks (cached) and 8 → 21 (full
install), covering all high-frequency operations observed in production:
- Kubelet install variants (FromPkg, FromURL, extractKubeBinaries)
- Credential provider variants (FromUrl, FromPkg, FromPMC, download)
- Node config (configureNodeExporter, retrycmd_nslookup, ensureSnapshotUpdate)
- Container runtime (installContainerRuntime, installStandaloneContainerd)

All thresholds from Ubuntu 22.04 prod telemetry (GuestAgentGenericLogs).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add OS-specific threshold sets derived from production telemetry
(GuestAgentGenericLogs, FA/azcore cluster):

- Ubuntu 24.04: similar to 22.04 but less apt lock contention
  (aptmarkWALinuxAgent p50=4.45s vs 0.49s bimodal on 22.04)
- Azure Linux V3: RPM-based, no apt/deb tasks. Higher baseline for
  configureKubeletAndKubectl (p50=4.56s) and installKubeletKubectlFromPkg
  (p50=29.03s). Includes ensureNoDupOnPromiscuBridge and enableLocalDNS.

Each OS gets cached + full install test scenarios (6 total now):
- Test_Ubuntu2204_CSE_CachedPerformance / FullInstallPerformance
- Test_Ubuntu2404_CSE_CachedPerformance / FullInstallPerformance
- Test_AzureLinuxV3_CSE_CachedPerformance / FullInstallPerformance

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Remove unused ctx param from LogReport (fixes golangci-lint unparam)
- Use Fatalf for missing cse_start (prevents 0-duration silently passing)
- Add newline delimiter between JSON files in find -exec to prevent concatenation
- Make subtest names unique by including task short name to avoid collisions
- Assert all configured threshold suffixes matched at least one task
- Fix DefaultTaskThreshold comment to match actual 45s value

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The E2E framework wraps *testing.T in toolkit.testLogger, so the
direct type assertion s.T.(*testing.T) fails. Add UnwrapTestingT()
helper to recursively extract the underlying *testing.T.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings April 14, 2026 20:03
@djsly djsly force-pushed the fix/cse-apt-lock-contention-and-perf-tests branch from f88e68a to ae65394 Compare April 14, 2026 20:03
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 6 comments.

… comments

- Bump installStandaloneContainerd threshold from 2s to 5s across all threshold sets
  (E2E saw 3.27s — prod p99 is only 0.46s but E2E infra variance is higher)
- Fix map iteration nondeterminism: sort suffixes by length descending for
  deterministic longest-match-first behavior
- Promote unmatched threshold suffix warning to Errorf so missing tasks fail the test
- Remove no-op empty BootstrapConfigMutator closures and tag-only VMConfigMutator

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Add detailed comments explaining OperationId is a timestamp (not a GUID),
  referencing logs_to_events() in cse_helpers.sh
- Add comment on cseEventsDir explaining path matches EVENTS_LOGGING_DIR
  in both cse_helpers.sh and cse_start.sh (not per-handler subdirectories)
- Switch from line-based JSON parsing to json.Decoder for robustness against
  multi-line or pretty-printed JSON files
- Threshold suffixes already sorted by length (longest first) for deterministic
  matching — addressed in prior commit
- Unmatched suffixes already use Errorf to fail the test — addressed in prior commit

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings April 15, 2026 01:44
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants