feat: add CSE timing regression tests for all Linux VHDs (Ubuntu 22.04/24.04, Azure Linux V3) by djsly · Pull Request #8284 · Azure/AgentBaker

djsly · 2026-04-11T22:38:38Z

Summary

Adds CSE (Custom Script Extension) timing regression tests for all supported Linux VHD builds, with thresholds derived from production telemetry (GuestAgentGenericLogs, FA database on azcore cluster).

Each CSE task runs as a t.Run() sub-test, so ADO pipeline analytics can individually track pass/fail rates over time — no pipeline changes needed (existing gotestsum → JUnit → PublishTestResults@2 flow picks them up automatically).

Test Scenarios (6 total)

Test	VHD	Install Path	Threshold Basis
`Test_Ubuntu2204_CSE_CachedPerformance`	Ubuntu 22.04 Gen2	PMC deb (cached)	prod p95 (~35K samples)
`Test_Ubuntu2204_CSE_FullInstallPerformance`	Ubuntu 22.04 Gen2	Full install	prod p99 (~35K samples)
`Test_Ubuntu2404_CSE_CachedPerformance`	Ubuntu 24.04 Gen2	PMC deb (cached)	prod p95 (~500 samples)
`Test_Ubuntu2404_CSE_FullInstallPerformance`	Ubuntu 24.04 Gen2	Full install	prod p99 (~500 samples)
`Test_AzureLinuxV3_CSE_CachedPerformance`	Azure Linux V3 Gen2	RPM (cached)	prod p95 (~1K samples)
`Test_AzureLinuxV3_CSE_FullInstallPerformance`	Azure Linux V3 Gen2	Full install	prod p99 (~1K samples)

How It Works

CSE timing extraction: Parses /var/log/azure/Microsoft.Azure.Extensions.CustomScript/*/events/*.json files from the VM to extract per-task start/end timestamps
Per-task sub-tests: Each task threshold becomes a t.Run() sub-test (e.g., aptmarkWALinuxAgent, installDebPackageFromFile) — individually tracked in ADO Test Analytics
Dynamic catch-all: Any CSE task taking >1s without a specific threshold gets a DefaultTaskThreshold check (45s cached / 60s full install) — automatically discovers new slow tasks
OS-specific thresholds: Each OS has its own threshold set reflecting different package managers and latency profiles

Threshold Methodology

All thresholds are derived from real production data queried from the GuestAgentGenericLogs Kusto table:

Cached path thresholds → set at p95 (catches regressions while tolerating normal infra variance)
Full install thresholds → set at p99 (more generous since full install is rarer and more variable)
Task name normalization → extract("^Completed: ([a-zA-Z_]+)", 1, Context1) strips parameters from task names

Key OS Differences Discovered

Characteristic	Ubuntu 22.04	Ubuntu 24.04	Azure Linux V3
Package manager	apt/deb	apt/deb	RPM/dnf
`aptmarkWALinuxAgent`	Bimodal: p50=0.49s, p99=58s (apt lock)	p50=4.45s, p99=15s (less contention)	N/A (no apt)
`installDebPackageFromFile`	p50=3.88s, p95=21.55s	p50=4.92s, p95=23.74s	N/A (no deb)
`installKubeletKubectlFromPkg`	p50=14.68s	p50=21.39s	p50=29.03s
`configureKubeletAndKubectl`	p50=6.56s	p50=21.65s	p50=4.56s
Unique tasks	—	—	`ensureNoDupOnPromiscuBridge`, `enableLocalDNS`

What Regressions This Catches

Apt lock contention (the original motivation): task reordering that causes aptmarkWALinuxAgent to block on dpkg lock
Package install slowdowns: new package versions or mirror issues increasing installDebPackageFromFile / installKubeletKubectlFromPkg latency
CSE task ordering regressions: any reordering that serializes previously parallel tasks
New slow tasks: dynamic catch-all auto-detects any previously-fast task that starts taking >1s

Files Changed

e2e/scenario_cse_perf_test.go — Test scenarios and OS-specific threshold definitions
e2e/cse_timing.go — CSE timing extraction, validation logic, DefaultTaskThreshold catch-all

Copilot

Pull request overview

This PR addresses a CSE bootstrap latency regression on Ubuntu golden-image VHDs by avoiding first-boot apt initialization/lock contention, and adds E2E timing checks to detect future regressions.

Changes:

Switch cached-package installation and walinuxagent hold/unhold operations to dpkg-based fast paths (with apt fallbacks).
Export FULL_INSTALL_REQUIRED so it’s available in subshell contexts.
Add E2E CSE timing extraction + two Ubuntu 22.04 performance scenarios with thresholds.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 5 comments.

Show a summary per file

File	Description
parts/linux/cloud-init/artifacts/ubuntu/cse_helpers_ubuntu.sh	Use `dpkg --set-selections` for walinuxagent hold/unhold and `dpkg -i` for cached `.deb` installs; add `wait_for_dpkg_lock`.
parts/linux/cloud-init/artifacts/cse_main.sh	Export `FULL_INSTALL_REQUIRED` after determining install mode.
e2e/cse_timing.go	New helper to extract CSE task durations from Guest Agent event JSON and validate against thresholds.
e2e/scenario_cse_perf_test.go	New Ubuntu 22.04 cached/full-install performance scenarios enforcing timing thresholds.
.github/workflows/pr-lint.yaml	Bump `actions/github-script` from v8 to v9.

parts/linux/cloud-init/artifacts/ubuntu/cse_helpers_ubuntu.sh

e2e/scenario_cse_perf_test.go

e2e/cse_timing.go

parts/linux/cloud-init/artifacts/ubuntu/cse_helpers_ubuntu.sh

parts/linux/cloud-init/artifacts/cse_main.sh

Copilot

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.

e2e/cse_timing.go

e2e/scenario_cse_perf_test.go

e2e/cse_timing.go

djsly · 2026-04-13T17:48:42Z

Multi-OS CSE Performance Tests Added 🎯

Expanded CSE regression tests from Ubuntu 22.04 only → 3 OS variants with 6 test scenarios total:

Test	VHD	Threshold Source
`Test_Ubuntu2204_CSE_CachedPerformance`	Ubuntu 22.04 Gen2	prod p95 (~35K samples)
`Test_Ubuntu2204_CSE_FullInstallPerformance`	Ubuntu 22.04 Gen2	prod p99 (~35K samples)
`Test_Ubuntu2404_CSE_CachedPerformance`	Ubuntu 24.04 Gen2	prod p95 (~500 samples)
`Test_Ubuntu2404_CSE_FullInstallPerformance`	Ubuntu 24.04 Gen2	prod p99 (~500 samples)
`Test_AzureLinuxV3_CSE_CachedPerformance`	Azure Linux V3 Gen2	prod p95 (~1K samples)
`Test_AzureLinuxV3_CSE_FullInstallPerformance`	Azure Linux V3 Gen2	prod p99 (~1K samples)

Key OS Differences Discovered

Ubuntu 24.04 vs 22.04:

aptmarkWALinuxAgent: Much less bimodal (p50=4.45s, p99=15s) vs 22.04's bimodal (p50=0.49s, p99=58s)
Overall similar task profiles

Azure Linux V3 vs Ubuntu:

No apt/deb tasks — uses RPM packages, no aptmarkWALinuxAgent or installDebPackageFromFile
configureKubeletAndKubectl: p50=4.56s (vs Ubuntu 22.04 p50=6.56s)
installKubeletKubectlFromPkg: p50=29.03s (higher than Ubuntu's 14.68s)
Has unique tasks: ensureNoDupOnPromiscuBridge, enableLocalDNS

All thresholds from prod telemetry: GuestAgentGenericLogs table, FA database on azcore cluster.

Copilot

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 8 comments.

e2e/cse_timing.go

e2e/scenario_cse_perf_test.go

…k contention Replace apt-get/apt-mark with dpkg equivalents for cached (golden image) VHDs: - installDebPackageFromFile: use 'dpkg -i' instead of 'apt-get -f install' when FULL_INSTALL_REQUIRED=false. The first apt-get call after VHD boot triggers expensive apt initialization (~30-50s) and holds dpkg locks. dpkg -i bypasses apt entirely and completes in <100ms for pre-cached packages. - aptmarkWALinuxAgent: use 'dpkg --set-selections' instead of 'apt-mark hold/unhold'. apt-mark internally initializes the apt cache even for simple hold operations, causing 10-30s delays on first invocation. Both functions include fallback to the original apt-based commands if dpkg fails. Add CSE timing regression detection framework: - e2e/cse_timing.go: Extracts CSE task timings from VM event JSON files - e2e/scenario_cse_perf_test.go: Two perf scenarios (cached + full install paths) E2E test results with fix (Ubuntu 22.04 golden image): - aptmarkWALinuxAgent: 21.7s → 0.06s (360x faster) - kubelet deb install: 47.6s → 0.04s (1190x faster) - Total CSE time: 19.65s (down from 47s+ in production) Root cause: PR #8105 removed installContainerRuntime's natural ~10s delay that masked the first-apt-call initialization cost. This fix addresses the underlying issue instead of re-introducing the delay. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

- Fix nil map panic in Test_Ubuntu2204_CSE_FullInstallPerformance by initializing vmss.Tags before setting SkipBinaryCleanup tag - Log parse errors in ExtractCSETimings instead of silently skipping, with warning counts for visibility into format drift - Fail test when no CSE tasks were parsed (empty report would silently pass all thresholds, making regression detection ineffective) - Fail test when AKS.CSE.cse_start task is missing (critical for total CSE duration validation) - Return error from ExtractCSETimings when zero tasks are parsed Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…esholds Refactor ValidateCSETimings to run each threshold check as a named sub-test (t.Run). Since the E2E pipeline already uses gotestsum to generate JUnit XML published via PublishTestResults@2, ADO Test Analytics will now track each CSE task individually: Test_Ubuntu2204_CSE_CachedPerformance/TotalCSEDuration Test_Ubuntu2204_CSE_CachedPerformance/Task_installDebPackageFromFile Test_Ubuntu2204_CSE_CachedPerformance/Task_aptmarkWALinuxAgent ... This enables per-task pass/fail trends and regression detection in ADO dashboards without any pipeline changes. Also tighten thresholds to ~2-3x observed healthy values: - Cached: TotalCSE 60s→35s, installDeb 25s→10s, aptmark 10s→5s - Full: TotalCSE 120s→90s, installDeps 90s→60s Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…p99) Thresholds derived from GuestAgentGenericLogs table (FA/azcore cluster), ~35K samples per task over 30 minutes on Ubuntu 22.04: Cached path (set at ~p95): - installDebPackageFromFile: 22s (prod p95=21.55s) - aptmarkWALinuxAgent: 24s (prod p90=23.32s, bimodal) - configureKubeletAndKubectl: 27s (prod p95=26.06s) - ensureContainerd: 3s (prod p95=1.99s) - ensureKubelet: 10s (prod p95=6.20s) Full install path (set at ~p99): - installDebPackageFromFile: 45s (prod p99=42.88s) - aptmarkWALinuxAgent: 60s (prod p99=58.07s) - configureKubeletAndKubectl: 45s (prod p99=44.39s) - extractKubeBinaries: 16s (prod p99=15.21s) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…esholds Add DefaultTaskThreshold to CSETimingThresholds — any task exceeding this duration that has no specific threshold automatically becomes a sub-test in ADO Pipeline Analytics. This ensures newly added CSE tasks are tracked without code changes. Expanded specific thresholds from 5 → 18 tasks (cached) and 8 → 21 (full install), covering all high-frequency operations observed in production: - Kubelet install variants (FromPkg, FromURL, extractKubeBinaries) - Credential provider variants (FromUrl, FromPkg, FromPMC, download) - Node config (configureNodeExporter, retrycmd_nslookup, ensureSnapshotUpdate) - Container runtime (installContainerRuntime, installStandaloneContainerd) All thresholds from Ubuntu 22.04 prod telemetry (GuestAgentGenericLogs). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Add OS-specific threshold sets derived from production telemetry (GuestAgentGenericLogs, FA/azcore cluster): - Ubuntu 24.04: similar to 22.04 but less apt lock contention (aptmarkWALinuxAgent p50=4.45s vs 0.49s bimodal on 22.04) - Azure Linux V3: RPM-based, no apt/deb tasks. Higher baseline for configureKubeletAndKubectl (p50=4.56s) and installKubeletKubectlFromPkg (p50=29.03s). Includes ensureNoDupOnPromiscuBridge and enableLocalDNS. Each OS gets cached + full install test scenarios (6 total now): - Test_Ubuntu2204_CSE_CachedPerformance / FullInstallPerformance - Test_Ubuntu2404_CSE_CachedPerformance / FullInstallPerformance - Test_AzureLinuxV3_CSE_CachedPerformance / FullInstallPerformance Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

- Remove unused ctx param from LogReport (fixes golangci-lint unparam) - Use Fatalf for missing cse_start (prevents 0-duration silently passing) - Add newline delimiter between JSON files in find -exec to prevent concatenation - Make subtest names unique by including task short name to avoid collisions - Assert all configured threshold suffixes matched at least one task - Fix DefaultTaskThreshold comment to match actual 45s value Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

The E2E framework wraps *testing.T in toolkit.testLogger, so the direct type assertion s.T.(*testing.T) fails. Add UnwrapTestingT() helper to recursively extract the underlying *testing.T. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 6 comments.

e2e/cse_timing.go

… comments - Bump installStandaloneContainerd threshold from 2s to 5s across all threshold sets (E2E saw 3.27s — prod p99 is only 0.46s but E2E infra variance is higher) - Fix map iteration nondeterminism: sort suffixes by length descending for deterministic longest-match-first behavior - Promote unmatched threshold suffix warning to Errorf so missing tasks fail the test - Remove no-op empty BootstrapConfigMutator closures and tag-only VMConfigMutator Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

- Add detailed comments explaining OperationId is a timestamp (not a GUID), referencing logs_to_events() in cse_helpers.sh - Add comment on cseEventsDir explaining path matches EVENTS_LOGGING_DIR in both cse_helpers.sh and cse_start.sh (not per-handler subdirectories) - Switch from line-based JSON parsing to json.Decoder for robustness against multi-line or pretty-printed JSON files - Threshold suffixes already sorted by length (longest first) for deterministic matching — addressed in prior commit - Unmatched suffixes already use Errorf to fail the test — addressed in prior commit Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

e2e/cse_timing.go

e2e/scenario_cse_perf_test.go

e2e/cse_timing.go

Copilot AI review requested due to automatic review settings April 11, 2026 22:38

djsly requested review from AbelHu, Devinwong, SriHarsha001, awesomenix, calvin197, cameronmeissner, ganeshkumarashok, junjiezhang1997, lilypan26, mxj220, pdamianov-dev, phealy, r2k1, sulixu, surajssd, timmy-wright and zachary-bailey as code owners April 11, 2026 22:38

djsly temporarily deployed to test April 11, 2026 22:38 — with GitHub Actions Inactive

Copilot started reviewing on behalf of djsly April 11, 2026 22:39 View session

Copilot AI reviewed Apr 11, 2026

View reviewed changes

djsly force-pushed the fix/cse-apt-lock-contention-and-perf-tests branch from a7b9d57 to 6f86252 Compare April 11, 2026 23:40

djsly temporarily deployed to test April 11, 2026 23:40 — with GitHub Actions Inactive

djsly commented Apr 12, 2026

View reviewed changes

awesomenix force-pushed the fix/cse-apt-lock-contention-and-perf-tests branch from 6f86252 to 0a1d334 Compare April 12, 2026 03:33

Copilot AI review requested due to automatic review settings April 12, 2026 03:33

awesomenix temporarily deployed to test April 12, 2026 03:33 — with GitHub Actions Inactive

awesomenix changed the title ~~fix: use dpkg instead of apt for cached packages to eliminate CSE lock contention~~ feat: add cse timings test to catch regression with cached and full package installation Apr 12, 2026

Copilot started reviewing on behalf of awesomenix April 12, 2026 03:33 View session

awesomenix approved these changes Apr 12, 2026

View reviewed changes

djsly temporarily deployed to test April 13, 2026 16:55 — with GitHub Actions Inactive

Copilot AI reviewed Apr 13, 2026

View reviewed changes

e2e/cse_timing.go Outdated Show resolved Hide resolved

e2e/cse_timing.go Outdated Show resolved Hide resolved

e2e/scenario_cse_perf_test.go Outdated Show resolved Hide resolved

e2e/cse_timing.go Outdated Show resolved Hide resolved

Copilot started reviewing on behalf of djsly April 13, 2026 17:11 View session

djsly temporarily deployed to test April 13, 2026 17:48 — with GitHub Actions Inactive

djsly changed the title ~~feat: add cse timings test to catch regression with cached and full package installation~~ feat: add CSE timing regression tests for all Linux VHDs (Ubuntu 22.04/24.04, Azure Linux V3) Apr 13, 2026

Copilot AI review requested due to automatic review settings April 13, 2026 20:31

djsly temporarily deployed to test April 13, 2026 20:31 — with GitHub Actions Inactive

Copilot AI reviewed Apr 13, 2026

View reviewed changes

djsly temporarily deployed to test April 13, 2026 21:07 — with GitHub Actions Inactive

djsly and others added 8 commits April 14, 2026 16:00

Copilot AI review requested due to automatic review settings April 14, 2026 20:03

djsly force-pushed the fix/cse-apt-lock-contention-and-perf-tests branch from f88e68a to ae65394 Compare April 14, 2026 20:03

djsly temporarily deployed to test April 14, 2026 20:03 — with GitHub Actions Inactive

Copilot AI reviewed Apr 14, 2026

View reviewed changes

e2e/cse_timing.go Show resolved Hide resolved

e2e/cse_timing.go Show resolved Hide resolved

e2e/cse_timing.go Show resolved Hide resolved

e2e/cse_timing.go Outdated Show resolved Hide resolved

e2e/cse_timing.go Show resolved Hide resolved

e2e/cse_timing.go Outdated Show resolved Hide resolved

awesomenix approved these changes Apr 14, 2026

View reviewed changes

djsly temporarily deployed to test April 15, 2026 01:41 — with GitHub Actions Inactive

Copilot AI review requested due to automatic review settings April 15, 2026 01:44

djsly temporarily deployed to test April 15, 2026 01:44 — with GitHub Actions Inactive

Copilot AI reviewed Apr 15, 2026

View reviewed changes

e2e/cse_timing.go Show resolved Hide resolved

e2e/scenario_cse_perf_test.go Show resolved Hide resolved

e2e/cse_timing.go Show resolved Hide resolved

Copilot started reviewing on behalf of djsly April 15, 2026 03:50 View session

Conversation

djsly commented Apr 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test Scenarios (6 total)

How It Works

Threshold Methodology

Key OS Differences Discovered

What Regressions This Catches

Files Changed

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

djsly commented Apr 13, 2026

Multi-OS CSE Performance Tests Added 🎯

Key OS Differences Discovered

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

djsly commented Apr 11, 2026 •

edited

Loading