feat: add CSE timing regression tests for all Linux VHDs (Ubuntu 22.04/24.04, Azure Linux V3)#8284
Open
feat: add CSE timing regression tests for all Linux VHDs (Ubuntu 22.04/24.04, Azure Linux V3)#8284
Conversation
Contributor
There was a problem hiding this comment.
Pull request overview
This PR addresses a CSE bootstrap latency regression on Ubuntu golden-image VHDs by avoiding first-boot apt initialization/lock contention, and adds E2E timing checks to detect future regressions.
Changes:
- Switch cached-package installation and walinuxagent hold/unhold operations to dpkg-based fast paths (with apt fallbacks).
- Export
FULL_INSTALL_REQUIREDso it’s available in subshell contexts. - Add E2E CSE timing extraction + two Ubuntu 22.04 performance scenarios with thresholds.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| parts/linux/cloud-init/artifacts/ubuntu/cse_helpers_ubuntu.sh | Use dpkg --set-selections for walinuxagent hold/unhold and dpkg -i for cached .deb installs; add wait_for_dpkg_lock. |
| parts/linux/cloud-init/artifacts/cse_main.sh | Export FULL_INSTALL_REQUIRED after determining install mode. |
| e2e/cse_timing.go | New helper to extract CSE task durations from Guest Agent event JSON and validate against thresholds. |
| e2e/scenario_cse_perf_test.go | New Ubuntu 22.04 cached/full-install performance scenarios enforcing timing thresholds. |
| .github/workflows/pr-lint.yaml | Bump actions/github-script from v8 to v9. |
a7b9d57 to
6f86252
Compare
djsly
commented
Apr 12, 2026
6f86252 to
0a1d334
Compare
awesomenix
approved these changes
Apr 12, 2026
Collaborator
Author
Multi-OS CSE Performance Tests Added 🎯Expanded CSE regression tests from Ubuntu 22.04 only → 3 OS variants with 6 test scenarios total:
Key OS Differences DiscoveredUbuntu 24.04 vs 22.04:
Azure Linux V3 vs Ubuntu:
All thresholds from prod telemetry: |
…k contention Replace apt-get/apt-mark with dpkg equivalents for cached (golden image) VHDs: - installDebPackageFromFile: use 'dpkg -i' instead of 'apt-get -f install' when FULL_INSTALL_REQUIRED=false. The first apt-get call after VHD boot triggers expensive apt initialization (~30-50s) and holds dpkg locks. dpkg -i bypasses apt entirely and completes in <100ms for pre-cached packages. - aptmarkWALinuxAgent: use 'dpkg --set-selections' instead of 'apt-mark hold/unhold'. apt-mark internally initializes the apt cache even for simple hold operations, causing 10-30s delays on first invocation. Both functions include fallback to the original apt-based commands if dpkg fails. Add CSE timing regression detection framework: - e2e/cse_timing.go: Extracts CSE task timings from VM event JSON files - e2e/scenario_cse_perf_test.go: Two perf scenarios (cached + full install paths) E2E test results with fix (Ubuntu 22.04 golden image): - aptmarkWALinuxAgent: 21.7s → 0.06s (360x faster) - kubelet deb install: 47.6s → 0.04s (1190x faster) - Total CSE time: 19.65s (down from 47s+ in production) Root cause: PR #8105 removed installContainerRuntime's natural ~10s delay that masked the first-apt-call initialization cost. This fix addresses the underlying issue instead of re-introducing the delay. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Fix nil map panic in Test_Ubuntu2204_CSE_FullInstallPerformance by initializing vmss.Tags before setting SkipBinaryCleanup tag - Log parse errors in ExtractCSETimings instead of silently skipping, with warning counts for visibility into format drift - Fail test when no CSE tasks were parsed (empty report would silently pass all thresholds, making regression detection ineffective) - Fail test when AKS.CSE.cse_start task is missing (critical for total CSE duration validation) - Return error from ExtractCSETimings when zero tasks are parsed Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…esholds Refactor ValidateCSETimings to run each threshold check as a named sub-test (t.Run). Since the E2E pipeline already uses gotestsum to generate JUnit XML published via PublishTestResults@2, ADO Test Analytics will now track each CSE task individually: Test_Ubuntu2204_CSE_CachedPerformance/TotalCSEDuration Test_Ubuntu2204_CSE_CachedPerformance/Task_installDebPackageFromFile Test_Ubuntu2204_CSE_CachedPerformance/Task_aptmarkWALinuxAgent ... This enables per-task pass/fail trends and regression detection in ADO dashboards without any pipeline changes. Also tighten thresholds to ~2-3x observed healthy values: - Cached: TotalCSE 60s→35s, installDeb 25s→10s, aptmark 10s→5s - Full: TotalCSE 120s→90s, installDeps 90s→60s Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…p99) Thresholds derived from GuestAgentGenericLogs table (FA/azcore cluster), ~35K samples per task over 30 minutes on Ubuntu 22.04: Cached path (set at ~p95): - installDebPackageFromFile: 22s (prod p95=21.55s) - aptmarkWALinuxAgent: 24s (prod p90=23.32s, bimodal) - configureKubeletAndKubectl: 27s (prod p95=26.06s) - ensureContainerd: 3s (prod p95=1.99s) - ensureKubelet: 10s (prod p95=6.20s) Full install path (set at ~p99): - installDebPackageFromFile: 45s (prod p99=42.88s) - aptmarkWALinuxAgent: 60s (prod p99=58.07s) - configureKubeletAndKubectl: 45s (prod p99=44.39s) - extractKubeBinaries: 16s (prod p99=15.21s) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…esholds Add DefaultTaskThreshold to CSETimingThresholds — any task exceeding this duration that has no specific threshold automatically becomes a sub-test in ADO Pipeline Analytics. This ensures newly added CSE tasks are tracked without code changes. Expanded specific thresholds from 5 → 18 tasks (cached) and 8 → 21 (full install), covering all high-frequency operations observed in production: - Kubelet install variants (FromPkg, FromURL, extractKubeBinaries) - Credential provider variants (FromUrl, FromPkg, FromPMC, download) - Node config (configureNodeExporter, retrycmd_nslookup, ensureSnapshotUpdate) - Container runtime (installContainerRuntime, installStandaloneContainerd) All thresholds from Ubuntu 22.04 prod telemetry (GuestAgentGenericLogs). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add OS-specific threshold sets derived from production telemetry (GuestAgentGenericLogs, FA/azcore cluster): - Ubuntu 24.04: similar to 22.04 but less apt lock contention (aptmarkWALinuxAgent p50=4.45s vs 0.49s bimodal on 22.04) - Azure Linux V3: RPM-based, no apt/deb tasks. Higher baseline for configureKubeletAndKubectl (p50=4.56s) and installKubeletKubectlFromPkg (p50=29.03s). Includes ensureNoDupOnPromiscuBridge and enableLocalDNS. Each OS gets cached + full install test scenarios (6 total now): - Test_Ubuntu2204_CSE_CachedPerformance / FullInstallPerformance - Test_Ubuntu2404_CSE_CachedPerformance / FullInstallPerformance - Test_AzureLinuxV3_CSE_CachedPerformance / FullInstallPerformance Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Remove unused ctx param from LogReport (fixes golangci-lint unparam) - Use Fatalf for missing cse_start (prevents 0-duration silently passing) - Add newline delimiter between JSON files in find -exec to prevent concatenation - Make subtest names unique by including task short name to avoid collisions - Assert all configured threshold suffixes matched at least one task - Fix DefaultTaskThreshold comment to match actual 45s value Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The E2E framework wraps *testing.T in toolkit.testLogger, so the direct type assertion s.T.(*testing.T) fails. Add UnwrapTestingT() helper to recursively extract the underlying *testing.T. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
f88e68a to
ae65394
Compare
awesomenix
approved these changes
Apr 14, 2026
… comments - Bump installStandaloneContainerd threshold from 2s to 5s across all threshold sets (E2E saw 3.27s — prod p99 is only 0.46s but E2E infra variance is higher) - Fix map iteration nondeterminism: sort suffixes by length descending for deterministic longest-match-first behavior - Promote unmatched threshold suffix warning to Errorf so missing tasks fail the test - Remove no-op empty BootstrapConfigMutator closures and tag-only VMConfigMutator Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Add detailed comments explaining OperationId is a timestamp (not a GUID), referencing logs_to_events() in cse_helpers.sh - Add comment on cseEventsDir explaining path matches EVENTS_LOGGING_DIR in both cse_helpers.sh and cse_start.sh (not per-handler subdirectories) - Switch from line-based JSON parsing to json.Decoder for robustness against multi-line or pretty-printed JSON files - Threshold suffixes already sorted by length (longest first) for deterministic matching — addressed in prior commit - Unmatched suffixes already use Errorf to fail the test — addressed in prior commit Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds CSE (Custom Script Extension) timing regression tests for all supported Linux VHD builds, with thresholds derived from production telemetry (
GuestAgentGenericLogs, FA database on azcore cluster).Each CSE task runs as a
t.Run()sub-test, so ADO pipeline analytics can individually track pass/fail rates over time — no pipeline changes needed (existing gotestsum → JUnit →PublishTestResults@2flow picks them up automatically).Test Scenarios (6 total)
Test_Ubuntu2204_CSE_CachedPerformanceTest_Ubuntu2204_CSE_FullInstallPerformanceTest_Ubuntu2404_CSE_CachedPerformanceTest_Ubuntu2404_CSE_FullInstallPerformanceTest_AzureLinuxV3_CSE_CachedPerformanceTest_AzureLinuxV3_CSE_FullInstallPerformanceHow It Works
/var/log/azure/Microsoft.Azure.Extensions.CustomScript/*/events/*.jsonfiles from the VM to extract per-task start/end timestampst.Run()sub-test (e.g.,aptmarkWALinuxAgent,installDebPackageFromFile) — individually tracked in ADO Test AnalyticsDefaultTaskThresholdcheck (45s cached / 60s full install) — automatically discovers new slow tasksThreshold Methodology
All thresholds are derived from real production data queried from the
GuestAgentGenericLogsKusto table:extract("^Completed: ([a-zA-Z_]+)", 1, Context1)strips parameters from task namesKey OS Differences Discovered
aptmarkWALinuxAgentinstallDebPackageFromFileinstallKubeletKubectlFromPkgconfigureKubeletAndKubectlensureNoDupOnPromiscuBridge,enableLocalDNSWhat Regressions This Catches
aptmarkWALinuxAgentto block on dpkg lockinstallDebPackageFromFile/installKubeletKubectlFromPkglatencyFiles Changed
e2e/scenario_cse_perf_test.go— Test scenarios and OS-specific threshold definitionse2e/cse_timing.go— CSE timing extraction, validation logic,DefaultTaskThresholdcatch-all