Add Azure Kubernetes Service (AKS) hosting support#16088
Draft
mitchdenny wants to merge 35 commits intomainfrom
Draft
Add Azure Kubernetes Service (AKS) hosting support#16088mitchdenny wants to merge 35 commits intomainfrom
mitchdenny wants to merge 35 commits intomainfrom
Conversation
Create core implementation files for AKS hosting support: - AzureKubernetesEnvironmentResource: resource class with BicepOutputReference properties - AzureKubernetesEnvironmentExtensions: AddAzureKubernetesEnvironment and configuration methods - AzureKubernetesInfrastructure: eventing subscriber for compute resource processing - AksNodePoolConfig, AksSkuTier, AksNetworkProfile: supporting types - Project file with dependencies on Hosting.Azure, Hosting.Kubernetes, etc. - Add project to Aspire.slnx solution - Add InternalsVisibleTo in Aspire.Hosting.Kubernetes for internal API access Note: Azure.Provisioning.ContainerService package is not yet available in internal NuGet feeds. ConfigureAksInfrastructure uses placeholder outputs. When the package becomes available, replace with typed ContainerServiceManagedCluster. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Contributor
|
🚀 Dogfood this PR with:
curl -fsSL https://raw.githubusercontent.com/microsoft/aspire/main/eng/scripts/get-aspire-cli-pr.sh | bash -s -- 16088Or
iex "& { $(irm https://raw.githubusercontent.com/microsoft/aspire/main/eng/scripts/get-aspire-cli-pr.ps1) } 16088" |
- Bicep snapshot verification tests - Configuration extension tests (version, SKU, node pools, private cluster) - Monitoring integration tests (Container Insights, Log Analytics) - Argument validation tests Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- AzureKubernetesEnvironmentResource now implements IAzureDelegatedSubnetResource and IAzureNspAssociationTarget for VNet and network perimeter integration - WithWorkloadIdentity() on AKS environment enables OIDC and workload identity - WithAzureWorkloadIdentity<T>() on compute resources for federated credential setup with auto-create identity support - AksWorkloadIdentityAnnotation for ServiceAccount YAML generation - AsExisting() works automatically via AzureProvisioningResource base class - Additional unit tests for all new functionality Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Changed WithNodePool to AddNodePool returning IResourceBuilder<AksNodePoolResource> - AksNodePoolResource is a child resource (IResourceWithParent) of AKS environment - WithNodePoolAffinity<T> extension lets compute resources target specific node pools - AksNodePoolAffinityAnnotation carries scheduling info for Helm chart nodeSelector - Made AksNodePoolConfig and AksNodePoolMode public (exposed via AksNodePoolResource) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
When no user node pool is explicitly added via AddNodePool(), the AzureKubernetesInfrastructure subscriber creates a default 'workload' user pool (Standard_D4s_v5, 1-10 nodes) during BeforeStartEvent. Compute resources without explicit WithNodePoolAffinity() are automatically assigned to the first available user pool (either explicitly created or the auto-generated default). This ensures workloads are never scheduled on the system pool, which should only run system pods (kube-system, etc.). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…tion Two issues with aspire publish: 1. No Helm chart output: The inner KubernetesEnvironmentResource was stored as a property but never added to the application model. KubernetesInfrastructure looks for KubernetesEnvironmentResource instances in the model to generate Helm charts. Fix: add the inner K8s environment to the model (excluded from manifest) with the default Helm engine. 2. Duplicate DeploymentTargetAnnotation: AzureKubernetesInfrastructure was adding its own DeploymentTargetAnnotation, conflicting with the one that KubernetesInfrastructure adds (which points to the correct KubernetesResource deployment target with Helm chart data). Fix: remove the duplicate annotation from our subscriber — KubernetesInfrastructure handles it. Also made EnsureDefaultHelmEngine internal (was private) so the AKS package can call it to set up the Helm deployment engine on the inner K8s environment. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Override GetBicepTemplateString and GetBicepTemplateFile to generate proper AKS ManagedCluster Bicep directly, bypassing the Azure.Provisioning SDK infrastructure (which requires the unavailable Azure.Provisioning.ContainerService package). The generated Bicep includes: - Microsoft.ContainerService/managedClusters resource with SystemAssigned identity - Configurable SKU tier, Kubernetes version, DNS prefix - Agent pool profiles with autoscaling from NodePools config - OIDC issuer profile and workload identity security profile - Optional private cluster API server access profile - Optional network profile (Azure CNI) - All outputs: id, name, clusterFqdn, oidcIssuerUrl, kubeletIdentityObjectId, nodeResourceGroup Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Container Registry: - Auto-create a default Azure Container Registry when AddAzureKubernetesEnvironment is called (same pattern as Container Apps) - WithContainerRegistry() extension to use an explicit ACR, replacing the default - FlowContainerRegistry() in AzureKubernetesInfrastructure propagates the registry to the inner KubernetesEnvironmentResource via ContainerRegistryReferenceAnnotation so KubernetesInfrastructure can discover it for image push/pull Localhive fix: - Added SuppressFinalPackageVersion to csproj (required for new packages in Arcade) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Packages with SuppressFinalPackageVersion=true (like Aspire.Hosting.Kubernetes and Aspire.Hosting.Azure.Kubernetes) are placed in the NonShipping output directory by Arcade SDK. The localhive script was only looking in the Shipping directory, causing these packages to be missing from the hive. Changes: - Added Get-AllPackagePaths that returns both Shipping and NonShipping dirs - Package collection now scans all available package directories - When packages span multiple directories, auto-uses copy mode (can't symlink to two dirs) - Single-dir case still uses symlink/junction for performance Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Contributor
|
Re-running the failed jobs in the CI workflow for this pull request because 1 job was identified as retry-safe transient failures in the CI run attempt.
|
davidfowl
reviewed
Apr 12, 2026
| /// <param name="builder">The resource builder.</param> | ||
| /// <param name="version">The Kubernetes version (e.g., "1.30").</param> | ||
| /// <returns>A reference to the <see cref="IResourceBuilder{AzureKubernetesEnvironmentResource}"/> for chaining.</returns> | ||
| [AspireExportIgnore(Reason = "AKS hosting is not yet supported in ATS")] |
davidfowl
reviewed
Apr 12, 2026
|
|
||
| Aspire's `Aspire.Hosting.Kubernetes` package currently supports end-to-end deployment to any conformant Kubernetes cluster (including AKS) via Helm charts. However, the support is **generic Kubernetes** — it has no awareness of Azure-specific capabilities. Users who want to deploy to AKS must manually provision the cluster, configure workload identity, set up monitoring, and manage networking outside of Aspire. | ||
|
|
||
| The goal is to create a first-class AKS experience in Aspire that supports: |
Contributor
There was a problem hiding this comment.
- ACR support
- Workload identity support also has to integrate with the AzureResourcePreparer
Internal methods from Aspire.Hosting.Kubernetes (AddKubernetesInfrastructureCore, EnsureDefaultHelmEngine, KubernetesInfrastructure, HelmDeploymentEngine) are not accessible at runtime across NuGet package boundaries, even with InternalsVisibleTo set. The InternalsVisibleTo attribute only works at compile time with project references, not with signed NuGet packages. Fix: call the public AddKubernetesEnvironment() API instead. This handles all the internal setup (registering KubernetesInfrastructure subscriber, creating the resource, setting up Helm engine) through a single public entry point. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The localhive.ps1 modifications were unnecessary - packages with SuppressFinalPackageVersion go to Shipping, not NonShipping. The package discovery issue was caused by running localhive from the wrong worktree, not a script problem. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…placeholders The ConfigureAksInfrastructure callback was still adding ProvisioningOutput objects with no values, even though GetBicepTemplateString/GetBicepTemplateFile now generate the Bicep directly. While our overrides prevent these from being used for Bicep generation, the stale outputs could confuse the AzureResourcePreparer's parameter analysis. Emptied the callback body since all Bicep generation is handled by the resource's overrides. Also removed unused Azure.Provisioning using directive. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Two changes to ensure Helm/kubectl target the AKS cluster instead of the user's default kubectl context: 1. KubernetesEnvironmentResource.KubeConfigPath (Aspire.Hosting.Kubernetes): New public property. When set, HelmDeploymentEngine passes --kubeconfig to all helm and kubectl commands. This is non-breaking — null means use default behavior. 2. AzureKubernetesInfrastructure get-credentials step (Aspire.Hosting.Azure.Kubernetes): Adds a pipeline step that runs after AKS Bicep provisioning and before Helm prepare. It calls 'az aks get-credentials --file <isolated-path>' to write credentials to a temp kubeconfig file, then sets KubeConfigPath on the inner KubernetesEnvironmentResource. This ensures Helm deploys to the provisioned AKS cluster without mutating ~/.kube/config. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…meterResource The resourceGroupName is a ParameterResource that requires configuration key 'Parameters:resourceGroupName' to be set. During deploy, this isn't available as a raw parameter value — it's resolved by the Azure provisioning context and stored in the 'Azure:ResourceGroup' configuration key. Changed GetAksCredentialsAsync to read from IConfiguration['Azure:ResourceGroup'] which is populated by the Azure provisioner during context creation, before our get-credentials step runs. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…luster name
BicepOutputReference.GetValueAsync() triggers parameter resolution on
the AzureProvisioningResource, which tries to resolve the 'location'
parameter that depends on 'resourceGroup().location'. In a fresh
environment without Parameters:resourceGroupName configured, this fails.
Since we set the cluster name directly in the Bicep template (name: '{Name}'),
we can just use environment.Name as the cluster name. This avoids the
parameter resolution chain entirely.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The aks-get-credentials step was depending on provision-{name} (individual
AKS resource step) but the Azure:ResourceGroup config key is set by the
create-provisioning-context step. In a fresh environment, the step ordering
wasn't guaranteed to have the config available.
Changed to depend on the provision-azure-bicep-resources aggregation step
which gates on ALL provisioning completing, ensuring both the provisioning
context (with resource group) and the AKS cluster are ready.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…uring event The ContainerRegistryReferenceAnnotation was being added to the inner KubernetesEnvironmentResource during BeforeStartEvent via FlowContainerRegistry. But KubernetesInfrastructure also runs during BeforeStartEvent and reads the registry annotation — if it ran first, it wouldn't see the annotation, resulting in no push steps being created and images never getting pushed. Fix: Add the ContainerRegistryReferenceAnnotation to the inner K8s environment immediately in AddAzureKubernetesEnvironment and WithContainerRegistry, at resource creation time before any events fire. This guarantees KubernetesInfrastructure always sees the registry regardless of subscriber execution order. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Push steps call registry.Endpoint.GetValueAsync() which awaits the BicepOutputReference for loginServer. If the ACR hasn't been provisioned yet, this blocks indefinitely — the push step just hangs after push-prereq. Push steps depend on build + push-prereq, but neither of those depend on the ACR's provision step. Added a PipelineConfigurationAnnotation on the inner K8s environment that makes all compute resource push steps depend on the ACR's provision step. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…step Changed from depending on individual ACR provision step (which required resource-to-step lookup that may not resolve correctly) to depending on the provision-azure-bicep-resources aggregation step by name. This is simpler and ensures ALL Azure provisioning (including ACR output population) completes before any image push begins. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Previous attempts tried to wire dependencies via GetSteps(resource, tag) which uses the StepToResourceMap. This approach failed because the push steps are keyed to the compute resources, not the K8s environment. New approach: find the push-prereq step by name in the Steps collection and directly call DependsOn(provision-azure-bicep-resources). Since all push steps already depend on push-prereq, this ensures the entire push chain waits for Azure provisioning to complete. This mirrors how ACA works: ACA doesn't need this because it implements IContainerRegistry directly on the environment resource, so the endpoint values are resolved differently. For AKS, the ACR is a separate Bicep resource whose outputs need to be populated before push can proceed. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Temporary console output to debug why push steps hang after push-prereq. Logs whether the PipelineConfigurationAnnotation runs, whether push-prereq is found, how many push steps exist, and their dependencies. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Diagnostics revealed push steps had EMPTY DependsOnSteps lists. The
standard wiring from ProjectResource's PipelineConfigurationAnnotation
(pushSteps.DependsOn(buildSteps, push-prereq)) wasn't working because
context.GetSteps(resource, tag) returned empty — the resource lookup
via ResourceNameComparer didn't match when K8s deployment targets are
involved.
Fix: directly find push steps by tag in the Steps collection and
explicitly wire dependencies on:
- provision-azure-bicep-resources (ACR must be provisioned for endpoint)
- push-prereq (ACR login must complete)
- build-{resourceName} (container image must be built)
This ensures the correct execution order:
provision → push-prereq → build → push → helm-deploy
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The Azure provisioning context internals (ProvisioningContextTask, AzureProvisionerOptions) are all internal to Aspire.Hosting.Azure and inaccessible from our package. IConfiguration['Azure:ResourceGroup'] is also not reliably set when our step runs because the deployment state manager writes to a different configuration scope. New approach: query Azure directly with 'az aks list --query' to find the cluster's resource group. This is guaranteed to work after provisioning completes, regardless of internal configuration state. The az CLI is already available (validated by validate-azure-login step). Also wires push step dependencies directly by finding steps by tag in the Steps collection, fixing the issue where push steps had empty DependsOnSteps lists in K8s compute environments. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
… group - Quote --resource-group and --name values to handle special characters - Strip line endings from az aks list output to prevent argument parsing issues - Add logging of cluster name and resource group values for debugging Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The JMESPath query in 'az aks list --query [?name==...].resourceGroup' had quote-escaping issues on Windows when passed via ProcessStartInfo. The quotes in the JMESPath expression were being mangled by cmd.exe, producing truncated/malformed resource group names. Switched to 'az resource list --resource-type ... --name ... --query [0].resourceGroup' which uses --name as a proper CLI argument (no embedded quotes in JMESPath) and the simpler [0].resourceGroup query has no quote escaping issues. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Two fixes: 1. Resource group resolution: Read Azure:ResourceGroup from IConfiguration which is populated from the deployment state JSON file by the deploy-prereq step. This correctly scopes to the current deployment's resource group, avoiding the issue where az resource list returned the wrong cluster when multiple clusters share the same name across resource groups. 2. ACR attach: After fetching AKS credentials, run 'az aks update --attach-acr' to grant the kubelet managed identity AcrPull role on the ACR. Without this, pods get ImagePullBackOff (401 Unauthorized) when pulling from the ACR. The attach is idempotent and won't fail if already attached. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Follows the ACA pattern: the AKS Bicep module now accepts an acrName parameter, references the ACR as an existing resource, and creates an AcrPull role assignment for the kubelet managed identity. This is handled during Bicep provisioning rather than as a runtime CLI step. The acrName parameter is wired from the ACR resource's NameOutputReference via the Parameters dictionary on the AzureBicepResource, which the Azure publishing context automatically resolves in main.bicep. Removed the az aks update --attach-acr CLI approach. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Shows cluster name, resource group, and the az aks get-credentials command in the pipeline summary so users can easily connect to the cluster after deployment. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The guid() function for the role assignment name used env.properties.identityProfile.kubeletidentity.objectId which is a runtime property (only known after AKS provisioning). Bicep requires the 'name' property to be calculable at deployment start. Fixed by using env.id (compile-time deterministic) instead of the runtime objectId in the guid() call. The principalId property still uses the runtime objectId since that's evaluated during deployment. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
On first deploy to a new environment, the deployment state JSON file doesn't exist yet. IConfiguration is loaded at app startup and doesn't see the state file that's written later by create-provisioning-context. Fix: try IConfiguration first (works on re-deploys where the state file exists), then fall back to 'az resource list' to query the resource group directly from Azure (works on first deploy since provisioning has completed by this point). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Three fixes for the multi-environment scenario:
1. WithContainerRegistry now updates resource.Parameters['acrName'] to
reference the explicit registry's NameOutputReference, fixing the
'key not present in dictionary' error when the Bicep publisher tries
to resolve the removed default ACR.
2. Added ParentComputeEnvironment property to KubernetesEnvironmentResource.
When set, KubernetesInfrastructure matches resources whose compute env
is either the K8s env itself OR its parent (the AKS resource). This
allows WithComputeEnvironment(aksEnv) to work correctly — the user
targets the AKS resource, and the inner K8s env picks it up.
3. AddAzureKubernetesEnvironment sets ParentComputeEnvironment on the
inner K8s environment, completing the parent-child relationship.
Example AppHost code that now works:
var registry = builder.AddAzureContainerRegistry('registry');
var enva = builder.AddAzureKubernetesEnvironment('enva')
.WithContainerRegistry(registry);
var envb = builder.AddAzureKubernetesEnvironment('envb')
.WithContainerRegistry(registry);
builder.AddProject<MyApi>('api').WithComputeEnvironment(enva);
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Verifies that WithComputeEnvironment correctly routes resources to their targeted AKS/K8s environments, including the ParentComputeEnvironment matching logic. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The publish context and step factory used GetDeploymentTargetAnnotation(environment) where environment is the inner K8s env. But WithComputeEnvironment(aksEnv) sets the compute env to the AKS resource, and KubernetesInfrastructure now sets DeploymentTargetAnnotation.ComputeEnvironment to match the resource's actual compute env (the AKS resource, not the inner K8s env). Updated all GetDeploymentTargetAnnotation calls to use ParentComputeEnvironment when available, so the lookup matches correctly. Also fixed KubernetesInfrastructure to set ComputeEnvironment on the DeploymentTargetAnnotation to the resource's actual compute env rather than always using the inner K8s env. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
BicepOutputReference.ValueExpression uses single braces ({storage.outputs.blobEndpoint})
but ResolveUnknownValue only stripped double braces ({{ }}) via HelmExtensions delimiters.
Single braces passed through to the Helm template, causing a parse error.
Fix: also strip single { and } characters when sanitizing the values key.
Fixes #16114
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Contributor
|
🎬 CLI E2E Test Recordings — 68 recordings uploaded (commit View recordings
📹 Recordings uploaded automatically from CI run #24345283063 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
WIP — Adds first-class Azure Kubernetes Service (AKS) support to Aspire via a new
Aspire.Hosting.Azure.Kubernetespackage.Motivation
Aspire's
Aspire.Hosting.Kubernetespackage supports end-to-end deployment to any conformant Kubernetes cluster via Helm charts, but it has no awareness of Azure-specific capabilities. Users who deploy to AKS must manually provision the cluster, configure workload identity, set up monitoring, and manage networking outside of Aspire.What's here so far (Phase 1)
Aspire.Hosting.Azure.Kubernetespackage with dependencies onAspire.Hosting.KubernetesandAspire.Hosting.AzureAzureKubernetesEnvironmentResource— unified resource that extendsAzureProvisioningResourceand implementsIAzureComputeEnvironmentResource, internally wrapping aKubernetesEnvironmentResourcefor Helm deploymentAddAzureKubernetesEnvironment()entry point (mirrorsAddAzureContainerAppEnvironment()pattern)WithVersion,WithSkuTier,WithNodePool,AsPrivateCluster,WithContainerInsights,WithAzureLogAnalyticsWorkspaceAzureKubernetesInfrastructureeventing subscriberdocs/specs/aks-support.mdWhat's planned next
WithDelegatedSubnet)Azure.Provisioning.ContainerServicepackage availability in internal feeds)Validation
dotnet build /p:SkipNativeBuild=trueAspire.Hosting.Azure.AppContainersFixes # (issue)
Checklist
<remarks />and<code />elements on your triple slash comments?aspire.devissue: