diff --git a/content/docs/2.20/scalers/azure-cosmosdb.md b/content/docs/2.20/scalers/azure-cosmosdb.md new file mode 100644 index 000000000..e14f0a1b1 --- /dev/null +++ b/content/docs/2.20/scalers/azure-cosmosdb.md @@ -0,0 +1,174 @@ ++++ +title = "Azure Cosmos DB Change Feed" +availability = "v2.20+" +maintainer = "Microsoft" +category = "Messaging" +description = "Scale applications based on Azure Cosmos DB change feed processor lag." +go_file = "azure_cosmosdb_scaler" ++++ + +### Trigger Specification + +This specification describes the `azure-cosmosdb` trigger for Azure Cosmos DB Change Feed. It estimates the lag of a change feed processor by comparing the current position of the change feed with the processor's checkpoint (stored in a lease container), and scales based on the total estimated lag across all partitions. + +```yaml +triggers: +- type: azure-cosmosdb + metadata: + databaseId: mydb + containerId: mycontainer + leaseDatabaseId: mydb + leaseContainerId: leases + processorName: myprocessor + connectionFromEnv: COSMOS_CONNECTION + changeFeedLagThreshold: '100' + activationChangeFeedLagThreshold: '0' +``` + +**Parameter list:** + +- `databaseId` - ID of the Cosmos DB database containing the monitored container. +- `containerId` - ID of the monitored container (the data container). +- `leaseDatabaseId` - ID of the Cosmos DB database containing the lease container. +- `leaseContainerId` - ID of the lease container used by the change feed processor. +- `processorName` - Name of the change feed processor. Used to filter lease documents by matching the document ID prefix, ensuring accurate lag estimation when multiple processors share the same lease container. +- `changeFeedLagThreshold` - Target value for the total estimated change feed lag per replica. The scaler sums the estimated lag across all partitions and the HPA uses the formula `replicas = ceil(totalLag / changeFeedLagThreshold)`, capped at the number of partitions. (Default: `100`, Optional) +- `activationChangeFeedLagThreshold` - Minimum total lag to activate the scaler (scale from zero). Learn more about activation [here](./../concepts/scaling-deployments.md#activating-and-scaling-thresholds). (Default: `0`, Optional) +- `connection` - Connection string for the Cosmos DB account containing the monitored container. (Optional, see authentication) +- `leaseConnection` - Connection string for the Cosmos DB account containing the lease container. If not specified, defaults to `connection`. (Optional) +- `endpoint` - Account endpoint of the Cosmos DB account (for workload identity authentication). (Optional, see authentication) +- `leaseEndpoint` - Account endpoint of the Cosmos DB account containing the lease container. If not specified, defaults to `endpoint`. (Optional) +- `cosmosDBKey` - Account key for the Cosmos DB account. Required when using `endpoint` without workload identity. (Optional) +- `leaseCosmosDBKey` - Account key for the Cosmos DB account containing the lease container. If not specified, defaults to `cosmosDBKey`. (Optional) + +> 💡 **Note:** The scaler supports lease documents written by both the .NET SDK and Java SDK change feed processors, including both PK-range-based (version 0) and EPK-range-based (version 1) lease formats. + +### Authentication Parameters + +You can authenticate by using connection string authentication or pod identity. + +**Connection String Authentication:** + +- `connection` - Connection string for the Cosmos DB account containing the monitored container. Format: `AccountEndpoint=https://.documents.azure.com:443/;AccountKey=`. +- `leaseConnection` - Connection string for the Cosmos DB account containing the lease container. Defaults to `connection` if not specified. + +Alternatively, provide `endpoint` + `cosmosDBKey`: + +- `endpoint` - Cosmos DB account endpoint (e.g., `https://myaccount.documents.azure.com:443/`). +- `cosmosDBKey` - Cosmos DB account key. + +**Pod identity based authentication:** + +[Azure AD Workload Identity](https://azure.github.io/azure-workload-identity/docs/) provider can be used. + +When using workload identity, provide `endpoint` (and optionally `leaseEndpoint`) instead of connection strings. The scaler will acquire a bearer token using the workload identity credential chain. + +> 💡 The identity used must have appropriate permissions to read from both the monitored container's change feed and the lease container. The built-in `Cosmos DB Account Reader` role or a custom role with `Microsoft.DocumentDB/databaseAccounts/readMetadata` and data-plane read access is required. + +### How It Works + +The scaler estimates change feed processor lag using the same algorithm as the .NET SDK's `ChangeFeedEstimator` and Java SDK's `IncrementalChangeFeedProcessorImpl`: + +1. Queries the lease container for lease documents matching the `processorName` prefix +2. For each lease (partition), reads the change feed with `maxItemCount=1` starting from the lease's continuation token +3. Compares the session token LSN (latest sequence number) with the first returned item's `_lsn` +4. Calculates lag as `sessionLSN - firstItemLSN + 1` +5. Sums the total lag across all partitions as the scaling metric, capped at `partitionCount * changeFeedLagThreshold` to prevent over-scaling + +Reading the change feed is a **non-destructive** operation — it does not affect the change feed processor's checkpoints or consume any data. + +If a partition split (HTTP 410 Gone) is detected, the scaler automatically retries once with fresh lease data. + +### Error Handling + +If the scaler cannot reach Cosmos DB (e.g., invalid credentials, network issues, or service unavailability): + +- **With prior successful polls:** The scaler caches the last known partition count and reports `partitionCount * changeFeedLagThreshold` as the metric, scaling to max replicas while remaining active. +- **Without prior successful polls** (e.g., fresh operator restart with bad credentials): The scaler returns an error to KEDA, which keeps the current replica count unchanged. Configure [`fallback`](../reference/scaledobject-spec/#fallback) on the ScaledObject for explicit failure behavior. + +> 💡 **Tip:** Configure `fallback` on the ScaledObject to control replica count during sustained failures when no cached partition count is available. + +### Example + +**Using connection string authentication:** + +```yaml +apiVersion: v1 +kind: Secret +metadata: + name: cosmos-secrets + namespace: default +data: + connection: +--- +apiVersion: keda.sh/v1alpha1 +kind: TriggerAuthentication +metadata: + name: cosmos-trigger-auth + namespace: default +spec: + secretTargetRef: + - parameter: connection + name: cosmos-secrets + key: connection +--- +apiVersion: keda.sh/v1alpha1 +kind: ScaledObject +metadata: + name: cosmos-scaledobject + namespace: default +spec: + scaleTargetRef: + name: my-change-feed-processor + pollingInterval: 10 + minReplicaCount: 0 + maxReplicaCount: 8 + cooldownPeriod: 30 + triggers: + - type: azure-cosmosdb + metadata: + # Required + databaseId: mydb + containerId: mycontainer + leaseDatabaseId: mydb + leaseContainerId: leases + processorName: myprocessor + # Optional + changeFeedLagThreshold: "100" # default 100 + activationChangeFeedLagThreshold: "0" # default 0 + authenticationRef: + name: cosmos-trigger-auth +``` + +**Using Azure Workload Identity:** + +```yaml +apiVersion: keda.sh/v1alpha1 +kind: TriggerAuthentication +metadata: + name: cosmos-workload-auth + namespace: default +spec: + podIdentity: + provider: azure-workload +--- +apiVersion: keda.sh/v1alpha1 +kind: ScaledObject +metadata: + name: cosmos-scaledobject + namespace: default +spec: + scaleTargetRef: + name: my-change-feed-processor + triggers: + - type: azure-cosmosdb + metadata: + endpoint: https://myaccount.documents.azure.com:443/ + databaseId: mydb + containerId: mycontainer + leaseDatabaseId: mydb + leaseContainerId: leases + processorName: myprocessor + authenticationRef: + name: cosmos-workload-auth +```