Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
174 changes: 174 additions & 0 deletions content/docs/2.20/scalers/azure-cosmosdb.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,174 @@
+++
title = "Azure Cosmos DB Change Feed"
availability = "v2.20+"
maintainer = "Microsoft"
category = "Messaging"
description = "Scale applications based on Azure Cosmos DB change feed processor lag."
go_file = "azure_cosmosdb_scaler"
+++

### Trigger Specification

This specification describes the `azure-cosmosdb` trigger for Azure Cosmos DB Change Feed. It estimates the lag of a change feed processor by comparing the current position of the change feed with the processor's checkpoint (stored in a lease container), and scales based on the total estimated lag across all partitions.

```yaml
triggers:
- type: azure-cosmosdb
metadata:
databaseId: mydb
containerId: mycontainer
leaseDatabaseId: mydb
leaseContainerId: leases
processorName: myprocessor
connectionFromEnv: COSMOS_CONNECTION
changeFeedLagThreshold: '100'
activationChangeFeedLagThreshold: '0'
```

**Parameter list:**

- `databaseId` - ID of the Cosmos DB database containing the monitored container.
- `containerId` - ID of the monitored container (the data container).
- `leaseDatabaseId` - ID of the Cosmos DB database containing the lease container.
- `leaseContainerId` - ID of the lease container used by the change feed processor.
- `processorName` - Name of the change feed processor. Used to filter lease documents by matching the document ID prefix, ensuring accurate lag estimation when multiple processors share the same lease container.
- `changeFeedLagThreshold` - Target value for the total estimated change feed lag per replica. The scaler sums the estimated lag across all partitions and the HPA uses the formula `replicas = ceil(totalLag / changeFeedLagThreshold)`, capped at the number of partitions. (Default: `100`, Optional)
- `activationChangeFeedLagThreshold` - Minimum total lag to activate the scaler (scale from zero). Learn more about activation [here](./../concepts/scaling-deployments.md#activating-and-scaling-thresholds). (Default: `0`, Optional)
- `connection` - Connection string for the Cosmos DB account containing the monitored container. (Optional, see authentication)
- `leaseConnection` - Connection string for the Cosmos DB account containing the lease container. If not specified, defaults to `connection`. (Optional)
- `endpoint` - Account endpoint of the Cosmos DB account (for workload identity authentication). (Optional, see authentication)
- `leaseEndpoint` - Account endpoint of the Cosmos DB account containing the lease container. If not specified, defaults to `endpoint`. (Optional)
- `cosmosDBKey` - Account key for the Cosmos DB account. Required when using `endpoint` without workload identity. (Optional)
- `leaseCosmosDBKey` - Account key for the Cosmos DB account containing the lease container. If not specified, defaults to `cosmosDBKey`. (Optional)

> 💡 **Note:** The scaler supports lease documents written by both the .NET SDK and Java SDK change feed processors, including both PK-range-based (version 0) and EPK-range-based (version 1) lease formats.

### Authentication Parameters

You can authenticate by using connection string authentication or pod identity.

**Connection String Authentication:**

- `connection` - Connection string for the Cosmos DB account containing the monitored container. Format: `AccountEndpoint=https://<account>.documents.azure.com:443/;AccountKey=<key>`.
- `leaseConnection` - Connection string for the Cosmos DB account containing the lease container. Defaults to `connection` if not specified.

Alternatively, provide `endpoint` + `cosmosDBKey`:

- `endpoint` - Cosmos DB account endpoint (e.g., `https://myaccount.documents.azure.com:443/`).
- `cosmosDBKey` - Cosmos DB account key.

**Pod identity based authentication:**

[Azure AD Workload Identity](https://azure.github.io/azure-workload-identity/docs/) provider can be used.

When using workload identity, provide `endpoint` (and optionally `leaseEndpoint`) instead of connection strings. The scaler will acquire a bearer token using the workload identity credential chain.

> 💡 The identity used must have appropriate permissions to read from both the monitored container's change feed and the lease container. The built-in `Cosmos DB Account Reader` role or a custom role with `Microsoft.DocumentDB/databaseAccounts/readMetadata` and data-plane read access is required.

### How It Works

The scaler estimates change feed processor lag using the same algorithm as the .NET SDK's `ChangeFeedEstimator` and Java SDK's `IncrementalChangeFeedProcessorImpl`:

1. Queries the lease container for lease documents matching the `processorName` prefix
2. For each lease (partition), reads the change feed with `maxItemCount=1` starting from the lease's continuation token
3. Compares the session token LSN (latest sequence number) with the first returned item's `_lsn`
4. Calculates lag as `sessionLSN - firstItemLSN + 1`
5. Sums the total lag across all partitions as the scaling metric, capped at `partitionCount * changeFeedLagThreshold` to prevent over-scaling

Reading the change feed is a **non-destructive** operation — it does not affect the change feed processor's checkpoints or consume any data.

If a partition split (HTTP 410 Gone) is detected, the scaler automatically retries once with fresh lease data.

### Error Handling

If the scaler cannot reach Cosmos DB (e.g., invalid credentials, network issues, or service unavailability):

- **With prior successful polls:** The scaler caches the last known partition count and reports `partitionCount * changeFeedLagThreshold` as the metric, scaling to max replicas while remaining active.
- **Without prior successful polls** (e.g., fresh operator restart with bad credentials): The scaler returns an error to KEDA, which keeps the current replica count unchanged. Configure [`fallback`](../reference/scaledobject-spec/#fallback) on the ScaledObject for explicit failure behavior.

> 💡 **Tip:** Configure `fallback` on the ScaledObject to control replica count during sustained failures when no cached partition count is available.

### Example

**Using connection string authentication:**

```yaml
apiVersion: v1
kind: Secret
metadata:
name: cosmos-secrets
namespace: default
data:
connection: <base64-encoded-connection-string>
---
apiVersion: keda.sh/v1alpha1
kind: TriggerAuthentication
metadata:
name: cosmos-trigger-auth
namespace: default
spec:
secretTargetRef:
- parameter: connection
name: cosmos-secrets
key: connection
---
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: cosmos-scaledobject
namespace: default
spec:
scaleTargetRef:
name: my-change-feed-processor
pollingInterval: 10
minReplicaCount: 0
maxReplicaCount: 8
cooldownPeriod: 30
triggers:
- type: azure-cosmosdb
metadata:
# Required
databaseId: mydb
containerId: mycontainer
leaseDatabaseId: mydb
leaseContainerId: leases
processorName: myprocessor
# Optional
changeFeedLagThreshold: "100" # default 100
activationChangeFeedLagThreshold: "0" # default 0
authenticationRef:
name: cosmos-trigger-auth
```

**Using Azure Workload Identity:**

```yaml
apiVersion: keda.sh/v1alpha1
kind: TriggerAuthentication
metadata:
name: cosmos-workload-auth
namespace: default
spec:
podIdentity:
provider: azure-workload
---
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: cosmos-scaledobject
namespace: default
spec:
scaleTargetRef:
name: my-change-feed-processor
triggers:
- type: azure-cosmosdb
metadata:
endpoint: https://myaccount.documents.azure.com:443/
databaseId: mydb
containerId: mycontainer
leaseDatabaseId: mydb
leaseContainerId: leases
processorName: myprocessor
authenticationRef:
name: cosmos-workload-auth
```
Loading