diff --git a/docs/cloud/connectivity/aws-connectivity.mdx b/docs/cloud/connectivity/aws-connectivity.mdx index 6f58e98cb6..380b0c1b03 100644 --- a/docs/cloud/connectivity/aws-connectivity.mdx +++ b/docs/cloud/connectivity/aws-connectivity.mdx @@ -108,14 +108,19 @@ This approach is **optional**; Temporal Cloud works without it. It simply stream ### Choose the override domain and endpoint -| Temporal Cloud setup | Use this PHZ domain | Example | -| ----------------------------------------- | ---------------------------------- | ----------------------------------------------- | -| Single-region namespace with mTLS auth | `.tmprl.cloud` | `payments.abcde.tmprl.cloud` ↔ `vpce-...` | -| Single-region namespace with API-key auth | `.api.temporal.io` | `us-east-1.aws.api.temporal.io` ↔ `vpce-...` | -| Multi-region namespace | `region.tmprl.cloud` | `aws-us-east-1.region.tmprl.cloud` ↔ `vpce-...` | +| Endpoint type | PHZ domain format | Example | +| ------------------ | ---------------------------------- | -------------------------------------- | +| Namespace endpoint | `.tmprl.cloud` | `payments.abcde.tmprl.cloud` | +| Regional endpoint | `-.region.tmprl.cloud` | `aws-ap-northeast-2.region.tmprl.cloud` | ### Step-by-step instructions +:::warning Order matters + +A Route 53 private hosted zone with no records causes DNS resolution to fail (NXDOMAIN) inside any associated VPC. If you create an empty PHZ for `.tmprl.cloud` and associate it with a VPC where Workers are running, **all Worker traffic to Temporal Cloud in that VPC stops** until you add the CNAME record. Follow the steps below in order to avoid this. + +::: + #### 1. Collect your PrivateLink endpoint DNS name ```bash @@ -128,15 +133,15 @@ aws ec2 describe-vpc-endpoints \ # vpce-0123456789abcdef-abc.us-east-1.vpce.amazonaws.com ``` -Save the **`vpce-*.amazonaws.com`** value -- you will target it in the CNAME record. +Save the **`vpce-*.amazonaws.com`** value — you will target it in the CNAME record. -#### 2. Create a Route 53 Private Hosted Zone +#### 2. Create a Route 53 Private Hosted Zone (do not yet attach Worker VPCs) -1. Open _Route 53 → Hosted zones → Create hosted zone_. -2. Enter the domain chosen from the table above, e.g., `payments.abcde.tmprl.cloud`. -3. Type: _Private hosted zone for Temporal Cloud_. -4. Associate the hosted zone with every VPC that contains Temporal Workers and/or SDK clients. -5. Create hosted zone. +a. Open _Route 53 → Hosted zones → Create hosted zone_. +b. Enter the domain chosen from the table above, e.g., `payments.abcde.tmprl.cloud`. +c. Type: _Private hosted zone for Temporal Cloud_. +d. Leave VPC associations empty for now (you'll add them in step 4). +e. Create the hosted zone. #### 3. Add a CNAME record @@ -144,12 +149,22 @@ Inside the new PHZ: | Field | Value | | --------------- | ------------------------------------------------------------------------------------- | -| **Record name** | the namespace endpoint (e.g., `payments.abcde.tmprl.cloud`). | +| **Record name** | the Namespace Endpoint (e.g., `payments.abcde.tmprl.cloud`). | | **Record type** | `CNAME` | | **Value** | Your VPC Endpoint DNS name (`vpce-0123456789abcdef-abc.us-east-1.vpce.amazonaws.com`) | -| **TTL** | 60s is typical; 15s for MRN namespaces; adjust as needed. | +| **TTL** | 60s is typical; 15s for Namespaces with High Availability (to minimize recovery time after failover). | + +#### 4. Associate the PHZ with your Worker VPCs and verify -#### 4. Verify DNS resolution from inside the VPC +Now that the record exists, associate the PHZ with every VPC that contains Temporal Workers or SDK clients (Route 53 → your zone → _Edit settings_ → _Add VPC_). + +:::tip Test with a non-production VPC first + +We strongly recommend that you test with a non-production VPC first. Attach the PHZ to a non-production VPC, validate end-to-end resolution and connectivity from a host in that VPC, and only then attach production Worker VPCs. This catches misconfigured records before they affect production traffic. + +::: + +Verify DNS resolution from inside one of the associated VPCs: ```bash dig payments.abcde.tmprl.cloud @@ -171,62 +186,11 @@ clientOptions := client.Options{ The DNS resolver inside your VPC returns the private endpoint, while TLS still validates the original hostname—simplifying both code and certificate management. -## Configure Private DNS for Multi-Region Namespaces - -:::tip Namespaces with High Availability features and AWS PrivateLink - -Proper networking configuration is required for failover to be transparent to clients and workers when using PrivateLink. -This page describes how to configure routing for Namespaces with High Availability features on AWS PrivateLink. - -::: - -To use AWS PrivateLink with High Availability features, you may need to: - -- Override the regional DNS zone. -- Ensure network connectivity between the two regions. - -This page provides the details you need to set this up. +## Configure private DNS for Namespaces with High Availability -### Customer side solutions +For Namespaces with [High Availability features](/cloud/high-availability), you need to override DNS for `region.tmprl.cloud` so each region resolves to the local VPC Endpoint, and you need to ensure Workers can reach whichever region is active. Failover is transparent to clients only when this is set up correctly. -When using PrivateLink, you connect to Temporal Cloud through a VPC Endpoint, which uses addresses local to your network. -Temporal treats each `region.` as a separate zone. -This setup allows you to override the default zone, ensuring that traffic is routed internally for the regions you’re using. - -A Namespace's active region is reflected in the target of a CNAME record. -For example, if the active region of a Namespace is AWS us-west-2, the DNS configuration would look like this: - -| Record name | Record type | Value | -| ----------------------------------- | ----------- | -------------------------------- | -| ha-namespace.account-id.tmprl.cloud | CNAME | aws-us-west-2.region.tmprl.cloud | - -After a failover, the CNAME record will be updated to point to the failover region, for example: - -| Record name | Record type | Value | -| ----------------------------------- | ----------- | -------------------------------- | -| ha-namespace.account-id.tmprl.cloud | CNAME | aws-us-east-1.region.tmprl.cloud | - -The Temporal domain did not change, but the CNAME updated from us-west-2 to us-east-1. - - - -### Setting up the DNS override - -To set up the DNS override, configure specific regions to target the internal VPC Endpoint IP addresses. -For example, you might set aws-us-west-1.region.tmprl.cloud to target 192.168.1.2. -In AWS, this can be done using a Route 53 private hosted zone for `region.tmprl.cloud`. -Link that private zone to the VPCs you use for Workers. - -When your Workers connect to the Namespace, they first resolve the `..` record. -This points to `.region.tmprl.cloud`, which then resolves to your internal IP addresses. - -Consider how you’ll configure Workers for this setup. -You can either have Workers run in both regions continuously or establish connectivity between regions using Transit Gateway or VPC Peering. -This way, Workers can access the newly activated region once failover occurs. +The complete guidance — including single-cloud (AWS-only) HA, multi-cloud HA (AWS PrivateLink + GCP Private Service Connect), and a recommended failover-testing plan — lives on a single page: [Connectivity for High Availability](/cloud/high-availability/ha-connectivity). ## Direct VPCE targeting without per-Namespace DNS {#direct-vpce} @@ -248,6 +212,22 @@ For HA Namespaces, use [private DNS](#configuring-private-dns-for-aws-privatelin ::: +## Adding PrivateLink from additional AWS accounts + +A common pattern is to have separate AWS accounts for different lines of business, environments (staging, production), or compliance scopes (PCI vs non-PCI), each with its own VPC and Workers connecting to the same Temporal Cloud account. + +You can create as many AWS PrivateLink VPC endpoints as you need to the same Temporal Cloud regional service — there is nothing to register, approve, or open a ticket for on the Temporal side. + +For each additional AWS account or VPC: + +1. In that account, create the AWS PrivateLink VPC endpoint targeting the regional service name from the [regions table](#available-aws-regions-privatelink-endpoints-and-dns-record-overrides) — same as in the [creation steps](#creating-an-aws-privatelink-connection) above. +2. Configure DNS in that VPC. You have two options: + - Create a Route 53 Private Hosted Zone in that account scoped to the appropriate VPC(s), following the [private DNS steps](#configuring-private-dns-for-aws-privatelink) above. Each VPC's PHZ should point at the VPC Endpoint local to that VPC. + - Or, use [direct VPCE targeting](#direct-vpce) (single-region Namespaces only). +3. **Optional:** if you want to enforce private-only access for a Namespace, add a Connectivity Rule for each VPC endpoint and attach all of them (plus a public rule, if needed) to the Namespace. See [Connectivity Rules](/cloud/connectivity#connectivity-rules). + +There is no upper limit on the number of VPC endpoints you can connect from your side to a regional PrivateLink service. The default per-account limit on private Connectivity Rules is 50 — [contact support](/cloud/support#support-ticket) if you need to raise it. + ## Available AWS regions, PrivateLink endpoints, and DNS record overrides The following table lists the available Temporal regions, PrivateLink endpoints, and regional endpoints used for DNS record overrides: diff --git a/docs/cloud/connectivity/gcp-connectivity.mdx b/docs/cloud/connectivity/gcp-connectivity.mdx index fb6f4f9d01..fae379994c 100644 --- a/docs/cloud/connectivity/gcp-connectivity.mdx +++ b/docs/cloud/connectivity/gcp-connectivity.mdx @@ -73,21 +73,25 @@ Individual Namespaces do not use separate services. - For **IP address**, click the dropdown and select **Create IP address** to create an internal IP from your subnet dedicated to the endpoint. Select this IP. - Check **Enable global access** if you intend to connect the endpoint to virtual machines outside of the selected region. We recommend regional connectivity instead of global access, as it can be better in terms of latency for your workers. _**Note:** this requires the network routing mode to be set to **GLOBAL**._ -5. Click the **Add endpoint** button at the bottom of the screen. +5. Click the **Add endpoint** button at the bottom of the screen. The endpoint will appear with status **Pending**. This is expected — the next step is what flips it to **Accepted**. -6. [Create a Temporal Cloud Connectivity Rule](/cloud/connectivity#creating-a-connectivity-rule) using the Connection ID of the newly created endpoint and the corresponding GCP Project. +6. [Create a Temporal Cloud Connectivity Rule](/cloud/connectivity#creating-a-connectivity-rule) using the Connection ID of the newly created endpoint and the corresponding GCP project. Use the **Connection ID** from the endpoint's detail page in the Google Cloud console (a numeric string such as `1234567890123456789`). -7. Once the status is "Accepted", the GCP Private Service Connect endpoint is ready for use. +7. Once the status changes from "Pending" to "Accepted", the GCP Private Service Connect endpoint is ready for use. -:::tip Connectivity Rule required +:::warning PSC stays "Pending" until you create a Connectivity Rule -If your Private Service Connect connection status is not becoming "Active", verify that you have [created a Connectivity Rule](/cloud/connectivity#creating-a-connectivity-rule). -Connectivity Rules are mandatory for GCP Private Service Connect connections. -The connection will not become active without one. +For GCP Private Service Connect, the Connectivity Rule is what tells Temporal Cloud to accept your PSC connection. Until you [create a Connectivity Rule](/cloud/connectivity#creating-a-connectivity-rule) for the connection, the endpoint will remain in **Pending**. There is no separate producer-side approval step — creating the Connectivity Rule is the approval. + +If your endpoint is stuck Pending, the most common causes are: + +- No Connectivity Rule exists for the connection ID. (Most common.) +- The Connectivity Rule was created with the wrong `connection-id`, `region`, or `gcp-project-id`. +- The endpoint is in a region that is not a [supported Temporal Cloud region](/cloud/regions). ::: -- Take note of the **IP address** that has been assigned to your endpoint, as it will be used to connect to Temporal Cloud. +- Take note of the **IP address** assigned to your endpoint — you will use it to connect to Temporal Cloud. :::caution You still need to set up private DNS or override client configuration for your clients to actually use the new Private Service Connect connection to connect to Temporal Cloud. diff --git a/docs/cloud/connectivity/index.mdx b/docs/cloud/connectivity/index.mdx index 78ae0fd3e5..df17e245c8 100644 --- a/docs/cloud/connectivity/index.mdx +++ b/docs/cloud/connectivity/index.mdx @@ -23,7 +23,7 @@ import { LANGUAGE_TAB_GROUP, getLanguageLabel } from '@site/src/constants/langua ## Private network connectivity for namespaces -Temporal Cloud supports private connectivity to namespaces via AWS PrivateLink or GCP Private Services Connect in addition to the default internet endpoints. +Temporal Cloud supports private connectivity to Namespaces via AWS PrivateLink or GCP Private Service Connect, in addition to the default public internet endpoints. Namespace access is always securely authenticated via [API keys](/cloud/api-keys#overview) or [mTLS](/cloud/certificates), regardless of how you choose to connect. @@ -31,13 +31,13 @@ For information about IP address stability and allowlisting, see [IP addresses]( ### Required steps -To use private connectivity with Temporal Cloud: +Setting up private connectivity is a three-step process — and it's important to understand that **private connectivity** (the network path) and **Connectivity Rules** (Temporal's enforcement layer) are related but separate concepts: -1. Set up the private connection from your VPC to the region where your Temporal namespace is located. -1. Update your private DNS and/or worker configuration to use the private connection. -1. (Required to complete Google PSC setup, optional if using AWS PrivateLink): create a connectivity rule for the private connection and attach it to the target namespace(s). This will block all access to the namespace that is not over the private connection, but you can also add a public rule to also allow internet connectivity. +1. **Set up the private connection** from your VPC to the region where your Temporal Namespace is located. +1. **Update your private DNS and/or client configuration** to actually use the private connection. Activating private connectivity does not change your Namespace Endpoint or Regional Endpoint automatically — clients keep resolving the public addresses until you do this step. +1. **(GCP PSC: required. AWS PrivateLink: optional.) Create a Connectivity Rule** for the private connection and attach it to the target Namespace(s). This blocks all access to the Namespace that does not arrive over a configured connection. You can mix private and public rules to also allow internet connectivity. -For steps 1 and 2, follow our guides for the target namespace's cloud provider: +For steps 1 and 2, follow the guide for your Namespace's cloud provider: - [AWS PrivateLink](/cloud/connectivity/aws-connectivity) creation and private DNS setup - [Google Cloud Private Service Connect](/cloud/connectivity/gcp-connectivity) creation and private DNS setup @@ -47,22 +47,16 @@ After creating a private connection, you must set up private DNS or update the c We recommend using private DNS. -Without this step, your clients may connect to the namespace over the internet if they were previously using public connectivity, or they will not be able to connect at all. +Without this step, your clients may connect to the Namespace over the internet if they were previously using public connectivity, or they will not be able to connect at all. If that's not an option for you, refer to [our guide for updating the server and TLS settings on your clients](/cloud/connectivity#update-dns-or-clients-to-use-private-connectivity). ::: -For step 3, keep reading for details on [connectivity rules](/cloud/connectivity#connectivity-rules). +For step 3, keep reading for details on [Connectivity Rules](/cloud/connectivity#connectivity-rules). ## Connectivity rules -:::tip Support, stability, and dependency info - -Connectivity rules are currently in [public preview](/evaluate/development-production-features/release-stages#public-preview). - -::: - :::info Web UI Connectivity The Temporal Cloud Web UI is not currently subject to connectivity rule enforcement. @@ -72,28 +66,41 @@ Even if a namespace is configured with private connectivity rules, the Web UI fo ### Definition -Connectivity rules are Temporal Cloud's mechanism for limiting the network access paths that can be used to access a namespace. +Connectivity Rules are Temporal Cloud's mechanism for restricting the network paths that can reach a Namespace. They are enforced by Temporal Cloud — they do not create or modify the underlying network connection. + +By default, a Namespace has zero Connectivity Rules and is reachable over (1) the public internet and (2) any private connections you've already configured to the region containing the Namespace. Namespace access is always securely authenticated via [API keys](/cloud/api-keys#overview) or [mTLS](/cloud/certificates), regardless of Connectivity Rules. + +When you attach one or more Connectivity Rules to a Namespace, Temporal Cloud immediately blocks any traffic that does not match a rule on that Namespace. A Namespace can have multiple Connectivity Rules, and you can mix public and private rules. + +Each Connectivity Rule specifies either generic public (internet) access or a specific private connection. -By default, a namespace has zero connectivity rules, and is accessible from 1. the public internet and 2. all private connections you've configured to the region containing the namespace. Namespace access is always securely authenticated via [API keys](/cloud/api-keys#overview) or [mTLS](/cloud/certificates), regardless of connectivity rules. +#### When you need a Connectivity Rule -When you attach one or more connectivity rules to a namespace, Temporal Cloud will immediately block all traffic that does not have a corresponding connectivity rule from accessing the namespace. One namespace can have multiple connectivity rules, and may mix both public and private rules. +| Provider | Connectivity Rule for private access | Why | +| -------- | ------------------------------------ | --- | +| AWS PrivateLink | **Optional.** Add one only if you want to enforce private-only access (block internet traffic to that Namespace). | AWS PrivateLink connections become usable as soon as the VPC endpoint is `Available`. Adding a Connectivity Rule restricts access; it does not establish it. | +| GCP Private Service Connect | **Required.** The PSC endpoint stays in `Pending` until a matching Connectivity Rule is created. | The Connectivity Rule is what tells Temporal Cloud to accept the PSC connection. | -Each connectivity rule specifies either generic public (i.e. internet) access or a specific private connection. +A public Connectivity Rule takes no parameters. -A public connectivity rule takes no parameters. +An AWS PrivateLink (PL) private Connectivity Rule requires: -An AWS PrivateLink (PL) private connectivity rule requires the following parameters: +- `connection-id`: The **VPC endpoint identifier** of the PL connection — the `vpce-…` value from your AWS account, *not* the endpoint service or DNS name (ex: `vpce-00939a7ed9EXAMPLE`). +- `region`: The region of the PL connection, prefixed with `aws-` (ex: `aws-us-east-1`). Must be the same region as the Namespace. Refer to the [Temporal Cloud region list](/cloud/regions) for supported regions. -- `connection-id`: The VPC endpoint ID of the PL connection (ex: `vpce-00939a7ed9EXAMPLE`) -- `region`: The region of the PL connection, prefixed with aws (ex: `aws-us-east-1`). Must be the same region as the namespace. Refer to the [Temporal Cloud region list](/cloud/regions) for supported regions. +A GCP Private Service Connect (PSC) private Connectivity Rule requires: -A GCP Private Service Connect (PSC) private connectivity rule requires the following parameters: +- `connection-id`: The **PSC connection identifier** of the endpoint (ex: `1234567890123456789`). Find it on the endpoint's detail page in the Google Cloud console. +- `region`: The region of the PSC connection, prefixed with `gcp-` (ex: `gcp-us-east1`). Must be the same region as the Namespace. Refer to the [Temporal Cloud region list](/cloud/regions) for supported regions. +- `gcp-project-id`: The identifier of the GCP project where you created the PSC connection (ex: `my-example-project-123`). -- `connection-id`: The ID of the PSC connection (ex: `1234567890123456789`) -- `region`: The region of the PSC connection, prefixed with gcp (ex: `gcp-us-east1`). Must be the same region as the namespace. Refer to the [Temporal Cloud region list](/cloud/regions) for supported regions. -- `gcp-project-id`: The ID of the GCP project where you created the PSC connection (ex: `my-example-project-123`) +Connectivity Rules can be created and managed with [tcld](https://docs.temporal.io/cloud/tcld/), [Terraform](https://github.com/temporalio/terraform-provider-temporalcloud/), the Web UI (under **Connectivity** in your account settings), or the [Cloud Ops API](/ops). -Connectivity rules can be created and managed with [tcld](https://docs.temporal.io/cloud/tcld/), [Terraform](https://github.com/temporalio/terraform-provider-temporalcloud/), or the [Cloud Ops API](/ops) +:::tip Connectivity Rules give Temporal visibility into your private connections + +Without a Connectivity Rule, Temporal Cloud has no record that your PrivateLink or PSC endpoint exists. If you open a support ticket about a private-connectivity issue, having a Connectivity Rule attached to the affected Namespace lets us correlate the connection on our side and is the fastest path to debugging. + +::: ### Permissions and limits @@ -200,15 +207,25 @@ tcld connectivity-rule list -n "my-namespace.abc123" ## Update DNS or clients to use private connectivity -We strongly recommend using private DNS instead of updating client server and TLS settings: +We strongly recommend using private DNS instead of updating client server and TLS settings: -- [How to set up private DNS in AWS](/cloud/connectivity/aws-connectivity#configuring-private-dns-for-aws-privatelink) +- [How to set up private DNS in AWS](/cloud/connectivity/aws-connectivity#configuring-private-dns-for-aws-privatelink) - [How to set up private DNS in GCP](/cloud/connectivity/gcp-connectivity#configuring-private-dns-for-gcp-private-service-connect) If you are unable to configure private DNS, you must update two settings in your Temporal clients: -1. Set the endpoint server address to the PrivateLink or Private Services Connect endpoint (e.g. `vpce-0123456789abcdef-abc.us-east-1.vpce.amazonaws.com:7233` or `:7233`) -2. Set TLS configuration to override the TLS server name (e.g., my-namespace.my-account.tmprl.cloud) +1. Set the endpoint server address to the PrivateLink or Private Service Connect endpoint (e.g. `vpce-0123456789abcdef-abc.us-east-1.vpce.amazonaws.com:7233` or `:7233`). +2. Set TLS configuration to override the TLS server name (the Namespace Endpoint, e.g., `my-namespace.my-account.tmprl.cloud`). + +The TLS server name override depends on your authentication method: + +| Authentication | TLS server name to use | +| -------------- | ---------------------- | +| mTLS (single-region Namespace) | The Namespace Endpoint, e.g. `my-namespace.my-account.tmprl.cloud` | +| API key (single-region Namespace) | The regional API endpoint, e.g. `us-east-1.aws.api.temporal.io` or `us-central1.gcp.api.temporal.io` | +| Multi-region Namespace (mTLS or API key) | The active region endpoint, e.g. `aws-us-east-1.region.tmprl.cloud` | + +If you authenticate with an API key over PrivateLink/PSC and use the wrong server name, the TLS handshake will fail with errors such as `connection reset by peer` even though `nc` reports the port as open. Updating these settings depends on the client you're using. diff --git a/docs/cloud/high-availability/ha-connectivity.mdx b/docs/cloud/high-availability/ha-connectivity.mdx index 1646a86ea2..3d12931827 100644 --- a/docs/cloud/high-availability/ha-connectivity.mdx +++ b/docs/cloud/high-availability/ha-connectivity.mdx @@ -8,38 +8,53 @@ description: How to use private network connectivity with Temporal Cloud HA feat import { CaptionedImage, JsonTable } from '@site/src/components'; -:::tip Namespaces with High Availability features and AWS PrivateLink +:::tip Namespaces with High Availability features and private connectivity -Proper networking configuration is required for failover to be transparent to clients and workers when using PrivateLink. -This page describes how to configure routing for Namespaces with High Availability features on AWS PrivateLink. +Proper networking configuration is required for failover to be transparent to clients and Workers when using AWS PrivateLink or GCP Private Service Connect. + +This page covers single-cloud HA (both replicas on AWS, or both on GCP) and multi-cloud HA (one replica on AWS, one on GCP). ::: -To use AWS PrivateLink with High Availability features, you may need to: +These instructions assume you already have the private connections in place. If not, follow the [AWS PrivateLink](/cloud/connectivity/aws-connectivity) or [GCP Private Service Connect](/cloud/connectivity/gcp-connectivity) creation guides first. + +## How HA + private connectivity works + +A Namespace with High Availability features has two replicas — a primary and a secondary, in different regions or different cloud providers. At any moment, one is **active** and one is **passive**. On failover, Temporal Cloud changes the active replica. + +Temporal Cloud expresses the active replica through DNS: + +- The Namespace DNS record (`..tmprl.cloud`) is a CNAME. +- It points to the active region's regional record (`-.region.tmprl.cloud`). +- On failover, Temporal Cloud rewrites the CNAME target. + +Namespace DNS records have a 15-second TTL. Clients should converge to the new region within roughly 30 seconds (about twice the TTL) once their resolver cache expires. + +For private connectivity, your job is to make sure that: -- Override the regional DNS zone. -- Ensure network connectivity between the two regions. +1. Both regions resolve to the correct private endpoint inside your network — not the public internet. +2. Your Workers have a network path to whichever region becomes active. -These instructions assume you already have the PrivateLink connections in place. If not, follow our [guide for creating AWS PrivateLink connections and configuring private DNS](/cloud/connectivity/aws-connectivity). +## Single-cloud HA on AWS PrivateLink -## Customer side solutions +This is the most common setup: both replicas live in AWS regions, and Workers connect via AWS PrivateLink. When using PrivateLink, you connect to Temporal Cloud through a VPC Endpoint, which uses addresses local to your network. -Temporal treats each `region.` as a separate zone. -This setup allows you to override the default zone, ensuring that traffic is routed internally for the regions you’re using. +Temporal treats each `region.tmprl.cloud` zone as a separate zone, so you override resolution per region. -A Namespace's active region is reflected in the target of a CNAME record. -For example, if the active region of a Namespace is AWS us-west-2, the DNS configuration would look like this: +Before failover, with the active region being `aws-us-west-2`: -| ha-namespace.account-id.tmprl.cloud | CNAME | aws-us-west-2.region.tmprl.cloud | -| ----------------------------------- | ----- | -------------------------------- | +| Record name | Record type | Value | +| ----------------------------------- | ----------- | -------------------------------- | +| ha-namespace.account-id.tmprl.cloud | CNAME | aws-us-west-2.region.tmprl.cloud | -After a failover, the CNAME record will be updated to point to the failover region, for example: +After a failover to `aws-us-east-1`, Temporal Cloud rewrites the CNAME: -| ha-namespace.account-id.tmprl.cloud | CNAME | aws-us-east-1.region.tmprl.cloud | -| ----------------------------------- | ----- | -------------------------------- | +| Record name | Record type | Value | +| ----------------------------------- | ----------- | -------------------------------- | +| ha-namespace.account-id.tmprl.cloud | CNAME | aws-us-east-1.region.tmprl.cloud | -The Temporal domain did not change, but the CNAME updated from us-west-2 to us-east-1. +The Temporal-managed CNAME changed from us-west-2 to us-east-1 — your private DNS does not need to change. -## Setting up the DNS override +### Setting up the DNS override (AWS) -:::caution +In AWS, use a Route 53 private hosted zone for `region.tmprl.cloud` to override resolution per region: + +| Record name | Record type | Value (your VPC Endpoint DNS) | +| ------------------------------------ | ----------- | ------------------------------------------------------------ | +| `aws-us-west-2.region.tmprl.cloud` | CNAME | `vpce-...-us-west-2.vpce.amazonaws.com` | +| `aws-us-east-1.region.tmprl.cloud` | CNAME | `vpce-...-us-east-1.vpce.amazonaws.com` | + +Link the private zone to every VPC where Workers run. + +When your Workers connect to the Namespace, they first resolve `..tmprl.cloud`, which CNAMEs to `.region.tmprl.cloud`, which then resolves to your local VPC Endpoint. + +You also need to decide how Workers reach whichever region becomes active. Either: + +- Run Workers in **both** regions continuously (recommended), or +- Establish cross-region connectivity (Transit Gateway, VPC Peering) so Workers in one region can reach the VPC Endpoint in the other. + +## Single-cloud HA on GCP Private Service Connect -Private connectivity is not yet offered for GCP Multi-region Namespaces. +For GCP-only HA, the same model applies, but use a Cloud DNS private zone for `region.tmprl.cloud` and point each `gcp-.region.tmprl.cloud` record at the local PSC endpoint IP address. + +| Record name | Record type | Value (your PSC endpoint IP) | +| ---------------------------------------- | ----------- | ----------------------------------- | +| `gcp-us-central1.region.tmprl.cloud` | A | `10.x.x.x` (PSC endpoint IP) | +| `gcp-us-east1.region.tmprl.cloud` | A | `10.x.x.x` (PSC endpoint IP) | + +A Connectivity Rule is required for each PSC connection — see [GCP PSC setup](/cloud/connectivity/gcp-connectivity) and [Connectivity Rules](/cloud/connectivity#connectivity-rules). + +## Multi-cloud HA (AWS PrivateLink + GCP Private Service Connect) + +If your replicas span clouds — for example, AWS `us-east-1` (active) and GCP `us-east4` (passive) — your Workers need a way to reach the active replica regardless of which cloud it's in. The Temporal-managed CNAME rewrites still work the same way; the harder problems are on the client side. + +Plan for these three things: + +1. **DNS overrides for both clouds.** Your private DNS for `region.tmprl.cloud` needs entries for both the AWS region (CNAME → AWS VPCE) and the GCP region (A → PSC IP). This typically means a Route 53 private hosted zone in your AWS Worker VPCs *and* a Cloud DNS private zone in your GCP Worker network — both for the same `region.tmprl.cloud` parent — each with the records relevant to the cloud the Workers run in. +2. **Worker reachability across clouds.** Your AWS-resident Workers must be able to reach the GCP PSC endpoint when GCP is active, and vice versa. Options include: + - Run Workers in both clouds (preferred — simplest, lowest latency, matches the failover model). + - Establish cross-cloud connectivity (e.g., AWS Transit Gateway + GCP Cloud Interconnect, or a third-party transit) so Workers in one cloud can resolve and reach the other cloud's private endpoint. +3. **Connectivity Rules in both regions.** GCP PSC requires a Connectivity Rule. AWS PrivateLink does not, but if you want to enforce private-only access, add one for the AWS side as well so the Namespace is private-only in both regions. + +:::caution Alpine/musl + GCP PSC: missing AAAA records can break Workers + +GCP Private Service Connect endpoints return only A (IPv4) records — there is no AAAA (IPv6) record. Most Linux distributions handle a missing AAAA gracefully, but **Alpine Linux's musl resolver returns a SERVFAIL** when AAAA is missing, which can cause Temporal SDK clients to fail name resolution after a failover from AWS to GCP. + +If you run Workers on Alpine and use multi-cloud HA, either: + +- Switch the Worker base image to a glibc-based distribution (Debian, Ubuntu, distroless), or +- Configure your application/runtime to disable AAAA lookups (e.g., set `GODEBUG=netdns=go+v4` for Go, or prefer IPv4 in the Java/Node/Python runtimes you use). ::: -To set up the DNS override, configure specific regions to target the internal VPC Endpoint IP addresses. -For example, you might set aws-us-west-1.region.tmprl.cloud to target 192.168.1.2. -In AWS, this can be done using a Route 53 private hosted zone for `region.tmprl.cloud`. -Link that private zone to the VPCs you use for Workers. +## Test failover before you depend on it + +Failover is the only thing High Availability features exist to do — and DNS, cross-region or cross-cloud reachability, and Connectivity Rule coverage are exactly the kinds of configuration that look correct on paper and break under failover. Test it in a non-production Namespace first. -When your Workers connect to the Namespace, they first resolve the `..` record. -This points to `.region.tmprl.cloud`, which then resolves to your internal IP addresses. +A reasonable validation plan: -Consider how you’ll configure Workers for this setup. -You can either have Workers run in both regions continuously or establish connectivity between regions using Transit Gateway or VPC Peering. -This way, Workers can access the newly activated region once failover occurs. +1. Set up the HA Namespace and the private connectivity for both regions, including all DNS overrides. +2. Run Workers continuously in **both** regions (or arrange cross-region connectivity). +3. Trigger a manual failover from the Web UI or `tcld` and verify: + - DNS for `..tmprl.cloud` resolves to the new region within ~30 seconds. + - Workers in both regions are picking up tasks. + - SDK clients connect successfully (no `Name resolution failed`, `connection reset by peer`, or `context deadline exceeded` errors). +4. Trigger a failback to the original region and verify the same. +5. For multi-cloud HA, repeat with each cloud as the active replica, including from base images (Alpine, distroless) you actually use in production. + +If a real failover finds a configuration gap that wasn't tested, recovery typically requires changes on the client side that are hard to make under pressure. ## Available regions, PrivateLink endpoints, and DNS record overrides @@ -75,16 +139,16 @@ The `sa-east-1` region is not yet available for use with Multi-region Namespaces ::: -The following table lists the available Temporal regions, PrivateLink endpoints, and DNS record overrides: +The following tables list the available Temporal regions and the DNS record overrides used for HA + private connectivity: + +### AWS regions and PrivateLink endpoints +### GCP regions and Private Service Connect endpoints + + -When using a Namespace with High Availability features, the Namespace's DNS record `..` points to a regional DNS record in the format `.region.`. -Here, `` is the currently active region for your Namespace. +When using a Namespace with High Availability features, the Namespace's DNS record `..tmprl.cloud` points to a regional DNS record in the format `-.region.tmprl.cloud`, where `-` is the currently active region for your Namespace. -During failover, Temporal Cloud changes the target of the Namespace DNS record from one region to another. -Namespace DNS records are configured with a 15 second TTL. -Any DNS cache should re-resolve the record within this time. -As a rule of thumb, receiving an updated DNS record takes about twice (2x) the TTL. -Clients should converge to the newly targeted region within, at most, a 30-second delay. +During failover, Temporal Cloud changes the target of the Namespace DNS record from one region to another. Namespace DNS records are configured with a 15-second TTL. Any DNS cache should re-resolve the record within this time. As a rule of thumb, receiving an updated DNS record takes about twice (2x) the TTL — clients should converge to the newly targeted region within, at most, a 30-second delay, assuming their resolver and language runtime honor the TTL.