Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions platform-cloud/cloud-sidebar.json
Original file line number Diff line number Diff line change
Expand Up @@ -69,6 +69,7 @@
"compute-envs/seqera-compute",
"compute-envs/aws-batch",
"compute-envs/aws-cloud",
"compute-envs/intelligent-compute",
"compute-envs/azure-batch",
"compute-envs/azure-cloud",
"compute-envs/google-cloud-batch",
Expand Down
9 changes: 9 additions & 0 deletions platform-cloud/docs/compute-envs/google-cloud-batch.md
Original file line number Diff line number Diff line change
Expand Up @@ -112,6 +112,15 @@ Setting up WIF requires the following steps in the GCP Console:
tityPools/{POOL}/providers/{PROVIDER}`. If you specify a custom value, it must match exactly what you enter in the Token audience field when creating the Google WIF credential in Seqera.
4. Define an attribute mapping and condition. At a minimum set `google.subject=assertion.sub`. This maps the subject claim from Seqera's JWT to GCP's identity space. For more information see [here](https://docs.cloud.google.com/iam/docs/workload-identity-federation-with-other-providers#mappings-and-conditions). You may see a pop-up asking to configure your application and provide an OIDC ID token path. This pop-up can be dismissed.
5. Grant `roles/iam.workloadIdentityUser` on the service account that WIF will impersonate to the Workload Identity Pool principal. This can be set for all pool identities or for a specific workspace. If you have not yet created a service account do so following the guidelines above.
6. If you use the same WIF credential for Data Explorer, grant `roles/iam.serviceAccountTokenCreator` on the service account to itself:

```bash
gcloud iam service-accounts add-iam-policy-binding SA_EMAIL \
--member="serviceAccount:SA_EMAIL" \
--role="roles/iam.serviceAccountTokenCreator"
```

Replace `SA_EMAIL` with the service account email. Without this role, viewing or downloading file contents in Data Explorer fails with a signing error. Running pipelines is not affected.

After setting up WIF in the GCP Console, you need the following information to create a credential in Seqera Platform:

Expand Down
142 changes: 142 additions & 0 deletions platform-cloud/docs/compute-envs/intelligent-compute.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,142 @@
---
title: "Intelligent Compute"
description: "Set up Seqera Intelligent Compute on an AWS Cloud compute environment"
date created: "2026-06-12"
last updated: "2026-06-12"
tags: [intelligent compute, aws, ecs, compute environments]
toc_min_heading_level: 2
toc_max_heading_level: 4
---

import CodeBlock from '@theme/CodeBlock';
import AwsCloudIntelligentComputePolicy from './_policies/aws-cloud-intelligent-compute-policy.json?raw';

:::info[Preview]
Seqera Intelligent Compute is in preview and must be enabled for your organization by Seqera before you can use it. Contact your account manager to request access.
:::

:::caution
Intelligent Compute may assign different CPU and memory values to tasks than those specified in your pipeline's `process` directives. The scheduler selects the most cost-effective instance that meets each task's resource request rather than provisioning exactly what the directive specifies.
:::

Intelligent Compute is supported on **AWS Cloud compute environments only**.

## What is Intelligent Compute

Intelligent Compute is a scheduling service that runs Nextflow pipelines on a Seqera-managed Amazon ECS cluster. It allocates compute resources based on what each task actually needs rather than what the pipeline requests, reducing cost and improving utilization across a run.

Unlike the standard AWS Cloud compute environment, which runs a pipeline on a single EC2 instance with a local executor, Intelligent Compute provisions and manages multi-node clusters. This allows pipelines to scale beyond a single instance while preserving fast startup times.

When Intelligent Compute is enabled on an AWS Cloud compute environment, Seqera provisions and manages the following resources in your AWS account on first use:

- An Amazon ECS cluster per compute environment configuration
- ECS capacity providers (Managed Instances or Auto Scaling Groups)
- ECS task definitions per container image and resource shape
- IAM roles for ECS task execution, EC2 instance profiles, and infrastructure management
- CloudWatch log groups under `/seqera/sched`

All managed resources use the `seqera-sched-` prefix. Seqera creates them on first use and removes them automatically when no longer needed.

## IAM permissions

In addition to the [standard AWS Cloud IAM permissions](./aws-cloud#required-platform-iam-permissions), Intelligent Compute requires an additional policy attached to the same IAM user or role that Seqera uses to access your AWS account.

<CodeBlock language="json">{AwsCloudIntelligentComputePolicy}</CodeBlock>

[Download aws-cloud-intelligent-compute-policy.json](./_policies/aws-cloud-intelligent-compute-policy.json)

### What each permission group does

| Group | Purpose |
|-------|---------|
| `ECSScopedOperations` | Create, delete, describe, and tag ECS clusters, capacity providers, and tasks. Scoped to `seqera-sched-*` resources. |
| `ECSUnscopedOperations` | Register, deregister, list, and describe ECS task definitions. ECS task definition APIs do not support resource-level permissions. |
| `IAMRoleManagement` | Create, update, and delete IAM roles and instance profiles scoped to `seqera-sched-*`. Seqera creates four role types on first use: execution role, infrastructure role, per-cluster instance role, and per-cluster task role. |
| `PassRoleToECS` | Pass `seqera-sched-*` and `TowerForge-*` roles to ECS, ECS tasks, and EC2. Required to attach roles to ECS infrastructure and task definitions. |
| `ServiceLinkedRoles` | Create service-linked roles for ECS, autoscaling, and Spot. Required only if these roles do not already exist in your account. |
| `CloudWatchLogs` | Create and manage log groups under `/seqera/sched`, and read log events. Task stdout and stderr are written to CloudWatch. |
| `EC2NetworkDiscovery` | Describe VPCs, subnets, security groups, and route tables. Create security groups and VPC endpoints. Used for VPC auto-discovery and network setup. |
| `ECRAccess` | Authorize ECR and pull container images. ECS tasks pull images from ECR. |
| `S3Access` | Read objects and list buckets. Used to read Fusion trace files and pipeline work directory content. |
| `ASGEC2Operations` | Describe instance types and create or delete EC2 launch templates. Required only for Auto Scaling Group-backed clusters. |
| `ASGManagement` | Create, update, and delete Auto Scaling Groups scoped to `seqera-sched-*`. Required only for Auto Scaling Group-backed clusters. |
| `ASGDescribe` | Describe Auto Scaling Groups. Required only for Auto Scaling Group-backed clusters. |
| `SSMECSOptimizedAmi` | Read the ECS-optimized AMI ID from SSM Parameter Store. Used to look up the latest Amazon Linux 2023 ECS-optimized AMI. |
| `CostExplorer` | Query `ce:GetCostAndUsage`. Used to display cost predictions at pipeline launch. If this permission is absent, cost predictions do not appear. No error is surfaced to users. |

**Conditional statements:**
- `ASGEC2Operations`, `ASGManagement`, and `ASGDescribe` are required only if Auto Scaling Group-backed clusters are enabled. You can omit them for Managed Instances deployments.
- `ServiceLinkedRoles` is required only if the listed service-linked roles do not already exist in your AWS account.
- `CostExplorer` is required only if you want cost predictions shown at pipeline launch.

### Create the additional IAM policy

1. Open the [AWS IAM console](https://console.aws.amazon.com/iam).
1. Select **Policies** under **Access management**, then select **Create policy**.
1. Select the **JSON** tab and paste the Intelligent Compute policy.
1. Select **Next**, enter a name (for example, `SeqeraIntelligentComputePolicy`), then select **Create policy**.
1. Attach the policy to the same IAM user or role that Seqera uses for your AWS Cloud compute environment.

## Set up an AWS Cloud compute environment with Intelligent Compute

Confirm with your account manager that Intelligent Compute is enabled for your organization before proceeding.

1. In your Seqera workspace, select **Compute Environments**, then **Add compute environment**.
1. Enter a name and select **AWS Cloud** as the platform.
1. Select your AWS credentials. The credential must have both the standard AWS Cloud permissions and the Intelligent Compute permissions attached.
1. Select the **Region** where the ECS cluster will be provisioned.
1. Enter a **Work directory** (S3 URI, for example `s3://my-bucket/work`).
1. Under **Intelligent Compute**, enable the **Seqera Intelligent Compute** toggle.
1. Configure the [Intelligent Compute options](#configuration-options) below as needed.
1. Select **Add**.

Seqera validates credentials and configuration on save. On first use, it provisions the required IAM roles and ECS cluster in your account. Provisioning is automatic and does not require additional steps.

## Resource metrics

The **Metrics** tab for a workflow run on Intelligent Compute shows three resource values for CPU and memory: **Requested**, **Allocated**, and **Used**.

| Metric | Source | What it represents |
|--------|--------|-------------------|
| **Requested** | Pipeline `process` directives | The CPU and memory your pipeline asked for, as written in your `process` directives (for example, `cpus = 4`, `memory = 8 GB`). |
| **Allocated** | Scheduler decision | The CPU and memory the scheduler actually assigned to the task container. Intelligent Compute may assign values different from what was requested β€” it selects the most cost-effective instance shape that satisfies the task's requirements. |
| **Used** | Nextflow trace data | The CPU and memory the task actually consumed, measured from Nextflow's trace metrics (`pcpu` Γ— `realtime` for CPU, `peakRss` for memory). Requires Fusion to be enabled. Absent for tasks that did not produce trace data. |

**How to read the numbers:**

- If **Requested** is much higher than **Allocated**, the scheduler found a more efficient instance shape than your directives implied.
- If **Allocated** is much higher than **Used**, the task ran with significant idle headroom. You may be able to reduce your process resource directives on future runs to lower cost.
- If **Used** is close to **Allocated**, resource utilization is near-optimal for that task.

## Configuration options

| Option | Values | Default | Description |
|--------|--------|---------|-------------|
| **Seqera Intelligent Compute** | Enabled / Disabled | Disabled | Enables the Intelligent Compute scheduler for this compute environment. This option only appears if Intelligent Compute is enabled for your organization. |
| **Provisioning model** | `spotFirst`, `spot`, `ondemand` | `spotFirst` | Instance procurement strategy. `spotFirst` uses Spot instances and falls back to On-Demand if Spot capacity is unavailable. `spot` uses Spot instances only. `ondemand` uses On-Demand instances only. |
| **Instance types** | Comma-separated EC2 instance type identifiers (for example, `m5.xlarge, c5.2xlarge`) | Empty | Restricts which instance types the scheduler can select. When empty, the scheduler selects the most cost-effective type for each task automatically. Specifying types here overrides automatic selection. |

## Task and run statuses

### Task statuses

| Status | Description |
|--------|-------------|
| `SUBMITTED` | Task is queued or submitted to the compute backend. |
| `RUNNING` | Task is actively executing on a compute instance. |
| `SUCCEEDED` | Task completed with exit code 0. |
| `FAILED` | Task failed. This covers non-retriable execution failures (non-zero exit code, container startup errors) and spot quota exhaustion after retries are exhausted. |

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The "spot quota exhaustion after retries are exhausted" clause is misleading and should be dropped. In the code the spot-quota path (SPOT_QUOTA) is non-terminal β€” it always retries on the same cluster after a cooldown and never exhausts into FAILED. It may briefly surface as FAILED between attempts, but that's a transient per-attempt blip, not a terminal outcome.

The rest is accurate. Suggested:

Task failed. This covers non-retriable execution failures such as a non-zero exit code or container startup errors.

| `CANCELLED` | Task was cancelled by the user. |
| `PREEMPTED` | The Spot instance running this task was reclaimed by AWS. The scheduler retries the task automatically. If the retry limit is reached, the task transitions to `FAILED`. |

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PREEMPTED description is inaccurate. Verified against the scheduler code (TaskStatus.isTerminal() + the ECS/VM retry-exhaustion paths): PREEMPTED is itself a terminal status β€” it does not transition to FAILED.

Retries happen internally as attempts while the task stays running; when spot attempts (and the one-shot on-demand fallback for spotFirst) are all exhausted, the terminal status stays PREEMPTED (error cause SPOT_INTERRUPTION_EXHAUSTED). The ECS code documents this contract explicitly.

Suggested wording:

The Spot instance running this task was reclaimed by AWS. The scheduler retries the task automatically on Spot, then falls back to On-Demand for spotFirst. If all retries are exhausted, the task ends as PREEMPTED.

| `UNSCHEDULABLE` | No instance type could satisfy the task's placement constraints. This occurs when the requested resources exceed what any available instance type can provide, or when specified instance types are unavailable in the region. Check your **Instance types** configuration and the `cpus` and `memory` directives in the failing process. |
| `UNKNOWN` | Task status could not be determined, typically due to a transient backend failure. |

### Run statuses

| Status | Description |
|--------|-------------|
| `ACTIVE` | The run is in progress. |
| `TERMINATING` | The run is shutting down β€” final tasks are completing or being cancelled. |
| `TERMINATED` | The run ended normally. |
| `FAILED` | The run failed. |
| `DANGLING` | The Nextflow process stopped sending heartbeats. This typically means the launcher process crashed or lost connectivity. Tasks already dispatched to ECS may still be running. Check CloudWatch logs under `/seqera/sched` for details. |

@jonmarti jonmarti Jun 12, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re: "Are these run statuses accurate?" + "how long should a user wait?" β€” verified the 5 run statuses against the scheduler code (RunStatus.java); all match: ACTIVE, TERMINATING, TERMINATED, FAILED, DANGLING. βœ…

DANGLING needs more prescriptive wording though. From the code:

  • A run is marked DANGLING only after 1 hour without a heartbeat (sched.cron.cleanup.session.stale-timeout: 1h). So the 1h grace is already built in β€” it's not a transient blip.
  • DANGLING is recoverable: if the launcher reconnects, the run automatically returns to ACTIVE. The next request from the launcher triggers it.
  • It never auto-transitions to FAILED β€” it stays DANGLING until it either recovers or the record is purged at 90-day retention. So DANGLING itself is the signal to act; there's no further state to wait for.

Suggested wording:

A run is marked DANGLING after 1 hour without a heartbeat from the Nextflow launcher β€” typically because the launcher process crashed or lost connectivity. If the launcher reconnects, the run automatically returns to ACTIVE. If it stays DANGLING, the launcher has not recovered: inspect the Nextflow launcher logs and re-launch the pipeline (the run will not resume on its own). Tasks already dispatched to ECS may still be running; check CloudWatch logs under /seqera/sched for details.

9 changes: 9 additions & 0 deletions platform-enterprise_docs/compute-envs/google-cloud-batch.md
Original file line number Diff line number Diff line change
Expand Up @@ -107,6 +107,15 @@ Workload Identity Federation (WIF) is the recommended authentication method for
3. Set the Allowed audiences. If left empty, GCP derives a default audience from the provider resource path in the format `//iam.googleapis.com/projects/{PROJECT}/locations/global/workloadIdentityPools/{POOL}/providers/{PROVIDER}`. If you specify a custom value, it must match exactly what you enter in the Token audience field when creating the Google WIF credential in Seqera.
4. Define an attribute mapping and condition. At a minimum set `google.subject=assertion.sub`. This maps the subject claim from Seqera's JWT to GCP's identity space. For more information see [here](https://docs.cloud.google.com/iam/docs/workload-identity-federation-with-other-providers#mappings-and-conditions)
5. Grant `roles/iam.workloadIdentityUser` on the service account created above to the Workload Identity Pool principal. This can be set for all pool identities or for a specific workspace.
6. If you use the same WIF credential for Data Explorer, grant `roles/iam.serviceAccountTokenCreator` on the service account to itself:

```bash
gcloud iam service-accounts add-iam-policy-binding SA_EMAIL \
--member="serviceAccount:SA_EMAIL" \
--role="roles/iam.serviceAccountTokenCreator"
```

Replace `SA_EMAIL` with the service account email. Without this role, viewing or downloading file contents in Data Explorer fails with a signing error. Running pipelines is not affected.

WIF requires an OIDC signing key and for Seqera Platform's OIDC provider to be configured. See [Cryptographic options](https://docs.seqera.io/platform-enterprise/enterprise/configuration/overview#cryptographic-options).

Expand Down
Loading