-
Notifications
You must be signed in to change notification settings - Fork 7
docs: Intelligent Compute page for AWS Cloud (preview) #1565
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. Weβll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Changes from all commits
a6876d1
26cd6d2
d9cdab0
9c443b4
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,142 @@ | ||
| --- | ||
| title: "Intelligent Compute" | ||
| description: "Set up Seqera Intelligent Compute on an AWS Cloud compute environment" | ||
| date created: "2026-06-12" | ||
| last updated: "2026-06-12" | ||
| tags: [intelligent compute, aws, ecs, compute environments] | ||
| toc_min_heading_level: 2 | ||
| toc_max_heading_level: 4 | ||
| --- | ||
|
|
||
| import CodeBlock from '@theme/CodeBlock'; | ||
| import AwsCloudIntelligentComputePolicy from './_policies/aws-cloud-intelligent-compute-policy.json?raw'; | ||
|
|
||
| :::info[Preview] | ||
| Seqera Intelligent Compute is in preview and must be enabled for your organization by Seqera before you can use it. Contact your account manager to request access. | ||
| ::: | ||
|
|
||
| :::caution | ||
| Intelligent Compute may assign different CPU and memory values to tasks than those specified in your pipeline's `process` directives. The scheduler selects the most cost-effective instance that meets each task's resource request rather than provisioning exactly what the directive specifies. | ||
| ::: | ||
|
|
||
| Intelligent Compute is supported on **AWS Cloud compute environments only**. | ||
|
|
||
| ## What is Intelligent Compute | ||
|
|
||
| Intelligent Compute is a scheduling service that runs Nextflow pipelines on a Seqera-managed Amazon ECS cluster. It allocates compute resources based on what each task actually needs rather than what the pipeline requests, reducing cost and improving utilization across a run. | ||
|
|
||
| Unlike the standard AWS Cloud compute environment, which runs a pipeline on a single EC2 instance with a local executor, Intelligent Compute provisions and manages multi-node clusters. This allows pipelines to scale beyond a single instance while preserving fast startup times. | ||
|
|
||
| When Intelligent Compute is enabled on an AWS Cloud compute environment, Seqera provisions and manages the following resources in your AWS account on first use: | ||
|
|
||
| - An Amazon ECS cluster per compute environment configuration | ||
| - ECS capacity providers (Managed Instances or Auto Scaling Groups) | ||
| - ECS task definitions per container image and resource shape | ||
| - IAM roles for ECS task execution, EC2 instance profiles, and infrastructure management | ||
| - CloudWatch log groups under `/seqera/sched` | ||
|
|
||
| All managed resources use the `seqera-sched-` prefix. Seqera creates them on first use and removes them automatically when no longer needed. | ||
|
|
||
| ## IAM permissions | ||
|
|
||
| In addition to the [standard AWS Cloud IAM permissions](./aws-cloud#required-platform-iam-permissions), Intelligent Compute requires an additional policy attached to the same IAM user or role that Seqera uses to access your AWS account. | ||
|
|
||
| <CodeBlock language="json">{AwsCloudIntelligentComputePolicy}</CodeBlock> | ||
|
|
||
| [Download aws-cloud-intelligent-compute-policy.json](./_policies/aws-cloud-intelligent-compute-policy.json) | ||
|
|
||
| ### What each permission group does | ||
|
|
||
| | Group | Purpose | | ||
| |-------|---------| | ||
| | `ECSScopedOperations` | Create, delete, describe, and tag ECS clusters, capacity providers, and tasks. Scoped to `seqera-sched-*` resources. | | ||
| | `ECSUnscopedOperations` | Register, deregister, list, and describe ECS task definitions. ECS task definition APIs do not support resource-level permissions. | | ||
| | `IAMRoleManagement` | Create, update, and delete IAM roles and instance profiles scoped to `seqera-sched-*`. Seqera creates four role types on first use: execution role, infrastructure role, per-cluster instance role, and per-cluster task role. | | ||
| | `PassRoleToECS` | Pass `seqera-sched-*` and `TowerForge-*` roles to ECS, ECS tasks, and EC2. Required to attach roles to ECS infrastructure and task definitions. | | ||
| | `ServiceLinkedRoles` | Create service-linked roles for ECS, autoscaling, and Spot. Required only if these roles do not already exist in your account. | | ||
| | `CloudWatchLogs` | Create and manage log groups under `/seqera/sched`, and read log events. Task stdout and stderr are written to CloudWatch. | | ||
| | `EC2NetworkDiscovery` | Describe VPCs, subnets, security groups, and route tables. Create security groups and VPC endpoints. Used for VPC auto-discovery and network setup. | | ||
| | `ECRAccess` | Authorize ECR and pull container images. ECS tasks pull images from ECR. | | ||
| | `S3Access` | Read objects and list buckets. Used to read Fusion trace files and pipeline work directory content. | | ||
| | `ASGEC2Operations` | Describe instance types and create or delete EC2 launch templates. Required only for Auto Scaling Group-backed clusters. | | ||
| | `ASGManagement` | Create, update, and delete Auto Scaling Groups scoped to `seqera-sched-*`. Required only for Auto Scaling Group-backed clusters. | | ||
| | `ASGDescribe` | Describe Auto Scaling Groups. Required only for Auto Scaling Group-backed clusters. | | ||
| | `SSMECSOptimizedAmi` | Read the ECS-optimized AMI ID from SSM Parameter Store. Used to look up the latest Amazon Linux 2023 ECS-optimized AMI. | | ||
| | `CostExplorer` | Query `ce:GetCostAndUsage`. Used to display cost predictions at pipeline launch. If this permission is absent, cost predictions do not appear. No error is surfaced to users. | | ||
|
|
||
| **Conditional statements:** | ||
| - `ASGEC2Operations`, `ASGManagement`, and `ASGDescribe` are required only if Auto Scaling Group-backed clusters are enabled. You can omit them for Managed Instances deployments. | ||
| - `ServiceLinkedRoles` is required only if the listed service-linked roles do not already exist in your AWS account. | ||
| - `CostExplorer` is required only if you want cost predictions shown at pipeline launch. | ||
|
|
||
| ### Create the additional IAM policy | ||
|
|
||
| 1. Open the [AWS IAM console](https://console.aws.amazon.com/iam). | ||
| 1. Select **Policies** under **Access management**, then select **Create policy**. | ||
| 1. Select the **JSON** tab and paste the Intelligent Compute policy. | ||
| 1. Select **Next**, enter a name (for example, `SeqeraIntelligentComputePolicy`), then select **Create policy**. | ||
| 1. Attach the policy to the same IAM user or role that Seqera uses for your AWS Cloud compute environment. | ||
|
|
||
| ## Set up an AWS Cloud compute environment with Intelligent Compute | ||
|
|
||
| Confirm with your account manager that Intelligent Compute is enabled for your organization before proceeding. | ||
|
|
||
| 1. In your Seqera workspace, select **Compute Environments**, then **Add compute environment**. | ||
| 1. Enter a name and select **AWS Cloud** as the platform. | ||
| 1. Select your AWS credentials. The credential must have both the standard AWS Cloud permissions and the Intelligent Compute permissions attached. | ||
| 1. Select the **Region** where the ECS cluster will be provisioned. | ||
| 1. Enter a **Work directory** (S3 URI, for example `s3://my-bucket/work`). | ||
| 1. Under **Intelligent Compute**, enable the **Seqera Intelligent Compute** toggle. | ||
| 1. Configure the [Intelligent Compute options](#configuration-options) below as needed. | ||
| 1. Select **Add**. | ||
|
|
||
| Seqera validates credentials and configuration on save. On first use, it provisions the required IAM roles and ECS cluster in your account. Provisioning is automatic and does not require additional steps. | ||
|
|
||
| ## Resource metrics | ||
|
|
||
| The **Metrics** tab for a workflow run on Intelligent Compute shows three resource values for CPU and memory: **Requested**, **Allocated**, and **Used**. | ||
|
|
||
| | Metric | Source | What it represents | | ||
| |--------|--------|-------------------| | ||
| | **Requested** | Pipeline `process` directives | The CPU and memory your pipeline asked for, as written in your `process` directives (for example, `cpus = 4`, `memory = 8 GB`). | | ||
| | **Allocated** | Scheduler decision | The CPU and memory the scheduler actually assigned to the task container. Intelligent Compute may assign values different from what was requested β it selects the most cost-effective instance shape that satisfies the task's requirements. | | ||
| | **Used** | Nextflow trace data | The CPU and memory the task actually consumed, measured from Nextflow's trace metrics (`pcpu` Γ `realtime` for CPU, `peakRss` for memory). Requires Fusion to be enabled. Absent for tasks that did not produce trace data. | | ||
|
|
||
| **How to read the numbers:** | ||
|
|
||
| - If **Requested** is much higher than **Allocated**, the scheduler found a more efficient instance shape than your directives implied. | ||
| - If **Allocated** is much higher than **Used**, the task ran with significant idle headroom. You may be able to reduce your process resource directives on future runs to lower cost. | ||
| - If **Used** is close to **Allocated**, resource utilization is near-optimal for that task. | ||
|
|
||
| ## Configuration options | ||
|
|
||
| | Option | Values | Default | Description | | ||
| |--------|--------|---------|-------------| | ||
| | **Seqera Intelligent Compute** | Enabled / Disabled | Disabled | Enables the Intelligent Compute scheduler for this compute environment. This option only appears if Intelligent Compute is enabled for your organization. | | ||
| | **Provisioning model** | `spotFirst`, `spot`, `ondemand` | `spotFirst` | Instance procurement strategy. `spotFirst` uses Spot instances and falls back to On-Demand if Spot capacity is unavailable. `spot` uses Spot instances only. `ondemand` uses On-Demand instances only. | | ||
| | **Instance types** | Comma-separated EC2 instance type identifiers (for example, `m5.xlarge, c5.2xlarge`) | Empty | Restricts which instance types the scheduler can select. When empty, the scheduler selects the most cost-effective type for each task automatically. Specifying types here overrides automatic selection. | | ||
|
|
||
| ## Task and run statuses | ||
|
|
||
| ### Task statuses | ||
|
|
||
| | Status | Description | | ||
| |--------|-------------| | ||
| | `SUBMITTED` | Task is queued or submitted to the compute backend. | | ||
| | `RUNNING` | Task is actively executing on a compute instance. | | ||
| | `SUCCEEDED` | Task completed with exit code 0. | | ||
| | `FAILED` | Task failed. This covers non-retriable execution failures (non-zero exit code, container startup errors) and spot quota exhaustion after retries are exhausted. | | ||
| | `CANCELLED` | Task was cancelled by the user. | | ||
| | `PREEMPTED` | The Spot instance running this task was reclaimed by AWS. The scheduler retries the task automatically. If the retry limit is reached, the task transitions to `FAILED`. | | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Retries happen internally as attempts while the task stays running; when spot attempts (and the one-shot on-demand fallback for Suggested wording:
|
||
| | `UNSCHEDULABLE` | No instance type could satisfy the task's placement constraints. This occurs when the requested resources exceed what any available instance type can provide, or when specified instance types are unavailable in the region. Check your **Instance types** configuration and the `cpus` and `memory` directives in the failing process. | | ||
| | `UNKNOWN` | Task status could not be determined, typically due to a transient backend failure. | | ||
|
|
||
| ### Run statuses | ||
|
|
||
| | Status | Description | | ||
| |--------|-------------| | ||
| | `ACTIVE` | The run is in progress. | | ||
| | `TERMINATING` | The run is shutting down β final tasks are completing or being cancelled. | | ||
| | `TERMINATED` | The run ended normally. | | ||
| | `FAILED` | The run failed. | | ||
| | `DANGLING` | The Nextflow process stopped sending heartbeats. This typically means the launcher process crashed or lost connectivity. Tasks already dispatched to ECS may still be running. Check CloudWatch logs under `/seqera/sched` for details. | | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Re: "Are these run statuses accurate?" + "how long should a user wait?" β verified the 5 run statuses against the scheduler code ( DANGLING needs more prescriptive wording though. From the code:
Suggested wording:
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The "spot quota exhaustion after retries are exhausted" clause is misleading and should be dropped. In the code the spot-quota path (
SPOT_QUOTA) is non-terminal β it always retries on the same cluster after a cooldown and never exhausts intoFAILED. It may briefly surface asFAILEDbetween attempts, but that's a transient per-attempt blip, not a terminal outcome.The rest is accurate. Suggested: