diff --git a/.github/doc-tags-allowed.txt b/.github/doc-tags-allowed.txt index b3f773621..a1d040716 100644 --- a/.github/doc-tags-allowed.txt +++ b/.github/doc-tags-allowed.txt @@ -141,6 +141,7 @@ image input installation integration +intelligent compute interactive jupyter k8s diff --git a/platform-cloud/cloud-sidebar.json b/platform-cloud/cloud-sidebar.json index 8eef893f4..b2238ce14 100644 --- a/platform-cloud/cloud-sidebar.json +++ b/platform-cloud/cloud-sidebar.json @@ -69,6 +69,7 @@ "compute-envs/seqera-compute", "compute-envs/aws-batch", "compute-envs/aws-cloud", + "compute-envs/intelligent-compute", "compute-envs/azure-batch", "compute-envs/azure-cloud", "compute-envs/google-cloud-batch", diff --git a/platform-cloud/docs/compute-envs/google-cloud-batch.md b/platform-cloud/docs/compute-envs/google-cloud-batch.md index 13a2aa078..5a008f10c 100644 --- a/platform-cloud/docs/compute-envs/google-cloud-batch.md +++ b/platform-cloud/docs/compute-envs/google-cloud-batch.md @@ -112,6 +112,15 @@ Setting up WIF requires the following steps in the GCP Console: tityPools/{POOL}/providers/{PROVIDER}`. If you specify a custom value, it must match exactly what you enter in the Token audience field when creating the Google WIF credential in Seqera. 4. Define an attribute mapping and condition. At a minimum set `google.subject=assertion.sub`. This maps the subject claim from Seqera's JWT to GCP's identity space. For more information see [here](https://docs.cloud.google.com/iam/docs/workload-identity-federation-with-other-providers#mappings-and-conditions). You may see a pop-up asking to configure your application and provide an OIDC ID token path. This pop-up can be dismissed. 5. Grant `roles/iam.workloadIdentityUser` on the service account that WIF will impersonate to the Workload Identity Pool principal. This can be set for all pool identities or for a specific workspace. If you have not yet created a service account do so following the guidelines above. +6. If you use the same WIF credential for Data Explorer, grant `roles/iam.serviceAccountTokenCreator` on the service account to itself: + + ```bash + gcloud iam service-accounts add-iam-policy-binding SA_EMAIL \ + --member="serviceAccount:SA_EMAIL" \ + --role="roles/iam.serviceAccountTokenCreator" + ``` + + Replace `SA_EMAIL` with the service account email. Without this role, viewing or downloading file contents in Data Explorer fails with a signing error. Running pipelines is not affected. After setting up WIF in the GCP Console, you need the following information to create a credential in Seqera Platform: diff --git a/platform-cloud/docs/compute-envs/intelligent-compute.mdx b/platform-cloud/docs/compute-envs/intelligent-compute.mdx new file mode 100644 index 000000000..bef759639 --- /dev/null +++ b/platform-cloud/docs/compute-envs/intelligent-compute.mdx @@ -0,0 +1,150 @@ +--- +title: "Intelligent Compute" +description: "Set up Seqera Intelligent Compute on an AWS Cloud compute environment" +date created: "2026-06-12" +last updated: "2026-06-12" +tags: [intelligent compute, aws, ecs, compute environments] +toc_min_heading_level: 2 +toc_max_heading_level: 4 +--- + +import CodeBlock from '@theme/CodeBlock'; +import AwsCloudIntelligentComputePolicy from './_policies/aws-cloud-intelligent-compute-policy.json?raw'; + +:::info[Preview] +Seqera Intelligent Compute is in preview. Seqera must enable it for your organization before you can use it. Contact your account manager to request access. +::: + +:::caution +Intelligent Compute may assign different CPU and memory values to tasks than those specified in your pipeline's `process` directives. The scheduler selects the most cost-effective instance that meets each task's resource request rather than provisioning exactly what the directive specifies. +::: + +Intelligent Compute is a scheduling service that runs Nextflow pipelines on a Seqera-managed Amazon ECS cluster. It allocates compute resources based on what each task needs rather than what the pipeline requests, which reduces cost and improves utilization across a run. Intelligent Compute is supported on AWS Cloud compute environments only. + +Unlike the standard AWS Cloud compute environment, which runs a pipeline on a single EC2 instance with a local executor, Intelligent Compute provisions and manages multi-node clusters. Pipelines can scale beyond a single instance while keeping startup times short. + +When Intelligent Compute is enabled on an AWS Cloud compute environment, Seqera provisions and manages the following resources in your AWS account on first use: + +- An Amazon ECS cluster per compute environment configuration +- ECS capacity providers (Managed Instances or Auto Scaling Groups) +- ECS task definitions per container image and resource shape +- IAM roles for ECS task execution, EC2 instance profiles, and infrastructure management +- CloudWatch log groups under `/seqera/sched` + +All managed resources use the `seqera-sched-` prefix. Seqera creates them on first use and removes them automatically when no longer needed. + +## IAM permissions + +In addition to the [standard AWS Cloud IAM permissions](./aws-cloud#required-platform-iam-permissions), Intelligent Compute requires an additional policy attached to the same IAM user or role that Seqera uses to access your AWS account. + +
+ IAM policy JSON + {AwsCloudIntelligentComputePolicy} +
+ +[Download aws-cloud-intelligent-compute-policy.json](./_policies/aws-cloud-intelligent-compute-policy.json) + +### Permission groups + +| Group | Purpose | +|-------|---------| +| `ECSScopedOperations` | Create, delete, describe, and tag ECS clusters, capacity providers, and tasks. Scoped to `seqera-sched-*` resources. | +| `ECSUnscopedOperations` | Register, deregister, list, and describe ECS task definitions. ECS task definition APIs do not support resource-level permissions. | +| `IAMRoleManagement` | Create, update, and delete IAM roles and instance profiles scoped to `seqera-sched-*`. Seqera creates four role types on first use: execution role, infrastructure role, per-cluster instance role, and per-cluster task role. | +| `PassRoleToECS` | Pass `seqera-sched-*` and `TowerForge-*` roles to ECS, ECS tasks, and EC2. Required to attach roles to ECS infrastructure and task definitions. | +| `ServiceLinkedRoles` | Create service-linked roles for ECS, autoscaling, and Spot. Required only if these roles do not already exist in your account. | +| `CloudWatchLogs` | Create and manage log groups under `/seqera/sched`, and read log events. Task stdout and stderr are written to CloudWatch. | +| `EC2NetworkDiscovery` | Describe VPCs, subnets, security groups, and route tables. Create security groups and VPC endpoints. Used for VPC auto-discovery and network setup. | +| `ECRAccess` | Authorize ECR and pull container images. ECS tasks pull images from ECR. | +| `S3Access` | Read objects and list buckets. Used to read Fusion trace files and pipeline work directory content. | +| `ASGEC2Operations` | Describe instance types and create or delete EC2 launch templates. Required only for Auto Scaling Group-backed clusters. | +| `ASGManagement` | Create, update, and delete Auto Scaling Groups scoped to `seqera-sched-*`. Required only for Auto Scaling Group-backed clusters. | +| `ASGDescribe` | Describe Auto Scaling Groups. Required only for Auto Scaling Group-backed clusters. | +| `SSMECSOptimizedAmi` | Read the ECS-optimized AMI ID from SSM Parameter Store. Used to look up the latest Amazon Linux 2023 ECS-optimized AMI. | +| `CostExplorer` | Query `ce:GetCostAndUsage`. Used to display cost predictions at pipeline launch. If this permission is absent, cost predictions do not appear and Seqera shows no error. | + +**Conditional statements:** +- `ASGEC2Operations`, `ASGManagement`, and `ASGDescribe` are required only if Auto Scaling Group-backed clusters are enabled. You can omit them for Managed Instances deployments. +- `ServiceLinkedRoles` is required only if the listed service-linked roles do not already exist in your AWS account. +- `CostExplorer` is required only if you want cost predictions shown at pipeline launch. + +### Create the additional IAM policy + +1. Open the [AWS IAM console](https://console.aws.amazon.com/iam). +1. Select **Policies** under **Access management**, then select **Create policy**. +1. Select the **JSON** tab and paste the Intelligent Compute policy. +1. Select **Next**, enter a name (for example, `SeqeraIntelligentComputePolicy`), then select **Create policy**. +1. Attach the policy to the same IAM user or role that Seqera uses for your AWS Cloud compute environment. + +## Set up an AWS Cloud compute environment with Intelligent Compute + +:::info[**Prerequisites**] + +You need the following: + +- Intelligent Compute enabled for your organization by Seqera. Contact your account manager to request access. +- AWS credentials with both the standard AWS Cloud permissions and the Intelligent Compute permissions attached. + +::: + +1. In your Seqera workspace, select **Compute Environments**, then **Add compute environment**. +1. Enter a name and select **AWS Cloud** as the platform. +1. Select your AWS credentials. +1. Select the **Region** where Seqera provisions the ECS cluster. +1. Enter a **Work directory** (S3 URI, for example `s3://my-bucket/work`). +1. Under **Intelligent Compute**, enable the **Seqera Intelligent Compute** toggle. +1. Configure the [Intelligent Compute options](#configuration-options) as needed. +1. Select **Add**. + +Seqera validates credentials and configuration on save. On first use, it provisions the required IAM roles and ECS cluster in your account. This provisioning is automatic and requires no further steps. + +## Resource metrics + +The **Metrics** tab for a run on Intelligent Compute shows three resource values for CPU and memory: **Requested**, **Allocated**, and **Used**. + +| Metric | Source | What it represents | +|--------|--------|-------------------| +| **Requested** | Pipeline `process` directives | The CPU and memory your pipeline asked for, as written in your `process` directives (for example, `cpus = 4`, `memory = 8 GB`). | +| **Allocated** | Scheduler decision | The CPU and memory the scheduler assigned to the task container. Intelligent Compute may assign values different from what was requested. It selects the most cost-effective instance shape that satisfies the task's requirements. | +| **Used** | Nextflow trace data | The CPU and memory the task actually consumed, measured from Nextflow's trace metrics (`pcpu` × `realtime` for CPU, `peakRss` for memory). Requires Fusion to be enabled. Absent for tasks that did not produce trace data. | + +**How to read the numbers:** + +- If **Requested** is much higher than **Allocated**, the scheduler found a more efficient instance shape than your directives implied. +- If **Allocated** is much higher than **Used**, the task ran with idle headroom. +- If **Used** is close to **Allocated**, resource utilization is near-optimal for that task. + +## Configuration options + +| Option | Values | Default | Description | +|--------|--------|---------|-------------| +| **Seqera Intelligent Compute** | Enabled / Disabled | Disabled | Enables the Intelligent Compute scheduler for this compute environment. This option only appears if Intelligent Compute is enabled for your organization. | +| **Provisioning model** | `spotFirst`, `spot`, `ondemand` | `spotFirst` | Instance procurement strategy. `spotFirst` uses Spot instances and falls back to On-Demand if Spot capacity is unavailable. `spot` uses Spot instances only. `ondemand` uses On-Demand instances only. | +| **Instance types** | Comma-separated EC2 instance type identifiers (for example, `m5.xlarge, c5.2xlarge`) | Empty | Restricts which instance types the scheduler can select. When empty, the scheduler selects the most cost-effective type for each task automatically. Specifying types here overrides automatic selection. | + +## Task and run statuses + +Intelligent Compute reports a status for each task and for the run as a whole. Use these statuses to track progress and diagnose failures. + +### Task statuses + +| Status | Description | +|--------|-------------| +| SUBMITTED | Task is queued or submitted to the compute backend. | +| RUNNING | Task is actively executing on a compute instance. | +| SUCCEEDED | Task completed with exit code 0. | +| FAILED | Task failed. This covers non-retriable execution failures (non-zero exit code, container startup errors) and spot quota exhaustion after retries are exhausted. | +| CANCELLED | Task was cancelled by the user. | +| PREEMPTED | The Spot instance running this task was reclaimed by AWS. The scheduler retries the task automatically. If the retry limit is reached, the task transitions to `FAILED`. | +| UNSCHEDULABLE | No instance type could satisfy the task's placement constraints. This occurs when the requested resources exceed what any available instance type can provide, or when specified instance types are unavailable in the region. Check your **Instance types** configuration and the `cpus` and `memory` directives in the failing process. | +| UNKNOWN | Task status could not be determined, typically due to a transient backend failure. | + +### Run statuses + +| Status | Description | +|--------|-------------| +| ACTIVE | The run is in progress. | +| TERMINATING | The run is shutting down. Final tasks are completing or being cancelled. | +| TERMINATED | The run ended normally. | +| FAILED | The run failed. | +| DANGLING | The Nextflow process stopped sending heartbeats. This typically means the launcher process crashed or lost connectivity. Tasks already dispatched to ECS may still be running. Check CloudWatch logs under `/seqera/sched` for details. | diff --git a/platform-enterprise_docs/compute-envs/google-cloud-batch.md b/platform-enterprise_docs/compute-envs/google-cloud-batch.md index 9f89ce681..f58311f98 100644 --- a/platform-enterprise_docs/compute-envs/google-cloud-batch.md +++ b/platform-enterprise_docs/compute-envs/google-cloud-batch.md @@ -107,6 +107,15 @@ Workload Identity Federation (WIF) is the recommended authentication method for 3. Set the Allowed audiences. If left empty, GCP derives a default audience from the provider resource path in the format `//iam.googleapis.com/projects/{PROJECT}/locations/global/workloadIdentityPools/{POOL}/providers/{PROVIDER}`. If you specify a custom value, it must match exactly what you enter in the Token audience field when creating the Google WIF credential in Seqera. 4. Define an attribute mapping and condition. At a minimum set `google.subject=assertion.sub`. This maps the subject claim from Seqera's JWT to GCP's identity space. For more information see [here](https://docs.cloud.google.com/iam/docs/workload-identity-federation-with-other-providers#mappings-and-conditions) 5. Grant `roles/iam.workloadIdentityUser` on the service account created above to the Workload Identity Pool principal. This can be set for all pool identities or for a specific workspace. +6. If you use the same WIF credential for Data Explorer, grant `roles/iam.serviceAccountTokenCreator` on the service account to itself: + + ```bash + gcloud iam service-accounts add-iam-policy-binding SA_EMAIL \ + --member="serviceAccount:SA_EMAIL" \ + --role="roles/iam.serviceAccountTokenCreator" + ``` + + Replace `SA_EMAIL` with the service account email. Without this role, viewing or downloading file contents in Data Explorer fails with a signing error. Running pipelines is not affected. WIF requires an OIDC signing key and for Seqera Platform's OIDC provider to be configured. See [Cryptographic options](https://docs.seqera.io/platform-enterprise/enterprise/configuration/overview#cryptographic-options).