docs: Intelligent Compute page for AWS Cloud (preview)#1565
docs: Intelligent Compute page for AWS Cloud (preview)#1565MichaelTansiniSeqera wants to merge 4 commits into
Conversation
Adds a new standalone page covering the Seqera Intelligent Compute preview feature for AWS Cloud compute environments, including IAM permissions, setup steps, and configuration options. Updates the Cloud sidebar to include the new page after aws-cloud. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
✅ Deploy Preview for seqera-docs ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
Signed-off-by: MichaelTansiniSeqera <michael.tansini@seqera.io>
- Remove <details> collapse from IAM policy — policy is now always visible and copy-pasteable - Add Resource metrics section explaining Requested/Allocated/Used and how to interpret the gap between them - Add Task and run statuses reference table for troubleshooting Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…rer on GCP When WIF credentials are used for Data Explorer, Platform has no embedded private key and must call the GCP IAM signBlob API to generate presigned URLs. roles/iam.serviceAccountTokenCreator on the SA itself is required for this to succeed. Without it, file viewing and download fail silently with a signing error. Running pipelines is unaffected. Updates both Cloud and Enterprise Google Cloud Batch docs. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
| | `TERMINATING` | The run is shutting down — final tasks are completing or being cancelled. | | ||
| | `TERMINATED` | The run ended normally. | | ||
| | `FAILED` | The run failed. | | ||
| | `DANGLING` | The Nextflow process stopped sending heartbeats. This typically means the launcher process crashed or lost connectivity. Tasks already dispatched to ECS may still be running. Check CloudWatch logs under `/seqera/sched` for details. | |
There was a problem hiding this comment.
Re: "Are these run statuses accurate?" + "how long should a user wait?" — verified the 5 run statuses against the scheduler code (RunStatus.java); all match: ACTIVE, TERMINATING, TERMINATED, FAILED, DANGLING. ✅
DANGLING needs more prescriptive wording though. From the code:
- A run is marked
DANGLINGonly after 1 hour without a heartbeat (sched.cron.cleanup.session.stale-timeout: 1h). So the 1h grace is already built in — it's not a transient blip. DANGLINGis recoverable: if the launcher reconnects, the run automatically returns toACTIVE. The next request from the launcher triggers it.- It never auto-transitions to
FAILED— it staysDANGLINGuntil it either recovers or the record is purged at 90-day retention. SoDANGLINGitself is the signal to act; there's no further state to wait for.
Suggested wording:
A run is marked
DANGLINGafter 1 hour without a heartbeat from the Nextflow launcher — typically because the launcher process crashed or lost connectivity. If the launcher reconnects, the run automatically returns toACTIVE. If it staysDANGLING, the launcher has not recovered: inspect the Nextflow launcher logs and re-launch the pipeline (the run will not resume on its own). Tasks already dispatched to ECS may still be running; check CloudWatch logs under/seqera/schedfor details.
| | `SUCCEEDED` | Task completed with exit code 0. | | ||
| | `FAILED` | Task failed. This covers non-retriable execution failures (non-zero exit code, container startup errors) and spot quota exhaustion after retries are exhausted. | | ||
| | `CANCELLED` | Task was cancelled by the user. | | ||
| | `PREEMPTED` | The Spot instance running this task was reclaimed by AWS. The scheduler retries the task automatically. If the retry limit is reached, the task transitions to `FAILED`. | |
There was a problem hiding this comment.
PREEMPTED description is inaccurate. Verified against the scheduler code (TaskStatus.isTerminal() + the ECS/VM retry-exhaustion paths): PREEMPTED is itself a terminal status — it does not transition to FAILED.
Retries happen internally as attempts while the task stays running; when spot attempts (and the one-shot on-demand fallback for spotFirst) are all exhausted, the terminal status stays PREEMPTED (error cause SPOT_INTERRUPTION_EXHAUSTED). The ECS code documents this contract explicitly.
Suggested wording:
The Spot instance running this task was reclaimed by AWS. The scheduler retries the task automatically on Spot, then falls back to On-Demand for
spotFirst. If all retries are exhausted, the task ends asPREEMPTED.
| | `SUBMITTED` | Task is queued or submitted to the compute backend. | | ||
| | `RUNNING` | Task is actively executing on a compute instance. | | ||
| | `SUCCEEDED` | Task completed with exit code 0. | | ||
| | `FAILED` | Task failed. This covers non-retriable execution failures (non-zero exit code, container startup errors) and spot quota exhaustion after retries are exhausted. | |
There was a problem hiding this comment.
The "spot quota exhaustion after retries are exhausted" clause is misleading and should be dropped. In the code the spot-quota path (SPOT_QUOTA) is non-terminal — it always retries on the same cluster after a cooldown and never exhausts into FAILED. It may briefly surface as FAILED between attempts, but that's a transient per-attempt blip, not a terminal outcome.
The rest is accurate. Suggested:
Task failed. This covers non-retriable execution failures such as a non-zero exit code or container startup errors.
Summary
intelligent-compute.mdxpage underplatform-cloud/docs/compute-envs/provisioningModel,machineTypes)cloud-sidebar.jsonto insert the page afteraws-cloudStatus
WIP — pending editorial review and internal sign-off before publish.
Checklist
🤖 Generated with Claude Code