Skip to content

docs: Intelligent Compute page for AWS Cloud (preview)#1565

Draft
MichaelTansiniSeqera wants to merge 4 commits into
masterfrom
docs/intelligent-compute-aws-preview
Draft

docs: Intelligent Compute page for AWS Cloud (preview)#1565
MichaelTansiniSeqera wants to merge 4 commits into
masterfrom
docs/intelligent-compute-aws-preview

Conversation

@MichaelTansiniSeqera

Copy link
Copy Markdown
Contributor

Summary

  • Adds a new standalone intelligent-compute.mdx page under platform-cloud/docs/compute-envs/
  • Covers: preview admonition, what IC is, IAM permissions (with policy JSON), AWS Cloud CE setup steps, and configuration options (provisioningModel, machineTypes)
  • Updates cloud-sidebar.json to insert the page after aws-cloud

Status

WIP — pending editorial review and internal sign-off before publish.

Checklist

  • Editorial review pass
  • Confirm IAM policy JSON is up to date with latest sched release
  • Confirm sidebar placement is correct
  • Add screenshot(s) once UI is final

🤖 Generated with Claude Code

Adds a new standalone page covering the Seqera Intelligent Compute
preview feature for AWS Cloud compute environments, including IAM
permissions, setup steps, and configuration options. Updates the
Cloud sidebar to include the new page after aws-cloud.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@netlify

netlify Bot commented Jun 12, 2026

Copy link
Copy Markdown

Deploy Preview for seqera-docs ready!

Name Link
🔨 Latest commit 9c443b4
🔍 Latest deploy log https://app.netlify.com/projects/seqera-docs/deploys/6a2c2da1b39745000853d078
😎 Deploy Preview https://deploy-preview-1565--seqera-docs.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

MichaelTansiniSeqera and others added 3 commits June 12, 2026 16:32
Signed-off-by: MichaelTansiniSeqera <michael.tansini@seqera.io>
- Remove <details> collapse from IAM policy — policy is now always visible and copy-pasteable
- Add Resource metrics section explaining Requested/Allocated/Used and how to interpret the gap between them
- Add Task and run statuses reference table for troubleshooting

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…rer on GCP

When WIF credentials are used for Data Explorer, Platform has no embedded
private key and must call the GCP IAM signBlob API to generate presigned
URLs. roles/iam.serviceAccountTokenCreator on the SA itself is required
for this to succeed. Without it, file viewing and download fail silently
with a signing error. Running pipelines is unaffected.

Updates both Cloud and Enterprise Google Cloud Batch docs.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
| `TERMINATING` | The run is shutting down — final tasks are completing or being cancelled. |
| `TERMINATED` | The run ended normally. |
| `FAILED` | The run failed. |
| `DANGLING` | The Nextflow process stopped sending heartbeats. This typically means the launcher process crashed or lost connectivity. Tasks already dispatched to ECS may still be running. Check CloudWatch logs under `/seqera/sched` for details. |

@jonmarti jonmarti Jun 12, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re: "Are these run statuses accurate?" + "how long should a user wait?" — verified the 5 run statuses against the scheduler code (RunStatus.java); all match: ACTIVE, TERMINATING, TERMINATED, FAILED, DANGLING. ✅

DANGLING needs more prescriptive wording though. From the code:

  • A run is marked DANGLING only after 1 hour without a heartbeat (sched.cron.cleanup.session.stale-timeout: 1h). So the 1h grace is already built in — it's not a transient blip.
  • DANGLING is recoverable: if the launcher reconnects, the run automatically returns to ACTIVE. The next request from the launcher triggers it.
  • It never auto-transitions to FAILED — it stays DANGLING until it either recovers or the record is purged at 90-day retention. So DANGLING itself is the signal to act; there's no further state to wait for.

Suggested wording:

A run is marked DANGLING after 1 hour without a heartbeat from the Nextflow launcher — typically because the launcher process crashed or lost connectivity. If the launcher reconnects, the run automatically returns to ACTIVE. If it stays DANGLING, the launcher has not recovered: inspect the Nextflow launcher logs and re-launch the pipeline (the run will not resume on its own). Tasks already dispatched to ECS may still be running; check CloudWatch logs under /seqera/sched for details.

| `SUCCEEDED` | Task completed with exit code 0. |
| `FAILED` | Task failed. This covers non-retriable execution failures (non-zero exit code, container startup errors) and spot quota exhaustion after retries are exhausted. |
| `CANCELLED` | Task was cancelled by the user. |
| `PREEMPTED` | The Spot instance running this task was reclaimed by AWS. The scheduler retries the task automatically. If the retry limit is reached, the task transitions to `FAILED`. |

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PREEMPTED description is inaccurate. Verified against the scheduler code (TaskStatus.isTerminal() + the ECS/VM retry-exhaustion paths): PREEMPTED is itself a terminal status — it does not transition to FAILED.

Retries happen internally as attempts while the task stays running; when spot attempts (and the one-shot on-demand fallback for spotFirst) are all exhausted, the terminal status stays PREEMPTED (error cause SPOT_INTERRUPTION_EXHAUSTED). The ECS code documents this contract explicitly.

Suggested wording:

The Spot instance running this task was reclaimed by AWS. The scheduler retries the task automatically on Spot, then falls back to On-Demand for spotFirst. If all retries are exhausted, the task ends as PREEMPTED.

| `SUBMITTED` | Task is queued or submitted to the compute backend. |
| `RUNNING` | Task is actively executing on a compute instance. |
| `SUCCEEDED` | Task completed with exit code 0. |
| `FAILED` | Task failed. This covers non-retriable execution failures (non-zero exit code, container startup errors) and spot quota exhaustion after retries are exhausted. |

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The "spot quota exhaustion after retries are exhausted" clause is misleading and should be dropped. In the code the spot-quota path (SPOT_QUOTA) is non-terminal — it always retries on the same cluster after a cooldown and never exhausts into FAILED. It may briefly surface as FAILED between attempts, but that's a transient per-attempt blip, not a terminal outcome.

The rest is accurate. Suggested:

Task failed. This covers non-retriable execution failures such as a non-zero exit code or container startup errors.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants