Skip to content

docs(studios): add S3 versioning guidance for checkpoint storage costs#1447

Open
ejseqera wants to merge 7 commits into
masterfrom
docs/studios-checkpoint-s3-versioning-guidance
Open

docs(studios): add S3 versioning guidance for checkpoint storage costs#1447
ejseqera wants to merge 7 commits into
masterfrom
docs/studios-checkpoint-s3-versioning-guidance

Conversation

@ejseqera

@ejseqera ejseqera commented May 19, 2026

Copy link
Copy Markdown
Member

Summary

  • Studios writes a checkpoint to the same S3 key every five minutes. With S3 versioning enabled on the work bucket, each write creates a new object version rather than an overwrite — up to 96 non-current versions per day per active session — which can cause significant unexpected storage costs.
  • Adds a new ### S3 versioning and checkpoint storage costs subsection under the existing checkpoint section explaining the interaction and providing actionable remediation steps.
  • Applied to both platform-cloud and platform-enterprise docs.

Test plan

  • Preview renders correctly on both enterprise and cloud docs
  • JSON and shell code blocks render without errors
  • Links and cross-references in surrounding sections are unaffected
  • Pre-commit passes (verified locally)

🤖 Generated with Claude Code

ejseqera added 2 commits May 19, 2026 17:28
Studios writes a checkpoint every five minutes to the same S3 key. When
S3 versioning is enabled on the work bucket, each write creates a new
object version rather than an overwrite, producing up to 96 non-current
versions per day per active session.

Add a new subsection under "Studio session checkpoints" that:
- Explains the versioning interaction and its cost implications
- Recommends an S3 Lifecycle rule (NoncurrentVersionExpiration: 1 day)
  with a ready-to-use JSON policy block and aws s3api CLI command
- Provides a bulk-delete shell command for clearing existing accumulated
  non-current versions
- Clarifies that non-current versions are safe to delete, while the
  current version and checkpoint directories must not be removed

Changes applied to both platform-cloud and platform-enterprise docs.
@ejseqera ejseqera requested a review from gwright99 May 19, 2026 21:31
@netlify

netlify Bot commented May 20, 2026

Copy link
Copy Markdown

Deploy Preview for seqera-docs ready!

Name Link
🔨 Latest commit dd80089
🔍 Latest deploy log https://app.netlify.com/projects/seqera-docs/deploys/6a29304f295ae600081ccf28
😎 Deploy Preview https://deploy-preview-1447--seqera-docs.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@justinegeffen justinegeffen requested a review from t0randr May 20, 2026 19:01
@justinegeffen justinegeffen added the 1. Dev/PM/SME Needs a review by a Dev/PM/SME label May 20, 2026
@justinegeffen justinegeffen added the 2. Edu reviews complete Reviews complete. Remove label when confirmed in prod. label May 21, 2026
@justinegeffen justinegeffen requested a review from robnewman May 22, 2026 14:25
@robnewman

robnewman commented May 26, 2026

Copy link
Copy Markdown
Member

@ejseqera Isn't this a generic issue across any cloud provider that provides object storage versioning?
(e.g. Azure, GCP, OCI).

I don't think we should limit this to just S3.

@robnewman robnewman left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should not be limited to S3.

Checkpoints vary in size depending on libraries installed in your session environment. This can potentially result in many large files stored in the compute environment's pipeline work directory and saved to cloud storage. This storage will incur costs based on the cloud provider. Due to the architecture of Studios, you cannot delete any checkpoint files to save on storage costs. Deleting a Studio session's checkpoints will result in a corrupted Studio session that cannot be started nor recovered.
:::

### S3 versioning and checkpoint storage costs

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a generic problem across any cloud provider that supports object versioning, which is all of them. Don't limit to just S3

Comment thread platform-cloud/docs/studios/managing.md Outdated
Comment thread platform-cloud/docs/studios/managing.md Outdated
Comment thread platform-cloud/docs/studios/managing.md Outdated
**Recommended mitigation:** Apply an S3 Lifecycle rule to expire non-current object versions on the `.studios/checkpoints/` prefix. A one-day expiry retains the current version while removing intermediate five-minute writes. You can also delete existing accumulated non-current versions manually using your cloud provider's console or CLI.

:::note
Non-current object versions (intermediate checkpoint writes) are safe to delete. Do **not** delete the current (latest) version of any checkpoint file or the checkpoint directory itself — doing so will corrupt the Studio session and it cannot be recovered.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Won't this impact the "Start as new" functionality?

Checkpoints vary in size depending on libraries installed in your session environment. This can potentially result in many large files stored in the compute environment's pipeline work directory and saved to cloud storage. This storage will incur costs based on the cloud provider. Due to the architecture of Studios, you cannot delete any checkpoint files to save on storage costs. Deleting a Studio session's checkpoints will result in a corrupted Studio session that cannot be started nor recovered.
:::

### S3 versioning and checkpoint storage costs

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same problems as the above Cloud docs.

justinegeffen and others added 4 commits June 10, 2026 11:35
Co-authored-by: Rob Newman <61608+robnewman@users.noreply.github.com>
Signed-off-by: Justine Geffen <justinegeffen@users.noreply.github.com>
Co-authored-by: Rob Newman <61608+robnewman@users.noreply.github.com>
Signed-off-by: Justine Geffen <justinegeffen@users.noreply.github.com>
Co-authored-by: Rob Newman <61608+robnewman@users.noreply.github.com>
Signed-off-by: Justine Geffen <justinegeffen@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

1. Dev/PM/SME Needs a review by a Dev/PM/SME 2. Edu reviews complete Reviews complete. Remove label when confirmed in prod.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants