Skip to content

feat(chart): opt-in Argo Rollouts (blue-green) + pre-deploy migration Job#3653

Open
nicacioliveira wants to merge 1 commit into
mainfrom
feat/chart-argo-rollouts-bluegreen
Open

feat(chart): opt-in Argo Rollouts (blue-green) + pre-deploy migration Job#3653
nicacioliveira wants to merge 1 commit into
mainfrom
feat/chart-argo-rollouts-bluegreen

Conversation

@nicacioliveira
Copy link
Copy Markdown
Contributor

@nicacioliveira nicacioliveira commented Jun 2, 2026

Summary

Two opt-in flags in the studio Helm chart so consumers can switch from the default Deployment to an Argo Rollouts Rollout (blue-green or canary), and move DB migrations out of pod startup into a dedicated pre-sync Job.

Both default to off — existing installs keep the exact same Deployment with the on-startup migration path. No selector / label changes, no PVC churn.

What's new

  • `argoRollouts.enabled` — render a `Rollout` (argoproj.io/v1alpha1) instead of `Deployment`. Pod template is shared via the new `chart-deco-studio.podTemplate` helper so the two workload kinds describe an identical pod surface. Supports `blueGreen` (default) and `canary` strategies.

  • `migrationJob.enabled` — render a Job that runs `bun run --cwd=apps/mesh migrate` ONCE before pods start. Carries BOTH `helm.sh/hook: pre-install,pre-upgrade` and `argocd.argoproj.io/hook: PreSync` annotations so it sequences correctly whether installed via `helm upgrade` directly or synced by ArgoCD. The runtime pod command gets `--skip-migrations` appended (the studio CLI already exposes this flag — see `apps/mesh/src/cli.ts`), eliminating the race between N replicas migrating concurrently and giving a clear pre-deploy gate: migration Job fails → release aborted.

Files

File Change
`templates/_pod-template.tpl` new — shared pod template helper + `podCommand` helper that appends `--skip-migrations` when migrationJob is on
`templates/deployment.yaml` wraps in `{{- if not argoRollouts.enabled }}` and references the helper; lifts `spec.template` body out
`templates/rollout.yaml` new — gated on `argoRollouts.enabled`, blueGreen by default; mutual-exclusion `fail` for both-on configs
`templates/migration-job.yaml` new — gated on `migrationJob.enabled`. Sync-wave -1, dual helm + ArgoCD hooks, bounded backoffLimit/activeDeadlineSeconds/TTL
`templates/service-preview.yaml` new — preview Service for blue-green; Argo Rollouts manages its selector
`values.yaml` adds `argoRollouts` and `migrationJob` blocks; both off by default

Why opt-in

The chart is open-source and not every consumer has the argo-rollouts controller installed. Defaulting to Deployment keeps zero requirements on the consumer's cluster. Internal CD (deco-apps-cd) flips both flags on for the deco-studio / deco-studio-stg releases — that's a separate change.

Important: migration discipline note

Blue-green amplifies the schema/code overlap window. The Job moves migrations to a single execution point BEFORE the new ReplicaSet probes, but it does NOT make destructive DDL safe — the old (blue) ReplicaSet still serves traffic during the overlap window with the migrated schema. Destructive changes (DROP/RENAME/type changes) still require expand-contract discipline at the migration code level. This is independent of the chart and is being handled team-side as a code/review practice.

Test plan

  • `helm lint deploy/helm/studio` passes
  • Default render (no flags) produces the same workload surface as before:
    • Deployment with command `bun run deco --no-local-mode`
    • No Rollout, no preview Service, no migration Job
  • `helm template ... --set argoRollouts.enabled=true --set migrationJob.enabled=true` produces:
    • Rollout with `blueGreen` strategy, `activeService: deco-studio`, `previewService: deco-studio-preview`, `scaleDownDelaySeconds: 30`, `autoPromotionEnabled: false`
    • Preview Service
    • Migration Job with PreSync hooks and `bun run --cwd=apps/mesh migrate` command
    • Pod command has `--skip-migrations` appended
  • Canary mode renders (`--set argoRollouts.strategy.blueGreen.enabled=false --set argoRollouts.strategy.canary.enabled=true`)
  • Reviewer: install into a test cluster, verify the Rollout transitions correctly through preview → promotion

Summary by cubic

Adds opt-in Argo Rollouts support (blue-green/canary) and a pre-deploy migration Job to the studio Helm chart. Both are off by default, so existing installs keep the same Deployment and on-startup migration behavior.

  • New Features

    • argoRollouts.enabled: renders a Rollout (argoproj.io/v1alpha1) instead of a Deployment. Blue-green by default with active/preview Services; canary supported; mutual-exclusion check if both are enabled. Shares the exact pod surface via chart-deco-studio.podTemplate.
    • migrationJob.enabled: runs bun run --cwd=apps/mesh migrate once as Helm pre-install/upgrade and ArgoCD PreSync hook. Runtime pod command appends --skip-migrations to avoid N-replica races.
  • Migration

    • Optional: set argoRollouts.enabled=true (requires argo-rollouts controller).
    • For blue-green, also set migrationJob.enabled=true to run migrations before pods start.
    • Use expand–contract discipline for destructive DB changes during rollout windows.

Written for commit b36722a. Summary will update on new commits.

Review in cubic

… Job

Adds two opt-in flags to the studio Helm chart so consumers can switch from
the default Deployment to an Argo Rollouts Rollout with blue-green strategy,
and move DB migrations out of pod startup into a dedicated pre-sync Job.

Both default to off — existing installs keep the exact same Deployment with
the on-startup migration path. No selector / label changes, no PVC churn.

## What's new

- `argoRollouts.enabled` — when true, render a `Rollout` (argoproj.io/v1alpha1)
  instead of the `Deployment`. Pod template is shared via the new
  `chart-deco-studio.podTemplate` helper so the two workload kinds describe an
  identical pod surface. Supports `blueGreen` (default) and `canary` strategies.

- `migrationJob.enabled` — when true, render a Job that runs
  `bun run --cwd=apps/mesh migrate` ONCE before pods start. Carries BOTH
  `helm.sh/hook: pre-install,pre-upgrade` and
  `argocd.argoproj.io/hook: PreSync` annotations so it sequences correctly
  whether installed via `helm upgrade` directly or synced by ArgoCD. The
  runtime pod command gets `--skip-migrations` appended (the studio CLI
  already exposes this flag — see `apps/mesh/src/cli.ts`), eliminating the
  race between N replicas migrating concurrently and giving a clear
  pre-deploy gate: migration Job fails → release aborted.

## New / modified files

- `templates/_pod-template.tpl` (new) — shared pod template helper + the
  `podCommand` helper that appends `--skip-migrations` when migrationJob is on.
- `templates/deployment.yaml` — now wraps in `{{- if not argoRollouts.enabled }}`
  and references the helper; lifts the entire `spec.template` body out.
- `templates/rollout.yaml` (new) — gated on `argoRollouts.enabled`, mirrors
  the Deployment via the same helper, picks blueGreen or canary based on
  values. Mutual-exclusion `fail` for both-on configs.
- `templates/migration-job.yaml` (new) — gated on `migrationJob.enabled`.
  Sync-wave -1, dual hooks, bounded backoffLimit/activeDeadlineSeconds/TTL.
- `templates/service-preview.yaml` (new) — rendered only for blue-green;
  Argo Rollouts manages its selector to point at the preview ReplicaSet.
- `values.yaml` — adds `argoRollouts` and `migrationJob` blocks; both off
  by default, defaults preserve current behavior.

## Why opt-in

The chart is open-source and not everyone has the argo-rollouts controller
installed. Defaulting to Deployment keeps zero requirements on the consumer's
cluster. Internal CD (deco-apps-cd) flips both flags on for the deco-studio /
deco-studio-stg releases — that's a separate change.

## Migration discipline note

Blue-green amplifies the schema/code overlap window. The Job moves migrations
to a single execution point BEFORE the new ReplicaSet probes, but it does NOT
make destructive DDL safe — the old (blue) ReplicaSet still serves traffic
during the overlap window with the migrated schema. Destructive changes
(DROP/RENAME/type changes) still require expand-contract discipline at the
migration code level. This is independent of the chart and is being handled
team-side as a code/review practice.

## Verification

- `helm lint deploy/helm/studio` passes
- `helm template deco-studio deploy/helm/studio` (default) renders identical
  workload surface to before — Deployment with `bun run deco --no-local-mode`,
  no Rollout, no preview Service, no migration Job
- `helm template ... --set argoRollouts.enabled=true --set migrationJob.enabled=true`
  renders Rollout with blueGreen, preview Service, migration Job with the
  PreSync hooks, and `--skip-migrations` appended to the pod command

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

5 issues found across 6 files

Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="deploy/helm/studio/templates/rollout.yaml">

<violation number="1" location="deploy/helm/studio/templates/rollout.yaml:69">
P2: Strategy selection silently falls back to blue-green when both strategy flags are false, producing inconsistent Rollout/Service manifests.</violation>
</file>

<file name="deploy/helm/studio/templates/migration-job.yaml">

<violation number="1" location="deploy/helm/studio/templates/migration-job.yaml:33">
P1: The migration hook runs before its ConfigMap/Secret dependencies exist, causing first install/sync failure when migrationJob is enabled.</violation>
</file>

<file name="deploy/helm/studio/templates/_pod-template.tpl">

<violation number="1" location="deploy/helm/studio/templates/_pod-template.tpl:23">
P2: Using a truthy check for `terminationGracePeriodSeconds` drops explicit `0` values, so the configured value may be ignored.</violation>

<violation number="2" location="deploy/helm/studio/templates/_pod-template.tpl:217">
P2: `--skip-migrations` is appended to any custom `image.command`, which can break pods when the command is not the `deco` CLI.</violation>
</file>

<file name="deploy/helm/studio/templates/deployment.yaml">

<violation number="1" location="deploy/helm/studio/templates/deployment.yaml:1">
P1: Enabling Argo Rollouts removes the Deployment, but HPA still targets Deployment, leaving autoscaling with an invalid target.</violation>
</file>

Reply with feedback, questions, or to request a fix.

Re-trigger cubic

labels:
{{- include "chart-deco-studio.labels" . | nindent 4 }}
annotations:
"helm.sh/hook": pre-install,pre-upgrade
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1: The migration hook runs before its ConfigMap/Secret dependencies exist, causing first install/sync failure when migrationJob is enabled.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At deploy/helm/studio/templates/migration-job.yaml, line 33:

<comment>The migration hook runs before its ConfigMap/Secret dependencies exist, causing first install/sync failure when migrationJob is enabled.</comment>

<file context>
@@ -0,0 +1,105 @@
+  labels:
+    {{- include "chart-deco-studio.labels" . | nindent 4 }}
+  annotations:
+    "helm.sh/hook": pre-install,pre-upgrade
+    "helm.sh/hook-weight": "-1"
+    "helm.sh/hook-delete-policy": before-hook-creation,hook-succeeded
</file context>

@@ -1,3 +1,4 @@
{{- if not (and .Values.argoRollouts .Values.argoRollouts.enabled) }}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1: Enabling Argo Rollouts removes the Deployment, but HPA still targets Deployment, leaving autoscaling with an invalid target.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At deploy/helm/studio/templates/deployment.yaml, line 1:

<comment>Enabling Argo Rollouts removes the Deployment, but HPA still targets Deployment, leaving autoscaling with an invalid target.</comment>

<file context>
@@ -1,3 +1,4 @@
+{{- if not (and .Values.argoRollouts .Values.argoRollouts.enabled) }}
 apiVersion: apps/v1
 kind: Deployment
</file context>

trafficRouting:
{{- toYaml . | nindent 8 }}
{{- end }}
{{- else }}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: Strategy selection silently falls back to blue-green when both strategy flags are false, producing inconsistent Rollout/Service manifests.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At deploy/helm/studio/templates/rollout.yaml, line 69:

<comment>Strategy selection silently falls back to blue-green when both strategy flags are false, producing inconsistent Rollout/Service manifests.</comment>

<file context>
@@ -0,0 +1,92 @@
+      trafficRouting:
+        {{- toYaml . | nindent 8 }}
+      {{- end }}
+    {{- else }}
+    blueGreen:
+      activeService: {{ default (include "chart-deco-studio.fullname" .) $bg.activeServiceName }}
</file context>

{{- toYaml . | nindent 4 }}
{{- end }}
spec:
{{- if .Values.terminationGracePeriodSeconds }}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: Using a truthy check for terminationGracePeriodSeconds drops explicit 0 values, so the configured value may be ignored.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At deploy/helm/studio/templates/_pod-template.tpl, line 23:

<comment>Using a truthy check for `terminationGracePeriodSeconds` drops explicit `0` values, so the configured value may be ignored.</comment>

<file context>
@@ -0,0 +1,220 @@
+    {{- toYaml . | nindent 4 }}
+    {{- end }}
+spec:
+  {{- if .Values.terminationGracePeriodSeconds }}
+  terminationGracePeriodSeconds: {{ .Values.terminationGracePeriodSeconds }}
+  {{- end }}
</file context>

{{- define "chart-deco-studio.podCommand" -}}
{{- $cmd := default (list "bun" "run" "deco" "--no-local-mode") .Values.image.command -}}
{{- if and .Values.migrationJob .Values.migrationJob.enabled -}}
{{- $cmd = append $cmd "--skip-migrations" -}}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: --skip-migrations is appended to any custom image.command, which can break pods when the command is not the deco CLI.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At deploy/helm/studio/templates/_pod-template.tpl, line 217:

<comment>`--skip-migrations` is appended to any custom `image.command`, which can break pods when the command is not the `deco` CLI.</comment>

<file context>
@@ -0,0 +1,220 @@
+{{- define "chart-deco-studio.podCommand" -}}
+{{- $cmd := default (list "bun" "run" "deco" "--no-local-mode") .Values.image.command -}}
+{{- if and .Values.migrationJob .Values.migrationJob.enabled -}}
+{{- $cmd = append $cmd "--skip-migrations" -}}
+{{- end -}}
+{{- toYaml $cmd -}}
</file context>

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant