Skip to content

Add Flower alongside live environment Celery workers#1617

Merged
ryanjjung merged 16 commits intomainfrom
celery-infra
Apr 15, 2026
Merged

Add Flower alongside live environment Celery workers#1617
ryanjjung merged 16 commits intomainfrom
celery-infra

Conversation

@ryanjjung
Copy link
Copy Markdown
Contributor

@ryanjjung ryanjjung commented Apr 8, 2026

This PR adds Celery workers to our live environments alongside Flower for worker visibility.

What's in the PR

  • Flower added to local dev configuration
  • Added explicit CONTAINER_ROLE variable value: "api" to indicate the regular backend as opposed to Celery, Flower, or something else.
  • Built new AutoscalingFargateCluster with Celery and Flower running in dev, stage, and prod.
  • Restructured all of our Pulumi configurations regarding task and container definitions to be more "DRY". This is because we now have three task definitions (which tend to be large/unwieldy) that are all 99% the same. So now we have variables that get reused across task/container defs that themselves get reused in our cluster configs.
  • Allow the Celery and Flower containers in dev and stage to talk to the Redis cluster.

What's Not in the PR

I have intentionally left the new prod Celery and Flower services scaled down to zero instances. This is because they will not work if brought online. And that is because we have to allow these containers access to the Neon DB PrivateLink security group (which is manual), and to the Redis cluster (which will be codified after the prod security groups have been created).

So there will be a small future PR coming after this is fully deployed.

Relevant Tickets

@ryanjjung ryanjjung self-assigned this Apr 8, 2026
Comment thread docker-compose.yml Outdated
backend:
image: "${{ steps.pulumi-tag-extract.outputs.pulumi_tag }}"
EOF
echo ".apmt_image: &APMT_IMAGE ${{ steps.pulumi-tag-extract.outputs.pulumi_tag }}" > newimage.yaml
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is simpler now due to the DRYing out of the task definitions.

Comment thread .github/workflows/deploy-production.yml Outdated
'urn:pulumi:prod::appointment::tb:fargate:FargateClusterWithLogging$aws:ecs/taskDefinition:TaskDefinition::appointment-prod-fargate-backend-taskdef' \
pulumi up -y --diff \
--target 'urn:pulumi:stage::appointment::tb:fargate:FargateClusterWithLogging$aws:ecs/taskDefinition:TaskDefinition::appointment-stage-fargate-backend-taskdef' \
--target 'urn:pulumi:stage::appointment::tb:fargate:AutoscalingFargateCluster::appointment-stage-afc-appointment' \
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes sure that images get deployed to both clusters when we do a release.

Comment thread pulumi/config.prod.yaml
### Special variables used throughout this file

# Update this value to update all containers based on the thunderbird/appointment image
.apmt_image: &APMT_IMAGE 768512802988.dkr.ecr.eu-central-1.amazonaws.com/thunderbird/appointment:7de6f16bdd309937caa186c8f5a269ea00118e5e
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the variable referenced in the workflow changes. This gets used in the task definition below, and that task definition gets used for all three services. So you change this in one place and it goes out to All The Things.

Comment thread pulumi/config.prod.yaml

tb:cloudwatch:LogDestination:
appointment:
org_name: tb
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This creates a privacy policy compliant CloudWatch Log Group called /tb/prod/appointment that these containers will now produce logs in.

Comment thread pulumi/config.prod.yaml
# protocol: tcp
# from_port: 6379
# to_port: 6379
# source_security_group_id:
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is where I will add the security groups to grant access to Redis later on.

Comment thread pulumi/config.prod.yaml
fargate_task_role_arns:
- arn:aws:iam::768512802988:role/appointment-prod-fargate-backend
- arn:aws:iam::768512802988:role/appointment-prod-afc-appointment-celery
- arn:aws:iam::768512802988:role/appointment-prod-afc-appointment-flower
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These represent new permissions needed for the CI process to deploy to the new cluster and tasks.

Comment thread pulumi/requirements.txt
@@ -1,3 +1,3 @@
tb_pulumi @ git+https://github.com/thunderbird/pulumi.git@v0.0.16
tb_pulumi @ git+https://github.com/thunderbird/pulumi.git@v0.0.18
Copy link
Copy Markdown
Contributor Author

@ryanjjung ryanjjung Apr 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This contains many fixes to both the LogDestination class and AutoscalingFargateCluster class that we need for these new resources to come out right. Ref: https://github.com/thunderbird/pulumi/blob/main/CHANGELOG.md

Comment thread pulumi/security_groups.py
for rule in backend_cache_sg_ingress_rules:
rule['source_security_group_id'] = container_sgs.get('backend').resources.get('sg').id
if 'source_security_group_id' not in rule and 'cidr_blocks' not in rule:
rule['source_security_group_id'] = container_sgs.get('backend').resources.get('sg').id
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Normally, the way we use these SGs in code is to automatically link up the Appointment backend container to Redis. If we want to link up any other source, we would need to specify that in code or config somewhere. Rather than bog this code down in lots of conditions based on expected strings in the config, I've changed this to allow us to specify a source in config, and to fall back on the Appointment backend container if no more explicit source is defined.

Comment thread docker-compose.yml
celery-flower:
<<: *backend
ports:
- 5556:5555
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Davi requested this port exposure change so that this does not conflict with a simultaneously running Flower container for a local dev instance of Accounts.

@ryanjjung ryanjjung marked this pull request as ready for review April 13, 2026 18:57
Copy link
Copy Markdown
Contributor

@davinotdavid davinotdavid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall lgtm with a few comments / questions / double checking but I'd love to have an input from other infra folks as I am not as familiar with the devops intricacies!

Comment thread .github/workflows/deploy-production.yml Outdated
Comment thread docker-compose.yml
Comment thread pulumi/config.stage.yaml Outdated
- *VAR_ZOOM_API_ENABLED
- *VAR_ZOOM_API_NEW_APP
- name: CONTAINER_ROLE
value: celery
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, shouldn't this be beat / worker ? If so, I wonder if we need 2 sets of this configs, one for each?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just fixed "celery" to "worker". As far as I'm aware, the beat container worker doesn't need to be deployed to live environments. That will be ultimately removed in favor of real tasks, right?

Copy link
Copy Markdown
Contributor

@davinotdavid davinotdavid Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's going to be a task in another PR that needs to be run periodically every week in production as well so perhaps we still need the beat container there too? Not sure if we need a separate container though as @Sancus added something called celery-redbeat to Accounts with a single container (ref thunderbird/thunderbird-accounts#696) so maybe that's also an option

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had not heard of redbeat, so I went and found its docs. It looks like you just configure your normal Celery container with this and a Redis key to use for its distributed lock. Then you define tasks as part of your Celery config. So it sounds to me like we wouldn't need a separate container to emit these events on a schedule.

Comment thread pulumi/config.prod.yaml Outdated
- *VAR_ZOOM_API_ENABLED
- *VAR_ZOOM_API_NEW_APP
- name: CONTAINER_ROLE
value: celery
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, just double checking!

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I fixed this in all three files just now.

@ryanjjung ryanjjung requested a review from davinotdavid April 13, 2026 22:26
Copy link
Copy Markdown
Contributor

@aatchison aatchison left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Go for it!

@ryanjjung ryanjjung merged commit 2a8e57c into main Apr 15, 2026
8 checks passed
@ryanjjung ryanjjung deleted the celery-infra branch April 15, 2026 14:28
@ryanjjung
Copy link
Copy Markdown
Contributor Author

Merged, will keep an eye on the workflows and follow through with the prod work now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants