Add Flower alongside live environment Celery workers#1617
Conversation
Co-authored-by: Davi Nakano <114549747+davinotdavid@users.noreply.github.com>
| backend: | ||
| image: "${{ steps.pulumi-tag-extract.outputs.pulumi_tag }}" | ||
| EOF | ||
| echo ".apmt_image: &APMT_IMAGE ${{ steps.pulumi-tag-extract.outputs.pulumi_tag }}" > newimage.yaml |
There was a problem hiding this comment.
This is simpler now due to the DRYing out of the task definitions.
| 'urn:pulumi:prod::appointment::tb:fargate:FargateClusterWithLogging$aws:ecs/taskDefinition:TaskDefinition::appointment-prod-fargate-backend-taskdef' \ | ||
| pulumi up -y --diff \ | ||
| --target 'urn:pulumi:stage::appointment::tb:fargate:FargateClusterWithLogging$aws:ecs/taskDefinition:TaskDefinition::appointment-stage-fargate-backend-taskdef' \ | ||
| --target 'urn:pulumi:stage::appointment::tb:fargate:AutoscalingFargateCluster::appointment-stage-afc-appointment' \ |
There was a problem hiding this comment.
This makes sure that images get deployed to both clusters when we do a release.
| ### Special variables used throughout this file | ||
|
|
||
| # Update this value to update all containers based on the thunderbird/appointment image | ||
| .apmt_image: &APMT_IMAGE 768512802988.dkr.ecr.eu-central-1.amazonaws.com/thunderbird/appointment:7de6f16bdd309937caa186c8f5a269ea00118e5e |
There was a problem hiding this comment.
This is the variable referenced in the workflow changes. This gets used in the task definition below, and that task definition gets used for all three services. So you change this in one place and it goes out to All The Things.
|
|
||
| tb:cloudwatch:LogDestination: | ||
| appointment: | ||
| org_name: tb |
There was a problem hiding this comment.
This creates a privacy policy compliant CloudWatch Log Group called /tb/prod/appointment that these containers will now produce logs in.
| # protocol: tcp | ||
| # from_port: 6379 | ||
| # to_port: 6379 | ||
| # source_security_group_id: |
There was a problem hiding this comment.
This is where I will add the security groups to grant access to Redis later on.
| fargate_task_role_arns: | ||
| - arn:aws:iam::768512802988:role/appointment-prod-fargate-backend | ||
| - arn:aws:iam::768512802988:role/appointment-prod-afc-appointment-celery | ||
| - arn:aws:iam::768512802988:role/appointment-prod-afc-appointment-flower |
There was a problem hiding this comment.
These represent new permissions needed for the CI process to deploy to the new cluster and tasks.
| @@ -1,3 +1,3 @@ | |||
| tb_pulumi @ git+https://github.com/thunderbird/pulumi.git@v0.0.16 | |||
| tb_pulumi @ git+https://github.com/thunderbird/pulumi.git@v0.0.18 | |||
There was a problem hiding this comment.
This contains many fixes to both the LogDestination class and AutoscalingFargateCluster class that we need for these new resources to come out right. Ref: https://github.com/thunderbird/pulumi/blob/main/CHANGELOG.md
| for rule in backend_cache_sg_ingress_rules: | ||
| rule['source_security_group_id'] = container_sgs.get('backend').resources.get('sg').id | ||
| if 'source_security_group_id' not in rule and 'cidr_blocks' not in rule: | ||
| rule['source_security_group_id'] = container_sgs.get('backend').resources.get('sg').id |
There was a problem hiding this comment.
Normally, the way we use these SGs in code is to automatically link up the Appointment backend container to Redis. If we want to link up any other source, we would need to specify that in code or config somewhere. Rather than bog this code down in lots of conditions based on expected strings in the config, I've changed this to allow us to specify a source in config, and to fall back on the Appointment backend container if no more explicit source is defined.
| celery-flower: | ||
| <<: *backend | ||
| ports: | ||
| - 5556:5555 |
There was a problem hiding this comment.
Davi requested this port exposure change so that this does not conflict with a simultaneously running Flower container for a local dev instance of Accounts.
davinotdavid
left a comment
There was a problem hiding this comment.
Overall lgtm with a few comments / questions / double checking but I'd love to have an input from other infra folks as I am not as familiar with the devops intricacies!
| - *VAR_ZOOM_API_ENABLED | ||
| - *VAR_ZOOM_API_NEW_APP | ||
| - name: CONTAINER_ROLE | ||
| value: celery |
There was a problem hiding this comment.
Same here, shouldn't this be beat / worker ? If so, I wonder if we need 2 sets of this configs, one for each?
There was a problem hiding this comment.
I just fixed "celery" to "worker". As far as I'm aware, the beat container worker doesn't need to be deployed to live environments. That will be ultimately removed in favor of real tasks, right?
There was a problem hiding this comment.
There's going to be a task in another PR that needs to be run periodically every week in production as well so perhaps we still need the beat container there too? Not sure if we need a separate container though as @Sancus added something called celery-redbeat to Accounts with a single container (ref thunderbird/thunderbird-accounts#696) so maybe that's also an option
There was a problem hiding this comment.
I had not heard of redbeat, so I went and found its docs. It looks like you just configure your normal Celery container with this and a Redis key to use for its distributed lock. Then you define tasks as part of your Celery config. So it sounds to me like we wouldn't need a separate container to emit these events on a schedule.
| - *VAR_ZOOM_API_ENABLED | ||
| - *VAR_ZOOM_API_NEW_APP | ||
| - name: CONTAINER_ROLE | ||
| value: celery |
There was a problem hiding this comment.
Same here, just double checking!
There was a problem hiding this comment.
I fixed this in all three files just now.
|
Merged, will keep an eye on the workflows and follow through with the prod work now. |
This PR adds Celery workers to our live environments alongside Flower for worker visibility.
What's in the PR
CONTAINER_ROLEvariable value: "api" to indicate the regular backend as opposed to Celery, Flower, or something else.What's Not in the PR
I have intentionally left the new prod Celery and Flower services scaled down to zero instances. This is because they will not work if brought online. And that is because we have to allow these containers access to the Neon DB PrivateLink security group (which is manual), and to the Redis cluster (which will be codified after the prod security groups have been created).
So there will be a small future PR coming after this is fully deployed.
Relevant Tickets