Skip to content

tiproxy: add annotation tiproxy-graceful-shutdown-delete-delay-seconds to remove label before deleting pods#6829

Open
YangKeao wants to merge 1 commit intopingcap:mainfrom
YangKeao:feature/annotation-to-extend-graceful-shutdown
Open

tiproxy: add annotation tiproxy-graceful-shutdown-delete-delay-seconds to remove label before deleting pods#6829
YangKeao wants to merge 1 commit intopingcap:mainfrom
YangKeao:feature/annotation-to-extend-graceful-shutdown

Conversation

@YangKeao
Copy link
Copy Markdown
Member

Background

I'm designing the graceful restarting of TiProxy in cloud environment. The expected process is just to:

  1. Use a big maxSurge.
  2. Use a big terminationGracePeriodSeconds (e.g. 24h).
  3. Restart.

However, we cannot modify the terminationGracePeriodSeconds without restarting pods. To be able to restart the existing pods gracefully, we'd better to design another workaround for it.

Design

  1. Add a new annotation core.pingcap.com/tiproxy-graceful-shutdown-delete-delay-seconds. With this annotation, the deletion of TiProxy object will have two extra steps before removing the pod: remove the label, wait for several seconds.
  2. The annotation will be propagated from TiProxyGroup to TiProxy without rolling restart, so we can modify it for existing resources.

I understand it's not elegant (so I didn't use a spec field to describe it) as it doesn't know when the TiProxy actually has no connection and can exit earlier. IMO, it'd be still better to use a bigger terminationGracePeriodSeconds to gracefully shutdown.

Usage

Patch an existing TiProxyGroup, so it'll wait for a while before deleting the pods:

kubectl --context "$CONTEXT" -n "$NS" patch tiproxygroup pg --type merge -p '{
  "spec": {
    "template": {
      "metadata": {
        "annotations": {
          "core.pingcap.com/tiproxy-graceful-shutdown-delete-delay-seconds": "20"
        }
      }
    }
  }
}'

@YangKeao YangKeao requested a review from liubog2008 April 16, 2026 09:24
@ti-chi-bot ti-chi-bot bot requested a review from howardlau1999 April 16, 2026 09:24
@ti-chi-bot
Copy link
Copy Markdown
Contributor

ti-chi-bot bot commented Apr 16, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign jlerche for approval. For more information see the Code Review Process.
Please ensure that each of them provides their approval before proceeding.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@github-actions github-actions bot added the v2 for operator v2 label Apr 16, 2026
@ti-chi-bot ti-chi-bot bot added the size/XL label Apr 16, 2026
Signed-off-by: Yang Keao <yangkeao@chunibyo.icu>
@YangKeao YangKeao force-pushed the feature/annotation-to-extend-graceful-shutdown branch from 1b9bd2d to de28487 Compare April 16, 2026 09:26
@codecov-commenter
Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 65.38462% with 18 lines in your changes missing coverage. Please review.
✅ Project coverage is 37.23%. Comparing base (f1c9aea) to head (de28487).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #6829      +/-   ##
==========================================
+ Coverage   37.16%   37.23%   +0.06%     
==========================================
  Files         390      391       +1     
  Lines       22360    22410      +50     
==========================================
+ Hits         8310     8344      +34     
- Misses      14050    14066      +16     
Flag Coverage Δ
unittest 37.23% <65.38%> (+0.06%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

}
}

res, _ := task.RunTask(ctx, common.TaskInstanceFinalizerDel[scope.TiProxy](state, c, tiproxyFinalizerSubresourceLister))
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a new task and do not run another task in this task. If task is not fail, next task will be run.

return task.NameTaskFunc("FinalizerDel", func(ctx context.Context) task.Result {
tiproxy := state.Object()

pod, err := apicall.GetPod[scope.TiProxy](ctx, c, tiproxy)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move common.TaskContextPod[scope.TiProxy](state, r.Client) before this task and no need to get pod again.

})
}

func drainOrDeletePod(ctx context.Context, c client.Client, tiproxy *v1alpha1.TiProxy, pod *corev1.Pod) (time.Duration, error) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How to notify the tiproxy that it is terminating?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size/XL v2 for operator v2

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants