feat(pubsub): expose RetryableErrors for customized retry configuration by suzmue · Pull Request #5411 · googleapis/google-cloud-rust

suzmue · 2026-04-15T00:33:47Z

This change introduces a new public retry_policy module in the google-cloud-pubsub crate, exporting the RetryableErrors struct. Users can now explicitly reference and decorate this policy to customize retry limits and durations on Publisher instances.

Fixes #5366

This change introduces a new public `retry_policy` module in the `google-cloud-pubsub` crate, exporting the `RetryableErrors` struct. Users can now explicitly reference and decorate this policy to customize retry limits and durations on Publisher instances.

codecov · 2026-04-15T00:43:09Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 97.72%. Comparing base (330dc2f) to head (006a8cd).

Additional details and impacted files

@@           Coverage Diff           @@
##             main    #5411   +/-   ##
=======================================
  Coverage   97.72%   97.72%           
=======================================
  Files         216      217    +1     
  Lines       47239    47239           
=======================================
+ Hits        46165    46166    +1     
+ Misses       1074     1073    -1

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

dbolduc · 2026-04-15T17:35:43Z

+            return RetryResult::Continue(error);
+        }
+
+        if error.is_io() || error.is_timeout() {


(I know this code already existed, but)

Why would we retry a timeout error? That is at best weird and at worst a bug.

Nevermind, I think you are right and I am wrong.

Why would we retry a timeout error?

There is an attempt timeout and an operation timeout. If we hit an attempt timeout, we should keep retrying. If we hit the operation timeout, we should stop.

This code only applies to attempt timeouts.

Operation timeouts are set using RetryPolicyExt::with_time_limit. And that has its own logic to override a RetryResult::Continue with a RetryResult::Exhausted.

google-cloud-rust/src/gax/src/retry_policy.rs

Lines 466 to 468 in 52388cc

RetryResult::Continue(e) => {

if tokio::time::Instant::now().into_std() >= state.start + self.maximum_duration {

RetryResult::Exhausted(e)

dbolduc · 2026-04-15T17:42:13Z

+            use google_cloud_gax::error::rpc::Code;
+            return match status.code {
+                Code::Aborted
+                | Code::Cancelled


This is suspicious. We don't have request cancelling, so not sure when we would receive a CANCELLED error. The docs seem to indicate they only expect CANCELLED coming from the client. They also say this could lead to a double publish, so I'd remove it.

"The client can retry the operation, but the operation may have been executed on the previous call."

dbolduc · 2026-04-15T17:42:53Z

+            return match status.code {
+                Code::Aborted
+                | Code::Cancelled
+                | Code::DeadlineExceeded


This is suspicious.

The docs say "Typically the client would retry the operation. Keep in mind that this may result in the operation being executed more than once on the server."

I think it is better to avoid a double publish than retry on this error.

Thinking out loud... Might be worth bringing up in our team meeting or the pubsub meeting. You probably understand this better than I do.

double publish

The more I think about publish retries, the more confused I am. Typically, we do not retry POSTs like the Publish RPC because they are not idempotent.

The application knows whether the message is idempotent, but we don't in general. e.g. if the message is some event like "your build passed", that could be published twice without side effects. But if the message is for some chat, you would not want it published twice.

My guess is that applications probably set a nonce or a request ID in the pubsub message attributes if they want retries + "exactly once" publish. And let their subscribers check for dups. (Aside: then why is exactly-once delivery a subscriber feature when applications need to do this anyway for "exactly-once" publish?? 🫨)

So I think what I convinced myself of is that retrying on DEADLINE_EXCEEDED is no different from retrying on UNAVAILABLE. You could get a double publish in either case.

I am not sure if there is any action we should take. We could add a blurb in the docs saying the publisher retries by default and that applications can use NeverRetry to turn off retries? Maybe here:

https://docs.rs/google-cloud-pubsub/0.33.2/google_cloud_pubsub/builder/publisher/struct.PublisherBuilder.html#method.with_retry_policy

dbolduc · 2026-04-15T17:47:56Z

+        // - 429: Resource Exhausted
+        // - 499: Cancelled Request
+        // - 5xx: Internal Server Error, Bad Gateway, etc.
+        if let Some(408 | 429 | 499 | 500..600) = error.http_status_code() {


500..600

Can we enumerate the errors we should retry on?

e.g. 501 is "not implemented", 505 is "HTTP version not supported". While Pub/Sub probably doesn't return them, they definitely shouldn't be retried.

suzmue · 2026-04-15T20:46:58Z

@dbolduc Will go through each of your comments too but this is also based on the retry implementations for Go and Java: https://github.com/googleapis/google-cloud-go/blob/9d47d1ca53dc1a3183352f67ebeb174b1d194846/pubsub/v2/service.go#L51 and https://github.com/googleapis/java-pubsub/blob/880f28a11d7dc2864ae958e59940bb24bb6fa555/google-cloud-pubsub/src/main/java/com/google/cloud/pubsub/v1/StatusUtil.java#L48

dbolduc

I agree that we want to expose this exact API.

suzmue requested a review from a team as a code owner April 15, 2026 00:33

product-auto-label bot added the api: pubsub Issues related to the Pub/Sub API. label Apr 15, 2026

dbolduc reviewed Apr 15, 2026

View reviewed changes

dbolduc approved these changes Apr 16, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(pubsub): expose RetryableErrors for customized retry configuration#5411

feat(pubsub): expose RetryableErrors for customized retry configuration#5411
suzmue wants to merge 1 commit intogoogleapis:mainfrom
suzmue:expose-pubsub-retry-policies

suzmue commented Apr 15, 2026

Uh oh!

codecov bot commented Apr 15, 2026

Uh oh!

dbolduc Apr 15, 2026

Uh oh!

dbolduc Apr 16, 2026

Uh oh!

dbolduc Apr 15, 2026

Uh oh!

dbolduc Apr 15, 2026

Uh oh!

dbolduc Apr 16, 2026

Uh oh!

dbolduc Apr 15, 2026

Uh oh!

suzmue commented Apr 15, 2026

Uh oh!

dbolduc left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	RetryResult::Continue(e) => {
	if tokio::time::Instant::now().into_std() >= state.start + self.maximum_duration {
	RetryResult::Exhausted(e)

Conversation

suzmue commented Apr 15, 2026

Uh oh!

codecov bot commented Apr 15, 2026

Codecov Report

Uh oh!

dbolduc Apr 15, 2026

Choose a reason for hiding this comment

Uh oh!

dbolduc Apr 16, 2026

Choose a reason for hiding this comment

Uh oh!

dbolduc Apr 15, 2026

Choose a reason for hiding this comment

Uh oh!

dbolduc Apr 15, 2026

Choose a reason for hiding this comment

Uh oh!

dbolduc Apr 16, 2026

Choose a reason for hiding this comment

Uh oh!

dbolduc Apr 15, 2026

Choose a reason for hiding this comment

Uh oh!

suzmue commented Apr 15, 2026

Uh oh!

dbolduc left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants