Skip to content

[P1] event-exporter: bounded retries with backoff for Cloud Logging writes #1031

Description

@erain

Problem

writer.Write retries indefinitely with a fixed 10s delay. During API outages or throttling, this can stall throughput and cause backlog growth without a clear signal or drop policy.

Proposed optimization

Introduce bounded retry with exponential backoff + jitter, and expose metrics for retry count and dropped batches. Optionally apply per-status handling (e.g., retry 429/5xx, fail fast on 400). This makes failure behavior predictable and prevents infinite backlog under sustained failure.

Notes / references

Acceptance criteria

  • Retries use exponential backoff with jitter and a max retry time or count.
  • Metrics expose retry attempts and dropped batches.
  • Behavior for 400 vs 429/5xx is explicit and documented.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions