Skip to content

Set Tracing Rate Limiter config via CRD#1596

Open
sergeymatov wants to merge 7 commits into
mainfrom
pr/smatov/rate-limiter-config
Open

Set Tracing Rate Limiter config via CRD#1596
sergeymatov wants to merge 7 commits into
mainfrom
pr/smatov/rate-limiter-config

Conversation

@sergeymatov

Copy link
Copy Markdown
Contributor

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: Sergey Matov <sergey.matov@githedgehog.com>
Copilot AI review requested due to automatic review settings June 15, 2026 08:19
@sergeymatov sergeymatov requested a review from a team as a code owner June 15, 2026 08:19
@sergeymatov sergeymatov requested review from mvachhar and removed request for a team June 15, 2026 08:19
@coderabbitai

coderabbitai Bot commented Jun 15, 2026

Copy link
Copy Markdown

Review Change Stack

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Introduces runtime-swappable tracing rate limiting via a new AtomicThrottle type in tracectl, backed by concurrency::slot::Slot, enabling live throttle reloads without rebuilding the subscriber. Adds TracingRateLimit to the internal config layer, wires it through K8s conversion and validation, connects it to the management processor and dataplane runtime, and extends fuzz generators to cover the new field.

Changes

Tracing Rate Limit End-to-End

Layer / File(s) Summary
AtomicThrottle and TracingControl runtime reload
tracectl/src/control.rs, tracectl/Cargo.toml
Swaps from arc-swap to concurrency::slot::Slot for atomic swapping. Adds Default for TracingRateLimitConfig with PartialEq and Eq derives. Refactors AtomicEnvFilter to use Slot<EnvFilter> for idempotent reloads. Introduces AtomicThrottle as a Slot-backed Filter with configurable token-bucket throttling and idempotent reload logic. Adds reload_throttle field to TracingControl, wires the swappable throttle through throttled_fmt_layer and init_subscriber from initialization, refactors TracingControl::new to construct and retain the throttle handle, exposes reload_rate_limit public method, and replaces the prior burst-overflow test with test_rate_limit_reload_phases.
TracingRateLimit struct and TracingConfig field
config/src/internal/device/tracecfg.rs
Adds TracingRateLimit struct with platform-aware Default (Unix derives from tracectl::TracingRateLimitConfig::default(); non-Unix uses 50/5), adds rate_limit field to TracingConfig, updates TracingConfig::default() and new() to populate it, and adds set_rate_limit setter.
K8s conversion and validation for rate_limit
config/src/converters/k8s/config/tracecfg.rs
Imports TracingRateLimit, extends TryFrom<&GatewayAgentGatewayLogs> to conditionally process optional rate_limit field with validation that burst and replenish_per_second are both present and strictly greater than 0, and adds unit test assertions for mapped and default cases.
Management processor and dataplane runtime wiring
mgmt/src/processor/proc.rs, dataplane/src/runtime.rs
Updates mgmt processor to import TracingRateLimitConfig and call reload_rate_limit with config built from TracingConfig fields; changes dataplane init_logging to always pass Some(TracingRateLimitConfig) via map_or_else with default when rate limit is absent.
Fuzz generator for rate_limit
k8s-intf/src/bolero/logs.rs
Adds GatewayAgentGatewayLogsRateLimit import and extends TypeGenerator for LegalValue<GatewayAgentGatewayLogs> to optionally populate rate_limit field with randomly generated burst and replenish_per_second values clamped to at least 1.
🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 69.70% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarizes the main change: implementing tracing rate limiter configuration via CRDs, which aligns with the changeset's focus on adding rate limit support across multiple files.
Description check ✅ Passed The description mentions a dependency on another PR, which is related to the changeset's purpose of implementing CRD-based tracing rate limiter configuration.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Comment @coderabbitai help to get the list of available commands and usage tips.

@sergeymatov sergeymatov added the dont-merge Do not merge this Pull Request label Jun 15, 2026
@sergeymatov

Copy link
Copy Markdown
Contributor Author

Adding don't merge label because Fabric-side PR is not yet merged and there is Fabric version commit bump is missing.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds runtime reconfiguration for tracing rate limiting and wires rate-limit parameters from the GatewayAgent CRD → internal config → mgmt processor → tracectl, allowing the throttle policy to be swapped without rebuilding the tracing subscriber.

Changes:

  • Add an AtomicThrottle (ArcSwap-backed) swappable filter and a public TracingControl::reload_rate_limit() API, plus a reload-path test.
  • Extend internal tracing config and the K8s converter/bolero generator to carry rate-limit parameters from the CRD.
  • Apply the rate-limit config during mgmt config application and adjust dataplane logging initialization.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
tracectl/src/control.rs Adds swappable throttle (AtomicThrottle), default rate-limit config, reload API, and tests.
mgmt/src/processor/proc.rs Applies tracing rate-limit settings when device tracing config is applied.
k8s-intf/src/bolero/logs.rs Extends CRD fuzz generator to optionally include rate-limit parameters.
dataplane/src/runtime.rs Changes startup initialization of tracing rate limiting.
config/src/internal/device/tracecfg.rs Adds an internal rate-limit struct and defaulting behavior.
config/src/converters/k8s/config/tracecfg.rs Converts CRD rate-limit fields into internal config and updates tests.

Comment thread tracectl/src/control.rs
Comment thread dataplane/src/runtime.rs
Comment on lines +555 to +558
get_trace_ctl().reload_rate_limit(Some(TracingRateLimitConfig {
burst: tracing.rate_limit.burst,
replenish_per_second: tracing.rate_limit.replenish_per_second,
}));

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@dataplane/src/runtime.rs`:
- Around line 51-57: The TracingControl::init_with_rate_limit() call in the
runtime initialization always passes Some(...) with a default
TracingRateLimitConfig even when the --tracing-rate-limit flag is omitted, which
violates the documented behavior that tracing should not be rate-limited when
the flag is absent. Replace the map_or_else() call (which provides
TracingRateLimitConfig::default() as the fallback) with a map() call that
transforms the rate limit only when present, allowing None to be passed through
to init_with_rate_limit() when the flag is omitted, thereby preserving the CLI
contract.

In `@tracectl/src/control.rs`:
- Around line 419-422: The throttle reload path uses std::sync::Arc directly
instead of the workspace abstraction. In the self.inner.store() call where
TracingControl::build_rate_limit_layer() is being invoked, replace the fully
qualified std::sync::Arc::new() with Arc::new() to use the workspace's
concurrency::sync abstraction that is already imported.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: e25bd886-d57e-40d0-970e-8628e70d7756

📥 Commits

Reviewing files that changed from the base of the PR and between 5407c39 and 7d80a49.

📒 Files selected for processing (6)
  • config/src/converters/k8s/config/tracecfg.rs
  • config/src/internal/device/tracecfg.rs
  • dataplane/src/runtime.rs
  • k8s-intf/src/bolero/logs.rs
  • mgmt/src/processor/proc.rs
  • tracectl/src/control.rs

Comment thread dataplane/src/runtime.rs
Comment on lines +51 to +57
TracingControl::init_with_rate_limit(Some(args.tracing_rate_limit().map_or_else(
TracingRateLimitConfig::default,
|rate_limit| TracingRateLimitConfig {
burst: rate_limit.burst,
replenish_per_second: rate_limit.replenish_per_second,
}
}));
},
)));

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Preserve the CLI contract when --tracing-rate-limit is omitted.

Lines 51-57 always pass Some(...) by injecting TracingRateLimitConfig::default(), which enables throttling even when the flag is absent. That conflicts with the documented CLI behavior (“If omitted, tracing output is not rate-limited.”).

Proposed fix
-    TracingControl::init_with_rate_limit(Some(args.tracing_rate_limit().map_or_else(
-        TracingRateLimitConfig::default,
-        |rate_limit| TracingRateLimitConfig {
-            burst: rate_limit.burst,
-            replenish_per_second: rate_limit.replenish_per_second,
-        },
-    )));
+    TracingControl::init_with_rate_limit(args.tracing_rate_limit().map(|rate_limit| {
+        TracingRateLimitConfig {
+            burst: rate_limit.burst,
+            replenish_per_second: rate_limit.replenish_per_second,
+        }
+    }));

As per coding guidelines, "Find logic errors in the code under review. If confident that code is incorrect, suggest a fix."

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
TracingControl::init_with_rate_limit(Some(args.tracing_rate_limit().map_or_else(
TracingRateLimitConfig::default,
|rate_limit| TracingRateLimitConfig {
burst: rate_limit.burst,
replenish_per_second: rate_limit.replenish_per_second,
}
}));
},
)));
TracingControl::init_with_rate_limit(args.tracing_rate_limit().map(|rate_limit| {
TracingRateLimitConfig {
burst: rate_limit.burst,
replenish_per_second: rate_limit.replenish_per_second,
}
}));
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@dataplane/src/runtime.rs` around lines 51 - 57, The
TracingControl::init_with_rate_limit() call in the runtime initialization always
passes Some(...) with a default TracingRateLimitConfig even when the
--tracing-rate-limit flag is omitted, which violates the documented behavior
that tracing should not be rate-limited when the flag is absent. Replace the
map_or_else() call (which provides TracingRateLimitConfig::default() as the
fallback) with a map() call that transforms the rate limit only when present,
allowing None to be passed through to init_with_rate_limit() when the flag is
omitted, thereby preserving the CLI contract.

Source: Coding guidelines

Comment thread tracectl/src/control.rs Outdated
@sergeymatov sergeymatov force-pushed the pr/smatov/rate-limiter-config branch from 7d80a49 to 1328731 Compare June 15, 2026 08:47

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
tracectl/src/control.rs (1)

1223-1226: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Assert the configured burst, not just any passed event.

The comment says “at least the burst survives,” but p2 >= 1 would pass if the burst value were ignored or miswired. Assert against BURST so the reload test covers the new config fields.

As per coding guidelines, **/*.{js,ts,tsx,jsx,py,java,go,rs,c,cpp,h,cs,rb,php,scala}: Find logic errors in the code under review.

Proposed test fix
-        assert!(p2 >= 1, "phase 2 throttle swallowed the entire burst");
+        assert!(
+            p2 >= BURST as usize,
+            "phase 2 passed only {p2}/{BURST} burst events"
+        );
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tracectl/src/control.rs` around lines 1223 - 1226, The assertion at line 1223
checks if p2 >= 1, but this only verifies that at least one event survived, not
that the configured burst value was actually respected. Replace the assertion
that checks p2 >= 1 with an assertion that checks p2 >= BURST to ensure the test
properly validates that the reloaded burst configuration is actually being
applied and not ignored or miswired. This ensures the reload test covers the new
config fields as intended.

Source: Coding guidelines

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@tracectl/src/control.rs`:
- Around line 421-426: The reload method in TracingControl always swaps in a
fresh TracingRateLimitLayer even when the config is unchanged, which resets
rate-limit buckets on every call and weakens throttling when management
repeatedly applies the same config. To fix this, add a check before the store
and callsite::rebuild_interest_cache calls to compare the incoming config with
the current configuration, and only perform the swap and rebuild if the config
has actually changed. This idempotency check should also be applied to any
reload_rate_limit calls in the management processor code that invoke this reload
method with the same config repeatedly.

---

Outside diff comments:
In `@tracectl/src/control.rs`:
- Around line 1223-1226: The assertion at line 1223 checks if p2 >= 1, but this
only verifies that at least one event survived, not that the configured burst
value was actually respected. Replace the assertion that checks p2 >= 1 with an
assertion that checks p2 >= BURST to ensure the test properly validates that the
reloaded burst configuration is actually being applied and not ignored or
miswired. This ensures the reload test covers the new config fields as intended.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: f459cff9-6369-4582-83bd-17b6e9e0a53f

📥 Commits

Reviewing files that changed from the base of the PR and between 7d80a49 and 1328731.

📒 Files selected for processing (6)
  • config/src/converters/k8s/config/tracecfg.rs
  • config/src/internal/device/tracecfg.rs
  • dataplane/src/runtime.rs
  • k8s-intf/src/bolero/logs.rs
  • mgmt/src/processor/proc.rs
  • tracectl/src/control.rs
🚧 Files skipped from review as they are similar to previous changes (4)
  • dataplane/src/runtime.rs
  • config/src/converters/k8s/config/tracecfg.rs
  • k8s-intf/src/bolero/logs.rs
  • mgmt/src/processor/proc.rs

Comment thread tracectl/src/control.rs
@sergeymatov sergeymatov force-pushed the pr/smatov/rate-limiter-config branch from 1328731 to 8900e1b Compare June 15, 2026 12:50

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
tracectl/src/control.rs (1)

1213-1220: ⚡ Quick win

Add a regression assertion for same-config reloads.

This test proves live enable/disable works, but it does not exercise the new idempotent path. Reload the same TracingRateLimitConfig after draining the bucket and assert it does not grant a fresh burst.

Proposed test extension
+            let config = TracingRateLimitConfig {
+                burst: BURST,
+                replenish_per_second: 1,
+            };
+
             // Phase 2 — install a small-burst throttle and reload the layer.
-            throttle.reload(Some(TracingRateLimitConfig {
-                burst: BURST,
-                replenish_per_second: 1,
-            }));
+            throttle.reload(Some(config));
             for _ in 0..EMITTED {
                 tracing::info!("{P2_ON}");
             }
+
+            let before_same_config_reload = {
+                let captured = String::from_utf8(buf.lock().unwrap().clone()).unwrap();
+                captured.matches(P2_ON).count()
+            };
+            throttle.reload(Some(config));
+            for _ in 0..EMITTED {
+                tracing::info!("{P2_ON}");
+            }
+            let after_same_config_reload = {
+                let captured = String::from_utf8(buf.lock().unwrap().clone()).unwrap();
+                captured.matches(P2_ON).count()
+            };
+            assert!(
+                after_same_config_reload - before_same_config_reload < BURST as usize,
+                "same-config reload granted a fresh burst"
+            );
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tracectl/src/control.rs` around lines 1213 - 1220, After the Phase 2 section
where the throttle is reloaded and the bucket is drained (after the for loop
that emits EMITTED messages), add code to reload the throttle with the same
TracingRateLimitConfig to test the idempotent path. Specifically, call
throttle.reload() again with the identical TracingRateLimitConfig containing the
same BURST and replenish_per_second values, then attempt to emit additional
messages and add an assertion that verifies these new messages are throttled
rather than all being emitted, proving that reloading with an identical config
does not grant a fresh burst.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@tracectl/src/control.rs`:
- Around line 1213-1220: After the Phase 2 section where the throttle is
reloaded and the bucket is drained (after the for loop that emits EMITTED
messages), add code to reload the throttle with the same TracingRateLimitConfig
to test the idempotent path. Specifically, call throttle.reload() again with the
identical TracingRateLimitConfig containing the same BURST and
replenish_per_second values, then attempt to emit additional messages and add an
assertion that verifies these new messages are throttled rather than all being
emitted, proving that reloading with an identical config does not grant a fresh
burst.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 8e6d2d0c-4a75-4d9b-807e-893dc4282a75

📥 Commits

Reviewing files that changed from the base of the PR and between 1328731 and 8900e1b.

📒 Files selected for processing (6)
  • config/src/converters/k8s/config/tracecfg.rs
  • config/src/internal/device/tracecfg.rs
  • dataplane/src/runtime.rs
  • k8s-intf/src/bolero/logs.rs
  • mgmt/src/processor/proc.rs
  • tracectl/src/control.rs
🚧 Files skipped from review as they are similar to previous changes (5)
  • dataplane/src/runtime.rs
  • k8s-intf/src/bolero/logs.rs
  • config/src/internal/device/tracecfg.rs
  • config/src/converters/k8s/config/tracecfg.rs
  • mgmt/src/processor/proc.rs

Signed-off-by: Sergey Matov <sergey.matov@githedgehog.com>
@sergeymatov sergeymatov force-pushed the pr/smatov/rate-limiter-config branch from 8900e1b to 0eb54e7 Compare June 15, 2026 13:17

@Fredi-raspall Fredi-raspall left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks mostly good to me @sergeymatov. Thx !

Comment thread tracectl/src/control.rs Outdated
Comment thread config/src/converters/k8s/config/tracecfg.rs
pub tags: OrderMap<String, LevelFilter>,
/// Log rate limiter — always present; defaults to
/// [`TracingRateLimit::default`] when a config omits it.
pub rate_limit: TracingRateLimit,

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Related to the prior question, I wonder if this should be Option?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TracingRateLimit is mandatory, defaults are set via fabric

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean mandatory in the API?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO, ideally, even if defaulted by the API if absent, there would be a way to actually disable the throttling.
If for anything, that would allow us to measure the penalty in performance when enabled / disabled.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We agreed that Dataplane by default must have throttler, maybe it's worth to disable it as well. Current API won't allow you to do this

Signed-off-by: Sergey Matov <sergey.matov@githedgehog.com>
Signed-off-by: Sergey Matov <sergey.matov@githedgehog.com>
@sergeymatov sergeymatov removed the dont-merge Do not merge this Pull Request label Jun 18, 2026
Signed-off-by: Sergey Matov <sergey.matov@githedgehog.com>
Signed-off-by: Sergey Matov <sergey.matov@githedgehog.com>
Signed-off-by: Sergey Matov <sergey.matov@githedgehog.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants