Set Tracing Rate Limiter config via CRD#1596
Conversation
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: Sergey Matov <sergey.matov@githedgehog.com>
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughIntroduces runtime-swappable tracing rate limiting via a new ChangesTracing Rate Limit End-to-End
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Comment |
|
Adding |
There was a problem hiding this comment.
Pull request overview
This PR adds runtime reconfiguration for tracing rate limiting and wires rate-limit parameters from the GatewayAgent CRD → internal config → mgmt processor → tracectl, allowing the throttle policy to be swapped without rebuilding the tracing subscriber.
Changes:
- Add an
AtomicThrottle(ArcSwap-backed) swappable filter and a publicTracingControl::reload_rate_limit()API, plus a reload-path test. - Extend internal tracing config and the K8s converter/bolero generator to carry rate-limit parameters from the CRD.
- Apply the rate-limit config during mgmt config application and adjust dataplane logging initialization.
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
tracectl/src/control.rs |
Adds swappable throttle (AtomicThrottle), default rate-limit config, reload API, and tests. |
mgmt/src/processor/proc.rs |
Applies tracing rate-limit settings when device tracing config is applied. |
k8s-intf/src/bolero/logs.rs |
Extends CRD fuzz generator to optionally include rate-limit parameters. |
dataplane/src/runtime.rs |
Changes startup initialization of tracing rate limiting. |
config/src/internal/device/tracecfg.rs |
Adds an internal rate-limit struct and defaulting behavior. |
config/src/converters/k8s/config/tracecfg.rs |
Converts CRD rate-limit fields into internal config and updates tests. |
| get_trace_ctl().reload_rate_limit(Some(TracingRateLimitConfig { | ||
| burst: tracing.rate_limit.burst, | ||
| replenish_per_second: tracing.rate_limit.replenish_per_second, | ||
| })); |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@dataplane/src/runtime.rs`:
- Around line 51-57: The TracingControl::init_with_rate_limit() call in the
runtime initialization always passes Some(...) with a default
TracingRateLimitConfig even when the --tracing-rate-limit flag is omitted, which
violates the documented behavior that tracing should not be rate-limited when
the flag is absent. Replace the map_or_else() call (which provides
TracingRateLimitConfig::default() as the fallback) with a map() call that
transforms the rate limit only when present, allowing None to be passed through
to init_with_rate_limit() when the flag is omitted, thereby preserving the CLI
contract.
In `@tracectl/src/control.rs`:
- Around line 419-422: The throttle reload path uses std::sync::Arc directly
instead of the workspace abstraction. In the self.inner.store() call where
TracingControl::build_rate_limit_layer() is being invoked, replace the fully
qualified std::sync::Arc::new() with Arc::new() to use the workspace's
concurrency::sync abstraction that is already imported.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro Plus
Run ID: e25bd886-d57e-40d0-970e-8628e70d7756
📒 Files selected for processing (6)
config/src/converters/k8s/config/tracecfg.rsconfig/src/internal/device/tracecfg.rsdataplane/src/runtime.rsk8s-intf/src/bolero/logs.rsmgmt/src/processor/proc.rstracectl/src/control.rs
| TracingControl::init_with_rate_limit(Some(args.tracing_rate_limit().map_or_else( | ||
| TracingRateLimitConfig::default, | ||
| |rate_limit| TracingRateLimitConfig { | ||
| burst: rate_limit.burst, | ||
| replenish_per_second: rate_limit.replenish_per_second, | ||
| } | ||
| })); | ||
| }, | ||
| ))); |
There was a problem hiding this comment.
Preserve the CLI contract when --tracing-rate-limit is omitted.
Lines 51-57 always pass Some(...) by injecting TracingRateLimitConfig::default(), which enables throttling even when the flag is absent. That conflicts with the documented CLI behavior (“If omitted, tracing output is not rate-limited.”).
Proposed fix
- TracingControl::init_with_rate_limit(Some(args.tracing_rate_limit().map_or_else(
- TracingRateLimitConfig::default,
- |rate_limit| TracingRateLimitConfig {
- burst: rate_limit.burst,
- replenish_per_second: rate_limit.replenish_per_second,
- },
- )));
+ TracingControl::init_with_rate_limit(args.tracing_rate_limit().map(|rate_limit| {
+ TracingRateLimitConfig {
+ burst: rate_limit.burst,
+ replenish_per_second: rate_limit.replenish_per_second,
+ }
+ }));As per coding guidelines, "Find logic errors in the code under review. If confident that code is incorrect, suggest a fix."
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| TracingControl::init_with_rate_limit(Some(args.tracing_rate_limit().map_or_else( | |
| TracingRateLimitConfig::default, | |
| |rate_limit| TracingRateLimitConfig { | |
| burst: rate_limit.burst, | |
| replenish_per_second: rate_limit.replenish_per_second, | |
| } | |
| })); | |
| }, | |
| ))); | |
| TracingControl::init_with_rate_limit(args.tracing_rate_limit().map(|rate_limit| { | |
| TracingRateLimitConfig { | |
| burst: rate_limit.burst, | |
| replenish_per_second: rate_limit.replenish_per_second, | |
| } | |
| })); |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@dataplane/src/runtime.rs` around lines 51 - 57, The
TracingControl::init_with_rate_limit() call in the runtime initialization always
passes Some(...) with a default TracingRateLimitConfig even when the
--tracing-rate-limit flag is omitted, which violates the documented behavior
that tracing should not be rate-limited when the flag is absent. Replace the
map_or_else() call (which provides TracingRateLimitConfig::default() as the
fallback) with a map() call that transforms the rate limit only when present,
allowing None to be passed through to init_with_rate_limit() when the flag is
omitted, thereby preserving the CLI contract.
Source: Coding guidelines
7d80a49 to
1328731
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
tracectl/src/control.rs (1)
1223-1226:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winAssert the configured burst, not just any passed event.
The comment says “at least the burst survives,” but
p2 >= 1would pass if theburstvalue were ignored or miswired. Assert againstBURSTso the reload test covers the new config fields.As per coding guidelines,
**/*.{js,ts,tsx,jsx,py,java,go,rs,c,cpp,h,cs,rb,php,scala}: Find logic errors in the code under review.Proposed test fix
- assert!(p2 >= 1, "phase 2 throttle swallowed the entire burst"); + assert!( + p2 >= BURST as usize, + "phase 2 passed only {p2}/{BURST} burst events" + );🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tracectl/src/control.rs` around lines 1223 - 1226, The assertion at line 1223 checks if p2 >= 1, but this only verifies that at least one event survived, not that the configured burst value was actually respected. Replace the assertion that checks p2 >= 1 with an assertion that checks p2 >= BURST to ensure the test properly validates that the reloaded burst configuration is actually being applied and not ignored or miswired. This ensures the reload test covers the new config fields as intended.Source: Coding guidelines
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@tracectl/src/control.rs`:
- Around line 421-426: The reload method in TracingControl always swaps in a
fresh TracingRateLimitLayer even when the config is unchanged, which resets
rate-limit buckets on every call and weakens throttling when management
repeatedly applies the same config. To fix this, add a check before the store
and callsite::rebuild_interest_cache calls to compare the incoming config with
the current configuration, and only perform the swap and rebuild if the config
has actually changed. This idempotency check should also be applied to any
reload_rate_limit calls in the management processor code that invoke this reload
method with the same config repeatedly.
---
Outside diff comments:
In `@tracectl/src/control.rs`:
- Around line 1223-1226: The assertion at line 1223 checks if p2 >= 1, but this
only verifies that at least one event survived, not that the configured burst
value was actually respected. Replace the assertion that checks p2 >= 1 with an
assertion that checks p2 >= BURST to ensure the test properly validates that the
reloaded burst configuration is actually being applied and not ignored or
miswired. This ensures the reload test covers the new config fields as intended.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro Plus
Run ID: f459cff9-6369-4582-83bd-17b6e9e0a53f
📒 Files selected for processing (6)
config/src/converters/k8s/config/tracecfg.rsconfig/src/internal/device/tracecfg.rsdataplane/src/runtime.rsk8s-intf/src/bolero/logs.rsmgmt/src/processor/proc.rstracectl/src/control.rs
🚧 Files skipped from review as they are similar to previous changes (4)
- dataplane/src/runtime.rs
- config/src/converters/k8s/config/tracecfg.rs
- k8s-intf/src/bolero/logs.rs
- mgmt/src/processor/proc.rs
1328731 to
8900e1b
Compare
There was a problem hiding this comment.
🧹 Nitpick comments (1)
tracectl/src/control.rs (1)
1213-1220: ⚡ Quick winAdd a regression assertion for same-config reloads.
This test proves live enable/disable works, but it does not exercise the new idempotent path. Reload the same
TracingRateLimitConfigafter draining the bucket and assert it does not grant a fresh burst.Proposed test extension
+ let config = TracingRateLimitConfig { + burst: BURST, + replenish_per_second: 1, + }; + // Phase 2 — install a small-burst throttle and reload the layer. - throttle.reload(Some(TracingRateLimitConfig { - burst: BURST, - replenish_per_second: 1, - })); + throttle.reload(Some(config)); for _ in 0..EMITTED { tracing::info!("{P2_ON}"); } + + let before_same_config_reload = { + let captured = String::from_utf8(buf.lock().unwrap().clone()).unwrap(); + captured.matches(P2_ON).count() + }; + throttle.reload(Some(config)); + for _ in 0..EMITTED { + tracing::info!("{P2_ON}"); + } + let after_same_config_reload = { + let captured = String::from_utf8(buf.lock().unwrap().clone()).unwrap(); + captured.matches(P2_ON).count() + }; + assert!( + after_same_config_reload - before_same_config_reload < BURST as usize, + "same-config reload granted a fresh burst" + );🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tracectl/src/control.rs` around lines 1213 - 1220, After the Phase 2 section where the throttle is reloaded and the bucket is drained (after the for loop that emits EMITTED messages), add code to reload the throttle with the same TracingRateLimitConfig to test the idempotent path. Specifically, call throttle.reload() again with the identical TracingRateLimitConfig containing the same BURST and replenish_per_second values, then attempt to emit additional messages and add an assertion that verifies these new messages are throttled rather than all being emitted, proving that reloading with an identical config does not grant a fresh burst.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@tracectl/src/control.rs`:
- Around line 1213-1220: After the Phase 2 section where the throttle is
reloaded and the bucket is drained (after the for loop that emits EMITTED
messages), add code to reload the throttle with the same TracingRateLimitConfig
to test the idempotent path. Specifically, call throttle.reload() again with the
identical TracingRateLimitConfig containing the same BURST and
replenish_per_second values, then attempt to emit additional messages and add an
assertion that verifies these new messages are throttled rather than all being
emitted, proving that reloading with an identical config does not grant a fresh
burst.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro Plus
Run ID: 8e6d2d0c-4a75-4d9b-807e-893dc4282a75
📒 Files selected for processing (6)
config/src/converters/k8s/config/tracecfg.rsconfig/src/internal/device/tracecfg.rsdataplane/src/runtime.rsk8s-intf/src/bolero/logs.rsmgmt/src/processor/proc.rstracectl/src/control.rs
🚧 Files skipped from review as they are similar to previous changes (5)
- dataplane/src/runtime.rs
- k8s-intf/src/bolero/logs.rs
- config/src/internal/device/tracecfg.rs
- config/src/converters/k8s/config/tracecfg.rs
- mgmt/src/processor/proc.rs
Signed-off-by: Sergey Matov <sergey.matov@githedgehog.com>
8900e1b to
0eb54e7
Compare
Fredi-raspall
left a comment
There was a problem hiding this comment.
Looks mostly good to me @sergeymatov. Thx !
| pub tags: OrderMap<String, LevelFilter>, | ||
| /// Log rate limiter — always present; defaults to | ||
| /// [`TracingRateLimit::default`] when a config omits it. | ||
| pub rate_limit: TracingRateLimit, |
There was a problem hiding this comment.
Related to the prior question, I wonder if this should be Option?
There was a problem hiding this comment.
TracingRateLimit is mandatory, defaults are set via fabric
There was a problem hiding this comment.
You mean mandatory in the API?
There was a problem hiding this comment.
IMO, ideally, even if defaulted by the API if absent, there would be a way to actually disable the throttling.
If for anything, that would allow us to measure the penalty in performance when enabled / disabled.
There was a problem hiding this comment.
We agreed that Dataplane by default must have throttler, maybe it's worth to disable it as well. Current API won't allow you to do this
Signed-off-by: Sergey Matov <sergey.matov@githedgehog.com>
Signed-off-by: Sergey Matov <sergey.matov@githedgehog.com>
Signed-off-by: Sergey Matov <sergey.matov@githedgehog.com>
Signed-off-by: Sergey Matov <sergey.matov@githedgehog.com>
Signed-off-by: Sergey Matov <sergey.matov@githedgehog.com>
Depends on githedgehog/fabric#1472