Skip to content

fix(rivetkit-core): decrement active actor metrics#5044

Draft
NathanFlurry wants to merge 1 commit into
05-11-fix_envoy-client_dont_abort_drain_after_20sfrom
active-actors/fix-active-count
Draft

fix(rivetkit-core): decrement active actor metrics#5044
NathanFlurry wants to merge 1 commit into
05-11-fix_envoy-client_dont_abort_drain_after_20sfrom
active-actors/fix-active-count

Conversation

@NathanFlurry
Copy link
Copy Markdown
Member

@NathanFlurry NathanFlurry commented May 12, 2026

Stack Context

This stack is improving RivetKit runtime metrics and serverless observability.

What?

  • Marks actor metrics stopped when ActorTask terminates, with Drop as an idempotent fallback.
  • Converts self-referential actor context callbacks to weak captures so callbacks do not retain actor contexts.
  • Adds a regression test that verifies ActorTask callback setup does not keep the actor context alive.

Why?

The active actor gauge could appear to only increase because metric decrement was tied to ActorContext drop, and context-owned callbacks captured the same context strongly. Stopping the metric explicitly on task termination and removing the self-retaining callback captures makes rivetkit_actor_active_count reflect task lifetime instead of leaked context lifetime.

Validation

  • cargo check -p rivetkit-core
  • cargo test -p rivetkit-core request_save_hook_does_not_retain_actor_context
  • cargo test -p rivetkit-core actor_active_count_tracks_metric_lifetime

@claude
Copy link
Copy Markdown
Contributor

claude Bot commented May 12, 2026

Code Review fix rivetkit-core decrement active actor metrics. Good fix for a real leak. The root cause strong self-captures in callbacks preventing ActorContext from ever dropping was subtle, and the three-pronged approach explicit stop at task exit, idempotent AtomicBool guard, weak captures is solid. Overview: The PR addresses two related issues. 1. ActorMetrics was only decremented in Drop, but self-referential strong captures in callbacks kept ActorContext alive indefinitely, so Drop never fired. 2. registry/mod.rs had get_protocol_metadata missing () referenced as a field, not called as a method and was using a hardcoded SHUTDOWN_DRAIN_TIMEOUT instead of the protocol-negotiated stop threshold. Bugs: registry/mod.rs fix is significant - get_protocol_metadata without () means the method was never actually called; stop_threshold would have fallen back to the hardcoded default on every shutdown. The internal_keep_awake weak-upgrade timing: The weak reference is resolved synchronously when the callback fires outside the async move block, so the captured Option inside the pinned future holds a strong reference for the full duration of internal_keep_awake_task. Dual call sites in run(): record_actor_stopped is called in both the early-exit branch and the shutdown-complete branch. The AtomicBool guard makes the subsequent Drop call a no-op. Correct. Design: Silent no-ops in metric callbacks - Five of the six callbacks silently do nothing if the context weak ref is dead. These are metrics/observability callbacks, so the silent no-op is defensible but a brief comment would help readers. ActorRuntime::NotConfigured in internal_keep_awake is the one callback that returns an explicit Err, consistent with fail-by-default guidance. Good. Ordering::AcqRel on swap is correct. Test Coverage: Missing actor_active_count_tracks_metric_lifetime test listed in validation steps but absent from diff. request_save_hook_does_not_retain_actor_context is present. Gauge-level coverage gap: the test does not verify record_actor_stopped decrements rivetkit_actor_active_count. Summary: Core approach is correct. Main items: 1. Add or confirm actor_active_count_tracks_metric_lifetime test. 2. Optional brief comments on silent no-op branches. 3. registry/mod.rs fixes are good - note them in the description.

@MasterPtato MasterPtato force-pushed the 05-11-fix_envoy-client_dont_abort_drain_after_20s branch from b28585d to f2f82a8 Compare May 14, 2026 17:26
@MasterPtato MasterPtato force-pushed the active-actors/fix-active-count branch from 253e406 to 87e5d7d Compare May 14, 2026 17:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant