fix(rivetkit-core): decrement active actor metrics#5044
Conversation
|
Code Review fix rivetkit-core decrement active actor metrics. Good fix for a real leak. The root cause strong self-captures in callbacks preventing ActorContext from ever dropping was subtle, and the three-pronged approach explicit stop at task exit, idempotent AtomicBool guard, weak captures is solid. Overview: The PR addresses two related issues. 1. ActorMetrics was only decremented in Drop, but self-referential strong captures in callbacks kept ActorContext alive indefinitely, so Drop never fired. 2. registry/mod.rs had get_protocol_metadata missing () referenced as a field, not called as a method and was using a hardcoded SHUTDOWN_DRAIN_TIMEOUT instead of the protocol-negotiated stop threshold. Bugs: registry/mod.rs fix is significant - get_protocol_metadata without () means the method was never actually called; stop_threshold would have fallen back to the hardcoded default on every shutdown. The internal_keep_awake weak-upgrade timing: The weak reference is resolved synchronously when the callback fires outside the async move block, so the captured Option inside the pinned future holds a strong reference for the full duration of internal_keep_awake_task. Dual call sites in run(): record_actor_stopped is called in both the early-exit branch and the shutdown-complete branch. The AtomicBool guard makes the subsequent Drop call a no-op. Correct. Design: Silent no-ops in metric callbacks - Five of the six callbacks silently do nothing if the context weak ref is dead. These are metrics/observability callbacks, so the silent no-op is defensible but a brief comment would help readers. ActorRuntime::NotConfigured in internal_keep_awake is the one callback that returns an explicit Err, consistent with fail-by-default guidance. Good. Ordering::AcqRel on swap is correct. Test Coverage: Missing actor_active_count_tracks_metric_lifetime test listed in validation steps but absent from diff. request_save_hook_does_not_retain_actor_context is present. Gauge-level coverage gap: the test does not verify record_actor_stopped decrements rivetkit_actor_active_count. Summary: Core approach is correct. Main items: 1. Add or confirm actor_active_count_tracks_metric_lifetime test. 2. Optional brief comments on silent no-op branches. 3. registry/mod.rs fixes are good - note them in the description. |
61d7ec9 to
253e406
Compare
b28585d to
f2f82a8
Compare
253e406 to
87e5d7d
Compare

Stack Context
This stack is improving RivetKit runtime metrics and serverless observability.
What?
ActorTaskterminates, withDropas an idempotent fallback.ActorTaskcallback setup does not keep the actor context alive.Why?
The active actor gauge could appear to only increase because metric decrement was tied to
ActorContextdrop, and context-owned callbacks captured the same context strongly. Stopping the metric explicitly on task termination and removing the self-retaining callback captures makesrivetkit_actor_active_countreflect task lifetime instead of leaked context lifetime.Validation
cargo check -p rivetkit-corecargo test -p rivetkit-core request_save_hook_does_not_retain_actor_contextcargo test -p rivetkit-core actor_active_count_tracks_metric_lifetime