Skip to content

TODO: ktichen sink counter latency#5051

Draft
NathanFlurry wants to merge 1 commit into
rivetkit-shutdown/use-engine-stop-thresholdfrom
05-12-todo_ktichen_sink_counter_latency
Draft

TODO: ktichen sink counter latency#5051
NathanFlurry wants to merge 1 commit into
rivetkit-shutdown/use-engine-stop-thresholdfrom
05-12-todo_ktichen_sink_counter_latency

Conversation

@NathanFlurry
Copy link
Copy Markdown
Member

Description

Please include a summary of the changes and the related issue. Please also include relevant motivation and context.

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update

How Has This Been Tested?

Please describe the tests that you ran to verify your changes.

Checklist:

  • My code follows the style guidelines of this project
  • I have performed a self-review of my code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes

@claude
Copy link
Copy Markdown
Contributor

claude Bot commented May 12, 2026

PR Review: counter-latency Rust Port + Kitchen-Sink Harness Improvements\n\n> Note: This is a DRAFT PR with an unfilled description template. The review covers the code as written.\n\n### Overview\n\nThis PR ports the TypeScript counter-latency benchmarking script to Rust, adds an echo handler to the counter actor for raw-WS RTT measurement, adds an on-sleep-remote-watch remote monitoring script, and makes several kitchen-sink Docker/build improvements. The Rust port adds meaningful capabilities: multi-threaded concurrency, colored histogram output, teed transcript logging, and separate rtt/concurrent/agent-concurrent modes.\n\n---\n\n### Bugs / Missing Files\n\n**tsup.server.config.ts is referenced but not present.\npackage.json adds pnpm exec tsup --config tsup.server.config.ts to the build script, and the Dockerfile CMD now runs dist-server/server.mjs. Neither the tsup config file nor the dist-server/ output appear in the diff, and the file does not exist on disk. The Docker image will produce a broken server entry point.\n\nRIVET_PROTOCOLS constant is declared but unused (ws.rs:11).\nThe gateway protocol header is hardcoded as "rivet, rivet_encoding.json" on line 18 instead of using the constant. The constant is dead code.\n\nactor_id is always None (rtt.rs:78, concurrent.rs throughout).\nBoth RTT and concurrent workers hardcode let actor_id: Option<String> = None; and never populate it. format_actor() always returns an empty string, so actor IDs are never visible in output. This looks like unfinished wire-up.\n\nDuplicate base36 implementation.\nrtt.rs:162 defines fn base36 and concurrent.rs defines an identical fn to_base36. These should be consolidated into a shared module.\n\n---\n\n### CLAUDE.md Violations\n\nMutex<HashMap<...>> is banned project-wide.\n- stats.rs:52: worker_health: Mutex<HashMap<u32, WorkerHealth>> uses std::sync::Mutex<HashMap>. CLAUDE.md explicitly bans both Mutex<HashMap> patterns and std::sync::Mutex in async Rust. Use scc::HashMap instead.\n- concurrent.rs (AgentWorkload): Arc<tokio::sync::Mutex<HashMap<String, Instant>>> for pending_inference_sends. Same policy applies - use scc::HashMap.\n\nstd::sync::Mutex in async context (stats.rs).\nCLAUDE.md says async Rust code defaults to tokio::sync::Mutex; std::sync::Mutex is reserved for forced-sync contexts (Drop, FFI, sync traits). State is used from async tasks. Switch to scc::HashMap to eliminate the mutex entirely, or parking_lot::Mutex if locking must remain sync.\n\n---\n\n### Design Issues\n\nTwo WorkloadCtx objects are created for concurrent mode.\nmain.rs::run() creates a WorkloadCtx and passes it to install_signal_handlers. Then run_concurrent_mode creates a second WorkloadCtx from the same values. They share the same Arc<State> so stopping propagates correctly, but the duplicate allocation is confusing. Pass the ctx into run_concurrent_mode instead of rebuilding it.\n\nConnect errors do not trigger reconnect (concurrent.rs:291-302).\nWhen open_raw_ws fails the worker returns immediately. A transient network blip during ramp-up permanently drops that concurrency slot with only a CONNECT-ERROR log line. Persistent load tests typically backoff and retry.\n\nACTOR_STOPPED_CLOSE_REASON = "hack_force_close" (args.rs:16).**\nThe name signals a workaround but gives no explanation. Rename to reflect the actual semantic or add a comment explaining what protocol behavior requires this exact string.\n\n---\n\n### Minor Issues\n\n- on-sleep-remote-watch.ts: createClient<any> bypasses type-safety. Prefer a typed client or at least a comment explaining why any is necessary.\n- counter.ts echo handler: websocket.readyState !== 1 is a magic number - prefer WebSocket.OPEN.\n- sigterm-sleep-probe.ts broadcast ordering: The new onSleepStarted broadcast to open websockets happens before the DB INSERT. A client receiving the event cannot immediately fetch proof rows. Swap the order or document the sequencing intent.\n- tee.rs:emit: Acquires stdout().lock() then the file mutex - nested lock ordering is worth a comment.\n- Comment style: File-level comments explain what each file does. CLAUDE.md asks comments to explain the why only when non-obvious.\n\n---\n\n### Positive Highlights\n\n- Dockerfile CMD change to node as PID 1 is correct - SIGTERM reaches the RivetKit handler without going through pnpm wrappers.\n- SKIP_WASM_BUILD=1 / SKIP_NAPI_BUILD=1 cleanly skips unnecessary native builds in Docker.\n- wait_for_echo in rtt.rs correctly skips Ping/Pong/Frame messages and surfaces Close frame code+reason in errors.\n- Per-run transcript in /tmp/ via tee.rs is useful for post-mortem analysis of long benchmark runs.\n- NAPI type fix (Promise<number | undefined | null> to Promise<number | null>) is a correct API surface tightening.

@NathanFlurry NathanFlurry force-pushed the 05-12-todo_ktichen_sink_counter_latency branch 2 times, most recently from 52d7da2 to be71196 Compare May 12, 2026 18:23
@NathanFlurry NathanFlurry force-pushed the 05-12-todo_ktichen_sink_counter_latency branch from be71196 to 537fd39 Compare May 12, 2026 19:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant