Conversation
…bV1 (#1049) Move the ~70 LOC of byte-identical lifecycle scaffolding (status, wait_until_ready, clean_trailing, zaino_db_handler_sleep, shutdown) and the open_or_create_db helper from db/v0.rs and db/v1.rs into a single pub(super) trait DbLifecycle and a free fn in the parent db.rs. DbV0 and DbV1 now impl DbLifecycle via four field getters; their DbCore::{status, shutdown} delegate to the trait. The duplicated tokio::select! shutdown/abort block and the three-way sleep-or-maintenance body now live in one place. DbBackend::{status, shutdown} in db.rs use fully-qualified DbCore:: paths to disambiguate the two traits now in scope on DbV0/DbV1. #1033 (notify_one race) preserved verbatim on DbLifecycle::shutdown — fixing it is now a single-call-site change. Parent: #862 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… race DbLifecycle::shutdown signals waiters with Notify::notify_one, which wakes at most one task and stores at most one permit. With N > 1 tasks awaiting the same shutdown_notify, N-1 are stranded. Adds a maximally narrow test in db.rs::shutdown that impls DbLifecycle on a minimal FakeDb, spawns N=3 tasks each registering interest via Notified::enable + Barrier, then asserts every waiter completes within 200ms after shutdown returns. As intended, this test uniquely fails among the 226 in-package tests (225 passed, 1 failed, 2 skipped). The fix — switching the signaling primitive to a wake-all mechanism (watch, flag + notify_waiters, CancellationToken) — will flip it green and is tracked in #1033. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
How to fix #1033, I chose tokio_util "option C" below. Analysis B (watch::channel) is the minimum-dependency choice. The pattern is already in use a few hundred C (tokio_util::sync::CancellationToken) is the semantic fit — purpose-built for exactly this scenario. |
Replaces Arc<Notify> + notify_one with tokio_util CancellationToken in
DbLifecycle. notify_one wakes at most one waiter and stores at most
one permit, so N>1 background tasks awaiting the same shutdown_notify
were stranded after shutdown. cancel() wakes every current waiter and
persists state for late subscribers; cancelled() handles the
late-poll case without an explicit register-before-wait dance.
Adds tokio-util = "0.7" to the workspace and zaino-state crate.
Trait change: shutdown_notify() -> &Arc<Notify> becomes cancel_token()
-> &CancellationToken. The two consumer sites (DbLifecycle::shutdown
notifier and zaino_db_handler_sleep waiter) and every Self {...} clone
literal across db/v0.rs, db/v1.rs, db/v1/write_core.rs, and
db/v1/compact_block.rs are updated to match.
Regression test from #1049 (db::shutdown::wakes_every_shutdown_waiter)
now passes — flips from the unique failure to a green invariant.
Closes #1033.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
@idky137 I have not carefully evaluated the claims in "C", I would appreciate your insights worker threads management at that layer. |
| let tmp = tempfile::tempdir().unwrap(); | ||
| let env = Arc::new( | ||
| lmdb::Environment::new() | ||
| .set_map_size(1 << 20) |
There was a problem hiding this comment.
@idky137 I don't understand why this is necessary.
|
utACK modulo one architectural concern for the follow-up discussion. This refactor looks fine for the current LMDB-backed However, one thing worth clearing up is that Not a blocker for this PR, but before this abstraction spreads further we may want to make a small adjustment so the generic lifecycle remains storage-agnostic, and LMDB-specific cleanup/sync behaviour is provided by the LMDB-backed implementations only (Enabling moving to a pure rust DB impl when one is available in the future). |
Fixes: #1049
Fixes: #1033
Move the ~70 LOC of byte-identical lifecycle scaffolding (status,
wait_until_ready, clean_trailing, zaino_db_handler_sleep, shutdown)
and the open_or_create_db helper from db/v0.rs and db/v1.rs into a
single pub(super) trait DbLifecycle and a free fn in the parent db.rs.
DbV0 and DbV1 now impl DbLifecycle via four field getters; their
DbCore::{status, shutdown} delegate to the trait. The duplicated
tokio::select! shutdown/abort block and the three-way
sleep-or-maintenance body now live in one place.
DbBackend::{status, shutdown} in db.rs use fully-qualified DbCore::
paths to disambiguate the two traits now in scope on DbV0/DbV1.
#1033 (notify_one race) preserved verbatim on DbLifecycle::shutdown —
fixing it is now a single-call-site change.
Parent: #862
Co-Authored-By: Claude Opus 4.7 (1M context) noreply@anthropic.com