Add Merkle storage enum to RepositoryConfig + merkle_store field#526
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughMakes Merkle store/transport acquisition fallible across the codebase, adds a configurable ChangesMerkle store acquisition & backend selection
Sequence Diagram(s)sequenceDiagram
participant Client
participant API as Client API
participant Repo as LocalRepository
participant Transport as MerkleTransport
participant Store as MerkleStore
participant Server
Client->>API: request create/download nodes
API->>Repo: need to pack/unpack nodes
Repo->>Repo: merkle_transport()? (may error)
alt transport OK
Repo->>Transport: pack_nodes / pack_all / unpack
Transport->>Store: read/write node blobs
Transport-->>Repo: packed bytes / unpack result
Repo-->>API: stream bytes (duplex)
API-->>Client: respond
else transport error
Repo-->>API: return transport initialization error
API-->>Client: error response
end
Note right of Server: Server uses same repo.merkle_transport()? path for downloads
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Suggested reviewers
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
|
NOTE: Stacked PR. Must merge #516 before merging. |
There was a problem hiding this comment.
Actionable comments posted: 4
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
crates/lib/src/api/client/tree.rs (1)
78-85:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winResolve
merkle_store()before starting the transfer.
merkle_store()can now fail locally, but in both paths that failure is discovered insidespawn_blockingafter the upload/download has already been initiated. AMerkleStoreNotInitializedrepo will still open the connection and can send or receive an empty/partial body before the join handle returns the real error. Hoisting the lookup ahead of the duplex/request setup would fail fast and avoid noisy remote-side unpack errors.💡 Suggested shape
- let repo = local_repo.clone(); + let merkle_store = local_repo.merkle_store()?; let pack_handle = tokio::task::spawn_blocking(move || -> Result<(), OxenError> { let mut sync_writer = SyncIoBridge::new(async_writer); - repo.merkle_store()? - .pack_nodes(&nodes, PackOptions::LegacyClientPush, &mut sync_writer)?; + merkle_store.pack_nodes(&nodes, PackOptions::LegacyClientPush, &mut sync_writer)?; Ok(()) });Apply the same hoist in
node_download_request.Also applies to: 367-372
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@crates/lib/src/api/client/tree.rs` around lines 78 - 85, The code currently calls repo.merkle_store() inside the closure passed to tokio::task::spawn_blocking (used by pack_nodes with SyncIoBridge), which defers repository-initialization errors until after the upload/download has started; instead, call repo.merkle_store() synchronously before creating the AsyncWriter/duplex/request and before spawn_blocking so any MerkleStoreNotInitialized (or other) error fails fast; then pass the obtained merkle_store (or the owned value it needs) into the closure and keep the existing use of pack_nodes/PackOptions::LegacyClientPush and SyncIoBridge; make the same change in node_download_request (the block around lines 367-372) so merkle_store() is resolved prior to initiating network I/O.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@crates/lib/src/config/repository_config.rs`:
- Around line 55-56: The config deserializer fails when older config.toml files
lack merkle_store_kind; add a serde default so missing keys use
MerkleStoreKind::default(). On the RepositoryConfig struct, annotate the
merkle_store_kind field with serde's default (e.g., #[serde(default)] or
#[serde(default = "MerkleStoreKind::default")]) so deserialization in
RepositoryConfig::from_file() falls back to MerkleStoreKind::default(); ensure
MerkleStoreKind implements Default if it doesn't already.
In `@crates/lib/src/core/v_latest/index/commit_merkle_tree.rs`:
- Around line 207-210: The repo currently swallows all errors from
CommitMerkleTreeLatest::read_from_path (in crates/lib/src/repositories/tree.rs
around the 214-220 handling) and returns Ok(None), which hides
initialization/errors like MerkleStoreNotInitialized; change the error handling
so that read_from_path's result is pattern-matched and only map a structured
NotFound variant to Ok(None) while propagating other errors (including
MerkleStoreNotInitialized) as Err(...) back to callers (so callers like
crates/server/src/controllers/tree.rs and get_file_by_path receive the real
error). Apply the same fix for the other read sites you noted (the blocks around
the ranges 232-235, 258-261, 282-285, 314-317, 343-346, 375-378) so only genuine
"not found" is converted to Ok(None) and all other errors are returned.
In `@crates/lib/src/model/repository/local_repository.rs`:
- Around line 127-131: The cached FileBackend captures is_vfs at creation which
can become stale when LocalRepository.set_vfs() flips VFS mode; update the code
so the backend is rebuilt when vfs changes (or remove is_vfs from the stored
backend). Specifically, modify LocalRepository (the place creating the
Box<FileBackend> for MerkleStoreKind::File) to either (a) check current
config.vfs in methods like unpack() and recreate the backend if it differs from
the cached backend's mode, or (b) stop storing is_vfs inside FileBackend and
derive vfs at call time from repository state; ensure set_vfs() invalidates or
replaces the cached store so subsequent calls to unpack() use the correct VFS
behavior.
- Around line 117-125: The code currently reads the on-disk config in the None
branch by calling RepositoryConfig::from_file; instead, when callers (e.g.,
load_merkle_store) expect defaults, avoid hitting disk and use an in-memory
default config. Replace the None branch that sets initialized_config =
RepositoryConfig::from_file(...) with creating a default RepositoryConfig (e.g.,
RepositoryConfig::default() or a RepositoryConfig::from_defaults() helper) and
return a reference to that, so RepositoryConfig::from_file is only used when an
explicit config is provided; update any tests if a helper constructor is added.
Ensure references to symbols: RepositoryConfig, RepositoryConfig::from_file,
initialized_config, load_merkle_store,
LocalRepository::new/new_from_version/from_view/from_remote are handled
accordingly.
---
Outside diff comments:
In `@crates/lib/src/api/client/tree.rs`:
- Around line 78-85: The code currently calls repo.merkle_store() inside the
closure passed to tokio::task::spawn_blocking (used by pack_nodes with
SyncIoBridge), which defers repository-initialization errors until after the
upload/download has started; instead, call repo.merkle_store() synchronously
before creating the AsyncWriter/duplex/request and before spawn_blocking so any
MerkleStoreNotInitialized (or other) error fails fast; then pass the obtained
merkle_store (or the owned value it needs) into the closure and keep the
existing use of pack_nodes/PackOptions::LegacyClientPush and SyncIoBridge; make
the same change in node_download_request (the block around lines 367-372) so
merkle_store() is resolved prior to initiating network I/O.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 09827646-91d7-4980-bd7f-697e8d3dbc03
📒 Files selected for processing (15)
crates/lib/src/api/client/tree.rscrates/lib/src/command/migrate/m20250111083535_add_child_counts_to_nodes.rscrates/lib/src/config/repository_config.rscrates/lib/src/core/db/merkle_node/file_backend.rscrates/lib/src/core/v_latest/commits.rscrates/lib/src/core/v_latest/entries.rscrates/lib/src/core/v_latest/index/commit_merkle_tree.rscrates/lib/src/core/v_latest/merge.rscrates/lib/src/error.rscrates/lib/src/model/merkle_tree.rscrates/lib/src/model/merkle_tree/node/merkle_tree_node.rscrates/lib/src/model/repository/local_repository.rscrates/lib/src/repositories/commits/commit_writer.rscrates/lib/src/repositories/tree.rscrates/server/src/controllers/tree.rs
8d7491c to
d70ecc5
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (4)
crates/lib/src/model/repository/local_repository.rs (2)
127-131:⚠️ Potential issue | 🟠 MajorThe cached
FileBackendstill snapshotsvfsonce.
is_vfsis captured here, but the repository now reuses this backend across calls. Ifvfschanges later, the cached store keeps the old mode and subsequent Merkle transport/store operations can take the wrong path.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@crates/lib/src/model/repository/local_repository.rs` around lines 127 - 131, The cached FileBackend captures is_vfs once (constructed via config.vfs.unwrap_or(false)) but the repository reuses that backend across calls, so changes to vfs later are ignored; update the caching logic in local_repository.rs so the cache key includes the vfs mode (e.g., include config.vfs.unwrap_or(false) when deciding to reuse the backend) or avoid reusing a single Arc<FileBackend> by constructing a new FileBackend when the vfs flag differs; ensure the creation site that currently matches MerkleStoreKind::File uses the vfs-aware key or fresh backend rather than the existing cached instance.
117-123:⚠️ Potential issue | 🔴 CriticalDon't hit disk when the caller asked for default Merkle-store settings.
LocalRepository::new,new_from_version,from_view, andfrom_remoteall passNonehere. This branch now eagerly reads.oxen/config.toml, which breaks those constructors before the repo is initialized and contradicts the contract on Line 206.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@crates/lib/src/model/repository/local_repository.rs` around lines 117 - 123, The code branch that handles config == None must not eagerly read .oxen/config.toml; instead of calling RepositoryConfig::from_file(util::fs::config_filepath(&repo_path)) create a default RepositoryConfig in memory and use that (e.g. set initialized_config = RepositoryConfig::default() or the repo's default merkle-store config) so LocalRepository::new, new_from_version, from_view, and from_remote don't hit disk; only call RepositoryConfig::from_file when the caller provides an explicit file-backed config. Use the existing symbols (initialized_config, config, repo_path, RepositoryConfig::from_file) to locate and replace the file-read branch with a default-initialization branch.crates/lib/src/core/v_latest/index/commit_merkle_tree.rs (1)
207-210:⚠️ Potential issue | 🟠 MajorThese new
?guards are still being flattened intoOk(None)higher up.The existence check now correctly propagates
MerkleStoreNotInitialized, butread_from_path_maybe()and the tree wrappers still coerce any error into “missing node.” That turns initialization failures back into false not-found results for callers.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@crates/lib/src/core/v_latest/index/commit_merkle_tree.rs` around lines 207 - 210, The existence check now returns errors (e.g., MerkleStoreNotInitialized) but higher-level callers (read_from_path_maybe and the tree wrapper methods) still swallow any error and return Ok(None); change those callers so they no longer coerce all errors into “missing node” — inspect the result of repo.merkle_store()?.exists(hash)? in read_from_path_maybe and in the tree wrapper functions and propagate Err(e) upward for non-not-found errors (or specifically propagate MerkleStoreNotInitialized), only treating an explicit NotFound result as Ok(None); update the error handling paths in read_from_path_maybe and the tree wrapper callers to return the Err from exists/read operations instead of coercing to Ok(None).crates/lib/src/config/repository_config.rs (1)
55-56:⚠️ Potential issue | 🟠 MajorAdd a serde default here for older
config.tomlfiles.This field is now required during deserialization, so repositories created before
merkle_store_kindexisted will fail to load.RepositoryConfig::from_file()should fall back toMerkleStoreKind::default()when the key is absent.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@crates/lib/src/config/repository_config.rs` around lines 55 - 56, Add a serde default for the merkle_store_kind field so old config.toml files that omit it will deserialize using MerkleStoreKind::default(); specifically, annotate the merkle_store_kind field with #[serde(default)] in the RepositoryConfig struct and ensure MerkleStoreKind implements Default (so Default::default() is used), which will make RepositoryConfig::from_file() fall back to the default when the key is absent.
🧹 Nitpick comments (1)
crates/lib/src/model/merkle_tree/node/merkle_tree_node.rs (1)
79-87: ⚡ Quick winReuse the already acquired store in child lookup.
Line 87 reacquires the store even though
storeis already available from Line 79. Reusing it avoids redundant lookup and keeps the existence check/read on the same handle.♻️ Proposed fix
- repo.merkle_store()?.get_children(hash) + store.get_children(hash)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@crates/lib/src/model/merkle_tree/node/merkle_tree_node.rs` around lines 79 - 87, The code currently calls repo.merkle_store() twice; reuse the previously obtained store variable instead of reacquiring it to avoid redundant lookups and ensure the exists() check and subsequent read use the same handle—replace the final repo.merkle_store()?.get_children(hash) call with store.get_children(hash) and keep the prior exists call using store.exists(hash)? and the same hash variable.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@crates/lib/src/model/repository/local_repository.rs`:
- Around line 61-64: Add the serde default attribute to the LocalRepository
struct's merkle_store_kind field so older serialized payloads can deserialize
when that field is missing: annotate the merkle_store_kind field with
#[serde(default)] (MerkleStoreKind implements Default) so deserialization falls
back to MerkleStoreKind::File when the field is absent.
---
Duplicate comments:
In `@crates/lib/src/config/repository_config.rs`:
- Around line 55-56: Add a serde default for the merkle_store_kind field so old
config.toml files that omit it will deserialize using
MerkleStoreKind::default(); specifically, annotate the merkle_store_kind field
with #[serde(default)] in the RepositoryConfig struct and ensure MerkleStoreKind
implements Default (so Default::default() is used), which will make
RepositoryConfig::from_file() fall back to the default when the key is absent.
In `@crates/lib/src/core/v_latest/index/commit_merkle_tree.rs`:
- Around line 207-210: The existence check now returns errors (e.g.,
MerkleStoreNotInitialized) but higher-level callers (read_from_path_maybe and
the tree wrapper methods) still swallow any error and return Ok(None); change
those callers so they no longer coerce all errors into “missing node” — inspect
the result of repo.merkle_store()?.exists(hash)? in read_from_path_maybe and in
the tree wrapper functions and propagate Err(e) upward for non-not-found errors
(or specifically propagate MerkleStoreNotInitialized), only treating an explicit
NotFound result as Ok(None); update the error handling paths in
read_from_path_maybe and the tree wrapper callers to return the Err from
exists/read operations instead of coercing to Ok(None).
In `@crates/lib/src/model/repository/local_repository.rs`:
- Around line 127-131: The cached FileBackend captures is_vfs once (constructed
via config.vfs.unwrap_or(false)) but the repository reuses that backend across
calls, so changes to vfs later are ignored; update the caching logic in
local_repository.rs so the cache key includes the vfs mode (e.g., include
config.vfs.unwrap_or(false) when deciding to reuse the backend) or avoid reusing
a single Arc<FileBackend> by constructing a new FileBackend when the vfs flag
differs; ensure the creation site that currently matches MerkleStoreKind::File
uses the vfs-aware key or fresh backend rather than the existing cached
instance.
- Around line 117-123: The code branch that handles config == None must not
eagerly read .oxen/config.toml; instead of calling
RepositoryConfig::from_file(util::fs::config_filepath(&repo_path)) create a
default RepositoryConfig in memory and use that (e.g. set initialized_config =
RepositoryConfig::default() or the repo's default merkle-store config) so
LocalRepository::new, new_from_version, from_view, and from_remote don't hit
disk; only call RepositoryConfig::from_file when the caller provides an explicit
file-backed config. Use the existing symbols (initialized_config, config,
repo_path, RepositoryConfig::from_file) to locate and replace the file-read
branch with a default-initialization branch.
---
Nitpick comments:
In `@crates/lib/src/model/merkle_tree/node/merkle_tree_node.rs`:
- Around line 79-87: The code currently calls repo.merkle_store() twice; reuse
the previously obtained store variable instead of reacquiring it to avoid
redundant lookups and ensure the exists() check and subsequent read use the same
handle—replace the final repo.merkle_store()?.get_children(hash) call with
store.get_children(hash) and keep the prior exists call using
store.exists(hash)? and the same hash variable.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 1894f6db-3835-4507-8b06-d039917f8ea9
📒 Files selected for processing (15)
crates/lib/src/api/client/tree.rscrates/lib/src/command/migrate/m20250111083535_add_child_counts_to_nodes.rscrates/lib/src/config/repository_config.rscrates/lib/src/core/db/merkle_node/file_backend.rscrates/lib/src/core/v_latest/commits.rscrates/lib/src/core/v_latest/entries.rscrates/lib/src/core/v_latest/index/commit_merkle_tree.rscrates/lib/src/core/v_latest/merge.rscrates/lib/src/error.rscrates/lib/src/model/merkle_tree.rscrates/lib/src/model/merkle_tree/node/merkle_tree_node.rscrates/lib/src/model/repository/local_repository.rscrates/lib/src/repositories/commits/commit_writer.rscrates/lib/src/repositories/tree.rscrates/server/src/controllers/tree.rs
🚧 Files skipped from review as they are similar to previous changes (3)
- crates/lib/src/command/migrate/m20250111083535_add_child_counts_to_nodes.rs
- crates/lib/src/error.rs
- crates/lib/src/core/db/merkle_node/file_backend.rs
f050933 to
e927639
Compare
There was a problem hiding this comment.
🧹 Nitpick comments (1)
crates/lib/src/api/client/tree.rs (1)
83-87: ⚡ Quick winFail fast on merkle transport initialization before starting stream/network work.
merkle_transport()?is resolved insidespawn_blocking, so aMerkleStoreNotInitializederror is raised only after request setup/streaming has already begun. Resolve transport first, then move it into the blocking closure to short-circuit earlier.Proposed refactor
pub async fn create_nodes( local_repo: &LocalRepository, @@ ) -> Result<(), OxenError> { + let transport = local_repo.merkle_transport()?; let n = nodes.len(); @@ - let repo = local_repo.clone(); + let transport_for_pack = Arc::clone(&transport); let pack_handle = tokio::task::spawn_blocking(move || -> Result<(), OxenError> { let mut sync_writer = SyncIoBridge::new(async_writer); @@ - repo.merkle_transport()?.pack_nodes( + transport_for_pack.pack_nodes( &nodes, PackOptions::LegacyClientPush, &mut sync_writer, )?; Ok(()) }); @@ async fn node_download_request( local_repo: &LocalRepository, url: impl AsRef<str>, ) -> Result<(), OxenError> { + let transport = local_repo.merkle_transport()?; let url = url.as_ref(); @@ - let repo = local_repo.clone(); + let transport_for_unpack = Arc::clone(&transport); tokio::task::spawn_blocking(move || -> Result<(), OxenError> { @@ - repo.merkle_transport()? - .unpack(&mut sync_reader, UnpackOptions::Overwrite)?; + transport_for_unpack.unpack(&mut sync_reader, UnpackOptions::Overwrite)?; Ok(()) })Also applies to: 373-374
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@crates/lib/src/api/client/tree.rs` around lines 83 - 87, The code currently calls repo.merkle_transport()? inside the closure spawned with spawn_blocking, which delays raising MerkleStoreNotInitialized until after stream/network work begins; instead resolve and assign the transport before starting the spawn_blocking (e.g., let transport = repo.merkle_transport()?;), then move that transport into the blocking closure and call transport.pack_nodes(&nodes, PackOptions::LegacyClientPush, &mut sync_writer) there; apply the same change to the other occurrence around lines 373-374 so transport initialization happens prior to any streaming/setup.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@crates/lib/src/api/client/tree.rs`:
- Around line 83-87: The code currently calls repo.merkle_transport()? inside
the closure spawned with spawn_blocking, which delays raising
MerkleStoreNotInitialized until after stream/network work begins; instead
resolve and assign the transport before starting the spawn_blocking (e.g., let
transport = repo.merkle_transport()?;), then move that transport into the
blocking closure and call transport.pack_nodes(&nodes,
PackOptions::LegacyClientPush, &mut sync_writer) there; apply the same change to
the other occurrence around lines 373-374 so transport initialization happens
prior to any streaming/setup.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 618af283-95c9-4830-9478-158f8a669083
📒 Files selected for processing (16)
crates/lib/src/api/client/internal_types.rscrates/lib/src/api/client/tree.rscrates/lib/src/api/client/workspaces/files.rscrates/lib/src/command/migrate/m20250111083535_add_child_counts_to_nodes.rscrates/lib/src/config/repository_config.rscrates/lib/src/core/db/merkle_node/file_backend.rscrates/lib/src/core/v_latest/commits.rscrates/lib/src/core/v_latest/entries.rscrates/lib/src/core/v_latest/index/commit_merkle_tree.rscrates/lib/src/core/v_latest/merge.rscrates/lib/src/model/merkle_tree.rscrates/lib/src/model/merkle_tree/node/merkle_tree_node.rscrates/lib/src/model/repository/local_repository.rscrates/lib/src/repositories/commits/commit_writer.rscrates/lib/src/repositories/tree.rscrates/server/src/controllers/tree.rs
🚧 Files skipped from review as they are similar to previous changes (7)
- crates/lib/src/core/v_latest/merge.rs
- crates/lib/src/repositories/commits/commit_writer.rs
- crates/lib/src/core/v_latest/entries.rs
- crates/lib/src/command/migrate/m20250111083535_add_child_counts_to_nodes.rs
- crates/lib/src/core/v_latest/commits.rs
- crates/lib/src/model/merkle_tree/node/merkle_tree_node.rs
- crates/lib/src/repositories/tree.rs
8c85936 to
4cbfa4f
Compare
There was a problem hiding this comment.
♻️ Duplicate comments (1)
crates/lib/src/config/repository_config.rs (1)
48-49:⚠️ Potential issue | 🟠 Major | ⚡ Quick winAdd serde default on
merkle_store_kindto keep oldconfig.tomlreadable.Line [49] adds a required field, but deserialization in
from_filewill fail for older repos where the key is missing unless this field has a serde default.Suggested patch
/// The type of Merkle tree node storage that the repository uses. + #[serde(default)] pub merkle_store_kind: MerkleStoreKind,#!/bin/bash # Verify whether repo fixtures/examples still have config.toml files without merkle_store_kind. # Expected: if any are reported, current deserialization is likely to fail for those files. fd -HI '^config\.toml$' | while read -r f; do if ! rg -n '^\s*merkle_store_kind\s*=' "$f" >/dev/null; then echo "Missing merkle_store_kind: $f" fi done🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@crates/lib/src/config/repository_config.rs` around lines 48 - 49, The new required field merkle_store_kind in the RepositoryConfig struct will break deserialization for old config.toml files; add a serde default so missing keys get a sensible value during deserialization by annotating the merkle_store_kind field with #[serde(default)] and implement or ensure a Default for MerkleStoreKind (or provide a function via #[serde(default = "default_merkle_store_kind")]) so RepositoryConfig::merkle_store_kind deserializes when the key is absent; update the MerkleStoreKind Default implementation or add a default function and run tests/fixtures to confirm old configs parse.
🧹 Nitpick comments (1)
crates/lib/src/model/merkle_tree/node/merkle_tree_node.rs (1)
79-87: ⚡ Quick winReuse the already acquired store on Line 87.
You acquire
let store = repo.merkle_store()?;on Line 79; callingrepo.merkle_store()?again is unnecessary. Use the existing binding instead.♻️ Suggested change
- repo.merkle_store()?.get_children(hash) + store.get_children(hash)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@crates/lib/src/model/merkle_tree/node/merkle_tree_node.rs` around lines 79 - 87, The code fetches the merkle store once into the local binding store (let store = repo.merkle_store()?), but then calls repo.merkle_store()? again when returning children; change the final call to reuse the existing binding by calling store.get_children(hash) instead of repo.merkle_store()?.get_children(hash) so you avoid an extra repo lookup and keep the earlier error handling consistent.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Duplicate comments:
In `@crates/lib/src/config/repository_config.rs`:
- Around line 48-49: The new required field merkle_store_kind in the
RepositoryConfig struct will break deserialization for old config.toml files;
add a serde default so missing keys get a sensible value during deserialization
by annotating the merkle_store_kind field with #[serde(default)] and implement
or ensure a Default for MerkleStoreKind (or provide a function via
#[serde(default = "default_merkle_store_kind")]) so
RepositoryConfig::merkle_store_kind deserializes when the key is absent; update
the MerkleStoreKind Default implementation or add a default function and run
tests/fixtures to confirm old configs parse.
---
Nitpick comments:
In `@crates/lib/src/model/merkle_tree/node/merkle_tree_node.rs`:
- Around line 79-87: The code fetches the merkle store once into the local
binding store (let store = repo.merkle_store()?), but then calls
repo.merkle_store()? again when returning children; change the final call to
reuse the existing binding by calling store.get_children(hash) instead of
repo.merkle_store()?.get_children(hash) so you avoid an extra repo lookup and
keep the earlier error handling consistent.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: b62122f8-b149-4ad7-965d-11f6c9878101
📒 Files selected for processing (16)
crates/lib/src/api/client/internal_types.rscrates/lib/src/api/client/tree.rscrates/lib/src/api/client/workspaces/files.rscrates/lib/src/command/migrate/m20250111083535_add_child_counts_to_nodes.rscrates/lib/src/config/repository_config.rscrates/lib/src/core/db/merkle_node/file_backend.rscrates/lib/src/core/v_latest/commits.rscrates/lib/src/core/v_latest/entries.rscrates/lib/src/core/v_latest/index/commit_merkle_tree.rscrates/lib/src/core/v_latest/merge.rscrates/lib/src/model/merkle_tree.rscrates/lib/src/model/merkle_tree/node/merkle_tree_node.rscrates/lib/src/model/repository/local_repository.rscrates/lib/src/repositories/commits/commit_writer.rscrates/lib/src/repositories/tree.rscrates/server/src/controllers/tree.rs
✅ Files skipped from review due to trivial changes (2)
- crates/lib/src/model/merkle_tree.rs
- crates/lib/src/core/v_latest/index/commit_merkle_tree.rs
🚧 Files skipped from review as they are similar to previous changes (6)
- crates/lib/src/repositories/commits/commit_writer.rs
- crates/lib/src/api/client/tree.rs
- crates/lib/src/api/client/internal_types.rs
- crates/lib/src/command/migrate/m20250111083535_add_child_counts_to_nodes.rs
- crates/lib/src/core/db/merkle_node/file_backend.rs
- crates/lib/src/model/repository/local_repository.rs
There was a problem hiding this comment.
♻️ Duplicate comments (1)
crates/lib/src/config/repository_config.rs (1)
48-49:⚠️ Potential issue | 🟠 Major | ⚡ Quick winMissing serde default breaks backward-compatible config loading
Line 49 introduces a required deserialized field, but without
#[serde(default)]olderconfig.tomlfiles (withoutmerkle_store_kind) will fail infrom_file()at Line 79. That contradicts the PR’s compatibility goal.Suggested fix
/// The type of Merkle tree node storage that the repository uses. + #[serde(default)] pub merkle_store_kind: MerkleStoreKind,#!/bin/bash set -euo pipefail file="crates/lib/src/config/repository_config.rs" echo "Inspect RepositoryConfig field declaration:" nl -ba "$file" | sed -n '44,54p' echo echo "Check for serde(default) on fields in this file:" rg -n 'serde\(default' "$file" || true echo echo "Confirm deserialization path using toml::from_str:" rg -n 'toml::from_str' "$file"🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@crates/lib/src/config/repository_config.rs` around lines 48 - 49, RepositoryConfig's new required field merkle_store_kind breaks deserialization of older configs; update the RepositoryConfig definition to add #[serde(default)] on the merkle_store_kind field and ensure MerkleStoreKind has a stable Default (either impl Default for MerkleStoreKind or provide a function like default_merkle_store_kind and use #[serde(default = "default_merkle_store_kind")]) so toml::from_str in the from_file() path can load configs that omit this field; keep the default value consistent with the repository's intended legacy behavior.
🧹 Nitpick comments (1)
crates/lib/src/model/merkle_tree.rs (1)
36-42: 💤 Low valueLGTM —
Debugsupertrait addition is correct and object-safe.
Debugonly requiresfn fmt(&self, ...), sodyn TransportableMerkleStoreremains object-safe and the vtable is well-formed. The transitivity (Arc<dyn TransportableMerkleStore>: Debug) that unblocksLocalRepository'sderive(Debug)follows correctly. The?Sized + Debugin the blanket impl is the standard Rust pattern for this use case.One note: if
TransportableMerkleStoreis ever re-exported and treated as a stable API, addingDebugas a supertrait is a semver-breaking change for downstream implementors. Sealing the trait (or documenting it as internal) would future-proof this.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@crates/lib/src/model/merkle_tree.rs` around lines 36 - 42, The new Debug supertrait on TransportableMerkleStore is object-safe but is a semver-breaking change for downstream implementors; to future-proof it, add the standard Rust "sealed" pattern: declare a private module-level trait (e.g., Sealed) and require TransportableMerkleStore: Sealed, then provide the crate-private blanket impl of Sealed for the same set of types (T: MerkleStore + MerkleTransport + ?Sized + Debug) so only internal types can implement TransportableMerkleStore; reference TransportableMerkleStore, MerkleStore, MerkleTransport, and LocalRepository when making this change and ensure the existing public blanket impl impl<T: MerkleStore + MerkleTransport + ?Sized + Debug> TransportableMerkleStore for T {} remains consistent with the sealing.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Duplicate comments:
In `@crates/lib/src/config/repository_config.rs`:
- Around line 48-49: RepositoryConfig's new required field merkle_store_kind
breaks deserialization of older configs; update the RepositoryConfig definition
to add #[serde(default)] on the merkle_store_kind field and ensure
MerkleStoreKind has a stable Default (either impl Default for MerkleStoreKind or
provide a function like default_merkle_store_kind and use #[serde(default =
"default_merkle_store_kind")]) so toml::from_str in the from_file() path can
load configs that omit this field; keep the default value consistent with the
repository's intended legacy behavior.
---
Nitpick comments:
In `@crates/lib/src/model/merkle_tree.rs`:
- Around line 36-42: The new Debug supertrait on TransportableMerkleStore is
object-safe but is a semver-breaking change for downstream implementors; to
future-proof it, add the standard Rust "sealed" pattern: declare a private
module-level trait (e.g., Sealed) and require TransportableMerkleStore: Sealed,
then provide the crate-private blanket impl of Sealed for the same set of types
(T: MerkleStore + MerkleTransport + ?Sized + Debug) so only internal types can
implement TransportableMerkleStore; reference TransportableMerkleStore,
MerkleStore, MerkleTransport, and LocalRepository when making this change and
ensure the existing public blanket impl impl<T: MerkleStore + MerkleTransport +
?Sized + Debug> TransportableMerkleStore for T {} remains consistent with the
sealing.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 55c43f61-3256-4e91-bfdb-12ff3df5d727
📒 Files selected for processing (16)
crates/lib/src/api/client/internal_types.rscrates/lib/src/api/client/tree.rscrates/lib/src/api/client/workspaces/files.rscrates/lib/src/command/migrate/m20250111083535_add_child_counts_to_nodes.rscrates/lib/src/config/repository_config.rscrates/lib/src/core/db/merkle_node/file_backend.rscrates/lib/src/core/v_latest/commits.rscrates/lib/src/core/v_latest/entries.rscrates/lib/src/core/v_latest/index/commit_merkle_tree.rscrates/lib/src/core/v_latest/merge.rscrates/lib/src/model/merkle_tree.rscrates/lib/src/model/merkle_tree/node/merkle_tree_node.rscrates/lib/src/model/repository/local_repository.rscrates/lib/src/repositories/commits/commit_writer.rscrates/lib/src/repositories/tree.rscrates/server/src/controllers/tree.rs
✅ Files skipped from review due to trivial changes (2)
- crates/lib/src/api/client/internal_types.rs
- crates/lib/src/core/db/merkle_node/file_backend.rs
🚧 Files skipped from review as they are similar to previous changes (9)
- crates/lib/src/core/v_latest/merge.rs
- crates/lib/src/core/v_latest/entries.rs
- crates/lib/src/repositories/commits/commit_writer.rs
- crates/lib/src/model/merkle_tree/node/merkle_tree_node.rs
- crates/lib/src/command/migrate/m20250111083535_add_child_counts_to_nodes.rs
- crates/lib/src/core/v_latest/commits.rs
- crates/lib/src/repositories/tree.rs
- crates/lib/src/core/v_latest/index/commit_merkle_tree.rs
- crates/lib/src/model/repository/local_repository.rs
66a369d to
5c4ad32
Compare
225cb0c to
7699775
Compare
5c4ad32 to
9e10a92
Compare
7699775 to
5507f6b
Compare
9e10a92 to
f57471e
Compare
5507f6b to
b6022bd
Compare
fd61e09 to
64e1254
Compare
b6022bd to
02d9373
Compare
64e1254 to
4886e86
Compare
02d9373 to
4bd68a4
Compare
4886e86 to
3d3b1fd
Compare
4bd68a4 to
851557c
Compare
3d3b1fd to
703036a
Compare
851557c to
60d4651
Compare
703036a to
656fb2a
Compare
9c92dd4 to
d5a2e13
Compare
656fb2a to
01fa5de
Compare
There was a problem hiding this comment.
Meta question: could we maybe get rid of the "vfs" feature? There's a lot of logic plumbed through to support it, but a few things make me wonder:
- It can only be set by manually editing the config toml
- All it really does is unpack the merkle tarball to the system
/tmpand then copy it over, rather than unpacking directly to the destination. It doesn't touch any other write paths, not even version file tarball unpacking. So it's a little hard to imagine it being load-bearing. - In #181 where it first showed up, it's framed as a bug fix, though the actual problem and why this would address it aren't really discussed in the PR. Could it have been an AI assumption that slipped through?
Update: I found a thread with a customer where setting this flag via oxen clone --vfs is claimed to have fixed a problem. But the customer also reported immediately running into another problem. 🤷🏻♂️ Sounds like we'd need to do some testing with NFS to really figure out if this is necessary or not. If it is, then we should probably aute-detect (or auto-fallback) to this behavior. If it isn't, it'd be nice to remove the logic. In any case, it's all stuff to figure out in the future.
d5a2e13 to
be546fc
Compare
01fa5de to
5a32743
Compare
be546fc to
e7956bc
Compare
5a32743 to
d13518d
Compare
0311615 to
6ad4dad
Compare
d13518d to
0250547
Compare
Adds a new `MerkleStoreKind` enum to the `RepositoryConfig`, which dictates the type of physical storage that the `LocalRepository` uses for storing and retrieving Merkle tree nodes. Defaults to the existing custom file format (aka `FileBackend`) so it is backwards compatible with all existing repositories. Makes the `TransportableMerkleStore` a field of the `LocalRepository` so that this is only loaded once at construction time and reused. Also added the `MerkleStoreKind` uses at load time as another field. Since there's `Serialization` & `Deserialization` derives for `LocalRepository`, the field has to be `Option` to provide a default value since it cannot be serialized. This leads to a brittle design where a `LocalRepository` can be made without being correctly initialized, hence the new `MerkleStoreNotInitialized` error variant for `OxenError`. A follow up refactor is necessary to remove these unnecessary serializaton derives and migrate the HTTP responses to only use `*View` structs (i.e. `WorkspaceView`, ParsedResourceView`, etc.) As a result of this initialization problem, `LocalRepository::merkle_store` has been changed to return a `Result.` It returns a `&dyn MerkleStore` as the borrow on the `LocalRepository` means we can return a borrow on the inner state without needing to increment the refcount. Additionally, uses of the `MerkleTransport` trait from the `LocalRepository` now go through a new method: `merkle_transport()`. This borrows the internal `merkle_store` `Arc` and coerces the fat pointer to a `&dyn MerkleTransport`. The original `merkle_store()` method has been changed to do the same thing, returning a `&dyn MerkleStore`. Finally, removes the `set_vfs` method. Merkle store backends need a consistent `vfs` value and allowing mutation to make this go out of sync. Any places that needed to use `set_vfs` have been updated to use a modified constructor that sets the right `vfs` value.
0250547 to
26ecf04
Compare
|
Good question on VFS. It needs to be plumbed through for the LMDB backend because LMDB doesn't support VFS. (So if it's set and a user tries to use that backend, it will error out at creation time). I think the whole feature should be reexamined and we should potentially redesign how we support and work with VFSes. The |
Adds a new
MerkleStoreKindenum to theRepositoryConfig, which dictatesthe type of physical storage that the
LocalRepositoryuses for storingand retrieving Merkle tree nodes. Defaults to the existing custom file format
(aka
FileBackend) so it is backwards compatible with all existing repositories.Makes the
TransportableMerkleStorea field of theLocalRepositoryso thatthis is only loaded once at construction time and reused. Also added the
MerkleStoreKinduses at load time as another field. Since there'sSerialization&Deserializationderives forLocalRepository, the fieldhas to be
Optionto provide a default value since it cannot be serialized.This leads to a brittle design where a
LocalRepositorycan be made withoutbeing correctly initialized, hence the new
MerkleStoreNotInitializederrorvariant for
OxenError. A follow up refactor is necessary to remove theseunnecessary serializaton derives and migrate the HTTP responses to only use
*Viewstructs (i.e.WorkspaceView, ParsedResourceView`, etc.)As a result of this initialization problem,
LocalRepository::merkle_storehas been changed to return a
Result.It returns a&dyn MerkleStoreasthe borrow on the
LocalRepositorymeans we can return a borrow on theinner state without needing to increment the refcount.
Additionally, uses of the
MerkleTransporttrait from theLocalRepositorynow go through a new method:
merkle_transport(). This borrows the internalmerkle_storeArcand coerces the fat pointer to a&dyn MerkleTransport.The original
merkle_store()method has been changed to do the same thing,returning a
&dyn MerkleStore.Finally, removes the
set_vfsmethod. Merkle store backends need a consistentvfsvalue and allowing mutation to make this go out of sync. Any placesthat needed to use
set_vfshave been updated to use a modified constructorthat sets the right
vfsvalue.