Skip to content

Add Merkle storage enum to RepositoryConfig + merkle_store field#526

Merged
malcolmgreaves merged 1 commit into
mainfrom
mg/config_merkle_store
May 15, 2026
Merged

Add Merkle storage enum to RepositoryConfig + merkle_store field#526
malcolmgreaves merged 1 commit into
mainfrom
mg/config_merkle_store

Conversation

@malcolmgreaves
Copy link
Copy Markdown
Collaborator

@malcolmgreaves malcolmgreaves commented May 6, 2026

Adds a new MerkleStoreKind enum to the RepositoryConfig, which dictates
the type of physical storage that the LocalRepository uses for storing
and retrieving Merkle tree nodes. Defaults to the existing custom file format
(aka FileBackend) so it is backwards compatible with all existing repositories.

Makes the TransportableMerkleStore a field of the LocalRepository so that
this is only loaded once at construction time and reused. Also added the
MerkleStoreKind uses at load time as another field. Since there's
Serialization & Deserialization derives for LocalRepository, the field
has to be Option to provide a default value since it cannot be serialized.
This leads to a brittle design where a LocalRepository can be made without
being correctly initialized, hence the new MerkleStoreNotInitialized error
variant for OxenError. A follow up refactor is necessary to remove these
unnecessary serializaton derives and migrate the HTTP responses to only use
*View structs (i.e. WorkspaceView, ParsedResourceView`, etc.)

As a result of this initialization problem, LocalRepository::merkle_store
has been changed to return a Result. It returns a &dyn MerkleStore as
the borrow on the LocalRepository means we can return a borrow on the
inner state without needing to increment the refcount.

Additionally, uses of the MerkleTransport trait from the LocalRepository
now go through a new method: merkle_transport(). This borrows the internal
merkle_store Arc and coerces the fat pointer to a &dyn MerkleTransport.
The original merkle_store() method has been changed to do the same thing,
returning a &dyn MerkleStore.

Finally, removes the set_vfs method. Merkle store backends need a consistent
vfs value and allowing mutation to make this go out of sync. Any places
that needed to use set_vfs have been updated to use a modified constructor
that sets the right vfs value.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 6, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Makes Merkle store/transport acquisition fallible across the codebase, adds a configurable MerkleStoreKind persisted in RepositoryConfig, refactors LocalRepository to initialize and cache the chosen backend, switches packing/unpacking to use the transport, and updates call sites and tests to propagate initialization/transport errors.

Changes

Merkle store acquisition & backend selection

Layer / File(s) Summary
Data Shape & Config
crates/lib/src/config/repository_config.rs
Adds public MerkleStoreKind enum (serde lowercase, File variant, Default) and utoipa::ToSchema.
Trait Bound
crates/lib/src/model/merkle_tree.rs
TransportableMerkleStore now requires Debug in addition to MerkleStore and MerkleTransport.
Repository Model
crates/lib/src/model/repository/local_repository.rs
LocalRepository gains merkle_store_kind and cached merkle_store; constructors call load_merkle_store to initialize backend choice. save() persists merkle_store_kind.
Accessors / API
crates/lib/src/model/repository/local_repository.rs, crates/lib/src/api/client/internal_types.rs
merkle_store(&self) and merkle_transport(&self) now return Result<&dyn ..., OxenError> (fallible borrowed access). LocalOrBase::Local changed to Box<LocalRepository>.
Core Call Sites
crates/lib/src/core/v_latest/*, crates/lib/src/repositories/*, crates/lib/src/model/merkle_tree/node/*
Many call sites change from infallible repo.merkle_store() to fallible repo.merkle_store()?, propagating initialization errors before sessions/operations (commits, merges, entries, index existence checks, node reads). create_initial_commit now also creates a per-commit dir_hash_db RocksDB and writes root dir hash.
Transport Usage & Client/Server Wiring
crates/lib/src/api/client/tree.rs, crates/server/src/controllers/tree.rs, crates/lib/src/repositories/tree.rs
Packing/unpacking now use repo.merkle_transport()? for pack_nodes, pack_all, and unpack, so transport acquisition errors propagate before packing/unpacking work.
Migrations & Tests
crates/lib/src/command/migrate/*.rs, crates/lib/src/core/db/merkle_node/file_backend.rs
Migration and many tests updated to acquire merkle store/transport fallibly (?) and to use transports for pack/unpack verification.
Workspace API
crates/lib/src/api/client/workspaces/files.rs
add_with_opts now constructs LocalOrBase::Local(Box::new(...)) when mapping expanded local paths.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant API as Client API
    participant Repo as LocalRepository
    participant Transport as MerkleTransport
    participant Store as MerkleStore
    participant Server

    Client->>API: request create/download nodes
    API->>Repo: need to pack/unpack nodes
    Repo->>Repo: merkle_transport()? (may error)
    alt transport OK
        Repo->>Transport: pack_nodes / pack_all / unpack
        Transport->>Store: read/write node blobs
        Transport-->>Repo: packed bytes / unpack result
        Repo-->>API: stream bytes (duplex)
        API-->>Client: respond
    else transport error
        Repo-->>API: return transport initialization error
        API-->>Client: error response
    end

    Note right of Server: Server uses same repo.merkle_transport()? path for downloads
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

  • Oxen-AI/Oxen#634: Modifies the same create_nodes implementation in src/lib/src/api/client/tree.rs.
  • Oxen-AI/Oxen#617: Touches commit/merkle child-loading code in commit_merkle_tree.rs and merkle_tree_node.rs.
  • Oxen-AI/Oxen#283: Related to LocalOrBase::Local and workspace upload refactors.

Suggested reviewers

  • jcelliott
  • rpschoenburg

"🐇
I boxed the repo, I taught it to say,
‘If stores won't open, I’ll fail right away.’
Pack and unpack now knock on the door,
Errors ring true — no secrets anymore,
Hooray for clearer paths and play! 🥕"

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarizes the main change: adding a MerkleStoreKind enum to RepositoryConfig and introducing the merkle_store field to LocalRepository.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Description check ✅ Passed The pull request description accurately relates to the changeset by explaining the core additions (MerkleStoreKind enum, merkle_store field initialization, new merkle_transport method) and discusses architectural implications.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch mg/config_merkle_store

Comment @coderabbitai help to get the list of available commands and usage tips.

@malcolmgreaves
Copy link
Copy Markdown
Collaborator Author

NOTE: Stacked PR. Must merge #516 before merging.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
crates/lib/src/api/client/tree.rs (1)

78-85: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Resolve merkle_store() before starting the transfer.

merkle_store() can now fail locally, but in both paths that failure is discovered inside spawn_blocking after the upload/download has already been initiated. A MerkleStoreNotInitialized repo will still open the connection and can send or receive an empty/partial body before the join handle returns the real error. Hoisting the lookup ahead of the duplex/request setup would fail fast and avoid noisy remote-side unpack errors.

💡 Suggested shape
-    let repo = local_repo.clone();
+    let merkle_store = local_repo.merkle_store()?;
     let pack_handle = tokio::task::spawn_blocking(move || -> Result<(), OxenError> {
         let mut sync_writer = SyncIoBridge::new(async_writer);
-        repo.merkle_store()?
-            .pack_nodes(&nodes, PackOptions::LegacyClientPush, &mut sync_writer)?;
+        merkle_store.pack_nodes(&nodes, PackOptions::LegacyClientPush, &mut sync_writer)?;
         Ok(())
     });

Apply the same hoist in node_download_request.

Also applies to: 367-372

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/lib/src/api/client/tree.rs` around lines 78 - 85, The code currently
calls repo.merkle_store() inside the closure passed to
tokio::task::spawn_blocking (used by pack_nodes with SyncIoBridge), which defers
repository-initialization errors until after the upload/download has started;
instead, call repo.merkle_store() synchronously before creating the
AsyncWriter/duplex/request and before spawn_blocking so any
MerkleStoreNotInitialized (or other) error fails fast; then pass the obtained
merkle_store (or the owned value it needs) into the closure and keep the
existing use of pack_nodes/PackOptions::LegacyClientPush and SyncIoBridge; make
the same change in node_download_request (the block around lines 367-372) so
merkle_store() is resolved prior to initiating network I/O.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@crates/lib/src/config/repository_config.rs`:
- Around line 55-56: The config deserializer fails when older config.toml files
lack merkle_store_kind; add a serde default so missing keys use
MerkleStoreKind::default(). On the RepositoryConfig struct, annotate the
merkle_store_kind field with serde's default (e.g., #[serde(default)] or
#[serde(default = "MerkleStoreKind::default")]) so deserialization in
RepositoryConfig::from_file() falls back to MerkleStoreKind::default(); ensure
MerkleStoreKind implements Default if it doesn't already.

In `@crates/lib/src/core/v_latest/index/commit_merkle_tree.rs`:
- Around line 207-210: The repo currently swallows all errors from
CommitMerkleTreeLatest::read_from_path (in crates/lib/src/repositories/tree.rs
around the 214-220 handling) and returns Ok(None), which hides
initialization/errors like MerkleStoreNotInitialized; change the error handling
so that read_from_path's result is pattern-matched and only map a structured
NotFound variant to Ok(None) while propagating other errors (including
MerkleStoreNotInitialized) as Err(...) back to callers (so callers like
crates/server/src/controllers/tree.rs and get_file_by_path receive the real
error). Apply the same fix for the other read sites you noted (the blocks around
the ranges 232-235, 258-261, 282-285, 314-317, 343-346, 375-378) so only genuine
"not found" is converted to Ok(None) and all other errors are returned.

In `@crates/lib/src/model/repository/local_repository.rs`:
- Around line 127-131: The cached FileBackend captures is_vfs at creation which
can become stale when LocalRepository.set_vfs() flips VFS mode; update the code
so the backend is rebuilt when vfs changes (or remove is_vfs from the stored
backend). Specifically, modify LocalRepository (the place creating the
Box<FileBackend> for MerkleStoreKind::File) to either (a) check current
config.vfs in methods like unpack() and recreate the backend if it differs from
the cached backend's mode, or (b) stop storing is_vfs inside FileBackend and
derive vfs at call time from repository state; ensure set_vfs() invalidates or
replaces the cached store so subsequent calls to unpack() use the correct VFS
behavior.
- Around line 117-125: The code currently reads the on-disk config in the None
branch by calling RepositoryConfig::from_file; instead, when callers (e.g.,
load_merkle_store) expect defaults, avoid hitting disk and use an in-memory
default config. Replace the None branch that sets initialized_config =
RepositoryConfig::from_file(...) with creating a default RepositoryConfig (e.g.,
RepositoryConfig::default() or a RepositoryConfig::from_defaults() helper) and
return a reference to that, so RepositoryConfig::from_file is only used when an
explicit config is provided; update any tests if a helper constructor is added.
Ensure references to symbols: RepositoryConfig, RepositoryConfig::from_file,
initialized_config, load_merkle_store,
LocalRepository::new/new_from_version/from_view/from_remote are handled
accordingly.

---

Outside diff comments:
In `@crates/lib/src/api/client/tree.rs`:
- Around line 78-85: The code currently calls repo.merkle_store() inside the
closure passed to tokio::task::spawn_blocking (used by pack_nodes with
SyncIoBridge), which defers repository-initialization errors until after the
upload/download has started; instead, call repo.merkle_store() synchronously
before creating the AsyncWriter/duplex/request and before spawn_blocking so any
MerkleStoreNotInitialized (or other) error fails fast; then pass the obtained
merkle_store (or the owned value it needs) into the closure and keep the
existing use of pack_nodes/PackOptions::LegacyClientPush and SyncIoBridge; make
the same change in node_download_request (the block around lines 367-372) so
merkle_store() is resolved prior to initiating network I/O.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 09827646-91d7-4980-bd7f-697e8d3dbc03

📥 Commits

Reviewing files that changed from the base of the PR and between 225cb0c and 8d7491c.

📒 Files selected for processing (15)
  • crates/lib/src/api/client/tree.rs
  • crates/lib/src/command/migrate/m20250111083535_add_child_counts_to_nodes.rs
  • crates/lib/src/config/repository_config.rs
  • crates/lib/src/core/db/merkle_node/file_backend.rs
  • crates/lib/src/core/v_latest/commits.rs
  • crates/lib/src/core/v_latest/entries.rs
  • crates/lib/src/core/v_latest/index/commit_merkle_tree.rs
  • crates/lib/src/core/v_latest/merge.rs
  • crates/lib/src/error.rs
  • crates/lib/src/model/merkle_tree.rs
  • crates/lib/src/model/merkle_tree/node/merkle_tree_node.rs
  • crates/lib/src/model/repository/local_repository.rs
  • crates/lib/src/repositories/commits/commit_writer.rs
  • crates/lib/src/repositories/tree.rs
  • crates/server/src/controllers/tree.rs

Comment thread crates/lib/src/config/repository_config.rs
Comment thread crates/lib/src/core/v_latest/index/commit_merkle_tree.rs
Comment thread crates/lib/src/model/repository/local_repository.rs Outdated
Comment thread crates/lib/src/model/repository/local_repository.rs Outdated
@malcolmgreaves malcolmgreaves force-pushed the mg/config_merkle_store branch from 8d7491c to d70ecc5 Compare May 7, 2026 01:45
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (4)
crates/lib/src/model/repository/local_repository.rs (2)

127-131: ⚠️ Potential issue | 🟠 Major

The cached FileBackend still snapshots vfs once.

is_vfs is captured here, but the repository now reuses this backend across calls. If vfs changes later, the cached store keeps the old mode and subsequent Merkle transport/store operations can take the wrong path.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/lib/src/model/repository/local_repository.rs` around lines 127 - 131,
The cached FileBackend captures is_vfs once (constructed via
config.vfs.unwrap_or(false)) but the repository reuses that backend across
calls, so changes to vfs later are ignored; update the caching logic in
local_repository.rs so the cache key includes the vfs mode (e.g., include
config.vfs.unwrap_or(false) when deciding to reuse the backend) or avoid reusing
a single Arc<FileBackend> by constructing a new FileBackend when the vfs flag
differs; ensure the creation site that currently matches MerkleStoreKind::File
uses the vfs-aware key or fresh backend rather than the existing cached
instance.

117-123: ⚠️ Potential issue | 🔴 Critical

Don't hit disk when the caller asked for default Merkle-store settings.

LocalRepository::new, new_from_version, from_view, and from_remote all pass None here. This branch now eagerly reads .oxen/config.toml, which breaks those constructors before the repo is initialized and contradicts the contract on Line 206.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/lib/src/model/repository/local_repository.rs` around lines 117 - 123,
The code branch that handles config == None must not eagerly read
.oxen/config.toml; instead of calling
RepositoryConfig::from_file(util::fs::config_filepath(&repo_path)) create a
default RepositoryConfig in memory and use that (e.g. set initialized_config =
RepositoryConfig::default() or the repo's default merkle-store config) so
LocalRepository::new, new_from_version, from_view, and from_remote don't hit
disk; only call RepositoryConfig::from_file when the caller provides an explicit
file-backed config. Use the existing symbols (initialized_config, config,
repo_path, RepositoryConfig::from_file) to locate and replace the file-read
branch with a default-initialization branch.
crates/lib/src/core/v_latest/index/commit_merkle_tree.rs (1)

207-210: ⚠️ Potential issue | 🟠 Major

These new ? guards are still being flattened into Ok(None) higher up.

The existence check now correctly propagates MerkleStoreNotInitialized, but read_from_path_maybe() and the tree wrappers still coerce any error into “missing node.” That turns initialization failures back into false not-found results for callers.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/lib/src/core/v_latest/index/commit_merkle_tree.rs` around lines 207 -
210, The existence check now returns errors (e.g., MerkleStoreNotInitialized)
but higher-level callers (read_from_path_maybe and the tree wrapper methods)
still swallow any error and return Ok(None); change those callers so they no
longer coerce all errors into “missing node” — inspect the result of
repo.merkle_store()?.exists(hash)? in read_from_path_maybe and in the tree
wrapper functions and propagate Err(e) upward for non-not-found errors (or
specifically propagate MerkleStoreNotInitialized), only treating an explicit
NotFound result as Ok(None); update the error handling paths in
read_from_path_maybe and the tree wrapper callers to return the Err from
exists/read operations instead of coercing to Ok(None).
crates/lib/src/config/repository_config.rs (1)

55-56: ⚠️ Potential issue | 🟠 Major

Add a serde default here for older config.toml files.

This field is now required during deserialization, so repositories created before merkle_store_kind existed will fail to load. RepositoryConfig::from_file() should fall back to MerkleStoreKind::default() when the key is absent.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/lib/src/config/repository_config.rs` around lines 55 - 56, Add a serde
default for the merkle_store_kind field so old config.toml files that omit it
will deserialize using MerkleStoreKind::default(); specifically, annotate the
merkle_store_kind field with #[serde(default)] in the RepositoryConfig struct
and ensure MerkleStoreKind implements Default (so Default::default() is used),
which will make RepositoryConfig::from_file() fall back to the default when the
key is absent.
🧹 Nitpick comments (1)
crates/lib/src/model/merkle_tree/node/merkle_tree_node.rs (1)

79-87: ⚡ Quick win

Reuse the already acquired store in child lookup.

Line 87 reacquires the store even though store is already available from Line 79. Reusing it avoids redundant lookup and keeps the existence check/read on the same handle.

♻️ Proposed fix
-        repo.merkle_store()?.get_children(hash)
+        store.get_children(hash)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/lib/src/model/merkle_tree/node/merkle_tree_node.rs` around lines 79 -
87, The code currently calls repo.merkle_store() twice; reuse the previously
obtained store variable instead of reacquiring it to avoid redundant lookups and
ensure the exists() check and subsequent read use the same handle—replace the
final repo.merkle_store()?.get_children(hash) call with store.get_children(hash)
and keep the prior exists call using store.exists(hash)? and the same hash
variable.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@crates/lib/src/model/repository/local_repository.rs`:
- Around line 61-64: Add the serde default attribute to the LocalRepository
struct's merkle_store_kind field so older serialized payloads can deserialize
when that field is missing: annotate the merkle_store_kind field with
#[serde(default)] (MerkleStoreKind implements Default) so deserialization falls
back to MerkleStoreKind::File when the field is absent.

---

Duplicate comments:
In `@crates/lib/src/config/repository_config.rs`:
- Around line 55-56: Add a serde default for the merkle_store_kind field so old
config.toml files that omit it will deserialize using
MerkleStoreKind::default(); specifically, annotate the merkle_store_kind field
with #[serde(default)] in the RepositoryConfig struct and ensure MerkleStoreKind
implements Default (so Default::default() is used), which will make
RepositoryConfig::from_file() fall back to the default when the key is absent.

In `@crates/lib/src/core/v_latest/index/commit_merkle_tree.rs`:
- Around line 207-210: The existence check now returns errors (e.g.,
MerkleStoreNotInitialized) but higher-level callers (read_from_path_maybe and
the tree wrapper methods) still swallow any error and return Ok(None); change
those callers so they no longer coerce all errors into “missing node” — inspect
the result of repo.merkle_store()?.exists(hash)? in read_from_path_maybe and in
the tree wrapper functions and propagate Err(e) upward for non-not-found errors
(or specifically propagate MerkleStoreNotInitialized), only treating an explicit
NotFound result as Ok(None); update the error handling paths in
read_from_path_maybe and the tree wrapper callers to return the Err from
exists/read operations instead of coercing to Ok(None).

In `@crates/lib/src/model/repository/local_repository.rs`:
- Around line 127-131: The cached FileBackend captures is_vfs once (constructed
via config.vfs.unwrap_or(false)) but the repository reuses that backend across
calls, so changes to vfs later are ignored; update the caching logic in
local_repository.rs so the cache key includes the vfs mode (e.g., include
config.vfs.unwrap_or(false) when deciding to reuse the backend) or avoid reusing
a single Arc<FileBackend> by constructing a new FileBackend when the vfs flag
differs; ensure the creation site that currently matches MerkleStoreKind::File
uses the vfs-aware key or fresh backend rather than the existing cached
instance.
- Around line 117-123: The code branch that handles config == None must not
eagerly read .oxen/config.toml; instead of calling
RepositoryConfig::from_file(util::fs::config_filepath(&repo_path)) create a
default RepositoryConfig in memory and use that (e.g. set initialized_config =
RepositoryConfig::default() or the repo's default merkle-store config) so
LocalRepository::new, new_from_version, from_view, and from_remote don't hit
disk; only call RepositoryConfig::from_file when the caller provides an explicit
file-backed config. Use the existing symbols (initialized_config, config,
repo_path, RepositoryConfig::from_file) to locate and replace the file-read
branch with a default-initialization branch.

---

Nitpick comments:
In `@crates/lib/src/model/merkle_tree/node/merkle_tree_node.rs`:
- Around line 79-87: The code currently calls repo.merkle_store() twice; reuse
the previously obtained store variable instead of reacquiring it to avoid
redundant lookups and ensure the exists() check and subsequent read use the same
handle—replace the final repo.merkle_store()?.get_children(hash) call with
store.get_children(hash) and keep the prior exists call using
store.exists(hash)? and the same hash variable.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 1894f6db-3835-4507-8b06-d039917f8ea9

📥 Commits

Reviewing files that changed from the base of the PR and between 8d7491c and d70ecc5.

📒 Files selected for processing (15)
  • crates/lib/src/api/client/tree.rs
  • crates/lib/src/command/migrate/m20250111083535_add_child_counts_to_nodes.rs
  • crates/lib/src/config/repository_config.rs
  • crates/lib/src/core/db/merkle_node/file_backend.rs
  • crates/lib/src/core/v_latest/commits.rs
  • crates/lib/src/core/v_latest/entries.rs
  • crates/lib/src/core/v_latest/index/commit_merkle_tree.rs
  • crates/lib/src/core/v_latest/merge.rs
  • crates/lib/src/error.rs
  • crates/lib/src/model/merkle_tree.rs
  • crates/lib/src/model/merkle_tree/node/merkle_tree_node.rs
  • crates/lib/src/model/repository/local_repository.rs
  • crates/lib/src/repositories/commits/commit_writer.rs
  • crates/lib/src/repositories/tree.rs
  • crates/server/src/controllers/tree.rs
🚧 Files skipped from review as they are similar to previous changes (3)
  • crates/lib/src/command/migrate/m20250111083535_add_child_counts_to_nodes.rs
  • crates/lib/src/error.rs
  • crates/lib/src/core/db/merkle_node/file_backend.rs

Comment thread crates/lib/src/model/repository/local_repository.rs Outdated
@malcolmgreaves malcolmgreaves force-pushed the mg/config_merkle_store branch 3 times, most recently from f050933 to e927639 Compare May 7, 2026 02:13
Comment thread crates/lib/src/api/client/internal_types.rs
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
crates/lib/src/api/client/tree.rs (1)

83-87: ⚡ Quick win

Fail fast on merkle transport initialization before starting stream/network work.

merkle_transport()? is resolved inside spawn_blocking, so a MerkleStoreNotInitialized error is raised only after request setup/streaming has already begun. Resolve transport first, then move it into the blocking closure to short-circuit earlier.

Proposed refactor
 pub async fn create_nodes(
     local_repo: &LocalRepository,
@@
 ) -> Result<(), OxenError> {
+    let transport = local_repo.merkle_transport()?;
     let n = nodes.len();
@@
-    let repo = local_repo.clone();
+    let transport_for_pack = Arc::clone(&transport);
     let pack_handle = tokio::task::spawn_blocking(move || -> Result<(), OxenError> {
         let mut sync_writer = SyncIoBridge::new(async_writer);
@@
-        repo.merkle_transport()?.pack_nodes(
+        transport_for_pack.pack_nodes(
             &nodes,
             PackOptions::LegacyClientPush,
             &mut sync_writer,
         )?;
         Ok(())
     });
@@
 async fn node_download_request(
     local_repo: &LocalRepository,
     url: impl AsRef<str>,
 ) -> Result<(), OxenError> {
+    let transport = local_repo.merkle_transport()?;
     let url = url.as_ref();
@@
-    let repo = local_repo.clone();
+    let transport_for_unpack = Arc::clone(&transport);
     tokio::task::spawn_blocking(move || -> Result<(), OxenError> {
@@
-        repo.merkle_transport()?
-            .unpack(&mut sync_reader, UnpackOptions::Overwrite)?;
+        transport_for_unpack.unpack(&mut sync_reader, UnpackOptions::Overwrite)?;
         Ok(())
     })

Also applies to: 373-374

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/lib/src/api/client/tree.rs` around lines 83 - 87, The code currently
calls repo.merkle_transport()? inside the closure spawned with spawn_blocking,
which delays raising MerkleStoreNotInitialized until after stream/network work
begins; instead resolve and assign the transport before starting the
spawn_blocking (e.g., let transport = repo.merkle_transport()?;), then move that
transport into the blocking closure and call transport.pack_nodes(&nodes,
PackOptions::LegacyClientPush, &mut sync_writer) there; apply the same change to
the other occurrence around lines 373-374 so transport initialization happens
prior to any streaming/setup.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@crates/lib/src/api/client/tree.rs`:
- Around line 83-87: The code currently calls repo.merkle_transport()? inside
the closure spawned with spawn_blocking, which delays raising
MerkleStoreNotInitialized until after stream/network work begins; instead
resolve and assign the transport before starting the spawn_blocking (e.g., let
transport = repo.merkle_transport()?;), then move that transport into the
blocking closure and call transport.pack_nodes(&nodes,
PackOptions::LegacyClientPush, &mut sync_writer) there; apply the same change to
the other occurrence around lines 373-374 so transport initialization happens
prior to any streaming/setup.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 618af283-95c9-4830-9478-158f8a669083

📥 Commits

Reviewing files that changed from the base of the PR and between d70ecc5 and e927639.

📒 Files selected for processing (16)
  • crates/lib/src/api/client/internal_types.rs
  • crates/lib/src/api/client/tree.rs
  • crates/lib/src/api/client/workspaces/files.rs
  • crates/lib/src/command/migrate/m20250111083535_add_child_counts_to_nodes.rs
  • crates/lib/src/config/repository_config.rs
  • crates/lib/src/core/db/merkle_node/file_backend.rs
  • crates/lib/src/core/v_latest/commits.rs
  • crates/lib/src/core/v_latest/entries.rs
  • crates/lib/src/core/v_latest/index/commit_merkle_tree.rs
  • crates/lib/src/core/v_latest/merge.rs
  • crates/lib/src/model/merkle_tree.rs
  • crates/lib/src/model/merkle_tree/node/merkle_tree_node.rs
  • crates/lib/src/model/repository/local_repository.rs
  • crates/lib/src/repositories/commits/commit_writer.rs
  • crates/lib/src/repositories/tree.rs
  • crates/server/src/controllers/tree.rs
🚧 Files skipped from review as they are similar to previous changes (7)
  • crates/lib/src/core/v_latest/merge.rs
  • crates/lib/src/repositories/commits/commit_writer.rs
  • crates/lib/src/core/v_latest/entries.rs
  • crates/lib/src/command/migrate/m20250111083535_add_child_counts_to_nodes.rs
  • crates/lib/src/core/v_latest/commits.rs
  • crates/lib/src/model/merkle_tree/node/merkle_tree_node.rs
  • crates/lib/src/repositories/tree.rs

@malcolmgreaves malcolmgreaves force-pushed the mg/config_merkle_store branch 2 times, most recently from 8c85936 to 4cbfa4f Compare May 7, 2026 04:20
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
crates/lib/src/config/repository_config.rs (1)

48-49: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Add serde default on merkle_store_kind to keep old config.toml readable.

Line [49] adds a required field, but deserialization in from_file will fail for older repos where the key is missing unless this field has a serde default.

Suggested patch
     /// The type of Merkle tree node storage that the repository uses.
+    #[serde(default)]
     pub merkle_store_kind: MerkleStoreKind,
#!/bin/bash
# Verify whether repo fixtures/examples still have config.toml files without merkle_store_kind.
# Expected: if any are reported, current deserialization is likely to fail for those files.

fd -HI '^config\.toml$' | while read -r f; do
  if ! rg -n '^\s*merkle_store_kind\s*=' "$f" >/dev/null; then
    echo "Missing merkle_store_kind: $f"
  fi
done
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/lib/src/config/repository_config.rs` around lines 48 - 49, The new
required field merkle_store_kind in the RepositoryConfig struct will break
deserialization for old config.toml files; add a serde default so missing keys
get a sensible value during deserialization by annotating the merkle_store_kind
field with #[serde(default)] and implement or ensure a Default for
MerkleStoreKind (or provide a function via #[serde(default =
"default_merkle_store_kind")]) so RepositoryConfig::merkle_store_kind
deserializes when the key is absent; update the MerkleStoreKind Default
implementation or add a default function and run tests/fixtures to confirm old
configs parse.
🧹 Nitpick comments (1)
crates/lib/src/model/merkle_tree/node/merkle_tree_node.rs (1)

79-87: ⚡ Quick win

Reuse the already acquired store on Line 87.

You acquire let store = repo.merkle_store()?; on Line 79; calling repo.merkle_store()? again is unnecessary. Use the existing binding instead.

♻️ Suggested change
-        repo.merkle_store()?.get_children(hash)
+        store.get_children(hash)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/lib/src/model/merkle_tree/node/merkle_tree_node.rs` around lines 79 -
87, The code fetches the merkle store once into the local binding store (let
store = repo.merkle_store()?), but then calls repo.merkle_store()? again when
returning children; change the final call to reuse the existing binding by
calling store.get_children(hash) instead of
repo.merkle_store()?.get_children(hash) so you avoid an extra repo lookup and
keep the earlier error handling consistent.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Duplicate comments:
In `@crates/lib/src/config/repository_config.rs`:
- Around line 48-49: The new required field merkle_store_kind in the
RepositoryConfig struct will break deserialization for old config.toml files;
add a serde default so missing keys get a sensible value during deserialization
by annotating the merkle_store_kind field with #[serde(default)] and implement
or ensure a Default for MerkleStoreKind (or provide a function via
#[serde(default = "default_merkle_store_kind")]) so
RepositoryConfig::merkle_store_kind deserializes when the key is absent; update
the MerkleStoreKind Default implementation or add a default function and run
tests/fixtures to confirm old configs parse.

---

Nitpick comments:
In `@crates/lib/src/model/merkle_tree/node/merkle_tree_node.rs`:
- Around line 79-87: The code fetches the merkle store once into the local
binding store (let store = repo.merkle_store()?), but then calls
repo.merkle_store()? again when returning children; change the final call to
reuse the existing binding by calling store.get_children(hash) instead of
repo.merkle_store()?.get_children(hash) so you avoid an extra repo lookup and
keep the earlier error handling consistent.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: b62122f8-b149-4ad7-965d-11f6c9878101

📥 Commits

Reviewing files that changed from the base of the PR and between e927639 and 8c85936.

📒 Files selected for processing (16)
  • crates/lib/src/api/client/internal_types.rs
  • crates/lib/src/api/client/tree.rs
  • crates/lib/src/api/client/workspaces/files.rs
  • crates/lib/src/command/migrate/m20250111083535_add_child_counts_to_nodes.rs
  • crates/lib/src/config/repository_config.rs
  • crates/lib/src/core/db/merkle_node/file_backend.rs
  • crates/lib/src/core/v_latest/commits.rs
  • crates/lib/src/core/v_latest/entries.rs
  • crates/lib/src/core/v_latest/index/commit_merkle_tree.rs
  • crates/lib/src/core/v_latest/merge.rs
  • crates/lib/src/model/merkle_tree.rs
  • crates/lib/src/model/merkle_tree/node/merkle_tree_node.rs
  • crates/lib/src/model/repository/local_repository.rs
  • crates/lib/src/repositories/commits/commit_writer.rs
  • crates/lib/src/repositories/tree.rs
  • crates/server/src/controllers/tree.rs
✅ Files skipped from review due to trivial changes (2)
  • crates/lib/src/model/merkle_tree.rs
  • crates/lib/src/core/v_latest/index/commit_merkle_tree.rs
🚧 Files skipped from review as they are similar to previous changes (6)
  • crates/lib/src/repositories/commits/commit_writer.rs
  • crates/lib/src/api/client/tree.rs
  • crates/lib/src/api/client/internal_types.rs
  • crates/lib/src/command/migrate/m20250111083535_add_child_counts_to_nodes.rs
  • crates/lib/src/core/db/merkle_node/file_backend.rs
  • crates/lib/src/model/repository/local_repository.rs

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
crates/lib/src/config/repository_config.rs (1)

48-49: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Missing serde default breaks backward-compatible config loading

Line 49 introduces a required deserialized field, but without #[serde(default)] older config.toml files (without merkle_store_kind) will fail in from_file() at Line 79. That contradicts the PR’s compatibility goal.

Suggested fix
     /// The type of Merkle tree node storage that the repository uses.
+    #[serde(default)]
     pub merkle_store_kind: MerkleStoreKind,
#!/bin/bash
set -euo pipefail

file="crates/lib/src/config/repository_config.rs"

echo "Inspect RepositoryConfig field declaration:"
nl -ba "$file" | sed -n '44,54p'

echo
echo "Check for serde(default) on fields in this file:"
rg -n 'serde\(default' "$file" || true

echo
echo "Confirm deserialization path using toml::from_str:"
rg -n 'toml::from_str' "$file"
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/lib/src/config/repository_config.rs` around lines 48 - 49,
RepositoryConfig's new required field merkle_store_kind breaks deserialization
of older configs; update the RepositoryConfig definition to add
#[serde(default)] on the merkle_store_kind field and ensure MerkleStoreKind has
a stable Default (either impl Default for MerkleStoreKind or provide a function
like default_merkle_store_kind and use #[serde(default =
"default_merkle_store_kind")]) so toml::from_str in the from_file() path can
load configs that omit this field; keep the default value consistent with the
repository's intended legacy behavior.
🧹 Nitpick comments (1)
crates/lib/src/model/merkle_tree.rs (1)

36-42: 💤 Low value

LGTM — Debug supertrait addition is correct and object-safe.

Debug only requires fn fmt(&self, ...), so dyn TransportableMerkleStore remains object-safe and the vtable is well-formed. The transitivity (Arc<dyn TransportableMerkleStore>: Debug) that unblocks LocalRepository's derive(Debug) follows correctly. The ?Sized + Debug in the blanket impl is the standard Rust pattern for this use case.

One note: if TransportableMerkleStore is ever re-exported and treated as a stable API, adding Debug as a supertrait is a semver-breaking change for downstream implementors. Sealing the trait (or documenting it as internal) would future-proof this.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/lib/src/model/merkle_tree.rs` around lines 36 - 42, The new Debug
supertrait on TransportableMerkleStore is object-safe but is a semver-breaking
change for downstream implementors; to future-proof it, add the standard Rust
"sealed" pattern: declare a private module-level trait (e.g., Sealed) and
require TransportableMerkleStore: Sealed, then provide the crate-private blanket
impl of Sealed for the same set of types (T: MerkleStore + MerkleTransport +
?Sized + Debug) so only internal types can implement TransportableMerkleStore;
reference TransportableMerkleStore, MerkleStore, MerkleTransport, and
LocalRepository when making this change and ensure the existing public blanket
impl impl<T: MerkleStore + MerkleTransport + ?Sized + Debug>
TransportableMerkleStore for T {} remains consistent with the sealing.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Duplicate comments:
In `@crates/lib/src/config/repository_config.rs`:
- Around line 48-49: RepositoryConfig's new required field merkle_store_kind
breaks deserialization of older configs; update the RepositoryConfig definition
to add #[serde(default)] on the merkle_store_kind field and ensure
MerkleStoreKind has a stable Default (either impl Default for MerkleStoreKind or
provide a function like default_merkle_store_kind and use #[serde(default =
"default_merkle_store_kind")]) so toml::from_str in the from_file() path can
load configs that omit this field; keep the default value consistent with the
repository's intended legacy behavior.

---

Nitpick comments:
In `@crates/lib/src/model/merkle_tree.rs`:
- Around line 36-42: The new Debug supertrait on TransportableMerkleStore is
object-safe but is a semver-breaking change for downstream implementors; to
future-proof it, add the standard Rust "sealed" pattern: declare a private
module-level trait (e.g., Sealed) and require TransportableMerkleStore: Sealed,
then provide the crate-private blanket impl of Sealed for the same set of types
(T: MerkleStore + MerkleTransport + ?Sized + Debug) so only internal types can
implement TransportableMerkleStore; reference TransportableMerkleStore,
MerkleStore, MerkleTransport, and LocalRepository when making this change and
ensure the existing public blanket impl impl<T: MerkleStore + MerkleTransport +
?Sized + Debug> TransportableMerkleStore for T {} remains consistent with the
sealing.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 55c43f61-3256-4e91-bfdb-12ff3df5d727

📥 Commits

Reviewing files that changed from the base of the PR and between 8c85936 and 4cbfa4f.

📒 Files selected for processing (16)
  • crates/lib/src/api/client/internal_types.rs
  • crates/lib/src/api/client/tree.rs
  • crates/lib/src/api/client/workspaces/files.rs
  • crates/lib/src/command/migrate/m20250111083535_add_child_counts_to_nodes.rs
  • crates/lib/src/config/repository_config.rs
  • crates/lib/src/core/db/merkle_node/file_backend.rs
  • crates/lib/src/core/v_latest/commits.rs
  • crates/lib/src/core/v_latest/entries.rs
  • crates/lib/src/core/v_latest/index/commit_merkle_tree.rs
  • crates/lib/src/core/v_latest/merge.rs
  • crates/lib/src/model/merkle_tree.rs
  • crates/lib/src/model/merkle_tree/node/merkle_tree_node.rs
  • crates/lib/src/model/repository/local_repository.rs
  • crates/lib/src/repositories/commits/commit_writer.rs
  • crates/lib/src/repositories/tree.rs
  • crates/server/src/controllers/tree.rs
✅ Files skipped from review due to trivial changes (2)
  • crates/lib/src/api/client/internal_types.rs
  • crates/lib/src/core/db/merkle_node/file_backend.rs
🚧 Files skipped from review as they are similar to previous changes (9)
  • crates/lib/src/core/v_latest/merge.rs
  • crates/lib/src/core/v_latest/entries.rs
  • crates/lib/src/repositories/commits/commit_writer.rs
  • crates/lib/src/model/merkle_tree/node/merkle_tree_node.rs
  • crates/lib/src/command/migrate/m20250111083535_add_child_counts_to_nodes.rs
  • crates/lib/src/core/v_latest/commits.rs
  • crates/lib/src/repositories/tree.rs
  • crates/lib/src/core/v_latest/index/commit_merkle_tree.rs
  • crates/lib/src/model/repository/local_repository.rs

@malcolmgreaves malcolmgreaves force-pushed the mg/config_merkle_store branch 2 times, most recently from 66a369d to 5c4ad32 Compare May 7, 2026 05:44
@malcolmgreaves malcolmgreaves force-pushed the mg/merkle_dyn_pack_refactor branch from 225cb0c to 7699775 Compare May 7, 2026 19:25
@malcolmgreaves malcolmgreaves force-pushed the mg/config_merkle_store branch from 5c4ad32 to 9e10a92 Compare May 7, 2026 19:25
@malcolmgreaves malcolmgreaves force-pushed the mg/merkle_dyn_pack_refactor branch from 7699775 to 5507f6b Compare May 7, 2026 20:34
@malcolmgreaves malcolmgreaves force-pushed the mg/config_merkle_store branch from 9e10a92 to f57471e Compare May 7, 2026 20:35
@malcolmgreaves malcolmgreaves force-pushed the mg/merkle_dyn_pack_refactor branch from 5507f6b to b6022bd Compare May 7, 2026 20:41
@malcolmgreaves malcolmgreaves force-pushed the mg/config_merkle_store branch 2 times, most recently from fd61e09 to 64e1254 Compare May 7, 2026 21:10
@malcolmgreaves malcolmgreaves force-pushed the mg/merkle_dyn_pack_refactor branch from b6022bd to 02d9373 Compare May 7, 2026 21:42
@malcolmgreaves malcolmgreaves force-pushed the mg/config_merkle_store branch from 64e1254 to 4886e86 Compare May 7, 2026 21:43
@malcolmgreaves malcolmgreaves force-pushed the mg/merkle_dyn_pack_refactor branch from 02d9373 to 4bd68a4 Compare May 7, 2026 23:37
@malcolmgreaves malcolmgreaves force-pushed the mg/config_merkle_store branch from 4886e86 to 3d3b1fd Compare May 7, 2026 23:39
@malcolmgreaves malcolmgreaves force-pushed the mg/merkle_dyn_pack_refactor branch from 4bd68a4 to 851557c Compare May 12, 2026 23:15
@malcolmgreaves malcolmgreaves force-pushed the mg/config_merkle_store branch from 3d3b1fd to 703036a Compare May 12, 2026 23:15
@malcolmgreaves malcolmgreaves force-pushed the mg/merkle_dyn_pack_refactor branch from 851557c to 60d4651 Compare May 13, 2026 20:52
@malcolmgreaves malcolmgreaves force-pushed the mg/config_merkle_store branch from 703036a to 656fb2a Compare May 13, 2026 20:53
@malcolmgreaves malcolmgreaves force-pushed the mg/merkle_dyn_pack_refactor branch 3 times, most recently from 9c92dd4 to d5a2e13 Compare May 14, 2026 01:39
@malcolmgreaves malcolmgreaves force-pushed the mg/config_merkle_store branch from 656fb2a to 01fa5de Compare May 14, 2026 01:41
Copy link
Copy Markdown
Contributor

@CleanCut CleanCut left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Meta question: could we maybe get rid of the "vfs" feature? There's a lot of logic plumbed through to support it, but a few things make me wonder:

  • It can only be set by manually editing the config toml
  • All it really does is unpack the merkle tarball to the system /tmp and then copy it over, rather than unpacking directly to the destination. It doesn't touch any other write paths, not even version file tarball unpacking. So it's a little hard to imagine it being load-bearing.
  • In #181 where it first showed up, it's framed as a bug fix, though the actual problem and why this would address it aren't really discussed in the PR. Could it have been an AI assumption that slipped through?

Update: I found a thread with a customer where setting this flag via oxen clone --vfs is claimed to have fixed a problem. But the customer also reported immediately running into another problem. 🤷🏻‍♂️ Sounds like we'd need to do some testing with NFS to really figure out if this is necessary or not. If it is, then we should probably aute-detect (or auto-fallback) to this behavior. If it isn't, it'd be nice to remove the logic. In any case, it's all stuff to figure out in the future.

Comment thread crates/lib/src/model/repository/local_repository.rs Outdated
Comment thread crates/lib/src/model/repository/local_repository.rs Outdated
Comment thread crates/lib/src/config/repository_config.rs
@malcolmgreaves malcolmgreaves force-pushed the mg/merkle_dyn_pack_refactor branch from d5a2e13 to be546fc Compare May 14, 2026 19:20
@malcolmgreaves malcolmgreaves force-pushed the mg/config_merkle_store branch from 01fa5de to 5a32743 Compare May 14, 2026 19:20
@malcolmgreaves malcolmgreaves force-pushed the mg/merkle_dyn_pack_refactor branch from be546fc to e7956bc Compare May 14, 2026 22:05
@malcolmgreaves malcolmgreaves force-pushed the mg/config_merkle_store branch from 5a32743 to d13518d Compare May 14, 2026 22:05
@malcolmgreaves malcolmgreaves force-pushed the mg/merkle_dyn_pack_refactor branch 5 times, most recently from 0311615 to 6ad4dad Compare May 15, 2026 01:28
Base automatically changed from mg/merkle_dyn_pack_refactor to main May 15, 2026 01:39
@malcolmgreaves malcolmgreaves force-pushed the mg/config_merkle_store branch from d13518d to 0250547 Compare May 15, 2026 02:13
Adds a new `MerkleStoreKind` enum to the `RepositoryConfig`, which dictates
the type of physical storage that the `LocalRepository` uses for storing
and retrieving Merkle tree nodes. Defaults to the existing custom file format
(aka `FileBackend`) so it is backwards compatible with all existing repositories.

Makes the `TransportableMerkleStore` a field of the `LocalRepository` so that
this is only loaded once at construction time and reused. Also added the
`MerkleStoreKind` uses at load time as another field. Since there's
`Serialization` & `Deserialization` derives for `LocalRepository`, the field
has to be `Option` to provide a default value since it cannot be serialized.
This leads to a brittle design where a `LocalRepository` can be made without
being correctly initialized, hence the new `MerkleStoreNotInitialized` error
variant for `OxenError`. A follow up refactor is necessary to remove these
unnecessary serializaton derives and migrate the HTTP responses to only use
`*View` structs (i.e. `WorkspaceView`, ParsedResourceView`, etc.)

As a result of this initialization problem, `LocalRepository::merkle_store`
has been changed to return a `Result.` It returns a `&dyn MerkleStore` as
the borrow on the `LocalRepository` means we can return a borrow on the
inner state without needing to increment the refcount.

Additionally, uses of the `MerkleTransport` trait from the `LocalRepository`
now go through a new method: `merkle_transport()`. This borrows the internal
`merkle_store` `Arc` and coerces the fat pointer to a `&dyn MerkleTransport`.
The original `merkle_store()` method has been changed to do the same thing,
returning a `&dyn MerkleStore`.

Finally, removes the `set_vfs` method. Merkle store backends need a consistent
`vfs` value and allowing mutation to make this go out of sync. Any places
that needed to use `set_vfs` have been updated to use a modified constructor
that sets the right `vfs` value.
@malcolmgreaves malcolmgreaves force-pushed the mg/config_merkle_store branch from 0250547 to 26ecf04 Compare May 15, 2026 03:14
@malcolmgreaves
Copy link
Copy Markdown
Collaborator Author

Good question on VFS. It needs to be plumbed through for the LMDB backend because LMDB doesn't support VFS. (So if it's set and a user tries to use that backend, it will error out at creation time).

I think the whole feature should be reexamined and we should potentially redesign how we support and work with VFSes. The vfs config option's history is a bit organic and it's probably time to take another look from first principles on what problems we're trying to solve with it.

@malcolmgreaves malcolmgreaves enabled auto-merge (squash) May 15, 2026 03:20
@malcolmgreaves malcolmgreaves merged commit 62ad7ff into main May 15, 2026
16 of 25 checks passed
@malcolmgreaves malcolmgreaves deleted the mg/config_merkle_store branch May 15, 2026 03:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants