Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
38 commits
Select commit Hold shift + click to select a range
890751a
chore(`gettxoutsetinfo`): start writing common types
dorianvp Oct 8, 2025
26e4b4e
chore(`gettxoutsetinfo`): initial impl of `get_txout_set`
dorianvp Oct 8, 2025
a1ba9cb
docs(`gettxoutsetinfo`): draft utxo set spec
dorianvp Oct 9, 2025
ab42473
chore(`gettxoutsetinfo`): initial impl of `ZAINO-UHS-01`
dorianvp Oct 11, 2025
c91da4e
chore(`gettxoutsetinfo`): add doc comments
dorianvp Oct 12, 2025
0abc45f
chore(`gettxoutsetinfo`): use `utxoset_hash_v1` in `StateServiceSubsc…
dorianvp Oct 12, 2025
7679169
chore(`gettxoutsetinfo`): add doc comments
dorianvp Oct 13, 2025
49499bc
test(`gettxoutsetinfo`): add `fetch_service` test
dorianvp Oct 13, 2025
d170a82
test(`gettxoutsetinfo`): add top-level comment
dorianvp Oct 13, 2025
3978e6b
test(`gettxoutsetinfo`): add top-level comment
dorianvp Oct 13, 2025
582f457
test(`gettxoutsetinfo`): add `state_service_get_txout_set_info`
dorianvp Oct 13, 2025
489c265
chore(`gettxoutsetinfo`): run clippy
dorianvp Oct 13, 2025
986170b
chore(`gettxoutsetinfo`): enable endpoint & add tests
dorianvp Oct 13, 2025
d12c3a6
test(`gettxoutsetinfo`): add `uhs` tests
dorianvp Oct 14, 2025
e4da8d6
chore(`gettxoutsetinfo`): comments
dorianvp Oct 14, 2025
5b5ca76
chore(`gettxoutsetinfo`): typed network enum
dorianvp Oct 14, 2025
77c972a
chore(`gettxoutsetinfo`): re-export `zaino_common::Network`
dorianvp Oct 14, 2025
9e46a43
chore(`gettxoutsetinfo`): address todos
dorianvp Oct 14, 2025
6855bda
chore(`gettxoutsetinfo`): add `byte_order_tests`
dorianvp Oct 14, 2025
012e866
chore(`gettxoutsetinfo`): add `utxo_serialized_size`
dorianvp Oct 14, 2025
fd62e0d
chore(`gettxoutsetinfo`): fix last todo
dorianvp Oct 14, 2025
1aabeb4
add references for CompactSize
zancas Oct 15, 2025
5314817
make reference explicit
zancas Oct 15, 2025
90a869e
chore(`gettxoutsetinfo`): small spec corrections
dorianvp Oct 16, 2025
8bd3748
add header and terminology, propose refinement to abstract
zancas Oct 17, 2025
ac5338a
ZI-ng-P: 0
zancas Oct 17, 2025
bdd2721
start moving sections to more closely align with https://github.com/z…
zancas Oct 17, 2025
a5f0fa2
reference in references, BCP 14
zancas Oct 17, 2025
25d6f8f
fix space in footnote
zancas Oct 17, 2025
dd64863
fix footnote
zancas Oct 17, 2025
35b61d7
futz with terminology
zancas Oct 17, 2025
b3bca67
bold consensus network
zancas Oct 17, 2025
cac6402
make rpc name part of Title
zancas Oct 17, 2025
6694358
docs(`gettxoutsetinfo`): use `network` instead of `consensus network`
dorianvp Oct 17, 2025
9c17d9f
docs(`gettxoutsetinfo`): replace `network` with `genesis_block_hash`
dorianvp Oct 18, 2025
aee7c32
chore(`gettxoutsetinfo`): remove `BlockHash`, update spec & impl
dorianvp Oct 19, 2025
b8fa6a1
fix: rebase gettxoutsetinfo onto dev and resolve dependency conflicts
zancas Apr 14, 2026
6bd542e
Merge branch 'dev' into feat/rpc-gettxoutsetinfo
zancas Apr 20, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
38 changes: 31 additions & 7 deletions Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

3 changes: 3 additions & 0 deletions docs/json_rpc/gettxoutsetinfo.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
# `gettxoutsetinfo`

See [Zaino's Unspent Hash set](./gettxoutsetinfo/canonical_utxo_set_snapshot_hash.md) for more information on how the UTXO set hash is computed.
168 changes: 168 additions & 0 deletions docs/json_rpc/gettxoutsetinfo/canonical_utxo_set_snapshot_hash.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,168 @@
Title: ZAINO-UTXOSET-01 Canonical UTXO Set Snapshot Hash (v1)
Owners: dorianvp <dorianvp@zingolabs.org>
Za Wil <zancas@zingolabs.org>
Status: Draft
Category: Lightclients
Created: 2025-10-16
License: MIT

## Terminology

- The key words **MUST**, **MUST NOT**, **SHOULD**, and **MAY** are to be interpreted as described in BCP 14 [^BCP14] when, and only when, they appear in all capitals..
- Integers are encoded **little-endian** unless otherwise stated.
- “CompactSize” refers to the [Bitcoin Specified](https://en.bitcoin.it/wiki/Protocol_documentation#Variable_length_integer) [Zcash Implementation](https://docs.rs/zcash_encoding/0.3.0/zcash_encoding/struct.CompactSize.html) of variable-length integer format.
- `BLAKE3` denotes the 32-byte output of the BLAKE3 hash function.
- This specification defines **version 1** (“V1”) of the ZAINO UTXO snapshot.

## Abstract

This document specifies a deterministic, versioned procedure to compute a 32-byte hash of a node’s UTXO set at a specified best block. The snapshot uses a canonical ordering and serialization and is hashed under a domain tag.

Among other uses, the snapshot hash can be used to:

- Verify that two nodes at the same best block have the same UTXO set across implementations and versions.
- Pin failing test fixtures to a snapshot hash to reproduce issues.
- Log periodic hashes to show continuity of state over time.

The hash is _not_ input to consensus validation.

## Motivation

Different nodes (e.g., `zcashd`, Zebra, indexers) may expose distinct internals or storage layouts. Operators often need a cheap way to verify “we’re looking at the same unspent set” without transporting the entire set. A canonical, versioned snapshot hash solves this.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that zcashd had this method returning a hash_serialized already, but, why is it not enough to check that the block hashes match?


## Domain Separation

Implementations **MUST** domain-separate the hash with the ASCII header:

```
"ZAINO-UTXOSET-V1\0"
```

Any change to the encoding rules or semantics **MUST** bump the domain string (e.g., `…-V2\0`) and is out of scope of this document.

## Inputs

To compute the snapshot hash, the implementation needs:
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why include anything other than the UTXOs as inputs in the snapshot hash? Shouldn't we already know that we're looking at the same UTXO set if the best block hashes match?


- `genesis_block_hash`: 32-byte hash that uniquely identifies the chain.
- `best_height`: the height of the best block at the time of the snapshot (unsigned 32-bit).
- `best_block`: the 32-byte block hash of the best chain tip, in the node’s _canonical internal byte order_.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we have a reference for the canonical internal byte order?

- `UTXO set`: a finite multimap keyed by outpoints `(txid, vout)` to outputs `(value_zat, scriptPubKey)`, where:

- `txid` is a 32-byte transaction hash (internal byte order).
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above?

- `vout` is a 32-bit output index (0-based).
Copy link
Copy Markdown
Member Author

@dorianvp dorianvp Oct 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leaving a note here:

If we serialize per unspent as txid || value || script and a transaction contains two outputs with identical (value, script), then two different UTXO sets that differ only by which index is unspent will serialize to the same bytes (and hash).

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

vout is a misleading name for this; I would call it output_index because vout is generally used to refer to the vector of outputs of a transaction; I was confused by this name before I got to this line.

- `value_zat` is a non-negative amount in zatoshis, range-checked to the node’s monetary bounds (e.g., `0 ≤ value_zat ≤ MAX_MONEY`).
- `scriptPubKey` is a byte string.

Implementations **MUST** reject negative values or out-of-range amounts prior to hashing.

## Canonical Ordering
Comment thread
zancas marked this conversation as resolved.

The snapshot **MUST** be ordered as follows, independent of the node’s in-memory layout:

1. Sort by `txid` ascending, comparing the raw 32-byte values as unsigned bytes.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like a bad serialization, because it requires recomputation over the entire UTXO set whenever a new block is received. The UTXO set can be very large; it would be much better to choose a snapshot protocol where snapshot hashes can incrementally build on the snapshot hash prior to the addition of a new block.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the UTXO set were stored in a B-tree data structure that internally kept Merkle hashes at the nodes, then it might be okay to use the Merkle root of that data structure for the snapshot identifier. It would need to be the case that the fanout of the B-tree and the insertion semantics were well-specified to ensure that everyone uses the same hashing approach.

One possibility that would allow for this to work as-specified would be to use a separate B-tree (implementing a set, rather than a map) for producing the hashes; since the txid commits to the effects of each transaction, one could build the snapshot identifier alongside the actual data, but building that identifier in parallel would have a risk of data inconsistencies with the primary store.

In general, I feel like the UTXO set would be best represented as a persistent data structure with good amortized append costs.

Copy link
Copy Markdown

@arya2 arya2 Oct 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The UTXO set can be very large; it would be much better to choose a snapshot protocol where snapshot hashes can incrementally build on the snapshot hash prior to the addition of a new block.

sparse-merkle-tree could be useful here, Zaino could:

  • Implement Value for a struct representing the transaction output data to which hash_serialized is committing,
  • Implement StoreReadOps/StoreWriteOps for on-disk storage of the tree,
  • Update the tree and a cache with the other fields in TxOutSetInfo when Zaino is indexing blocks, and
  • Return the cached TxOutSetInfo from the RPC method.

2. For equal `txid`s, sort by `vout` ascending (unsigned 32-bit).

This ordering **MUST** be used for serialization.

## Serialization

The byte stream fed to the hash is the concatenation of a **header** and **entries**:

### Header

- ASCII bytes: `"ZAINO-UTXOSET-V1\0"`
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this \0 present?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It only acts as a terminator/delimiter. It is not strictly necessary...

- `genesis_block_hash`: 32 raw bytes
- `best_height` as `u32` little-endian.
- `best_block` as 32 raw bytes.
- `count_txouts` as `u64` little-endian, where `count_txouts` is the total number of serialized entries below.

### Entries (one per outpoint in canonical order)

For each `(txid, vout, value_zat, scriptPubKey)`:

- `txid` as 32 raw bytes.
- `vout` as `u32` little-endian.
- `value_zat` as `u64` little-endian.
- `script_len` as CompactSize (Bitcoin/Zcash varint) of `scriptPubKey.len()`.
- `scriptPubKey` raw bytes.

**Note:** No per-transaction terminators or grouping markers are used. Instead, the format commits to _outputs_, not _transactions_.

### CompactSize ([reference](https://en.bitcoin.it/wiki/Protocol_documentation#Variable_length_integer))

- If `n < 0xFD`: a single byte `n`.
- Else if `n ≤ 0xFFFF`: `0xFD` followed by `n` as `u16` little-endian.
- Else if `n ≤ 0xFFFF_FFFF`: `0xFE` followed by `n` as `u32` little-endian.
- Else: `0xFF` followed by `n` as `u64` little-endian.

## Hash Function

- The implementation **MUST** stream the bytes above into a BLAKE3 hasher.
- The 32-byte output of the hasher is the **snapshot hash**.

## Pseudocode

```text
function UtxoSnapshotHashV1(genesis_block_hash, best_height, best_block, utxos):
H ← blake3::Hasher()

// Header
H.update("ZAINO-UTXOSET-V1\0")
H.update(genesis_block_hash)
H.update(le_u32(best_height))
H.update(best_block) // 32 raw bytes, node’s canonical order
count ← number_of_outputs(utxos)
H.update(le_u64(count))

// Entries in canonical order
for (txid, vout, value, script) in sort_by_txid_then_vout(utxos):
assert 0 ≤ value ≤ MAX_MONEY
H.update(txid) // 32 raw bytes
H.update(le_u32(vout))
H.update(le_u64(value)) // zatoshis
H.update(CompactSize(script.len))
H.update(script)

return H.finalize() // 32-byte BLAKE3 digest
```

## Error Handling

- If any `value_zat` is negative or exceeds `MAX_MONEY`, the snapshot procedure **MUST** fail and **MUST NOT** produce a hash.
- If the UTXO set changes during iteration (non-atomic read), the implementation **SHOULD** retry using a stable view (e.g., read lock or height-pinned snapshot).

## Security and Interop Considerations

- This hash is **not a consensus commitment** and **MUST NOT** be used to validate blocks or transactions.
- The domain string identifies the algorithm/format version. Any change **MUST** use a new tag.
- The snapshot **MUST** bind to a specific chain and tip by including best_block (32 bytes, consensus byte order) and
**SHOULD** include `best_height`. Implementations **SHOULD** include `genesis_block_hash` (32 bytes) as the chain identifier.
- The serialization **MUST** be injective. Duplicates or out-of-range values **MUST** cause failure.
- Equal hashes indicate equal inputs under this specification. They do not imply authenticity, provenance, or liveness.

## Rationale

- **BLAKE3** is chosen for speed and strong modern security. SHA-256 would also work but is slower in large sets. The domain string ensures local uniqueness regardless of the hash function family.
- Committing to _outputs_ rather than _transactions_ simplifies implementations that don’t have transaction-grouped storage.
- CompactSize matches existing Bitcoin/Zcash encoding and avoids ambiguity.

## Versioning

- Any breaking change to the byte stream, input semantics, or ordering **MUST** bump the domain tag to `ZAINO-UTXOSET-V2\0` (or higher).
- Implementations **SHOULD** publish the version alongside the hash in logs and APIs.

## Test Guidance

Implementations **SHOULD** include tests covering:

1. **Determinism:** Shuffle input, and the hash remains constant.
2. **Sensitivity:** Flip one bit in `value_zat` or `scriptPubKey`, and the hash changes.
3. **Metadata:** Change `genesis_block_hash` or `best_block`, and the hash changes.
4. **Empty Set:** With `count_txouts = 0`, the hash is well-defined.
5. **Large Scripts:** Scripts with CompactSize boundaries (252, 253, 2^16, 2^32).
6. **Ordering:** Two entries with same `txid` different `vout` are ordered by `vout`.

## References

[^BCP14]: [Information on BCP 14 — "RFC 2119"](https://www.rfc-editor.org/info/bcp14)
56 changes: 55 additions & 1 deletion integration-tests/tests/fetch_service.rs
Original file line number Diff line number Diff line change
@@ -1,4 +1,6 @@
//! These tests compare the output of `FetchService` with the output of `JsonRpcConnector`.
//! These tests compare the output of `FetchService` with the output of [`JsonRpSeeConnector`].
//!
//! Note that they both rely on the [`JsonRpSeeConnector`] to get the data.

use futures::StreamExt as _;
use hex::ToHex as _;
Expand Down Expand Up @@ -563,6 +565,53 @@ async fn fetch_service_get_address_tx_ids<V: ValidatorExt>(validator: &Validator
test_manager.close().await;
}

async fn fetch_service_get_txout_set_info() {
Comment thread
dorianvp marked this conversation as resolved.
let (mut test_manager, _fetch_service, fetch_service_subscriber) =
create_test_manager_and_fetch_service(&ValidatorKind::Zcashd, None, true, true, true, true)
.await;

let mut clients = test_manager
.clients
.take()
.expect("Clients are not initialized");
clients.faucet.sync_and_await().await.unwrap();

let recipient_ua = clients.get_recipient_address("unified").await;
let _tx = zaino_testutils::from_inputs::quick_send(
&mut clients.faucet,
vec![(&recipient_ua, 250_000, None)],
)
.await
.unwrap();

test_manager.local_net.generate_blocks(1).await.unwrap();
tokio::time::sleep(tokio::time::Duration::from_secs(1)).await;

let txout_set_info = fetch_service_subscriber.get_txout_set_info().await.unwrap();

let jsonrpc_client = JsonRpSeeConnector::new_with_basic_auth(
test_node_and_return_url(
test_manager.zebrad_rpc_listen_address,
false,
None,
Some("xxxxxx".to_string()),
Some("xxxxxx".to_string()),
)
.await
.unwrap(),
"xxxxxx".to_string(),
"xxxxxx".to_string(),
)
.unwrap();
let json_rpc_txout_set_info = jsonrpc_client.get_txout_set_info().await.unwrap();
dbg!(&json_rpc_txout_set_info);
dbg!(&txout_set_info);

assert_eq!(txout_set_info, json_rpc_txout_set_info);

test_manager.close().await;
}

#[allow(deprecated)]
async fn fetch_service_get_address_utxos<V: ValidatorExt>(validator: &ValidatorKind) {
let mut test_manager =
Expand Down Expand Up @@ -2184,6 +2233,11 @@ mod zcashd {
fetch_service_get_address_tx_ids::<Zcashd>(&ValidatorKind::Zcashd).await;
}

#[tokio::test(flavor = "multi_thread")]
pub(crate) async fn txout_set_info() {
fetch_service_get_txout_set_info().await;
}

#[tokio::test(flavor = "multi_thread")]
pub(crate) async fn address_utxos() {
fetch_service_get_address_utxos::<Zcashd>(&ValidatorKind::Zcashd).await;
Expand Down
Loading
Loading