Skip to content

Bwatch#9069

Draft
sangbida wants to merge 102 commits intoElementsProject:masterfrom
sangbida:async-block-processing
Draft

Bwatch#9069
sangbida wants to merge 102 commits intoElementsProject:masterfrom
sangbida:async-block-processing

Conversation

@sangbida
Copy link
Copy Markdown
Collaborator

Important

26.04 FREEZE March 11th: Non-bugfix PRs not ready by this date will wait for 26.06.

RC1 is scheduled on March 23rd

The final release is scheduled for April 15th.

Checklist

Before submitting the PR, ensure the following tasks are completed. If an item is not applicable to your PR, please mark it as checked:

  • The changelog has been updated in the relevant commit(s) according to the guidelines.
  • Tests have been added or modified to reflect the changes.
  • Documentation has been reviewed and updated as needed.
  • Related issues have been listed and linked, including any that this PR closes.
  • Important All PRs must consider how to reverse any persistent changes for tools/lightning-downgrade

@sangbida sangbida force-pushed the async-block-processing branch 2 times, most recently from 841b6d8 to 1958906 Compare April 21, 2026 12:13
Comment thread plugins/bwatch/bwatch.c
@@ -0,0 +1,44 @@
#include "config.h"
#include <ccan/array_size/array_size.h>
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Call these cln-bwatch

@sangbida
Copy link
Copy Markdown
Collaborator Author

Think about how we might rescan scriptpubkeys on migration, we don't really have to rescan scriptpubkeys more than once

rustyrussell and others added 24 commits April 23, 2026 23:07
Like bitcoin_txid, they are special backwards-printed snowflakes.

Thanks Obama!

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
These helper functions decode hex strings from JSON into big-endian 32-bit and 64-bit values, useful for parsing datastore entries exposing these into a more common space so they can be used by bwatch in the future.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
bwatch is an async block scanner that consumes blocks from bcli or any
other bitcoind interface and communicates with lightningd by sending
it updates. In this commit we're only introducing the plugin and some
files that we will populate in future commits.
This wire file primarily contains datastructures that is used to serialize data for storing in the datastore. We have 2 types of datastores for bwatch.
The block history datastore and the watch datastore. For block history we store height, the hash and the hash of the previous block.
For watches we have 4 types of watches - utxo, scriptpubkey, scid and blockdepth watches, each of these have their unique info stored in the datastore. The common info for all watches includes the start block and the list of owners interested in watching.
We have 4 types of watches: utxo (outpoint), scriptpubkey, scid and
blockdepth. Each gets its own hash table with a key shape that makes
lookups direct.
bwatch keeps a tail of recent blocks (height, hash, prev hash) so it
can detect and unwind reorgs without re-fetching from bitcoind. The
datastore key for each block is zero-padded to 10 digits so
listdatastore returns blocks in ascending height order. On startup
we replay the stored history and resume from the most recent block.
Each watch (and its set of owners) is serialized through the wire
format from the earlier commit and stored in the datastore. On startup
we walk each type's prefix and reload the watches into their
respective hash tables, so a restart resumes watching the same things
without anyone re-registering.
bwatch_add_watch and bwatch_del_watch are the high-level entry points
the RPCs (added in a later commit) use. Adding a watch that already
exists merges the owner list and lowers start_block if the new request
needs to scan further back, so a re-registering daemon (e.g. onchaind
on restart) doesn't lose missed events. Removing a watch drops only
the requesting owner; the watch itself is removed once the owner list
is empty.
Add the chain-polling loop. A timer fires bwatch_poll_chain, which calls
getchaininfo to learn bitcoind's tip; if we're behind, we fetch the next
block via getrawblockbyheight, append it to the in-memory history and
persist it to the datastore. After each successful persist we reschedule
the timer at zero delay so we keep fetching back-to-back until we catch
up to the chain tip. Once getchaininfo reports no new block, we settle
into the steady-state cadence (30s by default, tunable via the
--bwatch-poll-interval option).

This commit only handles the happy path. Reorg detection, watchman
notifications and watch matching land in subsequent commits.
After bwatch persists a new tip, send a block_processed RPC to watchman
(lightningd) with the height and hash. bwatch only continues polling
for the next block once watchman has acknowledged that it has also
processed the new block height on its end.

This matters for crash safety: on restart we treat watchman's height as
the floor and re-fetch anything above it, so any block we acted on must
be visible to watchman before we move on.

If watchman isn't ready yet (e.g. lightningd still booting) the RPC
errors out non-fatally; we just reschedule and retry.
When handle_block fetches the next block, validate its parent hash
against our current tip. If they disagree we're seeing a reorg: pop our
in-memory + persisted tip via bwatch_remove_tip, walk the history one
back, and re-fetch from the new height. Each fetch may itself reorg
further, so the loop naturally peels off as many stale tips as needed
until the chain rejoins.

After every rollback, tell watchman the new tip via
revert_block_processed so its persisted height tracks bwatch's. If we
crash before the ack lands, watchman's stale height will be higher than
ours on restart, which retriggers the rollback.

If the rollback exhausts our history (we rolled back past the oldest
record we still hold) we zero current_height/current_blockhash and let
the next poll re-init from bitcoind's tip.

Notifying owners that their watches were reverted lands in a subsequent
commit.
Add two RPCs for surfacing watches to lightningd on a new block or
reorg.

bwatch_send_watch_found informs lightningd of any watches that were
found in the current processed block.  The owner is used to
disambiguate watches that may pertain to multiple subdaemons.

bwatch_send_watch_revert is sent in case of a revert; it informs the
owner that a previously reported watch has been rolled back.

These functions get wired up in subsequent commits.
After every fetched block, walk each transaction and fire watch_found
for matching scriptpubkey outputs and spent outpoints.

Outputs are matched by hash lookup against scriptpubkey_watches; inputs
by reconstructing the spent outpoint and looking it up in
outpoint_watches.
After the per-tx scriptpubkey/outpoint pass, walk every scid watch and
fire watch_found for any whose encoded blockheight matches the block
just processed.

The watch's scid encodes the expected (txindex, outnum), so we jump
straight there without scanning. If the position is out of range
(txindex past the block, or outnum past the tx) we send watch_found
with tx=NULL, which lightningd treats as the "not found" case.
Subdaemons like channel_open and onchaind care about confirmation
depth, not the underlying tx. Walk blockdepth_watches on every new
block and send watch_found with the current depth to each owner.

This is what keeps bwatch awake in environments like Greenlight,
where we'd otherwise prefer to hibernate: as long as something is
waiting on a confirmation milestone, the blockdepth watch holds the
poll open; once it's deleted, we're free to sleep again.

Depth fires before the per-tx scan so restart-marker watches get a
chance to spin up subdaemons before any outpoint hits land for the
same block. Watches whose start_block is ahead of the tip are stale
(reorged-away, awaiting delete) and skipped.
On init, query bcli for chain name, headercount, blockcount and IBD
state, then forward the result to watchman via the chaininfo RPC
before bwatch starts its normal poll loop. Watchman uses this to
gate any work that depends on bitcoind being synced.

If bitcoind's blockcount comes back lower than our persisted tip,
peel stored blocks off until they line up so watchman gets a
consistent picture. During steady-state polling the same case is
handled by hash-mismatch reorg detection inside handle_block; this
shortcut only matters at startup, before we've fetched anything.

If bcli or watchman is not yet ready, log and fall back to scheduling
the poll loop anyway so init never stalls.

bwatch_remove_tip is exposed in bwatch.h so the chaininfo path in
bwatch_interface.c can use it.
addscriptpubkeywatch and delscriptpubkeywatch are how lightningd asks
bwatch to start/stop watching an output script for a given owner.
addoutpointwatch and deloutpointwatch are how lightningd asks bwatch
to start/stop watching a specific (txid, outnum) for a given owner.
addscidwatch and delscidwatch are how lightningd asks bwatch to
start/stop watching a specific short_channel_id for a given owner.
The scid pins the watch to one (block, txindex, outnum), so on each
new block we go straight to that position rather than scanning.
addblockdepthwatch and delblockdepthwatch are how lightningd asks
bwatch to start/stop a depth-tracker for a given (owner, start_block).
start_block doubles as the watch key and the anchor used to compute
depth = tip - start_block + 1 on every new block.
listwatch returns every active watch as a flat array. Each entry
carries its type-specific key (scriptpubkey hex, outpoint, scid
triple, or blockdepth anchor) plus the common type / start_block /
owners fields, so callers can dispatch on the per-type key without
parsing the type string first.

Mostly used by tests and operator tooling to inspect what bwatch
is currently tracking.
@sangbida sangbida force-pushed the async-block-processing branch 2 times, most recently from 0c7e64a to 11a4d26 Compare April 23, 2026 13:59
sangbida added 22 commits April 28, 2026 08:08
Part of the chaintopology spring clean.
The feerate samples now live alongside everything else watchman owns;
chain_topology no longer needs to know about them.
Part of the chaintopology spring clean. Wrap the fee poll in a small
struct fee_poll on lightningd, hardcode the cadence at 30s
(BITCOIND_POLL_SECONDS, matching bwatch's default), and use topo as
the bcli request ctx.

The fee poll is a stopgap; eventually bwatch will push feerate
updates alongside blocks. --dev-bitcoind-poll is kept as a deprecated
no-op so existing test fixtures keep parsing.
Replace topology_synced() and topology_add_sync_waiter() with direct
ld->bitcoind->synced checks.  ld->bitcoind->synced is already driven by
watchman.
lightningd's perspective of the block height should advance only when
bwatch has fully delivered all the txs in a block — exactly what
watchman tracks (last_processed_height).
This used to bump our height up to the bitcoin backend's headercount
when chaintopology hadn't caught up yet, mostly to compute slightly
tighter HTLC expiries.  Bwatch should startup in parallel to lightningd
now so it may be simpler to use the blockheight provided by bwatch.

A TODO for me would be to verify this using a test.
bwatch now drives block ingestion, so chain_topology has nothing
to bootstrap or stop.  Create the bitcoind backend and start fee
polling inline at wallet-init and drop begin_topology /
stop_topology / broadcast_shutdown.

Rebroadcast was previously kicked off by begin_topology; trigger
it from notify_feerate_change instead so RBF still chases the
new feerate.
The remaining chain_topology stub does nothing useful — bwatch
drives block ingestion and feerate.c owns fee polling. Remove
the file, the lightningd field, the Makefile entry, and the
new_topology call site.

Many consumers used to get broadcast.h and feerate.h indirectly
through chaintopology.h. They now include those directly, which
accounts for the include-only churn across ~16 files. In particular,
channel_control.c (calls rebroadcast_txs) and peer_control.c
(calls broadcast_tx) lost their transitive route to broadcast.h
when notification.h stopped including chaintopology.h, so both
gain a direct include.

Move struct txlocator to feerate.h since it has no other home.
The blanket "skip everything" hook is being unwound suite-by-suite.
Replace it with an empty allowlist so subsequent commits can opt
each test file in once it has been ported and verified, without
having to keep re-tweaking the hook itself.

Behaviour is unchanged at this commit (allowlist is empty, so
everything still skips).
These tests load pre-recorded sqlite3.xz fixtures (or run the
lightning-downgrade tool) that all predate the bwatch-era schema
(our_outputs / our_txs and the dropped utxoset / outputs tables).
Skipping them individually keeps their parent suites unblocked
when we re-enable test files one at a time.

TODO: regenerate or rewrite each of these fixtures against the
new schema and remove the @pytest.mark.skip decorators. Files
touched: test_db.py, test_invoices.py, test_runes.py,
test_wallet.py, test_coinmoves.py, test_bookkeeper.py,
test_downgrade.py.
Test files identical between master and the cherry-pick-blockid-helpers
target — they touch nothing chaintopology / watch.c / txfilter could
have affected, so they should be safe to run on top of the bwatch
migration unchanged.

Adds to BWATCH_MIGRATION_ALLOWLIST:
  test_cln_rs, test_clnrest, test_mkfunding, test_onion,
  test_reckless, test_renepay, test_runes

TODO: revisit test_cln_lsps once the wallet migration story is
complete; it currently fails on the CI builder.
When fundpsbt/addpsbtoutput derive a fresh change address, immediately register the scriptpubkey watch in bwatch so those outputs are tracked consistently before confirmation. This matches the original branch behavior and avoids reservation mismatches in wallet tests.
Test fixtures previously set rescan=1 so the wallet would re-scan the
last block on startup/restart.  In the bwatch world, every wallet
scriptpubkey is registered as a perennial watch, and asking for a 1-block
rescan on startup re-arms every per-keyindex watch and triggers a rescan
loop that drops in-memory reservation state.  This was visible as
test_reserveinputs failing with `assert not True` after `l1.restart()`,
because every UTXO's reserved_til was reset by the rescan path.

Drop the default to 0 to match upstream behaviour.  Tests that need an
explicit rescan (e.g. test_bip86_mnemonic_recovery) opt in via `options=`.
The four wallet_datastore_{get,create,update,remove} helpers used to
require the caller to be inside a wallet transaction; otherwise the
underlying db_prepare_v2 fatals at db/utils.c:103 with "Attempting to
prepare a db_stmt outside of a transaction".
@sangbida sangbida force-pushed the async-block-processing branch from 674cff1 to 09b3f08 Compare April 27, 2026 23:59
Two bugs broke splices on bwatch:

1. Duplicate scid update.  channel_splice_watch_found already sets
   channel->scid, so handle_peer_splice_locked's change_scid call
   re-added the same scid to chanmap and aborted.  Fold change_scid
   into depthcb_update_scid (matching the original branch), make it
   a no-op when the scid hasn't changed, and notify gossip from
   handle_peer_splice_locked directly.

2. Missing funding rawtx.  The peer's splice handshake calls
   splice_lookup_tx, which reads our_txs.  We never stored the
   funding tx there, so every splice failed with "channel control
   unable to find txid".  Save it on first confirmation (annotated
   TX_CHANNEL_FUNDING) and on later splice / unexpected-outpoint
   events.
For CHANNELD_AWAITING_SPLICE, channel_funding_depth_found called
channeld_tell_depth (splicing=false).  channeld then treated the
splice confirmation as the original funding tx, overwrote
short_channel_ids[LOCAL], and aborted with "Duplicate splice_locked
events detected by scid check".

Use channeld_tell_splice_depth for the splice case instead.
bwatch delivers peer_got_splice_locked asynchronously, so the two
peers can advance their gossip state machines a few ms apart.  When
the slower peer retransmits announcement_signatures, the faster
peer's sent_sigs guard suppresses the response and the channel never
finishes announcing.

Clear sent_sigs in WAITING_FOR_MATCHING_PEER_SIGS so we always
respond.  ANNOUNCE_DEPTH is unaffected by the race.
Add test_splice.py, test_splicing.py, test_splicing_disconnect.py
and test_splicing_insane.py to the bwatch-migration allowlist now
that the splice path works under bwatch.
bwatch fires output watches before input watches, so the change
deposit can arrive before the spend withdrawal in the bookkeeper.
Group each tx's spend+change as unordered pairs in
test_script_splice_{out,in}.
@sangbida
Copy link
Copy Markdown
Collaborator Author

Fix INSERT OR IGNORE for postgres for our_txs and our_outputs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants