Skip to content

Conversation

@willcl-ark
Copy link

.

@github-actions
Copy link

Benchmark Results

Comparison to nightly master:

  • No nightly history available for comparison

View detailed results
View nightly trend chart

@github-actions
Copy link

Benchmark Results

Comparison to nightly master:

  • 450 MB: 48 min (nightly: 60 min, 2026-01-12) → +20.0% faster
  • 32000 MB: 38 min (nightly: 45 min, 2026-01-12) → +15.1% faster

View detailed results
View nightly trend chart

@github-actions
Copy link

Benchmark Results

Comparison to nightly master:

  • 450 MB: 49 min (nightly: 61 min, 2026-01-13) → +20.2% faster
  • 32000 MB: 38 min (nightly: 46 min, 2026-01-13) → +16.5% faster

View detailed results
View nightly trend chart

willcl-ark and others added 21 commits January 15, 2026 02:54
Adds build configuration, benchmarking CI workflows, Python
dependencies, plotting tools, and documentation for benchcoin.

Co-authored-by: David Gumberg <davidzgumberg@gmail.com>
Co-authored-by: Lőrinc <pap.lorinc@gmail.com>
- Fix empty chart: use get_chart_data() instead of to_dict() so JS
  filters can match config strings ("450", "32000") instead of objects
- Capture machine specs on self-hosted runner during build job and pass
  via --machine-specs flag to nightly append, instead of detecting on
  the ubuntu-latest publish runner
willcl-ark and others added 15 commits January 15, 2026 10:40
Add a `Reset()` method to `CCoinsViewCache` that clears `cacheCoins`,
`cachedCoinsUsage`, and `hashBlock` without flushing to the `base` view.
This allows efficiently reusing a cache instance across multiple blocks.

Introduce `m_connect_block_view` as a persistent cache layer for `ConnectBlock`,
avoiding repeated memory allocations. On block validation failure, `Reset()`
discards uncommitted changes without affecting the main cache.

Co-authored-by: l0rinc <pap.lorinc@gmail.com>
Introduce a helper to look up a Coin through a stack of CCoinsViewCache layers without populating parent caches.

This is useful for ephemeral views (e.g. during ConnectBlock) that want to avoid polluting CoinsTip() when validating invalid blocks.

Co-authored-by: l0rinc <pap.lorinc@gmail.com>
Introduce CoinsViewCacheAsync, a CCoinsViewCache subclass that reads coins
without mutating the underlying cache via FetchCoin().

Add GetCoinFromBase() which is called for cache misses in FetchCoin. In
CoinsViewCacheAsync this method is overridden and calls PeekCoin(). This
prevents the main cache from caching inputs pulled from disk for a block that
has not yet been fully validated. Once Flush() is called on
m_connect_block_view, these inputs will be added as spent to coinsCache in
the main cache via BatchWrite().

This is the foundation for async input fetching, where worker threads must not
mutate shared state.

Co-authored-by: l0rinc <pap.lorinc@gmail.com>
Refactor TestCoinsView() to accept the cache as a parameter instead of
creating it internally. This prepares for adding CoinsViewCacheAsync
fuzz targets that need to pass in a different cache type.

This is a non-functional change.

Co-authored-by: l0rinc <pap.lorinc@gmail.com>
Add StartFetching() to populate a queue of all transaction inputs in a block,
then fetch them all via ProcessInput() before entering ConnectBlock.
GetCoinFromBase() now checks this queue first.

StartFetching() returns a FetchControl struct which is bound to the lifetime of
the block. When FetchControl goes out of scope and is destroyed, it will
clear the fetched inputs so the prevout referencing the block are not accessed.

Introduce InputToFetch struct to track each input's outpoint and fetched coin.
GetCoinFromBase() scans the queue sequentially, matching ConnectBlock's access
pattern where inputs are processed in block order.

ProcessInput() fetches coins one at a time using PeekCoin(),
preparing for parallel execution in later commits.

Also add fuzz targets for CoinsViewCacheAsync and add StartFetching() to
unit tests.

Co-authored-by: sedited <seb.kung@gmail.com>
Co-authored-by: l0rinc <pap.lorinc@gmail.com>
Add a benchmark measuring CoinsViewCacheAsync performance when fetching
inputs for a block. Creates a realistic scenario by adding all inputs from
block 413567 to a chainstate with an in memory leveldb backend.

Measures the time to access all inputs through the cache.

Co-authored-by: l0rinc <pap.lorinc@gmail.com>
Skip fetching inputs that spend outputs created earlier in the same block,
since these coins won't exist in the cache or database yet.

Store the first 8 bytes of each transaction's txid in a sorted vector for
O(log n) binary search lookups. Using truncated txids is a performance
optimization; in the rare case of a collision, the input simply won't be
prefetched and will fall back to normal fetching on the main thread.

This adds a performance regression due to the extra sorting and filtering.
Since the benchmark uses an in-memory leveldb, there is no real disk I/O that is avoided.

> bench: add CoinsViewCacheAsync benchmark
|               ns/op |                op/s |    err% |          ins/op |          cyc/op |    IPC |         bra/op |   miss% |     total | benchmark
|--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|---------------:|--------:|----------:|:----------
|        1,664,383.00 |              600.82 |    2.4% |   31,957,257.00 |    4,017,069.00 |  7.955 |   5,318,396.00 |    0.6% |      0.02 | CoinsViewCacheAsyncBenchmark

> coins: filter inputs created in same block before fetching
|               ns/op |                op/s |    err% |          ins/op |          cyc/op |    IPC |         bra/op |   miss% |     total | benchmark
|--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|---------------:|--------:|----------:|:----------
|        1,970,543.00 |              507.47 |    4.0% |   32,640,039.00 |    4,760,784.00 |  6.856 |   5,506,291.00 |    1.2% |      0.02 | CoinsViewCacheAsyncBenchmark

Co-authored-by: l0rinc <pap.lorinc@gmail.com>
Allow the main thread to process unfetched inputs while waiting for a
specific coin. Instead of blocking, GetCoinFromBase() calls ProcessInput() to
make forward progress on the queue.

This prepares for parallel fetching where the main thread can help workers
complete the queue rather than idling while waiting.

Co-authored-by: l0rinc <pap.lorinc@gmail.com>
Restructure TestCoinsView() to perform all checks that don't mutate the
backend before accessing backend_coins_view with HaveCoin()/GetCoin().

This prepares for CoinsViewCacheAsync testing, where we want to run as many
checks as possible while async fetching is still active. Only at the very
end do we call StopFetching() and perform the backend consistency checks
that require mutating calls (HaveCoin/GetCoin call FetchCoin which writes
to cacheCoins).

Non-mutating operations like GetBestBlock(), EstimateSize(), and Cursor()
can safely run on the backend while workers are still fetching.

Co-authored-by: l0rinc <pap.lorinc@gmail.com>
Rename ProcessInput to ProcessInputInBackground.

Add thread-safe synchronization primitives to allow any thread to safely call
ProcessInputInBackground once all threads arrive_and_wait() a
std::barrier.

Make m_input_head a std::atomic_unit32_t, so workers can claim inputs
atomically in ProcessInputInBackground.

Make ready flag a std::atomic_flag per InputToFetch to act as an atomic
memory fence. Workers release and the main thread acquires the flag to
ensure the coin is seen correctly no matter which thread has written it.

Add StopFetching() private method that skips all remaining inputs, waits for
all threads to arrive at the std::barrier, and resets all state in
CoinsViewCacheAsync.

Override Flush(), Sync(), and SetBackend() to call StopFetching() before
calling CCoinsViewCache base class methods. This ensures no worker threads
can access base while it is being mutated.

Co-authored-by: l0rinc <pap.lorinc@gmail.com>
Spawn a fixed pool of worker threads (default 4) that fetch coins in parallel.
Workers wait at the barrier until StartFetching() signals work is available,
then race to claim and fetch inputs from the queue.

Once all inputs have been fetched, the workers wait at the barrier until the
main thread arrives via StopFetching().

The destructor arrives at the barrier a final time with an empty m_inputs,
which signals to the threads to exit their loop.

> coins: filter inputs created in same block before fetching
|               ns/op |                op/s |    err% |          ins/op |          cyc/op |    IPC |         bra/op |   miss% |     total | benchmark
|--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|---------------:|--------:|----------:|:----------
|        1,970,543.00 |              507.47 |    4.0% |   32,640,039.00 |    4,760,784.00 |  6.856 |   5,506,291.00 |    1.2% |      0.02 | CoinsViewCacheAsyncBenchmark
> validation: fetch inputs on parallel threads
|               ns/op |                op/s |    err% |          ins/op |          cyc/op |    IPC |         bra/op |   miss% |     total | benchmark
|--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|---------------:|--------:|----------:|:----------
|        1,601,969.00 |              624.23 |    2.9% |    8,345,989.00 |    2,232,468.00 |  3.738 |   1,089,340.00 |    1.8% |      0.03 | CoinsViewCacheAsyncBenchmark

Co-authored-by: l0rinc <pap.lorinc@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants