payments: kv-to-sql migration (tests + wiring)#10485
payments: kv-to-sql migration (tests + wiring)#10485ziggie1984 wants to merge 10 commits intolightningnetwork:elle-payment-sql-series-newfrom
Conversation
Summary of ChangesHello @ziggie1984, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request introduces a significant architectural change by migrating the payments database from a Key-Value (KV) store to a native SQL schema. The primary goal is to enhance database robustness, query flexibility, and overall performance for payment-related operations. It includes the complete implementation of the migration logic, a new SQL-backed payment store, and a comprehensive suite of tests to ensure data integrity and correct functionality during and after the migration. This change also prepares the system for future database optimizations and features by leveraging the capabilities of a relational database. Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces a comprehensive migration for the payments database from a key-value store to SQL. The changes are well-structured, with the migration logic encapsulated in a new payments/db/migration1 package. This package includes a reader for the old KV store format, cleverly handling historical data quirks like duplicate payments by migrating them as HTLC attempts on the primary payment. The migration is correctly wired into the application startup, guarded by a new migration version and a tombstone mechanism to prevent accidental use of the old database post-migration. The testing strategy is particularly strong, featuring data integrity checks, property-based tests, and specific tests for various payment features, which provides high confidence in the correctness of this complex change. My review is positive, with only a couple of minor stylistic suggestions to improve readability.
| return fmt.Errorf("non bucket element" + | ||
| "in duplicate bucket") |
There was a problem hiding this comment.
This string concatenation is unnecessary and can be combined into a single string to improve readability and adhere to the style guide's principle of minimizing lines for error messages.
return fmt.Errorf("non bucket element in duplicate bucket")References
- The style guide states to 'Minimize lines for log and error messages, while adhering to the 80-character limit.' The current code unnecessarily splits a short line, which goes against the 'minimize lines' principle. (link)
d8531d2 to
2930b37
Compare
2802da4 to
83bce46
Compare
2930b37 to
8c106c7
Compare
622021b to
fc449ad
Compare
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request introduces the migration logic for moving payment data from the key-value store to a SQL database. The changes are extensive, including a new payments/db/migration1 package that encapsulates the migration logic, new SQL schema definitions for duplicate payments, and the necessary wiring in the configuration builder to trigger the migration. The approach is solid, with good isolation of migration code and thorough validation checks to ensure data integrity. I've found a few minor issues related to code style and typos in error messages and comments, which I've detailed in the review comments.
fc449ad to
83a9946
Compare
27f2fec to
5a25e61
Compare
0360b30 to
915f282
Compare
|
Do you want to update https://github.com/ziggie1984/lnd/blob/migration-kvdb-sql-payments-part1/docs/postgres.md or is that going to be part of a different PR? |
|
@yyforyongyu: review reminder |
ellemouton
left a comment
There was a problem hiding this comment.
first pass done! great work!!
payments/db/sql_store.go
Outdated
| */ | ||
|
|
||
| // InsertPaymentMig is a migration-only variant of InsertPayment that | ||
| // allows setting fail_reason when inserting historical payments. |
There was a problem hiding this comment.
cool - i'd just expand this to say "since for real payments, they have not failed at creation time and so no failure reason would exist yet" or something to that tune
There was a problem hiding this comment.
Done — expanded the comment to explain why the migration variant needs the fail_reason parameter.
| -- Logical identifier for the duplicate payment. This is the payment hash | ||
| -- of the duplicate payment. | ||
| payment_identifier BLOB NOT NULL, |
There was a problem hiding this comment.
isnt this implied given the payment_id? ie, is this data not duplicating something already stored in the payemtns table?
There was a problem hiding this comment.
Good point — removed. Since payment_id is a FK to payments.id and payments already stores payment_identifier, the column was redundant. Duplicates by definition share the same hash as the parent payment, so it can always be looked up via the FK join.
| // UnknownElementType is an alias for channeldb.UnknownElementType. | ||
| type UnknownElementType = channeldb.UnknownElementType |
There was a problem hiding this comment.
think we should not do this type alias. when we "freeze" code, we really want it to be properly frozen. So rather do a bit of copying here i'd say.
There was a problem hiding this comment.
Agreed on the principle — the frozen code should truly be self-contained. The scope is fairly large though (copying UnknownElementType + minimal codec, copying route.Hop/route.Route/Vertex types + methods, removing dead code like generateSphinxPacket that depends on ToSphinxPath, and updating all references across ~8 files). Will address this in a follow-up PR to keep this one focused.
|
|
||
| // ReadElement deserializes a single element from the provided io.Reader. | ||
| func ReadElement(r io.Reader, element interface{}) error { | ||
| err := channeldb.ReadElement(r, element) |
There was a problem hiding this comment.
same here: to truly be frozen, these helpers from channeldb should also be frozen.
Same goes for the types used in this "freeze" from the route package - route.Hop and route.Route
There was a problem hiding this comment.
Same as above — will address the full freeze (channeldb helpers + route types) in a follow-up PR.
| return ai.CreatedAt.Before(aj.CreatedAt) | ||
| } | ||
|
|
||
| return ai.AmountMsat < aj.AmountMsat |
There was a problem hiding this comment.
wont amount mainly be the same for duplicate payments?
There was a problem hiding this comment.
Usually yes, but for zero-amount invoices each duplicate payment could specify a different amount, so the tiebreaker is still meaningful in that case. Also sort.SliceStable preserves insertion order for fully equal elements, so the sort stays deterministic regardless.
| @@ -0,0 +1,52 @@ | |||
| # Payment Migration External Testdata | |||
There was a problem hiding this comment.
got this for one of the tests:
go test -v -tags="test_db_sqlite kvdb_sqlite" \
-run TestMigrationWithExternalDB
=== RUN TestMigrationWithExternalDB
=== RUN TestMigrationWithExternalDB/testdata
migration_external_test.go:170: Connecting to channel DB at: testdata/channel.sqlite
test_sqlite.go:51: Creating new SQLite DB for testing
2026-02-02 15:05:11.024 [INF]: Starting payment migration from KV to SQL...
migration_external_test.go:71:
Error Trace: /Users/elle/LL/lnd/payments/db/migration1/migration_external_test.go:71
/Users/elle/LL/lnd/payments/db/migration1/migration_external_test.go:177
Error: Received unexpected error:
migrate payments: migrate payment d775df0ead97b006: migrate attempt 0: HTLC attempt 0 missing payment hash (parent payment hash=d775df0ead97b0069c58664d3eb1a1ec6b75ca4a445b0104f584851d577187d8)
Test: TestMigrationWithExternalDB/testdata
--- FAIL: TestMigrationWithExternalDB (0.29s)
--- FAIL: TestMigrationWithExternalDB/testdata (0.29s)
FAIL
exit status 1
FAIL github.com/lightningnetwork/lnd/payments/db/migration1 0.612s
There was a problem hiding this comment.
this for another:
=== RUN TestMigrationWithExternalDB/testdata
migration_external_test.go:170: Connecting to channel DB at: testdata/channel.db
test_postgres.go:72: Creating new Postgres DB 'test_e01f16e8c3aa37b4' for testing
2026-02-02 15:14:04.990 [INF]: Starting payment migration from KV to SQL...
2026-02-02 15:14:07.525 [INF]: Validated 92/92 payments
2026-02-02 15:14:07.528 [INF]: ========================================
2026-02-02 15:14:07.528 [INF]: Payment Migration Summary
2026-02-02 15:14:07.528 [INF]: ========================================
2026-02-02 15:14:07.528 [INF]: Total Payments: 92
2026-02-02 15:14:07.528 [INF]: Successful: 76
2026-02-02 15:14:07.528 [INF]: Failed: 16
2026-02-02 15:14:07.528 [INF]: In-Flight: 0
2026-02-02 15:14:07.528 [INF]: Initiated: 0
2026-02-02 15:14:07.528 [INF]:
2026-02-02 15:14:07.528 [INF]: Total HTLC Attempts: 318
2026-02-02 15:14:07.528 [INF]: Settled: 76
2026-02-02 15:14:07.528 [INF]: Failed: 242
2026-02-02 15:14:07.528 [INF]: In-Flight: 0
2026-02-02 15:14:07.528 [INF]:
2026-02-02 15:14:07.528 [INF]: Total Route Hops: 1283
2026-02-02 15:14:07.528 [INF]:
2026-02-02 15:14:07.528 [INF]: Migration Duration: 2.538169416s
2026-02-02 15:14:07.528 [INF]: ========================================
--- PASS: TestMigrationWithExternalDB/testdata (6.04s)
--- PASS: TestMigrationWithExternalDB (6.04s)
PASS
There was a problem hiding this comment.
and:
go test -v -tags="test_db_sqlite kvdb_sqlite" \
-run TestMigrationWithExternalDB
=== RUN TestMigrationWithExternalDB
=== RUN TestMigrationWithExternalDB/testdata
migration_external_test.go:170: Connecting to channel DB at: testdata/channel.db
test_sqlite.go:51: Creating new SQLite DB for testing
2026-02-02 15:18:56.829 [INF]: Starting payment migration from KV to SQL...
2026-02-02 15:18:56.932 [INF]: Validated 92/92 payments
2026-02-02 15:18:56.932 [INF]: ========================================
2026-02-02 15:18:56.932 [INF]: Payment Migration Summary
2026-02-02 15:18:56.932 [INF]: ========================================
2026-02-02 15:18:56.932 [INF]: Total Payments: 92
2026-02-02 15:18:56.932 [INF]: Successful: 76
2026-02-02 15:18:56.932 [INF]: Failed: 16
2026-02-02 15:18:56.932 [INF]: In-Flight: 0
2026-02-02 15:18:56.932 [INF]: Initiated: 0
2026-02-02 15:18:56.932 [INF]:
2026-02-02 15:18:56.932 [INF]: Total HTLC Attempts: 318
2026-02-02 15:18:56.932 [INF]: Settled: 76
2026-02-02 15:18:56.932 [INF]: Failed: 242
2026-02-02 15:18:56.932 [INF]: In-Flight: 0
2026-02-02 15:18:56.932 [INF]:
2026-02-02 15:18:56.932 [INF]: Total Route Hops: 1283
2026-02-02 15:18:56.932 [INF]:
2026-02-02 15:18:56.932 [INF]: Migration Duration: 102.832041ms
2026-02-02 15:18:56.932 [INF]: ========================================
--- PASS: TestMigrationWithExternalDB (0.67s)
--- PASS: TestMigrationWithExternalDB/testdata (0.67s)
PASS
ok github.com/lightningnetwork/lnd/payments/db/migration1 1.147s
There was a problem hiding this comment.
Thanks for testing! The "HTLC attempt 0 missing payment hash" failure was caused by legacy payments where the HTLC hash field is nil in bbolt. This is fixed by falling back to the parent payment hash when the HTLC-specific hash is nil (consistent with how patchLegacyPaymentHash works in payment_lifecycle.go). The fix is in a follow-up commit.
There was a problem hiding this comment.
Specifically: for legacy/older payments the htlc.Hash field can be nil in bbolt. The fix changes migrateHTLCAttempt to fall back to the parent payment hash (parentPaymentHash) when htlc.Hash is nil, instead of returning an error. This is consistent with how the router already handles these in patchLegacyPaymentHash (payment_lifecycle.go). The validation logic applies the same fallback so the KV↔SQL comparison passes. Fix is in a follow-up commit (adf8264).
| "graph to SQL: %w", err) | ||
| } | ||
|
|
||
| return nil |
There was a problem hiding this comment.
another option could be to do "sample validation". so like, validate every 100th payment or something like that. so like, set a ValidationSampleRate or something? just an idea
There was a problem hiding this comment.
Interesting idea, but I think we can defer this to a follow-up if needed. A few reasons:
- The current binary choice is clean —
SkipMigrationValidationis simple: validate everything or skip entirely. A sample rate adds a config knob most users won't know how to tune. - Sampling gives a false sense of security — if there's a systematic bug, even 1% sampling catches it. If there's a rare edge case, sampling might miss it. So it's either "validate all" for confidence or "skip" for speed — the middle ground doesn't buy much.
- Real-world timing looks fine — from your own test output, 92 payments + 318 HTLC attempts validated in ~2.5s. Even for a node with 10k payments full validation should be manageable. The skip flag covers truly massive databases.
- YAGNI — if users report validation being too slow we can add it then with real data on what sample rate is useful.
Happy to add it if you feel strongly though!
8c106c7 to
2104014
Compare
915f282 to
0b84764
Compare
Add a migration specific query which allows to set the failure reason when inserting a payment into the db.
Older LND versions could create multiple payments for the same hash. We need to preserve those historical records during KV→SQL migration, but they don’t fit the normal payment schema because we enforce a unique payment hash constraint. Introduce a lean payment_duplicates table to store only the essential fields (identifier, amount, timestamps, settle/fail data). This keeps the primary payment records stable and makes the migration deterministic even when duplicate records lack attempt info. The table is intentionally minimal and can be dropped after migration if no duplicate payments exist. For now there is no logic in place which allows the noderunner to fetch duplicate payments after the migration.
Copy the core payments/db code into payments/db/migration1 and add the required sqlc-generated types/queries from sqldb/sqlc. This effectively freezes the migration code so it stays robust against future query or schema changes in the main payments package.
Implement the KV→SQL payment migration and add an in-migration validation pass that deep-compares KV and SQL payment data in batches. Duplicate payments are migrated into the payment_duplicates table, and duplicates without attempt info or explicit resolution are marked failed to ensure terminal state. Validation checks those rows as well.
Add test helpers plus sql_migration_test coverage for KV→SQL migration. Basic migration, sequence ordering, data integrity, and feature-specific cases (MPP/AMP, custom records, blinded routes, metadata, failure messages). Also cover duplicate payment migration to payment_duplicates, including missing attempt info to ensure terminal failure is recorded. This gives broad regression coverage for the migration path and its edge-cases.
Add a developer-facing migration_external_test that allows running the KV→SQL payments migration against a real channel.db backend to debug migration failures on actual data. The accompanying testdata README documents how to supply a database file and configure the test, so users can validate their data and confirm the migration completes successfully. The test is skipped by default and meant for manual diagnostics.
Hook the payments KV→SQL migration into the SQL migration config. The migration is still only available when building with the build tag "test_native_sql". Moreover a tombstone protection similar to the invoice migration is added to prevent re-runningi with the KV backend once migration completes.
Add a config flag to skip in-migration validation for the KV->SQL payments migration. This is added as an option in case bigger payment databases don't require strict validation but instead prefer speed. This commit wires the option through the config, documents it in the sample config, and disables batch/count validation when requested.
0b84764 to
b03694a
Compare
🔴 PR Severity: CRITICAL
🔴 Critical (4 files)
🟠 High (30 files)Payment Database Operations:
SQL Database Layer:
🟡 Medium (4 files)
🟢 Low (2 files)
AnalysisThis PR is classified as CRITICAL for the following reasons:
Recommendation: This PR requires review by maintainers with deep knowledge of:
To override, add a |
Adds the KV db migration code to native SQL.
After merging this PR, the migration is still hidden behind
test_native_sql