graphdb: fix backwards-compat for channel edge feature deserialization#10529
Conversation
Summary of ChangesHello @Roasbeef, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request resolves a significant backwards compatibility problem related to how channel edge features are deserialized from the database. Previously, an update to the serialization format in v0.20.x caused Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request addresses a critical backwards-compatibility issue with channel edge feature deserialization. The introduction of deserializeChanEdgeFeatures to handle both legacy and new formats is a good approach. The accompanying tests are extensive and cover many scenarios, including property-based testing.
However, I've identified a critical flaw in the format detection logic that could lead to data corruption in rare edge cases. The assumption that legacy feature vectors cannot accidentally match the new format's length prefix is incorrect for larger feature vectors. I've provided a detailed explanation and a suggested fix in a specific comment.
Once this critical issue is addressed, the PR will be a solid fix for the original problem.
| if len(featureBytes) >= 2 { | ||
| encodedLen := binary.BigEndian.Uint16(featureBytes[:2]) | ||
| if int(encodedLen) == len(featureBytes)-2 { | ||
| // New format: skip the 2-byte length prefix and decode | ||
| // the remaining bytes as raw feature bits. | ||
| err := features.DecodeBase256( | ||
| bytes.NewReader(featureBytes[2:]), | ||
| int(encodedLen), | ||
| ) | ||
| if err != nil { | ||
| return nil, fmt.Errorf("unable to decode "+ | ||
| "features (new format): %w", err) | ||
| } | ||
|
|
||
| return lnwire.NewFeatureVector( | ||
| features, lnwire.Features, | ||
| ), nil | ||
| } | ||
| } |
There was a problem hiding this comment.
The current format detection logic int(binary.BigEndian.Uint16(featureBytes[:2])) == len(featureBytes)-2 is not safe and can lead to data corruption. A legacy-formatted feature vector can accidentally satisfy this condition, causing it to be misinterpreted as the new format.
For example, a legacy feature vector of 258 bytes where the first two bytes are 0x01, 0x00 would be incorrectly detected as the new format, because 256 == 258 - 2. This would lead to the first two bytes being stripped and the remaining 256 bytes being parsed as a feature vector, losing the information from the first two bytes. This is possible if the original feature vector had a feature bit in the range [2056, 2063] set.
The existing test TestDeserializeChanEdgeFeaturesLegacyFormatNoCollision does not catch this because it only generates feature vectors with a maxBit up to 255, resulting in a maximum length of 32 bytes, which is too small to trigger this collision.
To fix this, we should add a check for canonical encoding. After decoding the payload (bytes after the supposed length prefix), we should re-encode it and verify that it produces the exact same byte slice. Legacy feature vectors that accidentally match the length check are very unlikely to be canonically encoded, so this will allow us to reliably distinguish between the formats.
if len(featureBytes) >= 2 {
encodedLen := binary.BigEndian.Uint16(featureBytes[:2])
if int(encodedLen) == len(featureBytes)-2 {
// This looks like the new format. To be sure, we decode
// and re-encode to check for canonical encoding, as a
// legacy feature vector could accidentally match the
// length check.
payload := featureBytes[2:]
tempFeatures := lnwire.NewRawFeatureVector()
err := tempFeatures.DecodeBase256(
bytes.NewReader(payload), int(encodedLen),
)
var checkBuf bytes.Buffer
if err == nil {
err = tempFeatures.EncodeBase256(&checkBuf)
}
// If there were no errors and the re-encoded payload
// matches, we are confident it's the new format.
if err == nil && bytes.Equal(checkBuf.Bytes(), payload) {
return lnwire.NewFeatureVector(
tempFeatures, lnwire.Features,
), nil
}
}
}There was a problem hiding this comment.
I think to be consistent we should do this
🟠 PR Severity: HIGH
🟠 High (1 file)
🟢 Test Files (1 file)
AnalysisThis PR fixes a backwards-compatibility issue in channel edge feature deserialization within the graph database layer ( The PR modifies core graph database logic (60 lines in production code) to handle feature deserialization correctly. While the change is relatively focused (1 production file, 65 non-test lines), it touches critical graph database functionality that could affect channel state persistence and network graph operations if not handled correctly. No severity bump modifiers apply:
To override, add a |
pinheadmz
left a comment
There was a problem hiding this comment.
ACK 909fa1a
Tested on 32-bit arm/ RPiOS, confirmed lnd started up and synced, this commit solves #10528
{
"version": "0.20.99-beta commit=v0.3-alpha-18056-g909fa1a59",
"commit_hash": "909fa1a59ee7aaf0e65b710633c7dfc0cc0ce6ca",
"identity_pubkey": "0355157b4260b70c7f407a720c527a84e9522cd948e7a8ad92ae00773be52488e3",
"alias": "🌟Star⭐️Service🌟",
"color": "#3399ff",
"num_pending_channels": 0,
"num_active_channels": 3,
"num_inactive_channels": 3,
"num_peers": 5,
"block_height": 934226,
"block_hash": "0000000000000000000105d3650ec8ac8a25a9f93b00cdb06dad558a550ac2f4",
"best_header_timestamp": "1769696561",
"synced_to_chain": true,
"synced_to_graph": true,
"testnet": false,
"chains": [
{
"chain": "bitcoin",
"network": "mainnet"
}
],
|
@claude review this |
|
Claude encountered an error —— View job I'll analyze this and get back to you. |
ziggie1984
left a comment
There was a problem hiding this comment.
Looks good, pending CI and we should think whether it is worth ckecking for the collision of the legacy and new format.
|
@claude review this |
909fa1a to
c65b268
Compare
🟠 PR Severity: HIGH
🟠 High (1 file)
🟢 Low (1 file)
⚪ Test Files (excluded from severity)
AnalysisThis PR fixes a backwards-compatibility issue in the graph database layer for deserializing channel edge features. The core change is in Severity Determination:
This requires review by an engineer knowledgeable in the graph database layer, as it affects how channel edge features are persisted and retrieved. The backwards-compatibility aspect is important to ensure existing channel data can be properly read. To override, add a |
This commit fixes a backwards compatibility issue that prevented nodes from upgrading from v0.19.x to v0.20.x. In v0.19.x, channel edge features were serialized as raw feature bytes without a length prefix. In v0.20.x (commit 2f2845d), the serialization changed to use Features.Encode() which adds a 2-byte big-endian length prefix before the feature bits. The deserialization code was updated to use Features.Decode() which expects this length prefix. When v0.20.x reads a database created by v0.19.x, Decode() tries to read a length prefix that doesn't exist, causing an EOF error: unable to decode features: EOF The fix adds a deserializeChanEdgeFeatures() helper that detects which format is being read and decodes accordingly: - New format (v0.20+): First 2 bytes encode the length of the remaining bytes. Detected when uint16(bytes[0:2]) == len(bytes)-2. - Legacy format (pre-v0.20): Raw feature bits without length prefix. Uses DecodeBase256 with the known length. The format detection is safe because in the legacy format, the first byte always has at least one bit set (the serialization uses minimum bytes), so the first two bytes can never encode a value equal to len-2. Fixes lightningnetwork#10528.
1676dd6 to
56a7f45
Compare
|
Created backport PR for
Please cherry-pick the changes locally and resolve any conflicts. git fetch origin backport-10529-to-v0.20.x-branch
git worktree add --checkout .worktree/backport-10529-to-v0.20.x-branch backport-10529-to-v0.20.x-branch
cd .worktree/backport-10529-to-v0.20.x-branch
git reset --hard HEAD^
git cherry-pick -x 56a7f45b998054ad290ecfe3ede57f274b58a5b4
git push --force-with-lease |
🟠 PR Severity: HIGH
🟠 High (1 file)
🟢 Low (2 files - excluded from severity)
AnalysisThis PR addresses a backwards-compatibility issue in the graph database's channel edge feature deserialization. The core change is in The fix ensures proper handling of feature deserialization for channel edges, which is critical for maintaining network graph integrity but doesn't touch consensus-critical or fund-security components (which would be CRITICAL severity). No severity bump applied:
This requires review from an engineer knowledgeable in graph database operations and Lightning Network gossip protocol semantics. To override, add a |
…20.x-branch [v0.20.x-branch] Backport #10529: graphdb: fix backwards-compat for channel edge feature deserialization
This commit fixes a backwards compatibility issue that prevented nodes from upgrading from v0.19.x to v0.20.x.
In v0.19.x, channel edge features were serialized as raw feature bytes without a length prefix. In v0.20.x (commit 2f2845d), the serialization changed to use
Features.Encode()which adds a 2-byte big-endian length prefix before the feature bits. The deserialization code was updated to useFeatures.Decode()which expects this length prefix.When v0.20.x reads a database created by v0.19.x,
Decode()tries to read a length prefix that doesn't exist, causing an EOF error:The fix adds a
deserializeChanEdgeFeatures()helper that detects which format is being read and decodes accordingly:uint16(bytes[0:2]) == len(bytes)-2.DecodeBase256with the known length.The format detection is safe because in the legacy format, the first byte always has at least one bit set (the serialization uses minimum bytes), so the first two bytes can never encode a value equal to len-2.
Fixes #10528.