Skip to content

switchrpc: improve SendOnion error handling#10545

Open
calvinrzachman wants to merge 4 commits intolightningnetwork:elle-base-branch-payment-servicefrom
calvinrzachman:switchrpc-error-handle-combined
Open

switchrpc: improve SendOnion error handling#10545
calvinrzachman wants to merge 4 commits intolightningnetwork:elle-base-branch-payment-servicefrom
calvinrzachman:switchrpc-error-handle-combined

Conversation

@calvinrzachman
Copy link
Contributor

Change Description

This PR refactors the SendOnion RPC's error handling to adhere strictly to gRPC best practices, moving away from error reporting within the response message itself.

Problem: Previously, SendOnion conveyed error information within its response body, even for successful gRPC calls (status.OK). This conflated success and failure states, limited the ability to send rich, structured error details, and complicated client-side error handling and observability.

Solution:

  1. Strict gRPC Status Usage: SendOnionResponse is now an empty message. A gRPC status.OK explicitly indicates a successful dispatch, while all failures result in a non-OK gRPC status.
  2. Structured Error Details: Detailed application-level failure information is now transported viaSendOnionFailureDetails messages attached to the gRPC status Details field (as per https://grpc.io/docs/guides/error/).
  3. Refined ErrorCode Enum: The ErrorCode enum has been slimmed down to represent high-level, actionable client states (e.g., HTLC_STATUS_UNKNOWN, DUPLICATE_HTLC). The CLEAR_TEXT_ERROR has been re-added for crucial forward-compatibility, allowing clients to definitively interpret new errors, all of which can fall under the same clear text error classification.
  4. Client Helpers: New helper functions, GetSendOnionFailureDetails (extracts raw failure details without any translation) and UnmarshallSendOnionError (translates into types expected by ChannelRouter, are provided to simplify client interaction with these structured errors.

Benefits:

  • Enhanced Robustness: Clients can implement clearer and more reliable error-handling logic.
  • Improved Observability: Isolating failures to gRPC status codes will enable more precise tracking of SendOnion endpoint issues, offering better insight into remote router communication health.
  • Clearer API Contract: Unambiguous signaling of success and failure states.

NOTE

This is a breaking change for any existing clients of the SendOnion RPC. Given its current usage, this is deemed acceptable. This PR is a follow-up to the error handling improvements for the TrackOnion RPC.

Continues error handling updates from: #10472

Steps to Test

  • make itest icase=send_onion
  • go test -v -tags switchrpc github.com/lightningnetwork/lnd/lnrpc/switchrpc

The new structure uses a top-level `oneof` to provide a
compile-time distinction between a successful payment
(preimage) and a failed one. Additional information on a
failed attempt can be found in FailureDetails.

We now also use a structured ForwardingFailure type for
communicating the failure index and wire message from
failures which occur during htlc forwarding downstream
in the route.
Update both proto and handler to communicate error
information via gRPC status details.
@gemini-code-assist
Copy link

Summary of Changes

Hello @calvinrzachman, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly overhauls the error handling mechanism for the SendOnion RPC to align with modern gRPC standards. The primary goal is to provide a more robust, client-friendly, and observable API by moving error reporting out of the response body and into structured gRPC status details. This change enhances the clarity of success and failure states, improves the ability to track issues, and simplifies client-side error logic, albeit introducing a breaking change for existing integrations.

Highlights

  • Refactored SendOnion RPC Error Handling: The SendOnion RPC now strictly adheres to gRPC best practices. Success is indicated by a status.OK gRPC call, while all failures result in a non-OK gRPC status with detailed error information.
  • Structured Error Details: Detailed application-level failure information is now conveyed via SendOnionFailureDetails messages attached to the gRPC status 'Details' field, replacing in-response error reporting.
  • Refined ErrorCode Enum: The ErrorCode enum has been streamlined to represent high-level, actionable client states such as HTLC_STATUS_UNKNOWN and DUPLICATE_HTLC. A CLEAR_TEXT_ERROR code has been re-added for forward-compatibility with new definitive local errors.
  • New Client Helpers: Helper functions GetSendOnionFailureDetails and UnmarshallSendOnionError are provided to simplify client interaction with the new structured error format, allowing for raw detail extraction or translation into ChannelRouter expected types.
  • Breaking Change: This update introduces a breaking change for existing clients of the SendOnion RPC due to the fundamental shift in error reporting.
  • Follow-up to TrackOnion Improvements: This PR continues the error handling improvements previously applied to the TrackOnion RPC, ensuring consistency across related RPCs.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog
  • docs/release-notes/release-notes-0.21.0.md
    • Added release notes for the switchrpc.TrackOnion RPC overhaul, detailing its robust and type-safe error handling.
    • Added release notes for the switchrpc.SendOnion RPC overhaul, highlighting the move to gRPC status codes for failures, structured SendOnionFailureDetails, and the redesigned ErrorCode enum.
  • itest/lnd_sendonion_test.go
    • Removed context and rpc imports.
    • Updated testSendOnion to assert err for successful onion sends instead of resp.Success and resp.ErrorMessage.
    • Modified trackResp.Preimage access to trackResp.GetPreimage().
    • Removed explicit context creation for SendOnion calls in testSendOnionTwice and testSendOnionConcurrency.
    • Updated trackResp.ErrorMessage and trackResp.EncryptedError assertions to use trackResp.GetFailureDetails() and clientFailure.GetEncryptedErrorData() respectively.
    • Adjusted error decryption and comparison logic in testTrackOnion to use new structured failure details.
  • lnrpc/switchrpc/mock.go
    • Added mockErrorDecrypter struct and its DecryptError method to mock htlcswitch.ErrorDecrypter for testing.
  • lnrpc/switchrpc/switch.pb.go
    • Modified ErrorCode enum: removed PAYMENT_ID_NOT_FOUND, FORWARDING_ERROR, NO_LINK; reordered and added HTLC_STATUS_UNKNOWN.
    • Made SendOnionResponse an empty message.
    • Introduced SendOnionFailureDetails message with ErrorCode, ErrorMessage, and ClearTextFailure.
    • Changed TrackOnionResponse to use a oneof field named result which can be either Preimage or FailureDetails.
    • Added FailureDetails message with ErrorMessage and a oneof field named failure for various structured failure types (e.g., ClearTextFailure, ForwardingFailure, EncryptedErrorData, UnreadableFailure, InternalError).
    • Added new message types: ForwardingFailure, ClearTextFailure, InternalError, and UnreadableFailure.
    • Updated message type indices and oneof wrappers.
  • lnrpc/switchrpc/switch.proto
    • Updated SendOnionResponse to be an empty message.
    • Defined new SendOnionFailureDetails message with error_code, error_message, and clear_text_failure.
    • Revised ErrorCode enum to include DUPLICATE_HTLC, CLEAR_TEXT_ERROR, HTLC_STATUS_UNKNOWN, UNREADABLE_FAILURE_MESSAGE, SWITCH_EXITING, and INTERNAL with updated descriptions.
    • Modified TrackOnionResponse to use a oneof field result for either preimage or failure_details.
    • Defined new FailureDetails message with error_message and a oneof field failure for various structured failure types.
    • Defined new messages: ForwardingFailure, ClearTextFailure, InternalError, and UnreadableFailure.
  • lnrpc/switchrpc/switch.swagger.json
    • Removed switchrpcErrorCode definition and its properties from switchrpcSendOnionResponse.
    • Added new definitions for switchrpcClearTextFailure, switchrpcFailureDetails, switchrpcForwardingFailure, switchrpcInternalError, and switchrpcUnreadableFailure.
    • Updated switchrpcSendOnionResponse to be an empty object.
    • Modified switchrpcTrackOnionResponse to reference switchrpcFailureDetails instead of direct error fields.
  • lnrpc/switchrpc/switch_server.go
    • Removed unused imports: encoding/hex, strconv, strings.
    • Added ErrUnknown sentinel error for unmarshalling failures.
    • Modified SendOnion to use marshallSendOnionError for handling dispatch errors and return an empty SendOnionResponse on success.
    • Updated TrackOnion to return gRPC status errors for ErrPaymentIDNotFound (codes.NotFound) and ErrSwitchExiting (codes.Unavailable).
    • Implemented marshallSendOnionError to translate internal errors into gRPC status errors with SendOnionFailureDetails.
    • Added UnmarshallSendOnionError and GetSendOnionFailureDetails for client-side error extraction.
    • Introduced newTrackOnionFailureResponse helper to construct TrackOnionResponse with failure details.
    • Implemented marshallFailureDetails to convert internal errors into FailureDetails for TrackOnion responses.
    • Added UnmarshallFailureDetails, UnmarshallForwardingError, UnmarshallLinkError, and UnmarshallFailureMessage for structured error unmarshalling.
  • lnrpc/switchrpc/switch_server_test.go
    • Removed unused import encoding/hex.
    • Updated TestSendOnion to expect gRPC status codes and use checkFailureDetails for validating structured error details.
    • Added new test cases for SendOnion covering ambiguous attempt init and clear text errors.
    • Introduced requireSendOnionFailureDetails helper for asserting SendOnion failure details.
    • Updated TestTrackOnion to use checkResponse for validating structured responses and gRPC status errors.
    • Added new test cases for TrackOnion covering generic internal errors, clear text errors, and forwarding errors.
    • Removed TestTranslateErrorForRPC, TestParseForwardingError, and TestForwardingErrorEncodeDecode.
    • Added TestMarshallFailureDetails, TestUnmarshallFailureDetails, TestUnmarshallSendOnionError, and TestGetSendOnionFailureDetails to thoroughly test the new error marshalling and unmarshalling logic.
  • lntest/rpc/switch.go
    • Modified the SendOnion helper function to return an error instead of *switchrpc.SendOnionResponse, aligning with the new gRPC error handling approach.
Activity
  • This pull request is a direct follow-up to the error handling improvements implemented for the TrackOnion RPC, continuing the effort to standardize and enhance error reporting across the switchrpc.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This is an excellent pull request that significantly improves the error handling of the SendOnion RPC. The refactoring to use gRPC status codes and structured error details aligns with best practices, making the API more robust, explicit, and easier for clients to consume. The changes are implemented consistently across the codebase, including the protobuf definitions, server logic, and tests. The updated ErrorCode enum and the addition of client helper functions for error unmarshalling are particularly valuable. The test suite has been thoroughly updated to reflect the new API contract, with new tests ensuring the correctness of the new error handling logic. The release notes are clear and accurately describe the breaking changes. I have no suggestions for improvement.

@saubyk saubyk added this to v0.21 Feb 3, 2026
@saubyk saubyk moved this to In progress in v0.21 Feb 3, 2026
@bitromortac bitromortac self-requested a review February 3, 2026 17:47
@lightninglabs-deploy
Copy link
Collaborator

🟠 PR Severity: HIGH

switchrpc: improve SendOnion error handling | 9 files | 1842 additions, 594 deletions

🟠 High (3 files)
  • lnrpc/switchrpc/switch_server.go - RPC server implementation for HTLC switch operations (+316/-109 lines)
  • lnrpc/switchrpc/switch.proto - API definition changes for SendOnion error handling (+104/-41 lines)
  • lnrpc/switchrpc/mock.go - Mock implementation for switchrpc (+11 lines)
🟢 Low (6 files)
  • lnrpc/switchrpc/switch.pb.go - Auto-generated protobuf code
  • lnrpc/switchrpc/switch.swagger.json - Auto-generated swagger documentation
  • lnrpc/switchrpc/switch_server_test.go - Test file
  • itest/lnd_sendonion_test.go - Integration test
  • lntest/rpc/switch.go - Test helper
  • docs/release-notes/release-notes-0.21.0.md - Release notes

Analysis

This PR improves error handling in the SendOnion RPC endpoint, which is part of the switchrpc package that interfaces with the critical htlcswitch component. The changes focus on better error reporting and handling for onion packet sending operations.

Severity Rationale:

  • The lnrpc/switchrpc/* package falls under the HIGH severity category as it defines the RPC/API interface for the HTLC switch
  • While this doesn't directly modify the htlcswitch internals (which would be CRITICAL), it does change how errors are surfaced and handled at the RPC layer
  • The PR includes substantial changes to the switch_server.go implementation (+316 lines) and the protobuf API definition
  • Requires review by an engineer knowledgeable about the HTLC switch and error handling patterns

Key Review Points:

  • Verify new error handling doesn't mask critical failure states
  • Ensure backward compatibility of RPC responses
  • Check that error messages provide sufficient debugging information without leaking sensitive data
  • Validate integration test coverage adequately exercises new error paths

To override, add a severity-override-{critical,high,medium,low} label.

Copy link
Collaborator

@bitromortac bitromortac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the approach, it seems cleaner this way. Do you know if the details mechanism can be used via REST as well?

// which an rpc client is safe to retry the SendOnion
// RPC until an explicit acknowledgement of HTLC
// dispatch can be received from the server.
name: "idempotency anchor fails",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does that test name make sense? or do we also need a test for InitAttempt general errors?

linkErr := htlcswitch.NewLinkError(wireMsg)
fwdErr := htlcswitch.NewForwardingError(wireMsg, 1)

// Create a forwarding error to be returned by the mock decrypter.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there seem to be changes that belong to the TrackOnion PR/commits

Comment on lines +123 to +130
* The `switchrpc.SendOnion` RPC has been overhauled to provide a more robust,
client-friendly, and forward-compatible API. Failures are no longer reported
in the response body but are instead communicated exclusively via gRPC status
codes with rich, structured `SendOnionFailureDetails` attached. The
`ErrorCode` enum has been redesigned to represent actionable client states,
and a new `CLEAR_TEXT_ERROR` code provides forward-compatibility for clients
when new definitive local errors are introduced. This is a **breaking change**
for any clients of the `SendOnion` RPC.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when merging to master, we can just reflect the final state, maybe

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: In progress

Development

Successfully merging this pull request may close these issues.

3 participants