fix: resolve Bad MAC, No Session, Invalid PreKey errors (#1769)#2372
Open
kobie3717 wants to merge 1 commit intoWhiskeySockets:masterfrom
Open
fix: resolve Bad MAC, No Session, Invalid PreKey errors (#1769)#2372kobie3717 wants to merge 1 commit intoWhiskeySockets:masterfrom
kobie3717 wants to merge 1 commit intoWhiskeySockets:masterfrom
Conversation
…prekey exhaustion
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
Users experience persistent decryption failures manifesting as:
Bad MAC/Failed to decryptNo matching sessionInvalid PreKey IDThese errors are the most reported issue in the repository (#1769, #2340, #2362) and have been occurring since the WhatsApp LID (Linked Identity) migration.
Root Causes
1. LID/PN Transaction Race Condition
When WhatsApp sends messages via both PN (Phone Number) and LID (Linked ID) JIDs for the same contact,
decryptMessageandencryptMessageuse the raw JID as the transaction mutex key. Two concurrent operations for the same logical session (one via PN, one via LID) acquire different locks, allowing concurrent session state mutations that corrupt the ratchet →Bad MAC.2. Aggressive PN Session Deletion During Migration
migrateSession()copies the session from PN→LID address then deletes the PN session (sessionUpdates[pnAddr] = null). Any in-flight messages still addressed to the PN JID immediately fail withNo matching session.3. Immediate PreKey Deletion
removePreKey()deletes the pre-key immediately after first use. When WhatsApp retransmits the same message (common during connectivity issues), the pre-key is already gone →Invalid PreKey ID.Changes
Fix 1: Canonical JID Resolution for Transaction Locks (
src/Signal/libsignal.ts)Before entering a transaction in
decryptMessage/encryptMessage, resolve the JID to its canonical (LID-preferred) form via the existingLIDMappingStore. This ensures PN and LID operations for the same contact serialize on the same mutex key.Fix 2: Retain PN Session During LID Migration (
src/Signal/libsignal.ts)In
migrateSession(), copy the session to the LID address but do not delete the PN session. The PN session will naturally fall out of use as new messages arrive under the LID address. This is safe because signal storage already resolves PN→LID internally viaresolveLIDSignalAddress.Fix 3: Delayed PreKey Deletion (
src/Signal/libsignal.ts)Replace immediate pre-key deletion with a 5-minute grace period. Used pre-keys are scheduled for deletion via a lightweight timer. Retransmissions within the grace window succeed. The timer uses
unref()to avoid blocking process exit.Risk Assessment
getLIDForPN()lookup. Falls back to the original JID if no mapping exists. Worst case: one extra async lookup per encrypt/decrypt.Testing Notes
Fixes #1769. Related: #2340, #2362.