fix: clear stale routingInfo on restart to prevent slow/unstable connections by rsalcara · Pull Request #220 · rsalcara/InfiniteAPI

rsalcara · 2026-02-25T04:32:35Z

fix: clear stale routingInfo on restart to prevent slow/unstable connections

…sage delivery Root cause: inbound messages were held in two back-to-back buffer phases before being emitted to listeners, producing 35-60 s end-to-end delays: Phase 1 – socket.ts offline buffer ev.buffer() is called on every (re)connection and flushed only when the server sends CB:ib,,offline (all offline notifications delivered). On busy accounts the server can take 10-30+ s to drain the offline queue, holding every buffered event — including fresh live messages — hostage for that duration. The existing event-buffer auto-flush was the only safety net (default 15 s). Phase 2 – chats.ts AwaitingInitialSync (first connection only) A second ev.buffer() is started after receivedPendingNotifications fires and held for up to 20 s while waiting for a history-sync notification + doAppStateSync. Stacking these two phases produced the observed 35-60 s total delay. Fixes (surgical, no behaviour change on the send path): 1. src/Utils/event-buffer.ts • Default bufferTimeoutMs 15 000 → 5 000 ms (BAILEYS_BUFFER_TIMEOUT_MS) • Default minBufferTimeoutMs 3 000 → 1 000 ms (BAILEYS_BUFFER_MIN_TIMEOUT_MS) • Default maxBufferTimeoutMs 20 000 → 8 000 ms (BAILEYS_BUFFER_MAX_TIMEOUT_MS) All three remain fully overridable via environment variables. 2. src/Socket/socket.ts • Added OFFLINE_BUFFER_TIMEOUT_MS safety timer (default 5 s, env-configurable). If CB:ib,,offline does not arrive within 5 s the buffer is force-flushed so live messages are never delayed beyond that cap. • CB:ib,,offline handler clears the safety timer on the happy path and marks didStartBuffer = false to avoid a double-flush. 3. src/Socket/chats.ts • AwaitingInitialSync fallback timeout 20 000 → 8 000 ms. History that arrives late is still processed via processMessage regardless of the state-machine phase (existing behaviour, unchanged). Worst-case delivery latency after this change: Reconnection (accountSyncCounter > 0): ≤ 5 s (was ≤ 15 s) First connection with history sync : ≤ 5 s + 8 s = 13 s (was 35-60 s) No changes to: send path, button/list/carousel, tcTokenFetchingJids, forceSnapshotCollections, LID/PN mapping, or app-state resilience. https://claude.ai/code/session_015McJNWJwABDTEwx4bfG4C7

…ndependent of env The offline-buffer safety timer in socket.ts (which caps how long the CB:ib,,offline phase can block live message delivery) must remain short regardless of what operators set for BAILEYS_BUFFER_TIMEOUT_MS. Operators often set BAILEYS_BUFFER_TIMEOUT_MS=30000 (30 s) for better Prometheus/history batching. Reading that env var for the offline timer would have kept the safety net at 30 s, defeating the fix entirely. The offline-phase timer is now a hardcoded 5_000 ms constant with an explicit comment explaining why it must not inherit the general buffer timeout. All other behaviour is unchanged. https://claude.ai/code/session_015McJNWJwABDTEwx4bfG4C7

…post-close flush If the socket closes (auth failure, network drop) before CB:ib,,offline arrives, the 5 s safety timer was still running. After 5 s its callback would find didStartBuffer=true and call ev.flush() on an already-closed session, risking stale/partial events being emitted and reprocessed on the next reconnect. Fix: clear offlineBufferTimeout and reset didStartBuffer=false inside end(), immediately after the existing clearInterval/clearTimeout block, mirroring how awaitingSyncTimeout is cleaned up in chats.ts on connection close. Addresses review comments from Codex (P2) and Copilot on PR #217. https://claude.ai/code/session_015McJNWJwABDTEwx4bfG4C7

Covers the three interaction points of the 5 s safety timer introduced in socket.ts to cap the offline-buffer phase: startBuffer() — arms the timer on reconnection onOffline() — CB:ib,,offline happy path: cancels timer, flushes once onClose() — end() path: cancels timer, resets flag, no post-close flush Test cases (15 total): - Timer fires after exactly 5 s and calls flush + warn - Timer sets offlineBufferTimeout=undefined and didStartBuffer=false - No flush if didStartBuffer was already false when timer fires - CB:ib,,offline cancels timer → only one flush regardless of timing - CB:ib,,offline is idempotent (spurious second call = no extra flush) - end() cancels timer → advancing past 5 s triggers no flush - end() is a no-op when called before startBuffer or after onOffline - Boundary checks: no flush at 4 999 ms, flushes at exactly 5 000 ms Follows the same standalone-function pattern used in bad-ack-handling.test.ts to test socket closures without instantiating makeSocket. Addresses Copilot review comment on PR #217. https://claude.ai/code/session_015McJNWJwABDTEwx4bfG4C7

https://claude.ai/code/session_015McJNWJwABDTEwx4bfG4C7

CI runners do not have SSH keys configured, so yarn was failing with "Permission denied (publickey)" when resolving: git+ssh://git@github.com/whiskeysockets/libsignal-node.git Changed to HTTPS which works without any SSH key setup: git+https://github.com/whiskeysockets/libsignal-node.git The commit hash and package.json entry are unchanged. https://claude.ai/code/session_015McJNWJwABDTEwx4bfG4C7

Commit 0bccee8 accidentally replaced the Yarn 4 Berry lock file (format v8, with __metadata, resolution:, checksum:) with a Yarn 1 Classic lock file (# yarn lockfile v1, resolved:, integrity:). The CI runs `yarn install --immutable` with Yarn 4 (corepack yarn@4.x). When Yarn 4 encounters a Yarn 1-format lock file it needs to migrate/regenerate it, which --immutable forbids → build failure. Restoring the original Yarn 4 format from before the bad commit. Note: the original lock file already used HTTPS for libsignal-node: resolution: "libsignal@https://github.com/whiskeysockets/libsignal-node.git#commit=..." So no further SSH→HTTPS fix is needed. https://claude.ai/code/session_015McJNWJwABDTEwx4bfG4C7

…lowness After a code update deployed via pm2 restart, active WhatsApp connections often remain slow even with the new code. The root cause is routingInfo stored in creds.json: it directs the socket to reconnect to the same WhatsApp edge server, which may retain stale server-side state (throttling, bad session state) from the previous version. A QR re-scan fixes it because it creates a new session on a fresh edge server. This option discards routingInfo before the WebSocket URL is constructed, forcing WhatsApp to assign a fresh edge server — equivalent to the clean state after a QR re-scan, but without invalidating Signal keys or auth credentials (no re-scan needed). The cleared state is immediately persisted via creds.update so that subsequent restarts before the server assigns new routingInfo also benefit. Usage in zpro-backend: set clearRoutingInfoOnStart: true on the first startSock() call after a deployment, then false on reconnections. https://claude.ai/code/session_015McJNWJwABDTEwx4bfG4C7

Make clearRoutingInfoOnStart: true the default so every restart (pm2, server reboot, deploy) automatically gets a fresh edge server assignment without any configuration change in the consumer. The old routingInfo becomes stale after any restart anyway — the WA server always issues a new one during the handshake. Keeping the stale value forces reconnection to a potentially overloaded or broken edge server, causing slow or unstable sessions. With this default, consumers that explicitly pass clearRoutingInfoOnStart: false can still opt out. https://claude.ai/code/session_015McJNWJwABDTEwx4bfG4C7

…red stale routingInfo When a channel is reconnected after disconnect (same zpro channel, new QR scan), the auth state still carries creds.me?.id from the previous session. This caused the offline buffer to activate and hold all incoming messages for up to 5 seconds while waiting for CB:ib,,offline (which may arrive late on accounts with large backlogs). Fix: track whether clearRoutingInfoOnStart actually cleared a stale routingInfo. If it did, this is clearly a reconnect-after-disconnect scenario, not a cold start that needs event batching. In this case, skip the offline buffer entirely so live messages are delivered immediately instead of being held for up to 5 s. Normal cold restarts (routingInfo already absent) are unaffected — they still use the 5 s safety cap as before. https://claude.ai/code/session_015McJNWJwABDTEwx4bfG4C7

chatgpt-codex-connector · 2026-02-25T04:32:40Z

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.

github-actions · 2026-02-25T04:32:43Z

Thanks for opening this pull request and contributing to the project!

The next step is for the maintainers to review your changes. If everything looks good, it will be approved and merged into the main branch.

In the meantime, anyone in the community is encouraged to test this pull request and provide feedback.

✅ How to confirm it works

If you’ve tested this PR, please comment below with:

Tested and working ✅

This helps us speed up the review and merge process.

📦 To test this PR locally:

# NPM
npm install @whiskeysockets/baileys@rsalcara/InfiniteAPI#claude/analyze-whatsapp-log-gcJdX

# Yarn (v2+)
yarn add @whiskeysockets/baileys@rsalcara/InfiniteAPI#claude/analyze-whatsapp-log-gcJdX

# PNPM
pnpm add @whiskeysockets/baileys@rsalcara/InfiniteAPI#claude/analyze-whatsapp-log-gcJdX

If you encounter any issues or have feedback, feel free to comment as well.

Copilot

Pull request overview

Adds connection-start mitigations aimed at improving reconnect reliability/latency by forcing fresh edge routing and capping how long “offline backlog” buffering can block live events.

Changes:

Introduces clearRoutingInfoOnStart SocketConfig option and clears persisted creds.routingInfo before connecting (plus persists the cleared state).
Adds an offline-buffer safety timer in socket.ts and ensures it’s cleared on socket end; adds focused Jest coverage for this timer logic.
Reduces several default buffering/initial-sync wait timeouts to flush earlier under stall conditions.

Reviewed changes

Copilot reviewed 1 out of 1 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
src/tests/Socket/offline-buffer-timeout.test.ts	Adds unit tests mirroring the new offline-buffer safety timer behavior.
src/Utils/event-buffer.ts	Lowers default BAILEYS_BUFFER_* timeouts to flush buffered events sooner.
src/Types/Socket.ts	Adds `clearRoutingInfoOnStart` option with JSDoc guidance.
src/Socket/socket.ts	Clears stored routingInfo on start, persists creds update, adds offline-buffer safety timer + end() cleanup.
src/Socket/chats.ts	Reduces AwaitingInitialSync timeout from 20s to 8s and updates logs/comments.
src/Defaults/index.ts	Enables `clearRoutingInfoOnStart` by default in `DEFAULT_CONNECTION_CONFIG`.

Comments suppressed due to low confidence (1)

src/Socket/socket.ts:521

This creds.update emit will fire on every socket creation whenever clearRoutingInfoOnStart is true and routingInfo is already undefined (i.e., even when nothing was cleared). That can trigger unnecessary consumer saveCreds() writes/side effects. Emit only when you actually modified routingInfo (e.g., gate on hadStaleRoutingInfo), and prefer emitting a minimal update payload instead of the entire creds object.


	const ev = makeEventBuffer(logger)

	// Persist the routingInfo clearing so the consumer's saveCreds() writes the clean state to disk.
	// This ensures that if the process restarts again before the server assigns new routingInfo,
	// the stale value is not reused.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

src/Socket/socket.ts

- Add missing space after 'if'/'else if' keywords in CB:stream:error handler - Reformat long logger.warn/info lines to stay within line length limit Fixes CI linting failures introduced by recent commits. https://claude.ai/code/session_015McJNWJwABDTEwx4bfG4C7

…tion The merge of master into this branch (a2bd33b) left two merge artifacts: 1. A duplicated `if (config.clearRoutingInfoOnStart...)` block nested inside the first, missing its closing brace — this caused TS1005 '}' expected at the end of the file. 2. A duplicate `const OFFLINE_BUFFER_TIMEOUT_MS` declaration (one from each branch) which would cause a duplicate identifier error. Both are removed, leaving the correct single implementation. https://claude.ai/code/session_015McJNWJwABDTEwx4bfG4C7

claude added 10 commits February 25, 2026 01:20

chore: update yarn.lock after npm install for test dependencies

0bccee8

https://claude.ai/code/session_015McJNWJwABDTEwx4bfG4C7

Copilot AI review requested due to automatic review settings February 25, 2026 04:32

Copilot started reviewing on behalf of rsalcara February 25, 2026 04:32 View session

Merge branch 'master' into claude/analyze-whatsapp-log-gcJdX

a2bd33b

Copilot AI reviewed Feb 25, 2026

View reviewed changes

src/Socket/socket.ts Outdated Show resolved Hide resolved

src/Socket/socket.ts Show resolved Hide resolved

src/Socket/socket.ts Show resolved Hide resolved

claude added 2 commits February 25, 2026 04:45

rsalcara merged commit f4daa91 into master Feb 25, 2026
4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: clear stale routingInfo on restart to prevent slow/unstable connections#220

fix: clear stale routingInfo on restart to prevent slow/unstable connections#220
rsalcara merged 13 commits intomasterfrom
claude/analyze-whatsapp-log-gcJdX

rsalcara commented Feb 25, 2026

Uh oh!

chatgpt-codex-connector bot commented Feb 25, 2026

Uh oh!

github-actions bot commented Feb 25, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

rsalcara commented Feb 25, 2026

Uh oh!

chatgpt-codex-connector bot commented Feb 25, 2026

Uh oh!

github-actions bot commented Feb 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ How to confirm it works

📦 To test this PR locally:

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

github-actions bot commented Feb 25, 2026 •

edited

Loading