Fix failing of non-sent queued messages #2123

lawrence-forooghian · 2025-12-05T13:31:25Z

Non-sent queued messages were not being failed when the connection entered the SUSPENDED / FAILED / CLOSED states.

The existing logic relied on the queued messages having a msgSerial, but they don't get a msgSerial until the first time we send the message on the transport.

Resolves #2115.

Summary by CodeRabbit

New Features
- Option to complete either a specific range of messages or all pending messages in the queue.
Bug Fixes
- Idle state now emitted only after processing and when the queue is empty.
- Callbacks invoked after messages are removed to ensure correct ordering.
- Improved completion/nack handling distinguishing queued vs sent messages.
Tests
- Expanded and renamed tests covering suspends/failures/closures for sent and queued messages.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

coderabbitai · 2025-12-05T13:31:44Z

Walkthrough

Refactors MessageQueue.completeMessages to accept a selector ('all' | {serial,count}); updates Protocol to pass the new selector object; ensures idle emission occurs after processing; makes completeAllMessages delegate to completeMessages('all', err); expands tests to cover queued-message NACK scenarios and exposes a private API identifier.

Changes

Cohort / File(s)	Summary
MessageQueue API Refactoring `src/common/lib/transport/messagequeue.ts`	`completeMessages` signature changed to `selector: 'all'
Protocol Integration `src/common/lib/transport/protocol.ts`	Call sites updated to pass `{ serial, count }` to `messageQueue.completeMessages(...)`; NACKs pass the error as the second argument.
Test Infrastructure `test/common/modules/private_api_recorder.js`	Added `read.connectionManager.queuedMessages` to recognized private API identifiers.
Failure Test Coverage `test/realtime/failure.test.js`	Renamed `nack_on_connection_failure` → `nack_of_sent_message_on_connection_failure`; added `nack_of_queued_message_on_connection_failure` helper and three queued-message tests for suspended/failed/closed states; updated channel IDs and instrumentation to detect queued vs sent message flows.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

🐇 I hopped through queues both near and far,
Found a selector to clear each jar.
When connections hiccup and messages wait,
I call the callbacks — no lingering fate.
Thump-thump, the queue is tidy; let's celebrate! 🎉

Pre-merge checks and finishing touches

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly summarizes the main change: fixing the handling of non-sent queued messages when the connection enters certain states.
Linked Issues check	✅ Passed	The PR directly addresses issue #2115 by refactoring completeMessages to accept a selector parameter that handles both 'all' messages and specific serial/count ranges, enabling proper failure of non-sent queued messages.
Out of Scope Changes check	✅ Passed	All changes are directly related to fixing queued message failure: MessageQueue refactoring, Protocol call site updates, test helper additions, and test coverage expansion for queued message scenarios.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch 2115-fail-pending-messages-on-connection-state

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

test/realtime/failure.test.js (1)

624-636: Consider adding a timeout to prevent test hangs.

The while(true) polling loop has no escape hatch if the message is never queued. While unlikely in this controlled test, adding a timeout would make failures more diagnosable.

               (async () => {
                 helper.recordPrivateApi('call.Platform.nextTick');
                 helper.recordPrivateApi('read.connectionManager.queuedMessages');
+                const maxAttempts = 100;
+                let attempts = 0;
                 while (true) {
                   await new Promise((res) => Ably.Realtime.Platform.Config.nextTick(res));
                   if (realtime.connection.connectionManager.queuedMessages.count() > 0) {
                     failureFn(realtime, helper.withParameterisedTestTitle(null));
                     break;
                   }
+                  if (++attempts >= maxAttempts) {
+                    throw new Error('Timed out waiting for message to be queued');
+                  }
                 }
               })();

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

Jira integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between ef368ba and a0be5cd.

📒 Files selected for processing (4)

src/common/lib/transport/messagequeue.ts (1 hunks)
src/common/lib/transport/protocol.ts (2 hunks)
test/common/modules/private_api_recorder.js (1 hunks)
test/realtime/failure.test.js (4 hunks)

🧰 Additional context used

🧬 Code graph analysis (1)

src/common/lib/transport/messagequeue.ts (1)

src/common/lib/transport/protocol.ts (1)

PendingMessage (11-28)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)

GitHub Check: test-browser (webkit)
GitHub Check: test-browser (chromium)
GitHub Check: test-browser (firefox)
GitHub Check: test-node (20.x)
GitHub Check: test-node (16.x)
GitHub Check: test-node (18.x)

🔇 Additional comments (6)

test/common/modules/private_api_recorder.js (1)

102-102: LGTM!

The new private API identifier is correctly added in alphabetical order and aligns with its usage in the new queued message failure tests.

src/common/lib/transport/protocol.ts (1)

46-62: LGTM!

The call sites are correctly updated to use the new object-based predicate format { serial, count } for the completeMessages API.

src/common/lib/transport/messagequeue.ts (2)

42-75: LGTM - Clean solution to the queued message failure issue.

The predicate-based approach elegantly handles both cases:

'all': Clears all messages regardless of msgSerial assignment (fixing the original bug)

{ serial, count }: Maintains existing ACK/NACK behavior for sent messages

84-86: LGTM!

Clean delegation to the unified completeMessages implementation.

test/realtime/failure.test.js (2)

486-540: LGTM - Clear rename and well-structured test helper.

The rename to nack_of_sent_message_on_connection_failure clearly distinguishes this scenario from the new queued message tests.

645-684: LGTM - Comprehensive test coverage for the fix.

The three new test cases properly cover SUSPENDED, FAILED, and CLOSED states for queued (not-yet-sent) messages, directly addressing the issue described in #2115.

src/common/lib/transport/messagequeue.ts

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (2)

test/realtime/failure.test.js (2)

482-492: Clearer naming for sent-message helper; minor comment typo

Renaming to nack_of_sent_message_on_connection_failure and aligning the helper/channel IDs with that name makes the intent (message already sent on the transport) much clearer. There is a small typo in the nearby comment (onProtocolMesage → onProtocolMessage) that you may want to fix for readability.

583-643: Queued-message helper hits the right semantics; consider bounding the polling loop

The new nack_of_queued_message_on_connection_failure helper correctly:

Forces the connection to remain non-CONNECTED by dropping inbound CONNECTED messages on transport.pending, and

Waits until the publish is actually queued before triggering the injected failure, then asserts the resulting error and connection state.

One thing to watch: the while (true) loop that polls queuedMessages.count() via Platform.Config.nextTick has no upper bound. If for any reason the message never appears in queuedMessages (e.g. future refactor of the queuing path), this test will just spin until the global Mocha timeout, which makes failures harder to diagnose. You might want to:

Add a max elapsed time or iteration count and fail the test explicitly if the condition is never met, or

Gate the loop with a shared flag so you can stop polling once the publish promise has settled.

Also, the same minor comment typo (onProtocolMesage) appears here as in the sent-message helper.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

Jira integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between a0be5cd and 1bfd80d.

📒 Files selected for processing (4)

src/common/lib/transport/messagequeue.ts (1 hunks)
src/common/lib/transport/protocol.ts (2 hunks)
test/common/modules/private_api_recorder.js (1 hunks)
test/realtime/failure.test.js (4 hunks)

🚧 Files skipped from review as they are similar to previous changes (3)

test/common/modules/private_api_recorder.js
src/common/lib/transport/messagequeue.ts
src/common/lib/transport/protocol.ts

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)

GitHub Check: test-browser (firefox)
GitHub Check: test-browser (webkit)
GitHub Check: test-browser (chromium)
GitHub Check: test-node (16.x)
GitHub Check: test-node (18.x)
GitHub Check: test-node (20.x)

🔇 Additional comments (2)

test/realtime/failure.test.js (2)

542-581: Sent-message RTN7c variants comprehensively cover SUSPENDED/FAILED/CLOSED paths

These three tests nicely exercise the “sent but unacked” publish behaviour across suspended (via becomeSuspended), failed (via injected ERROR protocol message with code 40100), and closed (via realtime.close()), and assert the expected error codes on the publish callback. The structure closely mirrors the helper’s contract and should guard the updated failure semantics well.

645-684: Queued-message RTN7c variants look consistent and target the right failure paths

The three queued-message tests (suspended/failed/closed) are well-aligned with the sent-message variants and RTN7c wording: they all drive the connection into the desired terminal state while the message is still queued, then assert both the connection state and the expected error codes (80002 for suspended, 40100 for failed with auth error, 80017 for closed). Using connectionManager.notifyState for suspended/failed keeps the tests tightly scoped to the behaviour under test.

SimonWoolf

lgtm

Non-sent queued messages were not being failed when the connection entered the SUSPENDED / FAILED / CLOSED states. The existing logic relied on the queued messages having a msgSerial, but they don't get a msgSerial until the first time we send the message on the transport. Resolves #2115.

coderabbitai

Actionable comments posted: 0

♻️ Duplicate comments (1)

src/common/lib/transport/messagequeue.ts (1)
76-78: Potential runtime error if callback is undefined.

The PendingMessage.callback property is optional (callback?: ErrCallback). Casting to Function and invoking without a null check could throw a TypeError if a message was queued without a callback.
🔎 Proposed fix
     for (const message of completeMessages) {
-      (message.callback as Function)(err);
+      message.callback?.(err);
     }

🧹 Nitpick comments (2)

test/realtime/failure.test.js (1)
625-635: Unhandled promise rejection and potential infinite loop.

The async IIFE lacks error handling. If failureFn throws, the error won't propagate to the test framework. Additionally, if queuedMessages.count() never becomes > 0 (e.g., encoding fails silently), the while (true) loop will run indefinitely until the test times out, making debugging harder.

Consider adding error handling and a safety timeout:
🔎 Proposed improvement
               // We wait for the `publish()`-ed message to appear in the message queue before enacting the connection state change (queueing is preceded by asynchronous encoding). AFAIK there isn't an event-driven internal API for this so we'll just poll.
               (async () => {
                 helper.recordPrivateApi('call.Platform.nextTick');
                 helper.recordPrivateApi('read.connectionManager.queuedMessages');
-                while (true) {
+                const maxAttempts = 100; // Safety limit
+                let attempts = 0;
+                while (attempts++ < maxAttempts) {
                   await new Promise((res) => Ably.Realtime.Platform.Config.nextTick(res));
                   if (realtime.connection.connectionManager.queuedMessages.count() > 0) {
                     failureFn(realtime, helper.withParameterisedTestTitle(null));
-                    break;
+                    return;
                   }
                 }
-              })();
+                throw new Error('Message never appeared in queue');
+              })().catch((err) => cb(err));
src/common/lib/transport/messagequeue.ts (1)
56-58: Consider making the empty queue check selector-aware for defensive clarity.

Currently, completeMessages() throws on an empty queue regardless of selector type. While the 'all' case is guarded upstream by failQueuedMessages (line 1887: if (numQueued > 0)), making this explicit in the method improves code robustness against future changes.

The proposed fix safely handles the empty case for the 'all' selector by returning early, while preserving the error-catching behavior for { serial, count } selectors (which are called from Protocol.onAck/onNack without upstream guards):
🔎 Proposed fix
     const messages = this.messages;
-    if (messages.length === 0) {
+    if (messages.length === 0 && selector !== 'all') {
       throw new Error('MessageQueue.completeMessages(): completeMessages called on any empty MessageQueue');
     }
+    if (messages.length === 0) {
+      return;
+    }

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

Jira integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 1bfd80d and 542f724.

📒 Files selected for processing (4)

src/common/lib/transport/messagequeue.ts
src/common/lib/transport/protocol.ts
test/common/modules/private_api_recorder.js
test/realtime/failure.test.js

🚧 Files skipped from review as they are similar to previous changes (1)

test/common/modules/private_api_recorder.js

🧰 Additional context used

🧬 Code graph analysis (1)

src/common/lib/transport/messagequeue.ts (1)

src/common/lib/transport/protocol.ts (1)

PendingMessage (11-28)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)

GitHub Check: test-browser (webkit)
GitHub Check: test-browser (firefox)
GitHub Check: test-browser (chromium)
GitHub Check: test-node (20.x)
GitHub Check: test-node (16.x)
GitHub Check: test-node (18.x)
GitHub Check: test-npm-package

🔇 Additional comments (5)

src/common/lib/transport/messagequeue.ts (2)

65-73: Verify assumption that msgSerial is always defined for ACK/NACK paths.

The type assertion as number on line 67 assumes first.message.msgSerial is defined. Per the JSDoc, the {serial, count} selector is used for ACK/NACK from Ably and assumes all messages have msgSerial assigned. This should hold if messages only receive ACK/NACK after being sent (when msgSerial is assigned), but consider adding a defensive check or assertion for clarity.

42-47: LGTM on the API design.

The selector-based API cleanly separates the "fail all queued messages" case from the "complete a specific range based on ACK/NACK" case. The completeAllMessages delegation to completeMessages('all', err) is a good refactor that eliminates code duplication.

Also applies to: 83-85

src/common/lib/transport/protocol.ts (1)

46-49: LGTM!

The call sites correctly updated to use the new object-based selector { serial, count } matching the updated MessageQueue.completeMessages signature.

Also applies to: 51-62

test/realtime/failure.test.js (2)

486-491: LGTM on test helper structure.

The separation of nack_of_sent_message_on_connection_failure and nack_of_queued_message_on_connection_failure clearly distinguishes the two scenarios being tested (messages sent on transport vs. still queued). The test coverage for SUSPENDED, FAILED, and CLOSED states is comprehensive and aligns with the PR objectives.

Also applies to: 588-593

575-581: LGTM on the simplified close test case.

Good cleanup removing the unnecessary helper parameter since it's not used in the close scenario.

github-actions bot temporarily deployed to staging/pull/2123/bundle-report December 5, 2025 13:32 Inactive

github-actions bot temporarily deployed to staging/pull/2123/features December 5, 2025 13:32 Inactive

github-actions bot temporarily deployed to staging/pull/2123/typedoc December 5, 2025 13:32 Inactive

coderabbitai bot reviewed Dec 5, 2025

View reviewed changes

src/common/lib/transport/messagequeue.ts Show resolved Hide resolved

lawrence-forooghian force-pushed the 2115-fail-pending-messages-on-connection-state branch 2 times, most recently from 9740968 to 1bfd80d Compare December 5, 2025 13:39

github-actions bot temporarily deployed to staging/pull/2123/bundle-report December 5, 2025 13:40 Inactive

github-actions bot temporarily deployed to staging/pull/2123/features December 5, 2025 13:40 Inactive

github-actions bot temporarily deployed to staging/pull/2123/typedoc December 5, 2025 13:40 Inactive

coderabbitai bot reviewed Dec 5, 2025

View reviewed changes

lawrence-forooghian requested review from SimonWoolf, VeskeR and ttypic December 5, 2025 14:06

lawrence-forooghian mentioned this pull request Dec 5, 2025

MessageQueue behaviour on connection suspended state #2115

Closed

SimonWoolf approved these changes Dec 16, 2025

View reviewed changes

lawrence-forooghian force-pushed the 2115-fail-pending-messages-on-connection-state branch from 1bfd80d to 542f724 Compare January 5, 2026 10:01

github-actions bot deployed to staging/pull/2123/bundle-report January 5, 2026 10:02 View deployment

github-actions bot deployed to staging/pull/2123/features January 5, 2026 10:02 View deployment

github-actions bot deployed to staging/pull/2123/typedoc January 5, 2026 10:02 View deployment

coderabbitai bot reviewed Jan 5, 2026

View reviewed changes

lawrence-forooghian merged commit 2ee11a5 into main Jan 6, 2026
11 of 17 checks passed

lawrence-forooghian deleted the 2115-fail-pending-messages-on-connection-state branch January 6, 2026 09:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix failing of non-sent queued messages #2123

Fix failing of non-sent queued messages #2123

Uh oh!

lawrence-forooghian commented Dec 5, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Dec 5, 2025 •

edited

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Uh oh!

SimonWoolf left a comment

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

3 participants

Fix failing of non-sent queued messages #2123

Fix failing of non-sent queued messages #2123

Uh oh!

Conversation

lawrence-forooghian commented Dec 5, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Dec 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Poem

Pre-merge checks and finishing touches

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

SimonWoolf left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

3 participants

lawrence-forooghian commented Dec 5, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Dec 5, 2025 •

edited

Loading