Skip to content

Conversation

@lawrence-forooghian
Copy link
Collaborator

@lawrence-forooghian lawrence-forooghian commented Dec 5, 2025

Non-sent queued messages were not being failed when the connection entered the SUSPENDED / FAILED / CLOSED states.

The existing logic relied on the queued messages having a msgSerial, but they don't get a msgSerial until the first time we send the message on the transport.

Resolves #2115.

Summary by CodeRabbit

  • New Features

    • Option to complete either a specific range of messages or all pending messages in the queue.
  • Bug Fixes

    • Idle state now emitted only after processing and when the queue is empty.
    • Callbacks invoked after messages are removed to ensure correct ordering.
    • Improved completion/nack handling distinguishing queued vs sent messages.
  • Tests

    • Expanded and renamed tests covering suspends/failures/closures for sent and queued messages.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link

coderabbitai bot commented Dec 5, 2025

Walkthrough

Refactors MessageQueue.completeMessages to accept a selector ('all' | {serial,count}); updates Protocol to pass the new selector object; ensures idle emission occurs after processing; makes completeAllMessages delegate to completeMessages('all', err); expands tests to cover queued-message NACK scenarios and exposes a private API identifier.

Changes

Cohort / File(s) Summary
MessageQueue API Refactoring
src/common/lib/transport/messagequeue.ts
completeMessages signature changed to `selector: 'all'
Protocol Integration
src/common/lib/transport/protocol.ts
Call sites updated to pass { serial, count } to messageQueue.completeMessages(...); NACKs pass the error as the second argument.
Test Infrastructure
test/common/modules/private_api_recorder.js
Added read.connectionManager.queuedMessages to recognized private API identifiers.
Failure Test Coverage
test/realtime/failure.test.js
Renamed nack_on_connection_failurenack_of_sent_message_on_connection_failure; added nack_of_queued_message_on_connection_failure helper and three queued-message tests for suspended/failed/closed states; updated channel IDs and instrumentation to detect queued vs sent message flows.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

🐇 I hopped through queues both near and far,
Found a selector to clear each jar.
When connections hiccup and messages wait,
I call the callbacks — no lingering fate.
Thump-thump, the queue is tidy; let's celebrate! 🎉

Pre-merge checks and finishing touches

✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly summarizes the main change: fixing the handling of non-sent queued messages when the connection enters certain states.
Linked Issues check ✅ Passed The PR directly addresses issue #2115 by refactoring completeMessages to accept a selector parameter that handles both 'all' messages and specific serial/count ranges, enabling proper failure of non-sent queued messages.
Out of Scope Changes check ✅ Passed All changes are directly related to fixing queued message failure: MessageQueue refactoring, Protocol call site updates, test helper additions, and test coverage expansion for queued message scenarios.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch 2115-fail-pending-messages-on-connection-state

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions github-actions bot temporarily deployed to staging/pull/2123/bundle-report December 5, 2025 13:32 Inactive
@github-actions github-actions bot temporarily deployed to staging/pull/2123/features December 5, 2025 13:32 Inactive
@github-actions github-actions bot temporarily deployed to staging/pull/2123/typedoc December 5, 2025 13:32 Inactive
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
test/realtime/failure.test.js (1)

624-636: Consider adding a timeout to prevent test hangs.

The while(true) polling loop has no escape hatch if the message is never queued. While unlikely in this controlled test, adding a timeout would make failures more diagnosable.

               (async () => {
                 helper.recordPrivateApi('call.Platform.nextTick');
                 helper.recordPrivateApi('read.connectionManager.queuedMessages');
+                const maxAttempts = 100;
+                let attempts = 0;
                 while (true) {
                   await new Promise((res) => Ably.Realtime.Platform.Config.nextTick(res));
                   if (realtime.connection.connectionManager.queuedMessages.count() > 0) {
                     failureFn(realtime, helper.withParameterisedTestTitle(null));
                     break;
                   }
+                  if (++attempts >= maxAttempts) {
+                    throw new Error('Timed out waiting for message to be queued');
+                  }
                 }
               })();
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

  • Jira integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between ef368ba and a0be5cd.

📒 Files selected for processing (4)
  • src/common/lib/transport/messagequeue.ts (1 hunks)
  • src/common/lib/transport/protocol.ts (2 hunks)
  • test/common/modules/private_api_recorder.js (1 hunks)
  • test/realtime/failure.test.js (4 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
src/common/lib/transport/messagequeue.ts (1)
src/common/lib/transport/protocol.ts (1)
  • PendingMessage (11-28)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
  • GitHub Check: test-browser (webkit)
  • GitHub Check: test-browser (chromium)
  • GitHub Check: test-browser (firefox)
  • GitHub Check: test-node (20.x)
  • GitHub Check: test-node (16.x)
  • GitHub Check: test-node (18.x)
🔇 Additional comments (6)
test/common/modules/private_api_recorder.js (1)

102-102: LGTM!

The new private API identifier is correctly added in alphabetical order and aligns with its usage in the new queued message failure tests.

src/common/lib/transport/protocol.ts (1)

46-62: LGTM!

The call sites are correctly updated to use the new object-based predicate format { serial, count } for the completeMessages API.

src/common/lib/transport/messagequeue.ts (2)

42-75: LGTM - Clean solution to the queued message failure issue.

The predicate-based approach elegantly handles both cases:

  • 'all': Clears all messages regardless of msgSerial assignment (fixing the original bug)
  • { serial, count }: Maintains existing ACK/NACK behavior for sent messages

84-86: LGTM!

Clean delegation to the unified completeMessages implementation.

test/realtime/failure.test.js (2)

486-540: LGTM - Clear rename and well-structured test helper.

The rename to nack_of_sent_message_on_connection_failure clearly distinguishes this scenario from the new queued message tests.


645-684: LGTM - Comprehensive test coverage for the fix.

The three new test cases properly cover SUSPENDED, FAILED, and CLOSED states for queued (not-yet-sent) messages, directly addressing the issue described in #2115.

@lawrence-forooghian lawrence-forooghian force-pushed the 2115-fail-pending-messages-on-connection-state branch 2 times, most recently from 9740968 to 1bfd80d Compare December 5, 2025 13:39
@github-actions github-actions bot temporarily deployed to staging/pull/2123/bundle-report December 5, 2025 13:40 Inactive
@github-actions github-actions bot temporarily deployed to staging/pull/2123/features December 5, 2025 13:40 Inactive
@github-actions github-actions bot temporarily deployed to staging/pull/2123/typedoc December 5, 2025 13:40 Inactive
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
test/realtime/failure.test.js (2)

482-492: Clearer naming for sent-message helper; minor comment typo

Renaming to nack_of_sent_message_on_connection_failure and aligning the helper/channel IDs with that name makes the intent (message already sent on the transport) much clearer. There is a small typo in the nearby comment (onProtocolMesageonProtocolMessage) that you may want to fix for readability.


583-643: Queued-message helper hits the right semantics; consider bounding the polling loop

The new nack_of_queued_message_on_connection_failure helper correctly:

  • Forces the connection to remain non-CONNECTED by dropping inbound CONNECTED messages on transport.pending, and
  • Waits until the publish is actually queued before triggering the injected failure, then asserts the resulting error and connection state.

One thing to watch: the while (true) loop that polls queuedMessages.count() via Platform.Config.nextTick has no upper bound. If for any reason the message never appears in queuedMessages (e.g. future refactor of the queuing path), this test will just spin until the global Mocha timeout, which makes failures harder to diagnose. You might want to:

  • Add a max elapsed time or iteration count and fail the test explicitly if the condition is never met, or
  • Gate the loop with a shared flag so you can stop polling once the publish promise has settled.

Also, the same minor comment typo (onProtocolMesage) appears here as in the sent-message helper.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

  • Jira integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between a0be5cd and 1bfd80d.

📒 Files selected for processing (4)
  • src/common/lib/transport/messagequeue.ts (1 hunks)
  • src/common/lib/transport/protocol.ts (2 hunks)
  • test/common/modules/private_api_recorder.js (1 hunks)
  • test/realtime/failure.test.js (4 hunks)
🚧 Files skipped from review as they are similar to previous changes (3)
  • test/common/modules/private_api_recorder.js
  • src/common/lib/transport/messagequeue.ts
  • src/common/lib/transport/protocol.ts
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
  • GitHub Check: test-browser (firefox)
  • GitHub Check: test-browser (webkit)
  • GitHub Check: test-browser (chromium)
  • GitHub Check: test-node (16.x)
  • GitHub Check: test-node (18.x)
  • GitHub Check: test-node (20.x)
🔇 Additional comments (2)
test/realtime/failure.test.js (2)

542-581: Sent-message RTN7c variants comprehensively cover SUSPENDED/FAILED/CLOSED paths

These three tests nicely exercise the “sent but unacked” publish behaviour across suspended (via becomeSuspended), failed (via injected ERROR protocol message with code 40100), and closed (via realtime.close()), and assert the expected error codes on the publish callback. The structure closely mirrors the helper’s contract and should guard the updated failure semantics well.


645-684: Queued-message RTN7c variants look consistent and target the right failure paths

The three queued-message tests (suspended/failed/closed) are well-aligned with the sent-message variants and RTN7c wording: they all drive the connection into the desired terminal state while the message is still queued, then assert both the connection state and the expected error codes (80002 for suspended, 40100 for failed with auth error, 80017 for closed). Using connectionManager.notifyState for suspended/failed keeps the tests tightly scoped to the behaviour under test.

Copy link
Member

@SimonWoolf SimonWoolf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

Non-sent queued messages were not being failed when the connection
entered the SUSPENDED / FAILED / CLOSED states.

The existing logic relied on the queued messages having a msgSerial, but
they don't get a msgSerial until the first time we send the message on
the transport.

Resolves #2115.
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
src/common/lib/transport/messagequeue.ts (1)

76-78: Potential runtime error if callback is undefined.

The PendingMessage.callback property is optional (callback?: ErrCallback). Casting to Function and invoking without a null check could throw a TypeError if a message was queued without a callback.

🔎 Proposed fix
     for (const message of completeMessages) {
-      (message.callback as Function)(err);
+      message.callback?.(err);
     }
🧹 Nitpick comments (2)
test/realtime/failure.test.js (1)

625-635: Unhandled promise rejection and potential infinite loop.

The async IIFE lacks error handling. If failureFn throws, the error won't propagate to the test framework. Additionally, if queuedMessages.count() never becomes > 0 (e.g., encoding fails silently), the while (true) loop will run indefinitely until the test times out, making debugging harder.

Consider adding error handling and a safety timeout:

🔎 Proposed improvement
               // We wait for the `publish()`-ed message to appear in the message queue before enacting the connection state change (queueing is preceded by asynchronous encoding). AFAIK there isn't an event-driven internal API for this so we'll just poll.
               (async () => {
                 helper.recordPrivateApi('call.Platform.nextTick');
                 helper.recordPrivateApi('read.connectionManager.queuedMessages');
-                while (true) {
+                const maxAttempts = 100; // Safety limit
+                let attempts = 0;
+                while (attempts++ < maxAttempts) {
                   await new Promise((res) => Ably.Realtime.Platform.Config.nextTick(res));
                   if (realtime.connection.connectionManager.queuedMessages.count() > 0) {
                     failureFn(realtime, helper.withParameterisedTestTitle(null));
-                    break;
+                    return;
                   }
                 }
-              })();
+                throw new Error('Message never appeared in queue');
+              })().catch((err) => cb(err));
src/common/lib/transport/messagequeue.ts (1)

56-58: Consider making the empty queue check selector-aware for defensive clarity.

Currently, completeMessages() throws on an empty queue regardless of selector type. While the 'all' case is guarded upstream by failQueuedMessages (line 1887: if (numQueued > 0)), making this explicit in the method improves code robustness against future changes.

The proposed fix safely handles the empty case for the 'all' selector by returning early, while preserving the error-catching behavior for { serial, count } selectors (which are called from Protocol.onAck/onNack without upstream guards):

🔎 Proposed fix
     const messages = this.messages;
-    if (messages.length === 0) {
+    if (messages.length === 0 && selector !== 'all') {
       throw new Error('MessageQueue.completeMessages(): completeMessages called on any empty MessageQueue');
     }
+    if (messages.length === 0) {
+      return;
+    }
📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

  • Jira integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 1bfd80d and 542f724.

📒 Files selected for processing (4)
  • src/common/lib/transport/messagequeue.ts
  • src/common/lib/transport/protocol.ts
  • test/common/modules/private_api_recorder.js
  • test/realtime/failure.test.js
🚧 Files skipped from review as they are similar to previous changes (1)
  • test/common/modules/private_api_recorder.js
🧰 Additional context used
🧬 Code graph analysis (1)
src/common/lib/transport/messagequeue.ts (1)
src/common/lib/transport/protocol.ts (1)
  • PendingMessage (11-28)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
  • GitHub Check: test-browser (webkit)
  • GitHub Check: test-browser (firefox)
  • GitHub Check: test-browser (chromium)
  • GitHub Check: test-node (20.x)
  • GitHub Check: test-node (16.x)
  • GitHub Check: test-node (18.x)
  • GitHub Check: test-npm-package
🔇 Additional comments (5)
src/common/lib/transport/messagequeue.ts (2)

65-73: Verify assumption that msgSerial is always defined for ACK/NACK paths.

The type assertion as number on line 67 assumes first.message.msgSerial is defined. Per the JSDoc, the {serial, count} selector is used for ACK/NACK from Ably and assumes all messages have msgSerial assigned. This should hold if messages only receive ACK/NACK after being sent (when msgSerial is assigned), but consider adding a defensive check or assertion for clarity.


42-47: LGTM on the API design.

The selector-based API cleanly separates the "fail all queued messages" case from the "complete a specific range based on ACK/NACK" case. The completeAllMessages delegation to completeMessages('all', err) is a good refactor that eliminates code duplication.

Also applies to: 83-85

src/common/lib/transport/protocol.ts (1)

46-49: LGTM!

The call sites correctly updated to use the new object-based selector { serial, count } matching the updated MessageQueue.completeMessages signature.

Also applies to: 51-62

test/realtime/failure.test.js (2)

486-491: LGTM on test helper structure.

The separation of nack_of_sent_message_on_connection_failure and nack_of_queued_message_on_connection_failure clearly distinguishes the two scenarios being tested (messages sent on transport vs. still queued). The test coverage for SUSPENDED, FAILED, and CLOSED states is comprehensive and aligns with the PR objectives.

Also applies to: 588-593


575-581: LGTM on the simplified close test case.

Good cleanup removing the unnecessary helper parameter since it's not used in the close scenario.

@lawrence-forooghian lawrence-forooghian merged commit 2ee11a5 into main Jan 6, 2026
11 of 17 checks passed
@lawrence-forooghian lawrence-forooghian deleted the 2115-fail-pending-messages-on-connection-state branch January 6, 2026 09:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

MessageQueue behaviour on connection suspended state

3 participants