Skip to content

Conversation

@sam-golioth
Copy link
Collaborator

This changes the Pouch GATT transport to use application-level sliding window acknowledgments on top of unreliable Bluetooth GATT transfers. Doing so eliminates the need to wait for round-trip acknowledgements on each packet, and results in significant speedups, as much as 10x in testing. The window size is configurable, and severely constrained devices can lower the window all the way back down to 1, in which case the transfer speed is essentially equivalent to the previous speed when using reliable GATT writes.

Copy link
Collaborator

@trond-snekvik trond-snekvik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are slightly too many corner cases in the sliding window mechanism for me to confidently say whether everything works as it should, but overall, this looks good to me.

Comment on lines +39 to +41
const uint32_t modulus = 1 << (CHAR_BIT * sizeof(uint8_t));
return (modulus + atomic_get(&sender->last_sent) - atomic_get(&sender->last_acknowledged))
% modulus;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The compiler is probably able to optimize this into a bitmask, but even so, this feels like it might be a little easier to understand, and still works across the wraparound:

Suggested change
const uint32_t modulus = 1 << (CHAR_BIT * sizeof(uint8_t));
return (modulus + atomic_get(&sender->last_sent) - atomic_get(&sender->last_acknowledged))
% modulus;
return (atomic_get(&sender->last_sent) - atomic_get(&sender->last_acknowledged)) & UINT8_MAX;

}

enum pouch_gatt_ack_code code;
if (pouch_gatt_packetizer_is_fin(data, length, &code))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this set receiver->complete = true? And potentially call receiver->push with empty data to trigger functionality like downlink_finish()? Presumably, if we get FIN, we could still be in an open context

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I think that's right. The FIN handling in general is a little incomplete/inconsistent, so I'll clean that up. The reason I put this here originally was mostly for debugging, because at first I was handling the FIN message in the caller, so that the receiver could respond to a FIN received while it was idle. But I think it's better to handle FINs received while the receiver is active within the receiver itself, as you suggest. So the caller only handles received messages directly when the receiver is idle (NACKs in response to packets, nothing in response to FINs). We don't respond to FINs, because the sender will send a FIN if it receives an ACK while idle, so we need to be careful to not get into a loop of the sender and receiver acknowledging each other.

Anyway, this comment is only half responding to you and half me getting my thoughts out for myself 😅. But yes, I think when the receiver is active, we should handle the FIN here and I'll update the PR to reflect that. In thinking about this, I made a bunch of sequence diagrams to work the various scenarios, and those will make good figures to include in the design doc.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@trond-snekvik The handling of FINs (and NACKs) should be more robust and consistent now. I'd appreciate another look.

@sam-golioth sam-golioth force-pushed the application_level_acks branch from 634bcd7 to b207674 Compare December 9, 2025 20:43
@sam-golioth sam-golioth force-pushed the application_level_acks branch from b207674 to 36b7b89 Compare December 18, 2025 22:41
@sam-golioth sam-golioth force-pushed the application_level_acks branch from 36b7b89 to 4d796f8 Compare January 8, 2026 22:05
Signed-off-by: Sam Friedman <sam@golioth.io>
Signed-off-by: Sam Friedman <sam@golioth.io>
Signed-off-by: Sam Friedman <sam@golioth.io>
@sam-golioth sam-golioth force-pushed the application_level_acks branch from 4d796f8 to 8655791 Compare January 14, 2026 21:04
Copy link
Collaborator

@trond-snekvik trond-snekvik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sam-golioth I realize I didn't get a review request here, so I appreciate that some pieces are still moving, but since you suggested I take another look at the deadlock scenarios, I went ahead anyway.

I made a sequence diagram for myself to understand the logic here, and I have attached it with some annotations to illustrate the three scenarios that concern me:

  1. The sender's window size is 0 until it receives an ack, so it can't start sending
  2. No one initiates sending, and there's a potential deadlock in the sender's send_data function that will have to be resolved by whoever initiates (and resumes) the data sending
  3. No one resumes sending after an empty payload packet
Image

It's actually not entirely clear to me what the function of the dynamic window size is for pouch. The receivers don't allocate any resources based on window size, and I don't think they'd need to anyway, as the packetizer just pushes data until it's done.

The real need for a dynamic window size is on the sender side, where the senders could/should keep buffers allocated until they've been acked or nacked, so we can retry sending them. At the moment though, the sender just bails whenever it receives a NACK, so it doesn't need it either at the moment.

From what I can deduce, the acks just serve two purposes at the moment:

  • If something goes wrong in the receiver, it can tell the sender to stop (by NACK) without breaking the connection. The sender currently breaks the connection whenever this happens though, and the receiver will NACK an out-of-order packet (which would include potential retries).
  • At the end of the transfer, the sender will wait for the receiver to ack the messages before it tells pouch that the transfer is done. This is good and necessary, but it doesn't require a sliding window. Could this be the only mechanism we need for now?


send_ack(receiver, POUCH_GATT_ACK);

reset_ack_timer(receiver);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Redundant, technically.

receiver->last_acknowledged = receiver->last_received;
}

reset_ack_timer(receiver);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this really be reset if the ack send fails? If that only happens on success, the reset_ack_timer call in the ack_timer_handler below could serve a purpose.

ctx->indicate_params.len = 0;

ctx->state = UPLINK_INDICATE_IN_PROGRESS;
ctx->sender = pouch_gatt_sender_create(ctx->packetizer, send_uplink_data, conn, mtu);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should also have a NULL check, I think

return err;
}

int pouch_gatt_sender_data_available(struct pouch_gatt_sender *sender)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function is never called, but the caller will be responsible for getting us out of the deadlock discussed in send_data



finish:
err = send_ack(receiver, err ? POUCH_GATT_NACK_UNKNOWN : POUCH_GATT_ACK);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't seem right to me. We'll end up here every time we receive data, but we don't actually want to send an ack on every packet -- that would defeat the whole purpose of the sliding window, right?

If the receiver push above is successful, we should just return, as the ack has been scheduled as a timer.

goto finish;
}

atomic_set(&sender->last_sent, pouch_gatt_packetizer_get_sequence(sender->buffer, len));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be reset if the send fails

packets_sent++;
}

finish:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need for gotos here, better to consistently break from the loop

goto finish;
}

if (POUCH_GATT_PACKETIZER_EMPTY_PAYLOAD == ret)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know we're handling this as a special case in the ack-triggered sending, but I wonder if we might just want to return success here? We're going to hit this every time we have less than a window's worth of data to send, so it shouldn't really be an error IMO.

There could be an argument for returning ENODATA if packets_sent is 0, but even then, that's not really an issue, I think.

{
uint8_t ack[POUCH_GATT_ACK_SIZE];
pouch_gatt_ack_encode(ack, sizeof(ack), code, receiver->last_received, receiver->window);
int err = receiver->send_ack(receiver->send_ack_arg, ack, sizeof(ack));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should use size returned from ack_endcode()


atomic_set(&sender->last_sent, UINT8_MAX);
atomic_set(&sender->last_acknowledged, UINT8_MAX);
atomic_set(&sender->offered_window, 0);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nothing gets sent until we have a non-zero window size, but the window size doesn't get set until we get an ack, which we won't get until we send something 🤔

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants