-
Notifications
You must be signed in to change notification settings - Fork 128
Description
I've noticed a possible bug where having an ObservableResource that periodically calls self.updated_state with a new message, i.e.
self.updated_state(aiocoap.Message(code=Code.VALID))... can cause problems if there are multiple observers for the resource.
(Note: in my case the two observers are the same client, which is a bit odd and I plan to fix that on the client side - but I think a version of the bug exists even when the observers are different clients, and it is also possible for a lossy network to result in a redundant observer living longer even if the client tries to cancel it.)
The trivial effect is seeing the "Message ID set on to-be-sent message" in the log each time updated_state is called:
Mon Dec 8 08:46:45 2025 DEBUG Sending message <aiocoap.Message at 0x43aca890: CON 2.03 Valid (MID
38964, token 500c) remote <UDP6EndpointAddress [fdb8:60ca:a3f7:1:a6d4:41b8:946e:7cb8]:8163 (locally fdb8:60ca:a3f7:1:d85b:80bf:b1d9:
1083%wpan0)>, 1 option(s)>
Mon Dec 8 08:46:45 2025 DEBUG Exchange added, message ID: 38964.
Mon Dec 8 08:46:45 2025 WARNING Message ID set on to-be-sent message, this is probably unintended; clearing it.
However the bigger problem happens later if one of the clients goes away and an outgoing observation message times out:
Tue Dec 9 17:48:50 2025 DEBUG Sending message <aiocoap.Message at 0x37c93610: CON 2.03 Valid (MID 23763, token 82f4) remote <UDP6EndpointAddress [fd80:9d9c:eb7c:1:fd6d:5321:80b7:b618]:28164 (locally fd80:9d9c:eb7c:1:b1aa:ef14:6599:8971%wpan0)>, 1 option(s)>
Tue Dec 9 17:48:50 2025 DEBUG Exchange added, message ID: 23763.
Tue Dec 9 17:48:50 2025 WARNING Message ID set on to-be-sent message, this is probably unintended; clearing it.
Tue Dec 9 17:48:50 2025 DEBUG Message to <UDP6EndpointAddress [fd80:9d9c:eb7c:1:fd6d:5321:80b7:b618]:28164 (locally fd80:9d9c:eb7c:1:b1aa:ef14:6599:8971%wpan0)> put into backlog
Tue Dec 9 17:48:52 2025 ERROR Exception in callback MessageManager._schedule_retransmit.<locals>.retr() at /usr/lib/python3.9/site-packages/aiocoap/messagemanager.py:260
Tue Dec 9 17:48:52 2025 handle: <TimerHandle when=13051.910706122799 MessageManager._schedule_retransmit.<locals>.retr() at /usr/lib/python3.9/site-packages/aiocoap/messagemanager.py:260 created at /usr/lib/python3.9/site-packages/aiocoap/messagemanager.py:267>
Tue Dec 9 17:48:52 2025 source_traceback: Object created at (most recent call last):
Tue Dec 9 17:48:52 2025 File "/usr/lib/python3.9/site-packages/aiocoap/pipe.py", line 172, in add_response
Tue Dec 9 17:48:52 2025 self._add_event(self.Event(response, None, is_last))
Tue Dec 9 17:48:52 2025 File "/usr/lib/python3.9/site-packages/aiocoap/pipe.py", line 160, in _add_event
Tue Dec 9 17:48:52 2025 keep_calling = cb(event)
Tue Dec 9 17:48:52 2025 File "/usr/lib/python3.9/site-packages/aiocoap/tokenmanager.py", line 132, in on_event
Tue Dec 9 17:48:52 2025 self.token_interface.send_message(
Tue Dec 9 17:48:52 2025 File "/usr/lib/python3.9/site-packages/aiocoap/messagemanager.py", line 419, in send_message
Tue Dec 9 17:48:52 2025 self._send_initially(message, messageerror_monitor)
Tue Dec 9 17:48:52 2025 File "/usr/lib/python3.9/site-packages/aiocoap/messagemanager.py", line 428, in _send_initially
Tue Dec 9 17:48:52 2025 self._add_exchange(message, messageerror_monitor)
Tue Dec 9 17:48:52 2025 File "/usr/lib/python3.9/site-packages/aiocoap/messagemanager.py", line 206, in _add_exchange
Tue Dec 9 17:48:52 2025 next_retransmission = self._schedule_retransmit(message, timeout, 0)
Tue Dec 9 17:48:52 2025 File "/usr/lib/python3.9/site-packages/aiocoap/messagemanager.py", line 267, in _schedule_retransmit
Tue Dec 9 17:48:52 2025 return self.loop.call_later(timeout, retr)
Tue Dec 9 17:48:52 2025 Traceback (most recent call last):
Tue Dec 9 17:48:52 2025 File "/usr/lib/python3.9/asyncio/events.py", line 80, in _run
Tue Dec 9 17:48:52 2025 File "/usr/lib/python3.9/site-packages/aiocoap/messagemanager.py", line 266, in retr
Tue Dec 9 17:48:52 2025 self._retransmit(message, timeout, retransmission_counter)
Tue Dec 9 17:48:52 2025 File "/usr/lib/python3.9/site-packages/aiocoap/messagemanager.py", line 273, in _retransmit
Tue Dec 9 17:48:52 2025 messageerror_monitor, next_retransmission = self._active_exchanges.pop(key)
Tue Dec 9 17:48:52 2025 KeyError: (<UDP6EndpointAddress [fd80:9d9c:eb7c:1:fd6d:5321:80b7:b618]:28164 (locally fd80:9d9c:eb7c:1:b1aa:ef14:6599:8971%wpan0)>, 23764)
... I think the root cause here is that the same message object is being sent multiple times with different IDs - each time the MessageManager.send_message() clears the message ID (resulting in the log line) and then sets a new message id on the same object.
Then if a retransmit happens for the first recipient, the message id has been rewritten for a later recipient so the key for self._active_exchanges is wrong and the KeyError shown above triggers. (Possibly with different clients it's not a KeyError but still the retransmit will be incorrect as the remote will have also changed on the message.)
I've observed the cascade impact of the KeyError is that entire remote is stalled and the Observer keeps adding messages to the backlog which are never sent, so it becomes a zombie Observe client...
I think the fix may be simple, will submit a PR shortly.
Install info
# python3 -m aiocoap.cli.defaults
Python version: 3.9.15 (main, Oct 23 2022, 20:48:04)
[GCC 8.4.0]
aiocoap version: 0.4.7
Modules missing for subsystems:
dtls: missing DTLSSocket
oscore: missing cryptography, ge25519
linkheader: everything there
prettyprint: missing termcolor, pygments
Python platform: linux
Default server transports: tcpserver:tcpclient:tlsserver:tlsclient:udp6
Selected server transports: tcpserver:tcpclient:tlsserver:tlsclient:udp6
Default client transports: tcpclient:tlsclient:udp6
Selected client transports: tcpclient:tlsclient:udp6
SO_REUSEPORT available (default, selected): True, True
(OpenWRT router, running old Python unfortunately!)