Skip to content

ObservableResource.updated_state(message) can cause messagemanager exception #401

@projectgus

Description

@projectgus

I've noticed a possible bug where having an ObservableResource that periodically calls self.updated_state with a new message, i.e.

        self.updated_state(aiocoap.Message(code=Code.VALID))

... can cause problems if there are multiple observers for the resource.

(Note: in my case the two observers are the same client, which is a bit odd and I plan to fix that on the client side - but I think a version of the bug exists even when the observers are different clients, and it is also possible for a lossy network to result in a redundant observer living longer even if the client tries to cancel it.)

The trivial effect is seeing the "Message ID set on to-be-sent message" in the log each time updated_state is called:

Mon Dec  8 08:46:45 2025 DEBUG Sending message <aiocoap.Message at 0x43aca890: CON 2.03 Valid (MID
 38964, token 500c) remote <UDP6EndpointAddress [fdb8:60ca:a3f7:1:a6d4:41b8:946e:7cb8]:8163 (locally fdb8:60ca:a3f7:1:d85b:80bf:b1d9:
1083%wpan0)>, 1 option(s)>
Mon Dec  8 08:46:45 2025 DEBUG Exchange added, message ID: 38964.
Mon Dec  8 08:46:45 2025 WARNING Message ID set on to-be-sent message, this is probably unintended; clearing it.

However the bigger problem happens later if one of the clients goes away and an outgoing observation message times out:

Tue Dec  9 17:48:50 2025 DEBUG Sending message <aiocoap.Message at 0x37c93610: CON 2.03 Valid (MID 23763, token 82f4) remote <UDP6EndpointAddress [fd80:9d9c:eb7c:1:fd6d:5321:80b7:b618]:28164 (locally fd80:9d9c:eb7c:1:b1aa:ef14:6599:8971%wpan0)>, 1 option(s)>
Tue Dec  9 17:48:50 2025 DEBUG Exchange added, message ID: 23763.
Tue Dec  9 17:48:50 2025 WARNING Message ID set on to-be-sent message, this is probably unintended; clearing it.
Tue Dec  9 17:48:50 2025 DEBUG Message to <UDP6EndpointAddress [fd80:9d9c:eb7c:1:fd6d:5321:80b7:b618]:28164 (locally fd80:9d9c:eb7c:1:b1aa:ef14:6599:8971%wpan0)> put into backlog
Tue Dec  9 17:48:52 2025 ERROR Exception in callback MessageManager._schedule_retransmit.<locals>.retr() at /usr/lib/python3.9/site-packages/aiocoap/messagemanager.py:260
Tue Dec  9 17:48:52 2025 handle: <TimerHandle when=13051.910706122799 MessageManager._schedule_retransmit.<locals>.retr() at /usr/lib/python3.9/site-packages/aiocoap/messagemanager.py:260 created at /usr/lib/python3.9/site-packages/aiocoap/messagemanager.py:267>
Tue Dec  9 17:48:52 2025 source_traceback: Object created at (most recent call last):
Tue Dec  9 17:48:52 2025   File "/usr/lib/python3.9/site-packages/aiocoap/pipe.py", line 172, in add_response
Tue Dec  9 17:48:52 2025     self._add_event(self.Event(response, None, is_last))
Tue Dec  9 17:48:52 2025   File "/usr/lib/python3.9/site-packages/aiocoap/pipe.py", line 160, in _add_event
Tue Dec  9 17:48:52 2025     keep_calling = cb(event)
Tue Dec  9 17:48:52 2025   File "/usr/lib/python3.9/site-packages/aiocoap/tokenmanager.py", line 132, in on_event
Tue Dec  9 17:48:52 2025     self.token_interface.send_message(
Tue Dec  9 17:48:52 2025   File "/usr/lib/python3.9/site-packages/aiocoap/messagemanager.py", line 419, in send_message
Tue Dec  9 17:48:52 2025     self._send_initially(message, messageerror_monitor)
Tue Dec  9 17:48:52 2025   File "/usr/lib/python3.9/site-packages/aiocoap/messagemanager.py", line 428, in _send_initially
Tue Dec  9 17:48:52 2025     self._add_exchange(message, messageerror_monitor)
Tue Dec  9 17:48:52 2025   File "/usr/lib/python3.9/site-packages/aiocoap/messagemanager.py", line 206, in _add_exchange
Tue Dec  9 17:48:52 2025     next_retransmission = self._schedule_retransmit(message, timeout, 0)
Tue Dec  9 17:48:52 2025   File "/usr/lib/python3.9/site-packages/aiocoap/messagemanager.py", line 267, in _schedule_retransmit
Tue Dec  9 17:48:52 2025     return self.loop.call_later(timeout, retr)
Tue Dec  9 17:48:52 2025 Traceback (most recent call last):
Tue Dec  9 17:48:52 2025   File "/usr/lib/python3.9/asyncio/events.py", line 80, in _run
Tue Dec  9 17:48:52 2025   File "/usr/lib/python3.9/site-packages/aiocoap/messagemanager.py", line 266, in retr
Tue Dec  9 17:48:52 2025     self._retransmit(message, timeout, retransmission_counter)
Tue Dec  9 17:48:52 2025   File "/usr/lib/python3.9/site-packages/aiocoap/messagemanager.py", line 273, in _retransmit
Tue Dec  9 17:48:52 2025     messageerror_monitor, next_retransmission = self._active_exchanges.pop(key)
Tue Dec  9 17:48:52 2025 KeyError: (<UDP6EndpointAddress [fd80:9d9c:eb7c:1:fd6d:5321:80b7:b618]:28164 (locally fd80:9d9c:eb7c:1:b1aa:ef14:6599:8971%wpan0)>, 23764)

... I think the root cause here is that the same message object is being sent multiple times with different IDs - each time the MessageManager.send_message() clears the message ID (resulting in the log line) and then sets a new message id on the same object.

Then if a retransmit happens for the first recipient, the message id has been rewritten for a later recipient so the key for self._active_exchanges is wrong and the KeyError shown above triggers. (Possibly with different clients it's not a KeyError but still the retransmit will be incorrect as the remote will have also changed on the message.)

I've observed the cascade impact of the KeyError is that entire remote is stalled and the Observer keeps adding messages to the backlog which are never sent, so it becomes a zombie Observe client...

I think the fix may be simple, will submit a PR shortly.

Install info

# python3 -m aiocoap.cli.defaults
Python version: 3.9.15 (main, Oct 23 2022, 20:48:04) 
[GCC 8.4.0]
aiocoap version: 0.4.7
Modules missing for subsystems:
    dtls: missing DTLSSocket
    oscore: missing cryptography, ge25519
    linkheader: everything there
    prettyprint: missing termcolor, pygments
Python platform: linux
Default server transports:  tcpserver:tcpclient:tlsserver:tlsclient:udp6
Selected server transports: tcpserver:tcpclient:tlsserver:tlsclient:udp6
Default client transports:  tcpclient:tlsclient:udp6
Selected client transports: tcpclient:tlsclient:udp6
SO_REUSEPORT available (default, selected): True, True

(OpenWRT router, running old Python unfortunately!)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions