Missed notifications because `Postgrex.Notifications` is started with `auto_reconnect: true`

From [the `Postgrex.Notifications` documentation](https://hexdocs.pm/postgrex/Postgrex.Notifications.html#module-async-connect-auto-reconnects-and-missed-notifications):

> Note however that when the notification system is waiting for a connection, any notifications that occur during the disconnection period are not queued and cannot be recovered. Similarly, any listen command will be queued until the connection is up.
> 
> There is a race condition between starting to listen and notifications being issued "at the same time", as explained [in the PostgreSQL documentation](https://www.postgresql.org/docs/current/sql-listen.html). If your application needs to keep a consistent representation of data, follow the three-step approach of first subscribing, then obtaining the current state of data, then handling the incoming notifications.
> 
> **Beware that the same race condition applies to auto-reconnects**. A simple way of dealing with this issue is not using the auto-reconnect feature directly, but monitoring and re-starting the Notifications process, then subscribing to channel messages over again, using the same three-step approach.

However, in the configuration we have this:

https://github.com/commanded/eventstore/blob/0bf4f2e6ecbfe8be72e66b54bd047d9441c0479c/lib/event_store/config.ex#L125-L129

So essentially, there is the possibility that some event notifications are missed if the DB connection used by the `Postgrex.Notifications` process loses connection to the DB. Even if it reconnects automatically, some event notifications may have been missed due to the race condition cited above.

The `Subscriptions.Subscription` process is able to detect when it has not received an inbetween event, as seen here:

https://github.com/commanded/eventstore/blob/0bf4f2e6ecbfe8be72e66b54bd047d9441c0479c/lib/event_store/subscriptions/subscription_fsm.ex#L142-L146

However, this logic only applies when a next event is received. But for quiet or not-so-busy streams, a long time can happen before a new event is emitted. And during this time the application will remain in a broken state because it will not know that it has missed event notifications.

We have experienced problems in production with subscriptions getting stuck and not processing events until they are restarted. The issue I am describing above might be one of the reasons.

To fix this we would need to not automatically reconnect and introduce the possibility to do a catch up in some way so that any missed notification can be retrieved and processed. I am not sure whether this logic can be done directly from the `Notifications.Listener` process in a general way, or if it has to be done from each individual subscription, depending on their last seen event.

	def postgrex_notifications_opts(config, name) do
	config
	\|> session_mode_pool_config()
	\|> default_postgrex_opts()
	\|> Keyword.put(:auto_reconnect, true)

	future when future > expected_event ->
	Logger.debug(describe(data) <> " received unexpected event(s), requesting catch up")

	# Missed event(s), request catch-up with any unseen events from storage
	next_state(:request_catch_up, data)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Missed notifications because `Postgrex.Notifications` is started with `auto_reconnect: true` #309

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Missed notifications because Postgrex.Notifications is started with auto_reconnect: true #309

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Missed notifications because `Postgrex.Notifications` is started with `auto_reconnect: true` #309