SSE Connection Best Practices

This article pertains to: INFORM (V2)

SSE connections, by their nature, are one way connections. The server sends and the client receives. There is no way for the client to acknowledge the events or the connection itself and confirm to the server that things are ok. As such, the server isn't listening for that. What this ultimately means is the client is responsible for managing the connection. There are two recommendations that ideally should be used in conjunction with each other.

The first recommendation is to force the disconnect of the of the connection after a set amount of consecutive POKEs. Consecutive POKEs means no DATA is actively coming through. It takes a large volume of data to have a connection not have consecutive POKEs. Youcan determine what makes sense for your organization, but 5 is a good starting point. 5 Pokes is about 25 seconds. There has been no data in 25 seconds, so you disconnect, let some data build up and, after some period of time, reconnect. You would then wait a few seconds and reconnect. If before there are any DATA events, there are, for example, 5 more consecutive POKEs, you would reconnect again, this time waiting even more time before connecting again. Continue this pattern and increase the reconnect time (to a maximum) until there are some DATA events again. At any point that there is a DATA event, the counters and timers reset. This pattern is recommended because it doesn't unnecessarily leave a connection with no actionable data open, which is good for "connection hygiene" (see recommendation 2 below) and is actually considered good security practice. Only leave connections open that have value.

So, why do connections "go stale"? The fact is TCP/IP and the Internet built on it is designed to work with disruption. Network outages, re-routes, and other hiccups occur all the time. The longer a connection stays open, the higher the likelihood such a hiccup will impact it. Unfortunately, some connections require long-running connections and are one-way in nature, such as an SSE stream. And, as indicated above, in this scenario, the client has to be responsible for determining the health of the connection. This is because it is receiving data. The server has no way of knowing if the client actually received it, and therefore can't really determine if there is an issue on the connection.

The recommendation, in this case, is for the client to restart the connection as soon as it determines no events are actually coming through. This is the real value of the POKE events. We expect them every 5 seconds, so theoretically, not receiving any POKE events (or DATA events) for a period of more than 5 seconds should be alarming. But don't move too quickly. Sometimes, things happen, and a POKE is delayed, though not likely ever missing. So, wait 30 seconds or a minute. If no POKE comes in during that time, it's probably time to kill the client side of the connection and try again. The longer this time is, the longer the server might be trying to send DATA events up the connection, despite the client not receiving it. This can mean dropped events. As such, in the event the client determines it needs to shut down the connection for this reason, it should also delete the stream and create a new one. Unfortunately, the stream creation process is only granular at the day level, so it would then be required to re-process events for the entirety of the day to the point of termination. And, because even a single "hung" connection can lose DATA events, you must terminate all connections before recreating the stream. Lost events are lost. Only a new stream or a replay stream can be used to recover. And despite having a replay stream, the recreate process is highly recommended unless the org is so high volume that additional resources beyond their maximum 20 connections are required to handle the normal volume along with replay.