(Akka 2.6.5, akka-http 10.1.12)
I have a server/client websocket setup using Source.queue and Sink.actorRef on each side of the connection.
It turns out my system has a rather critical and unexpected flaw (found in production no less):
Sink actor fails and terminates (Dead letters are logged)
Sink actor is sent Stream Failure message (configured in Sink.actorRef construction) - this also is logged at dead letters, since the actor is indeed dead.
So we have a finished web socket stream right? That's what Half-closed websockets would say (although, just noticing the heading id is "half-closed-client-websockets...)
What happens instead is... nothing. The connected client stays connected - there's no complete message or failure.
Is there something configuration I need to actively tell akka to fully close Http on failures like this?
Testing
I reproduced the issue in integrated testing:
Establish connection
Sleep for 70 seconds (just to ensure keep-alives are configured/working properly)
Send a message from server
Ensure receipt on client
Kill server actor sink (and see same Stream Failure -> dead letters as above)
Wait for client to acknowledge completion (100 seconds) - either:
If I did nothing -> Timeout
If I sent message from client to server before waiting for completion:
After 60s: Aborting tcp connection ... because of upstream failure: TcpIdleTimeoutException
Stream failed sent to client sink.
Notes
I've deliberately not included code at this stage because I'm trying to understand the technology properly - either I've found a bug, or a fundamental misunderstanding of how the web sockets are meant to work (and fail). If you think I should include code, you'll need to convince me on how it might help create an answer
In production, this failure to close meant that the websocket client was waiting for data for 12 hours (it wasn't attempting to send messages at the time)
Related
As I understand it, Websockets use a Ping to detect that they are still connected. Except of course Chrome which leaves it to apps to do the ping themselves.
I'd like to understand if its possible for a connection to become unstable between pings such that a frame of data is not received... but to stabilize again by the time the next ping is sent. In other words: is it possible to have an apparently good websocket connection, but for data to fail to arrive?
Question relates to Is it possible to miss websocket events which remains unanswered and side-tracked into long-polling and socket-io.
Thanks!
This is heavily dependent on the client software (browser) that you use.
The websockets depend on a TCP connection which will make sure the message arrives to destination. Except if the network connection is down, of course.
However, some clients (browsers) will suspend the inactive tabs and will not process the events. If your page is inactive, it "may" fail to send data to the server because it will not be executed at all. On the other hand, it "may" also fail to receive data because the handler will not be executed at all.
Meanwhile, even if inactive, the machine will still receive the ping packets. So it is really about whether or not your client software gives it back to your code or not.
I have a client/server setup in which clients send a single request message to the server and gets a bunch of data messages back.
The server is implemented using a ROUTER socket and the clients using a DEALER. The communication is asynchronous.
The clients are typically iPads/iPhones and they connect over wifi so the connection is not 100% reliable.
The issue I’m concern about is if the client connects to the server and sends a request for data but before the response messages are delivered back the communication goes down (e.g. out of wifi coverage).
In this case the messages will be queued up on the server side waiting for the client to reconnect. That is fine for a short time but eventually I would like to drop the messages and the connection to release resources.
By checking activity/timeouts it would be possible in the server and the client applications to identify that the connection is gone. The client can shutdown the socket and in this way free resources but how can it be done in the server?
Per the ZMQ FAQ:
How can I flush all messages that are in the ZeroMQ socket queue?
There is no explicit command for flushing a specific message or all messages from the message queue. You may set ZMQ_LINGER to 0 and close the socket to discard any unsent messages.
Per this mailing list discussion from 2013:
There is no option to drop old messages [from an outgoing message queue].
Your best bet is to implement heartbeating and, when one client stops responding without explicitly disconnecting, restart your ROUTER socket. Messy, I know, this is really something that should have a companion option to HWM. Pieter Hintjens is clearly on board (he created ZMQ) - but that was from 2011, so it looks like nothing ever came of it.
This is a bit late but setting tcp keepalive to a reasonable value will cause dead sockets to close after the timeouts have expired.
Heartbeating is necessary for either side to determine the other side is still responding.
The only thing I'm not sure about is how to go about heartbeating many thousands of clients without spending all available cpu just on dealing with the heartbeats.
I'm using uwsgi's websockets support and so far it's looking great, the server detects when the client disconnects and the client as well when the server goes down. But i'm concerned this will not work in every case/browser.
In other frameworks, namely sockjs, the connection is monitored by sending regular messages that work as heartbeats/pings. But uwsgi sends PING/PONG frames (ie. not regular messages/control frames) according to the websockets spec and so from the client side i have no way to know when the last ping was received from the server. So my question is this:
If the connection is dropped or blocked by some proxy will browsers reliably (ie. Chrome, IE, Firefox, Opera) detect no PING was received from the server and signal the connection as down or should i implement some additional ping/pong system so that the connection is detected as closed from the client side?
Thanks
You are totally right. There is no way from client side to track or send ping/pongs. So if the connection drops, the server is able of detecting this condition through the ping/pong, but the client is let hung... until it tries to send something and the underlying TCP mechanism detect that the other side is not ACKnowledging its packets.
Therefore, if the client application expects to be "listening" most of the time, it may be convenient to implement a keep alive system that works "both ways" as Stephen Clearly explains in the link you posted. But, this keep alive system would be part of your application layer, rather than part of the transport layer as ping/pongs.
For example you can have a message "{token:'whatever'}" that the server and client just echoes with a 5 seconds delay. The client should have a timer with a 10 seconds timeout that stops every time that messages is received and starts every time the message is echoed, if the timer triggers, the connection can be consider dropped.
Although browsers that implement the same RFC as uWSGI should detect reliably when the server closes the connection cleanly they won't detect when the connection is interrupted midway (half open connections)t. So from what i understand we should employ an extra mechanism like application level pings.
I have an EventMachine server sending TCP data down to a Mac client (via GCDAsyncSocket). It always works flawlessly for a while, but inevitably the server suddenly stops sending data on a connection-by-connection basis. The connection is still maintained, and the server still receives data from the client, but it doesn't go the other way.
When this happens, I've discovered via connection#get_outbound_data_size that the connection send buffer is filling up infinitely (via #send_data) and not being sent to the client.
Are there specific (and hopefully fixable) reasons why this might occur? The reactor keeps humming along, and other active connections to the server continue working fine (though they sometimes fall into buffer hell as well).
I see one reason at least: when the remote client no longer read data from its side of the TCP connection (with a recv() call or whatever).
Then, the scenario is: the receiving TCP buffer on the client side becomes full. And the OS can no longer accepts TCP pacquets from its peer, since it cannot store them queue them. As a consequence, the sending TCP buffer on the server side becomes full too as your application continue to send paquets on the socket! Soon your server is no longer able to write into the socket since the send() system call will :
blocks undefinitively. (waiting for buffer to empty enough for the new paquet)
ot returns with an EWOULDBLOCK error. (if you configured your socket as a non-blocking one)
I usually met that kind of use case in TEST environment when I put a breakpoint in my code on the client side.
There was a patch was applied to GCDAsyncSocket on March 23 that prevents the reads from stopping. Did this patch solve your problem?
I have observed the following behavior in Firefox 4 and Chrome 7:
If the server running the websocket daemon crashes, reboots, loses network connectivity, etc then the 'onclose' or 'onerror' events are not fired on the client-side. I would expect one of those events to be fired when the connection is broken for any reason.
If however the daemon is shutdown cleanly first, then the 'onclose' event is fired (as expected).
Why do the clients perceive the websocket connection as open when the daemon is not shutdown properly?
I want to rely on the expected behavior to inform the user that the server has become unavailable or that the client's internet connection has suffered a disruption.
TCP is like that. The most recent WebSockets standard draft (v76) has a clean shutdown message mechanism. But without that (or if it doesn't have a chance to be sent) you are relying on normal TCP socket cleanup which make take several minutes (or hours).
I would suggest adding some sort of signal handler/exit trap to the server so that when the server is killed/shutdown, a clean shutdown message is sent to all connected clients.
You could also add a heartbeat mechanism (ala TCP keep alive) to your application to detect when the other side goes away.