How long can a client be connected to a server via websockets? Is there a time limit, can they potentially be connected together for years?
A WebSocket connection can in theory last forever. Assuming the endpoints remain up, one common reason why long-lived TCP connections eventually terminate is inactivity. WebSockets have a ping-pong mechanism which among other things can avoid closure by "smart" network routers which often have an inactivity timeout of a few hours. But if your application actively sends data (in either direction) at least once an hour, that's probably enough.
Still, "forever" is unlikely to be achieved in practice, because TCP connections on most networks eventually get terminated by some outage or other. You'll want intelligent reconnection logic if reliability is important.
Related
I am looking for best practices to handle server restarts. Specifically, I push stock prices to users using websockets for a day trading simulation web app. I have 10k concurrent users. To ensure a responsive ux, I reconnect to the websocket when the onclose event is fired. As our user base has grown we have had to scale our hardware. In addition to better hardware, we have implemented a random delay before reconnecting. The goal of this is to spread out the influx of handshakes when the server restarts ever night (Continuous Deployment). However some of our users have poor internet (isp and or wifi). Their connection constantly drops. For these users I would prefer they reconnect immediately. Is there a solution for this problem that doesn't have the aforementioned tradeoffs?
The question is calling for a subjective response, here is mine :)
Discriminating a client disconnection and a server shutdown:
This can be achieved by sending a shutdown message over the websocket so that active clients can prepare and reconnect with a random delay. Thus, a client that encounters an onclose event without a proper shutdown broadcast would be able to reconnect asap. This means that the client application needs to be modified to account for this special shutdown event.
Handle the handshake load: Some web servers can handle incoming connections as an asynchronous parallel event queue, thus at most X connections will be initialized at the same time (in parallel) and others will wait in a queue until their turn comes. This allows to safeguard the server performance and the websocket handshake will thus be automatically delayed based on the true processing capabilities of the server. Of course, this means a change of web server technology and depends on your use-case.
I have a Websocket connection being served from http-kit (Clojure, and it works great). I send pings from the client to make sure we're still connected, and everything works fine there. My question is, do people bother pinging client from server in these cases?
I was trying to set something up to remove the channel from the server if I didn't get a response, but it's not very functional-friendly to set up timed processes and alter state to track the ping-pong cycle, so it was getting a little ugly. Then I thought, the server can handle hundreds of thousands of simultaneous connections, should I just not worry about a few broken threads? How do people typically handle (or not handle) this?
The WebSocket protocol itself has heart beating to keep the connection alive. If you wanted an additional layer on top of that you could use the STOMP protocol, which coordinates heartbeats between client/server.
The one STOMP implementation I know of for the JVM is Stampy. There’s one for JS too, stompjs. Note: the heartbeat implementation differs between these libs, I believe the Stampy one is incorrect. You’d have to roll your own.
We have a node instance that has about 2500 client socket connections, everything runs fine except occasionally then something happens to the service (restart or failover event in azure), when the node instances comes back up and all socket connections try to reconnect the service comes to a halt and the log just shows repeated socket connect/disconnects. Even if we stop the service and start it the same thing happens, we currently send out a package to our on premise servers to kill the users chrome sessions then everything works fine as users begin logging in again. We have the clients currently connecting with 'forceNew' and force web sockets only and not the default long polling than upgrade. Any one ever see this or have ideas?
In your socket.io client code, you can force the reconnects to be spread out in time more. The two configuration variables that appear to be most relevant here are:
reconnectionDelay
Determines how long socket.io will initially wait before attempting a reconnect (it should back off from there if the server is down awhile). You can increase this to make it less likely they are all trying to reconnect at the same time.
randomizationFactor
This is a number between 0 and 1.0 and defaults to 0.5. It determines how much the above delay is randomly modified to try to make client reconnects be more random and not all at the same time. You can increase this value to increase the randomness of the reconnect timing.
See client doc here for more details.
You may also want to explore your server configuration to see if it is as scalable as possible with moderate numbers of incoming socket requests. While nobody expects a server to be able to handle 2500 simultaneous connections all at once, the server should be able to queue up these connection requests and serve them as it gets time without immediately failing any incoming connection that can't immediately be handled. There is a desirable middle ground of some number of connections held in a queue (usually controllable by server-side TCP configuration parameters) and then when the queue gets too large connections are failed immediately and then socket.io should back-off and try again a little later. Adjusting the above variables will tell it to wait longer before retrying.
Also, I'm curious why you are using forceNew. That does not seem like it would help you. Forcing webSockets only (no initial polling) is a good thing.
We had issues with lot of Applications connecting to MQ server without properly doing a disconnect. Hence we introduced DISCINT on our server connection channels with a value 1800 sec which we found ideal for our transactions. But our Keep Alive interval is pretty high with 900 sec. We would like to reduce that less than 300 as suggested by mqconfig util. But before doing that I would like to know if this is going to affect our disconnect interval value and whether it is going to override our disconnect interval value and make more frequent disconnects which will be a performance hit for us.
How does both these values work and how they are related?
Thanks
TCP KeepAlive works below the application layer in the protocol stack, so it does not affect the disconnecting of the channel configured by the DISCINT.
However lowering the value can result in more frequent disconnects, if your network is unreliable, for example has intermittent very short (shorter then the current KeepAlive, but longer then the new) periods when packets are not flowing.
I think the main difference is, that DISCINT is for disconnecting a technically working channel, which is not used for a given period, while KeepAlive is for detecting a not working TCP connection.
And MQ provides means to detect not working connections in the application layer too, configured by the heartbeat interval.
These may help:
http://www-01.ibm.com/support/knowledgecenter/SSFKSJ_7.5.0/com.ibm.mq.con.doc/q015650_.htm
http://www-01.ibm.com/support/knowledgecenter/SSFKSJ_7.5.0/com.ibm.mq.ref.con.doc/q081900_.htm
http://tldp.org/HOWTO/TCP-Keepalive-HOWTO/overview.html
http://www-01.ibm.com/support/knowledgecenter/SSFKSJ_7.5.0/com.ibm.mq.ref.con.doc/q081860_.htm
I am doing some performance testing with a large number of threads. Each thread is sending HTTP requests to another IP. It looks like at some stages the connections are closed (because there are too many threads) and then of course have to be reopned.
I am looking to get some ball park figures for how long it takes windows to Open TCP connections.
Is there any way I can get this?
Thanks.
This is highly dependent on the endpoints you're trying to connect to, is it not?
As an extreme best case, you can test it yourself by targeting an IIS on localhost.
I wouldn't be surprised if routers and servers that you are connecting through may drop connections as a measure against what could be perceived as connection storms or even denial-of-service attacks.