We have a websocket server for real-time communication deployed on Kubernetes. when the server redeployed, all connection will be disconnected. And the client will reconnect when detect connection closed. At this time, the server will receive a large amount of traffic(about 200K connections) instantly, it may make the server overloaded.
We try to add the time interval between restarting multiple pods, but the reconnect requests at the same time are still higher than expected, and this way is not work if out server version is not forward compatible.
Is there any way to solve the problem of websocket server deployment in this scenario?
Or is there any way to be able to stay connected while redeploying the server?
Related
Our infrastructure is composed by
1 F5 load balancer
3 nodes
We have an application which uses websockets, so when a user goes to our site, it opens a websocket to the balancer which it connects to the first available node, and it works as expected.
Our truobles arrives with maintenance tasks, when we have to update our software, we need to turn offline 1 node at a time, deploy the new release and then turn it on again. Doing this task, the balancer drops the open websocket connections to the node and the clients retries to connect after few seconds to the first available nodes, creating an inconvenience for the client because he could miss a signal (or more).
How we can keep the connection between the client and the balancer, changing the backend websocket server? Is the load balancer enough to achieve our goal or we need to change our infrastructure?
To avoid this kind of problems I recommend to read about the Azure SignalR. With this you don't need to thing about stuff like load balancer, redis backplane and other infrastructures that you possibly need to a WebSockets connection.
Basically the clients will not connected to your node directly but redirected to Azure SignalR. You can read more about it here: https://learn.microsoft.com/en-us/azure/azure-signalr/signalr-overview
Since it is important to your application to maintain the connection, I don't see how any other way to archive no connection drop to your nodes, since you need to shut them down.
It's important to understand that the F5 is a full TCP proxy. This means that the F5 is the server to the client and the client to the server. If you are using the websockets protocol then you must apply a websockets profile to the F5 Virtual Server in order for the websockets application to be handled properly by the Load Balancer.
Details of the websockets profile can be found here: https://support.f5.com/csp/article/K14754
If a websockets and an HTTP profile are applied to the Virtual Server - meaning that you have websockets and web traffic using the same port and LB nodes - then the F5 will allow the websockets traffic as passthrough. Also keep in mind that if this is an HTTPS virtual sever that you will need to ensure a client and server side HTTPS profile (SSL offload) are applied to the Virtual Server.
While there are a variety of ways that you can fiddle with load balancers to minimize the downtime caused by a software upgrade, none of them solve the problem, which is that your application-layer protocol seems to not tolerate some small network outages.
Even if you have a perfect load balancer and your software deploys cause zero downtime, the customer's computer may be on flaky wifi which causes a network dropout for half a second - or going over ethernet and someone reconfigures some routing on their LAN, etc.
I'd suggest having your server maintain a queue of messages for clients (up to some size/time limit) so that when a client drops a connection - whether it be due to load balancers/upgrades - or any other reason, it can continue without disruption.
I am setting up an MQTT/Websockets server, my client is an flutter app, which connects to the broker on main screen, and in other screens it sends and receive messages from the broker. My understanding of keepAlive is how often the client and server should share ping/pong, so they make sure the connection is still alive. being said, if my flutter app, connects to the broker in main screen, of 3600/1 hour keepAlive, and suppose to share and receive messages on other screens, if i disconnect the client from the internet for 2 minutes, and reconnect after that, it will not send/receive messages, maybe my understanding of keepAlive is not correct. Well, How would i structure my app/server to reconnect automatically to the internet as soon as internet connection is back and up again.
I have also tried On.Disconnect method, which i noticed it will never get called, and the app even though still thinks its connected to the broker.
I mentioned websockets, on the tags as i could do mqtt over websockets.
I see that no-one else has responded, so I'll try (however I'm new to this also).
Also, have you looked at the Flutter connectivity package?
From my reading of the Mqtt specification, it seems the Mqtt client ** should** disconnect the TCP/IP connection if it doesn't receive a PINGRESP to its PINGREQ in the keep alive period (ie it's not required to disconnect).
My Flutter + Mqtt app checks the connection state, and reconnects if needed, every time it sends a message. I haven't needed to check for internet dropouts, but I have noticed the connection is lost on some application state changes. The main app widget. is notified of these using didChangeAppLifecycleState() and sends a dummy message if needed.
So this doesn't answer exactly what you asked, but I hope it's useful anyway.
We have a node instance that has about 2500 client socket connections, everything runs fine except occasionally then something happens to the service (restart or failover event in azure), when the node instances comes back up and all socket connections try to reconnect the service comes to a halt and the log just shows repeated socket connect/disconnects. Even if we stop the service and start it the same thing happens, we currently send out a package to our on premise servers to kill the users chrome sessions then everything works fine as users begin logging in again. We have the clients currently connecting with 'forceNew' and force web sockets only and not the default long polling than upgrade. Any one ever see this or have ideas?
In your socket.io client code, you can force the reconnects to be spread out in time more. The two configuration variables that appear to be most relevant here are:
reconnectionDelay
Determines how long socket.io will initially wait before attempting a reconnect (it should back off from there if the server is down awhile). You can increase this to make it less likely they are all trying to reconnect at the same time.
randomizationFactor
This is a number between 0 and 1.0 and defaults to 0.5. It determines how much the above delay is randomly modified to try to make client reconnects be more random and not all at the same time. You can increase this value to increase the randomness of the reconnect timing.
See client doc here for more details.
You may also want to explore your server configuration to see if it is as scalable as possible with moderate numbers of incoming socket requests. While nobody expects a server to be able to handle 2500 simultaneous connections all at once, the server should be able to queue up these connection requests and serve them as it gets time without immediately failing any incoming connection that can't immediately be handled. There is a desirable middle ground of some number of connections held in a queue (usually controllable by server-side TCP configuration parameters) and then when the queue gets too large connections are failed immediately and then socket.io should back-off and try again a little later. Adjusting the above variables will tell it to wait longer before retrying.
Also, I'm curious why you are using forceNew. That does not seem like it would help you. Forcing webSockets only (no initial polling) is a good thing.
I've noticed that with socket.io server, if the client's network times out, then the server never gets the disconnect event. How do I handle disconnect in case the client silently times out? Or do I get the disconnect even after very long time? Can anyone explain what happens in socket.io server, when client silently disconnects (power plugged, router crashed, network issues, etc.)
Case:
A WebSocket connection have been established between the client and server endpoint.
Now I have the network connection go down (for example the ADSL dies), after 10 min I recover the network, I find that the client and server are still able to communicate with each
other. Why?
Note:
The client was developed with Java-WebSocket framework, and the client did with ws4py.
1 - If they did not try to exchange any data and only the connection (not the endpoints) between them is down, this is normal behaviour.
2 - If the websocket connection ended, Browser may have re-established it without you knowing about it. I just checked that this is not normal behaviour. But maybe there is some parameter somewhere :-)