Amazon Load Balancers dropping Web Socket connections to TorqueBox - ruby

I'm running TorqueBox on Amazon AWS. I've created a load balancer, which does TCP pass through for Web Socket connections on port 8675. When I first load up the page this seems to work quite nicely, however if I leave the page open for a while, the connection just stops working. I don't get an error message, it just silently ignores any further messages sent over the connection. If I reload the page at this point, everything works fine again.
I've tried connecting to individual nodes in the cluster directly, and the connection does not get dropped in that case, so my suspicion is that it has something to do with the load balancers.
Any ideas what might be causing this?

More information about your specific architecture might be useful, but my first guess is that you should enable session stickiness so that requests from the same host get directed to the same machine on AWS (if the request gets directed to another machine the protocol would have to be renegociated).

Related

Laravel-websoket - broadcasts nothing if the backend is on another port

My laravel works like restapi. Front and back on different ports, nginx proxies requests to the back or front.
I can connect from the browser to the web socket on port 6001 without any problems, as can be seen in the statistics, there are no errors in the console. The statistics itself is also available on port 6001, i.e. the web socket is working fine.
But Laravel broadcasts no events on the working server, in statistics and console it is empty.
Nothing blocks traffic, the firewall is disabled.
I spent half a day, but still did not understand what the problem was.
Any thoughts ...
Friends, as always, the answer was simple. In the production process, I did not fix the .env log on the pusher (((

Load balancer and WebSockets

Our infrastructure is composed by
1 F5 load balancer
3 nodes
We have an application which uses websockets, so when a user goes to our site, it opens a websocket to the balancer which it connects to the first available node, and it works as expected.
Our truobles arrives with maintenance tasks, when we have to update our software, we need to turn offline 1 node at a time, deploy the new release and then turn it on again. Doing this task, the balancer drops the open websocket connections to the node and the clients retries to connect after few seconds to the first available nodes, creating an inconvenience for the client because he could miss a signal (or more).
How we can keep the connection between the client and the balancer, changing the backend websocket server? Is the load balancer enough to achieve our goal or we need to change our infrastructure?
To avoid this kind of problems I recommend to read about the Azure SignalR. With this you don't need to thing about stuff like load balancer, redis backplane and other infrastructures that you possibly need to a WebSockets connection.
Basically the clients will not connected to your node directly but redirected to Azure SignalR. You can read more about it here: https://learn.microsoft.com/en-us/azure/azure-signalr/signalr-overview
Since it is important to your application to maintain the connection, I don't see how any other way to archive no connection drop to your nodes, since you need to shut them down.
It's important to understand that the F5 is a full TCP proxy. This means that the F5 is the server to the client and the client to the server. If you are using the websockets protocol then you must apply a websockets profile to the F5 Virtual Server in order for the websockets application to be handled properly by the Load Balancer.
Details of the websockets profile can be found here: https://support.f5.com/csp/article/K14754
If a websockets and an HTTP profile are applied to the Virtual Server - meaning that you have websockets and web traffic using the same port and LB nodes - then the F5 will allow the websockets traffic as passthrough. Also keep in mind that if this is an HTTPS virtual sever that you will need to ensure a client and server side HTTPS profile (SSL offload) are applied to the Virtual Server.
While there are a variety of ways that you can fiddle with load balancers to minimize the downtime caused by a software upgrade, none of them solve the problem, which is that your application-layer protocol seems to not tolerate some small network outages.
Even if you have a perfect load balancer and your software deploys cause zero downtime, the customer's computer may be on flaky wifi which causes a network dropout for half a second - or going over ethernet and someone reconfigures some routing on their LAN, etc.
I'd suggest having your server maintain a queue of messages for clients (up to some size/time limit) so that when a client drops a connection - whether it be due to load balancers/upgrades - or any other reason, it can continue without disruption.

Google Cloud Platform - load balancer websocket keep disconnecting after few seconds

We are using 2 servers and have setup load balancer to redirect the trafic. Both servers are Compute engines.
We are also using websocket (socket.io) to keep the connection between users (online and offline status). When connection is established between users, it gets disconnected after few seconds. We concluded that it is load balancer configuration issue as if we use single server (without load balancer), connection remains alive until user goes offline.
We need help here if we need to do anything extra in load balancer configurations to work it smoothly with websocket.
Using ip addresses, not domain name (if that makes any difference)

HTTP GET requests work but POST requests do not

Our Spring application is running on several different servers. For one of those servers POST requests do not seem to be working. All site functionality that uses GET requests works completely fine; however, as soon as I hit something that uses a POST request (ex. form submit) the site just hangs permanently. The server won't give any response. We can see the requests in Tomcat Manager but they don't time out.
Has anyone ever seen this?
We have found the problem. Our DBA accidentally deleted the MySQL database files on that particular server (/sigh). In our Spring application we use GET requests for record retrieval and the records we were trying to retrieve must have been cached by MySQL. This made it seem as if GET requests were working. When trying to add new data to the database, which we use POST requests to do, Tomcat would wait for a response, which never came, from MySQL.
In my experience if you're getting a timeout error it's almost always due to not having correct ports open for your application. For example, go into your virtual machine's rules and insure port 8080, 8443 or 80, 443 are open for http and https traffic.
In google cloud platform: its under VPC networking -> firewall rules. Azure and AWS are similar.

Loadbalancing web sockets

I have a server which supports web sockets. Browsers connect to my site and each one opens a web socket to www.mydomain.example. That way, my social network app can push messages to the clients.
Traditionally, using just HTTP requests, I would scale up by adding a second server and a load balancer in front of the two web servers.
With web sockets, the connection has to be directly with the web server, not the load balancers, because if a machine has a physical limit of say 64k open ports, and the clients were connecting to the load balancer, then I couldn't support more than 64k concurrent users.
So how do I:
get the client to connect directly to the web server (rather than the load balancer) when the page loads? Do I simply load the JavaScript from a node, and the load balancers (or whatever) randomly modifies the URL for the script, every time the page is initially requested?
handle a ripple start? The browser will notice that the connection is closed as the web server shuts down. I can write JavaScript code to attempt to reopen the connection, but the node will be gone for a while. So I guess I would have to go back to the load balancer to query the address of the next node to use?
I did wonder about the load balancers sending a redirect on the initial request, so that the browser initially requests www.mydomain.example and gets redirected to www34.mydomain.example. That works quite well, until the node goes down - and sites like Facebook don't do that. How do they do it?
Put a L3 load-balancer that distributes IP packets based on source-IP-port hash to your WebSocket server farm. Since the L3 balancer maintains no state (using hashed source-IP-port) it will scale to wire speed on low-end hardware (say 10GbE). Since the distribution is deterministic (using hashed source-IP-port), it will work with TCP (and hence WebSocket).
Also note that a 64k hard limit only applies to outgoing TCP/IP for a given (source) IP address. It does not apply to incoming TCP/IP. We have tested Autobahn (a high-performance WebSocket server) with 200k active connections on a 2 core, 4GB RAM VM.
Also note that you can do L7 load-balancing on the HTTP path announced during the initial WebSocket handshake. In that case the load balancer has to maintain state (which source IP-port pair is going to which backend node). It will probably scale to millions of connections nevertheless on decent setup.
Disclaimer: I am original author of Autobahn and work for Tavendo.
Note that if your websocket server logic runs on nodejs with socket.io, you can tell socket.io to use a shared redis key/value store for synchronization.
This way you don't even have to care about the load balancer, events will propagate among the server instances.
var io = require('socket.io')(3000);
var redis = require('socket.io-redis');
io.adapter(redis({ host: 'localhost', port: 6379 }));
See: Socket IO - Using multiple nodes
But at some point I guess redis can become the bottleneck...
You can also achieve layer 7 load balancing with inspection and "routing functionality"
See "How to inspect and load-balance WebSockets traffic using Stingray Traffic Manager, and when necessary, how to manage WebSockets and HTTP traffic that is received on the same IP address and port." https://splash.riverbed.com/docs/DOC-1451

Resources