Websockets and scalability - websocket

I am a beginner with websockets.
I have a need in my application where server needs to notify clients when something changes and am planning to use websockets.
Single server instance and single client ==> How many websockets will be created and how many connections to websockets?
Single server instance and 10 clients ==> How many websockets will be created and how many connections to websockets?
Single server instance and 1000 clients ==> How many websockets will be created and how many connections to websockets?
How do you scale with websockets when your application has a 1000’s of user base?
Thanks much for your feedback.

1) Single server instance and single client ==> How many websockets will be created and how many connections to websockets?
If your client creates one webSocket connection, then that's what there will be one webSocket connection on the client and one on the server. It's the client that creates webSocket connections to the server so it is the client that determines how many there will be. If it creates 3, then there will be 3. If it creates 1, then there will be 1. Usually, the client would just create 1.
2) Single server instance and 10 clients ==> How many websockets will be created and how many connections to websockets?
As described above, it depends upon what the client does. If each client creates 1 webSocket connection and there are 10 clients connected to the server, then the server will see a total of 10 webSocket connections.
3) Single server instance and 1000 clients ==> How many websockets will be created and how many connections to websockets?
Same as point #2.
How do you scale with webscokets when your application has a 1000’s of user base?
A single server, configured appropriately can handle hundreds of thousands of simultaneous webSocket connections that are mostly idle since an idle webSocket uses pretty much no server CPU. For even larger scale deployments, one can cluster the server (run multiple server processes) and use sticky load balancing to spread the load.
There are many other articles like these on Google worth reading if you're pursuing large scale webSocket or socket.io deployments:
The Road to 2 Million Websocket Connections in Phoenix
600k concurrent websocket connections on AWS using Node.js
10 million concurrent webSockets
Ultimately, the achievable scale per a properly configured server will likely have more to do with how much activity there is per connection and how much computation is needed to deliver that.

Related

How does AWS Application Load balancer select a target within a target group? How to load balance the websocket traffic?

I have an AWS Application load balancer to distribute the http(s) traffic.
Problem 1:
Suppose I have a target group with 2 EC2 instances: micro and xlarge. Obviously they can handle different traffic levels. Does the load balancer manage traffic proportionally to instance sizes or just round robin? If only round robin is used and no other factors taken into account, then it's not really balancing load, because at some point the micro instance will be suffering from the traffic, while xlarge will starve.
Problem 2:
Suppose I have target group with 2 EC2 instances, both are same size. But my service is not using a classic http request/response flow. It is using HTTP websockets, i.e. a client makes HTTP request just once, to establish a socket, and then keeps the socket open for longer time, sending and receiving messages (e.g. a chat service). Let's suppose my load balancer is using round robin and both EC2 instances have 1000 clients connected each. Now suppose one of the EC2 instances goes down and 1000 connected clients drop their socket connections. The instance gets back up quickly and is ready to accept websocket calls again. The 1000 clients who dropped are trying to reconnect. Now, if the load balancer would use pure round robin, I'll end up with 1500 clients connected to instance #1 and 500 clients connected to instance #2, thus not really balancing the load correctly.
Basically, I'm trying to find out if some more advanced logic is being used to select a target in a group, or is it just a naive round robin selection. If it's round robin only, then how can I really balance the websocket connections load?
Websockets start out as http or https connections, so a load balancer can dispatch them to a server. Once the server accepts the http connection, both the server and the client "upgrade" the connection to use the websocket protocol. They then leave the connection open to use for websocket traffic. As far as the load balancer can tell, the connection is simply a long-lasting http connection.
Taking a server down when it has websocket connections to clients requires your application to retry lost connections. Reconnecting on connection failure is one of the trickiest parts of websocket client programming. Your application cannot be robust without reconnect logic.
AWS's load balancer has no built-in knowledge of the capabilities of the servers behind it. You have observed that it sends requests equally to big and small servers. That can overwhelm the small ones.
I have managed this by building a /healthcheck endpoint in my servers. It's a straightforward https://example.com/heathcheck web page. You can put a little bit of content on the page announcing how many websocket connections are currently open, or anything else. Don't password protect it or require a session to hit it.
My /healthcheck endpoints, whenever hit, measure the server load. I simply use the number of current websocket connections, but you can use any metric you want. I compare the current load to a load threshold configured for each server. For example, on a micro instance I can handle 20 open websockets, and on a production instance I can handle 400.
If the server load is too high, my endpoint gives back a 503 http error status along with its content. 503 typically means "I am overloaded, please try again later." It can also mean "I will shut down when all my connections are closed. Please don't use me for any more connections."
Then I configure the load balancer to perform those health checks every couple of minutes on all the servers in the server pool (AWS calls the pool a "target group"). The health check operation detects "unhealthy" servers and temporarily takes them out of its rotation. (The health check also detects crashed servers, which is good.)
You need this loadbalancer health check for a large-scale production setup.
All that being said, you will get best results if all your server instances in your pool have roughly the same capacity as each other.

How to scale websocket connection load as one adds/removes servers?

To explain the problem:
With HTTP:
Assume there are 100 requests/second arriving.
If, there are 4 servers, the load balancer (LB) can distribute the load across them evenly, 25/second per server
If i add a server (5 servers total), the LB balances it more evenly to now 20/second per server
If i remove a server (3 servers total), the LB decreases load per server to 33.3/second per server
So the load per server is automatically balanced as i add/remove servers, since each connection is so short lived.
With Websockets
Assume there are 100 clients, 2 servers (behind a LB)
The LB initially balances each incoming connection evenly, so each server has 50 connections.
However, if I add a server (3 servers total), the 3rd servers gets 0 connections, since the existing 100 clients are already connected to the 2 servers.
If i remove a server (1 server total), all those 100 connections will reconnect and are now served by 1 server.
Problem
Since websocket connections are persistent, adding/removing a server does not increase/decrease load per server until the clients decide to reconnect.
How does one then efficiently scale websockets and manage load per server?
This is similar to problems the gaming industry has been trying to solve for a long time. That is an area where you have many concurrent connections and you have to have fast communication between many clients.
Options:
Slave/master architecture where master retains connection to slaves to monitor health, load, etc. When someone joins the session/application they ping the master and the master responds with the next server. This is kind of client side load balancing except you are using server side heuristics.
This prevents your clients from blowing up a single server. You'll have to have the client poll the master before establishing the WS connection but that is simple.
This way you can also scale out to multi master if you need to and put them behind load balancers.
If you need to send a message between servers there are many options for that (handle it yourself, queues, etc).
This is how my drawing app Pixmap for Android, which I built last year, works. Works very well too.
Client side load balancing where client connects to a random host name. This is how Watch.ly works. Each host can then be its own load balancer and cluster of servers for safety. Risky but simple.
Traditional load balancing - ie round robin. Hard to beat haproxy. This should be your first approach and will scale to many thousands of concurrent users. Doesn't solve the problem of redistributing load though. One way to solve that with this setup is to push an event to your clients telling them to reconnect (and have each attempt to reconnect with a random timeout so you don't kill your servers).

Maximum concurrent requests that a Self Hosted SignalR server can handle

I had been doing some load testing on SignalR server. According to my test case a Self Hosted SignalR server can handle only 20,000 concurrent requests at a time.
When SignalR has 20,000 open connections, the process consumes about 1.5 GB of RAM (I think that is too much). And when the connections exceed 22,000 the new clients get connection timeout error. Server never runs out of memory, just stops responding to new requests.
I'm aware of Server Farming, and that I can use that it in SignalR using BackPlane, but I'm concerned about Vertical Scaling here. I have achieved 25,000 connections using long polling (async asp.net handlers). I suppose signalR should be able to achieve more concurrent requests as it uses WebSockets.
Is there something I can do to have about 50,000 concurrent connections per node of SignalR? This performance tuning is of no help because I'm using Owin self-hosting. What can I do so that my server application takes less memory per connection?

Does websocket only broadcasts the data to all clients connected instead of sending to a particular client?

I am new to Websockets. While reading about websockets, I am not been able to find answers to some of my doubts. I would like if someone clarifies it.
Does websocket only broadcasts the data to all clients connected instead of sending to a particular client? Whatever example (mainly chat apps) I tried they sends data to all the clients. Is it possible to alter this?
How it works on clients located on NAT (behind router).
Since client server connection will always remain open, how will it affect server performance for large number of connections?
Since I want all my clients to get real time updates, it is required to connect all my clients to server, so how should I handele the client connection limit?
NOTE:- My client is not a Web browser but a desktop application.
No, websocket is not only for broadcasting. You send messages to specific clients, when you broadcast you just send the same message to all connected clients, but you can send different messages to different clients, for example a game session.
The clients connect to the server and initialise the connections, so NAT is not a problem.
It's good to use a scalable server, e.g. an event driven server (e.g. Node.js) that doesn't use a seperate thread for each connection, or an erlang server with lightweight processes (a good choice for a game server).
This should not be a problem if you use a good server OS (e.g. Linux), but may be a limitation if your server uses a desktop version of Windows (e.g. may be limited to 200 connections).

How can I scale socket.io?

Let's say a server gets 10,000 concurrent connections (via socket.io). That's a lot, and if it can't handle any more, I need to spin up another server.
How can I sync the two servers together with their socket.io?
I wouldn't use Cluster to scale Socket.IO. Socket.IO 0.6 is designed as a single process server, and it uses long living connections or polling connections to achieve a real time connection between the server and client.
If you put Cluster infront of your socket.io client you will basically distribute the polling transports between different servers, who are not aware of the client. This will result in broken connections. But also broadcasting to all your clients will be a pain as they are all distributed on different servers and you don't have IPC between them.
So I would only advice to use Cluster if you only use Web Socket & Flash Socket connections and don't need to use the broadcast functionality.
So what should you do?
You could wait until socket.io 0.7 is released which is designed from the ground up to be used on multiple processes.
Or you can use pub/sub to send messages between different servers.
You can try to use for example cluster module and distribute the load to multiple cores (in case you have a multi-core CPU). In case this is not enough you can try to use reverse proxy for distributing requests across multiple servers and redis as a central session data store (if it's possible for your scenario).

Resources