Maximum concurrent requests that a Self Hosted SignalR server can handle - performance

I had been doing some load testing on SignalR server. According to my test case a Self Hosted SignalR server can handle only 20,000 concurrent requests at a time.
When SignalR has 20,000 open connections, the process consumes about 1.5 GB of RAM (I think that is too much). And when the connections exceed 22,000 the new clients get connection timeout error. Server never runs out of memory, just stops responding to new requests.
I'm aware of Server Farming, and that I can use that it in SignalR using BackPlane, but I'm concerned about Vertical Scaling here. I have achieved 25,000 connections using long polling (async asp.net handlers). I suppose signalR should be able to achieve more concurrent requests as it uses WebSockets.
Is there something I can do to have about 50,000 concurrent connections per node of SignalR? This performance tuning is of no help because I'm using Owin self-hosting. What can I do so that my server application takes less memory per connection?

Related

How does AWS Application Load balancer select a target within a target group? How to load balance the websocket traffic?

I have an AWS Application load balancer to distribute the http(s) traffic.
Problem 1:
Suppose I have a target group with 2 EC2 instances: micro and xlarge. Obviously they can handle different traffic levels. Does the load balancer manage traffic proportionally to instance sizes or just round robin? If only round robin is used and no other factors taken into account, then it's not really balancing load, because at some point the micro instance will be suffering from the traffic, while xlarge will starve.
Problem 2:
Suppose I have target group with 2 EC2 instances, both are same size. But my service is not using a classic http request/response flow. It is using HTTP websockets, i.e. a client makes HTTP request just once, to establish a socket, and then keeps the socket open for longer time, sending and receiving messages (e.g. a chat service). Let's suppose my load balancer is using round robin and both EC2 instances have 1000 clients connected each. Now suppose one of the EC2 instances goes down and 1000 connected clients drop their socket connections. The instance gets back up quickly and is ready to accept websocket calls again. The 1000 clients who dropped are trying to reconnect. Now, if the load balancer would use pure round robin, I'll end up with 1500 clients connected to instance #1 and 500 clients connected to instance #2, thus not really balancing the load correctly.
Basically, I'm trying to find out if some more advanced logic is being used to select a target in a group, or is it just a naive round robin selection. If it's round robin only, then how can I really balance the websocket connections load?
Websockets start out as http or https connections, so a load balancer can dispatch them to a server. Once the server accepts the http connection, both the server and the client "upgrade" the connection to use the websocket protocol. They then leave the connection open to use for websocket traffic. As far as the load balancer can tell, the connection is simply a long-lasting http connection.
Taking a server down when it has websocket connections to clients requires your application to retry lost connections. Reconnecting on connection failure is one of the trickiest parts of websocket client programming. Your application cannot be robust without reconnect logic.
AWS's load balancer has no built-in knowledge of the capabilities of the servers behind it. You have observed that it sends requests equally to big and small servers. That can overwhelm the small ones.
I have managed this by building a /healthcheck endpoint in my servers. It's a straightforward https://example.com/heathcheck web page. You can put a little bit of content on the page announcing how many websocket connections are currently open, or anything else. Don't password protect it or require a session to hit it.
My /healthcheck endpoints, whenever hit, measure the server load. I simply use the number of current websocket connections, but you can use any metric you want. I compare the current load to a load threshold configured for each server. For example, on a micro instance I can handle 20 open websockets, and on a production instance I can handle 400.
If the server load is too high, my endpoint gives back a 503 http error status along with its content. 503 typically means "I am overloaded, please try again later." It can also mean "I will shut down when all my connections are closed. Please don't use me for any more connections."
Then I configure the load balancer to perform those health checks every couple of minutes on all the servers in the server pool (AWS calls the pool a "target group"). The health check operation detects "unhealthy" servers and temporarily takes them out of its rotation. (The health check also detects crashed servers, which is good.)
You need this loadbalancer health check for a large-scale production setup.
All that being said, you will get best results if all your server instances in your pool have roughly the same capacity as each other.

How to scale websocket connection load as one adds/removes servers?

To explain the problem:
With HTTP:
Assume there are 100 requests/second arriving.
If, there are 4 servers, the load balancer (LB) can distribute the load across them evenly, 25/second per server
If i add a server (5 servers total), the LB balances it more evenly to now 20/second per server
If i remove a server (3 servers total), the LB decreases load per server to 33.3/second per server
So the load per server is automatically balanced as i add/remove servers, since each connection is so short lived.
With Websockets
Assume there are 100 clients, 2 servers (behind a LB)
The LB initially balances each incoming connection evenly, so each server has 50 connections.
However, if I add a server (3 servers total), the 3rd servers gets 0 connections, since the existing 100 clients are already connected to the 2 servers.
If i remove a server (1 server total), all those 100 connections will reconnect and are now served by 1 server.
Problem
Since websocket connections are persistent, adding/removing a server does not increase/decrease load per server until the clients decide to reconnect.
How does one then efficiently scale websockets and manage load per server?
This is similar to problems the gaming industry has been trying to solve for a long time. That is an area where you have many concurrent connections and you have to have fast communication between many clients.
Options:
Slave/master architecture where master retains connection to slaves to monitor health, load, etc. When someone joins the session/application they ping the master and the master responds with the next server. This is kind of client side load balancing except you are using server side heuristics.
This prevents your clients from blowing up a single server. You'll have to have the client poll the master before establishing the WS connection but that is simple.
This way you can also scale out to multi master if you need to and put them behind load balancers.
If you need to send a message between servers there are many options for that (handle it yourself, queues, etc).
This is how my drawing app Pixmap for Android, which I built last year, works. Works very well too.
Client side load balancing where client connects to a random host name. This is how Watch.ly works. Each host can then be its own load balancer and cluster of servers for safety. Risky but simple.
Traditional load balancing - ie round robin. Hard to beat haproxy. This should be your first approach and will scale to many thousands of concurrent users. Doesn't solve the problem of redistributing load though. One way to solve that with this setup is to push an event to your clients telling them to reconnect (and have each attempt to reconnect with a random timeout so you don't kill your servers).

Using HTTP2 how can I limit the number of concurrent requests?

We have a system client <-> server working over HTTP1.1. The client is making hundreds (sometimes thousands) of concurrent requests to the server.
Because the default limitations of the browsers to HTTP1.1 connections the client is actually making these requests in batches of (6 ~ 8) concurrent requests, we think we can get some performance improvement if we can increase the number of concurrent requests.
We moved the system to work over HTTP2 and we see the client requesting all the requests simultaneously as we wanted.
The problem now is the opposite: the server can not handle so many concurrent requests.
How can we limit the number of concurrent request the Client is doing simultaneous to something more manageable for the server? let's say 50 ~ 100 concurrent requests.
We were assuming that HTTP2 can allow us to graduate the number of concurrent
connections:
https://developers.google.com/web/fundamentals/performance/http2/
With HTTP/2 the client remains in full control of how server push is
used. The client can limit the number of concurrently pushed streams;
adjust the initial flow control window to control how much data is
pushed when the stream is first opened; or disable server push
entirely. These preferences are communicated via the SETTINGS frames
at the beginning of the HTTP/2 connection and may be updated at any
time.
Also here:
https://stackoverflow.com/a/36847527/316700
O maybe, if possible, we can limit this in the server side (what I think is more maintainable).
But looks like these solutions are talking about Server Push and what we have is the Client Pulling.
In case help in any way our architecture looks like this:
Client ==[http 2]==> ALB(AWS Beanstalk) ==[http 1.1]==> nginx ==[http 1.0]==> Puma
There is special setting in SETTINGS frame
You can specify SETTINGS_MAX_CONCURRENT_STREAMS to 100 on the server side
Reference

Websockets and scalability

I am a beginner with websockets.
I have a need in my application where server needs to notify clients when something changes and am planning to use websockets.
Single server instance and single client ==> How many websockets will be created and how many connections to websockets?
Single server instance and 10 clients ==> How many websockets will be created and how many connections to websockets?
Single server instance and 1000 clients ==> How many websockets will be created and how many connections to websockets?
How do you scale with websockets when your application has a 1000’s of user base?
Thanks much for your feedback.
1) Single server instance and single client ==> How many websockets will be created and how many connections to websockets?
If your client creates one webSocket connection, then that's what there will be one webSocket connection on the client and one on the server. It's the client that creates webSocket connections to the server so it is the client that determines how many there will be. If it creates 3, then there will be 3. If it creates 1, then there will be 1. Usually, the client would just create 1.
2) Single server instance and 10 clients ==> How many websockets will be created and how many connections to websockets?
As described above, it depends upon what the client does. If each client creates 1 webSocket connection and there are 10 clients connected to the server, then the server will see a total of 10 webSocket connections.
3) Single server instance and 1000 clients ==> How many websockets will be created and how many connections to websockets?
Same as point #2.
How do you scale with webscokets when your application has a 1000’s of user base?
A single server, configured appropriately can handle hundreds of thousands of simultaneous webSocket connections that are mostly idle since an idle webSocket uses pretty much no server CPU. For even larger scale deployments, one can cluster the server (run multiple server processes) and use sticky load balancing to spread the load.
There are many other articles like these on Google worth reading if you're pursuing large scale webSocket or socket.io deployments:
The Road to 2 Million Websocket Connections in Phoenix
600k concurrent websocket connections on AWS using Node.js
10 million concurrent webSockets
Ultimately, the achievable scale per a properly configured server will likely have more to do with how much activity there is per connection and how much computation is needed to deliver that.

SignalR: Concurrent connections on Windows 7

I have a self-hosted WebAPI which uses SignalR. The process is running on a client OS (Windows 7). Let's assume most clients fall back to long-polling. How many clients can I roughly have? I'm asking because I read Microsoft keeps concurrent connections artificially low on client OS (the article is about IIS though)...
In this regard does it matter whether or not clients are using long-polling or WebSockets?
I don't have a definitive answer regarding request limits for self-hosting on Win 7, but I can say that there can be a difference between long-polling and WebSockets since with long-polling, you can use up several requests at a time, while the WebSockets connection always uses one.
I suspect that the IIS connection limits are not enforced when self-hosting with OWIN, so you'd deal with the ASP.NET limits (defaulting at 5000 concurrent requests per CPU) which you can read about here and here.
Since long-polling can use a variable number of concurrent requests, it's not possible to say how many clients you can have exactly; it depends on what the clients do. So you'll have to estimate or measure based on typical usage scenarios.

Resources