IIs internal load balancer does not distribute requests evenly - performance

We have windows server 2019.
On IIS server, we have 10 workers.
IIS internal load balancer does not distribute requests evenly
It means that when IIS received 20 requests, It puts 10 requests on worker 1 and 10 requests on worker 2 and the other 8 workers are idle.
I expected that each worker has 2 requests.
Please help me to correct the config, if exist.
Thanks

Related

How does AWS Application Load balancer select a target within a target group? How to load balance the websocket traffic?

I have an AWS Application load balancer to distribute the http(s) traffic.
Problem 1:
Suppose I have a target group with 2 EC2 instances: micro and xlarge. Obviously they can handle different traffic levels. Does the load balancer manage traffic proportionally to instance sizes or just round robin? If only round robin is used and no other factors taken into account, then it's not really balancing load, because at some point the micro instance will be suffering from the traffic, while xlarge will starve.
Problem 2:
Suppose I have target group with 2 EC2 instances, both are same size. But my service is not using a classic http request/response flow. It is using HTTP websockets, i.e. a client makes HTTP request just once, to establish a socket, and then keeps the socket open for longer time, sending and receiving messages (e.g. a chat service). Let's suppose my load balancer is using round robin and both EC2 instances have 1000 clients connected each. Now suppose one of the EC2 instances goes down and 1000 connected clients drop their socket connections. The instance gets back up quickly and is ready to accept websocket calls again. The 1000 clients who dropped are trying to reconnect. Now, if the load balancer would use pure round robin, I'll end up with 1500 clients connected to instance #1 and 500 clients connected to instance #2, thus not really balancing the load correctly.
Basically, I'm trying to find out if some more advanced logic is being used to select a target in a group, or is it just a naive round robin selection. If it's round robin only, then how can I really balance the websocket connections load?
Websockets start out as http or https connections, so a load balancer can dispatch them to a server. Once the server accepts the http connection, both the server and the client "upgrade" the connection to use the websocket protocol. They then leave the connection open to use for websocket traffic. As far as the load balancer can tell, the connection is simply a long-lasting http connection.
Taking a server down when it has websocket connections to clients requires your application to retry lost connections. Reconnecting on connection failure is one of the trickiest parts of websocket client programming. Your application cannot be robust without reconnect logic.
AWS's load balancer has no built-in knowledge of the capabilities of the servers behind it. You have observed that it sends requests equally to big and small servers. That can overwhelm the small ones.
I have managed this by building a /healthcheck endpoint in my servers. It's a straightforward https://example.com/heathcheck web page. You can put a little bit of content on the page announcing how many websocket connections are currently open, or anything else. Don't password protect it or require a session to hit it.
My /healthcheck endpoints, whenever hit, measure the server load. I simply use the number of current websocket connections, but you can use any metric you want. I compare the current load to a load threshold configured for each server. For example, on a micro instance I can handle 20 open websockets, and on a production instance I can handle 400.
If the server load is too high, my endpoint gives back a 503 http error status along with its content. 503 typically means "I am overloaded, please try again later." It can also mean "I will shut down when all my connections are closed. Please don't use me for any more connections."
Then I configure the load balancer to perform those health checks every couple of minutes on all the servers in the server pool (AWS calls the pool a "target group"). The health check operation detects "unhealthy" servers and temporarily takes them out of its rotation. (The health check also detects crashed servers, which is good.)
You need this loadbalancer health check for a large-scale production setup.
All that being said, you will get best results if all your server instances in your pool have roughly the same capacity as each other.

How to scale websocket connection load as one adds/removes servers?

To explain the problem:
With HTTP:
Assume there are 100 requests/second arriving.
If, there are 4 servers, the load balancer (LB) can distribute the load across them evenly, 25/second per server
If i add a server (5 servers total), the LB balances it more evenly to now 20/second per server
If i remove a server (3 servers total), the LB decreases load per server to 33.3/second per server
So the load per server is automatically balanced as i add/remove servers, since each connection is so short lived.
With Websockets
Assume there are 100 clients, 2 servers (behind a LB)
The LB initially balances each incoming connection evenly, so each server has 50 connections.
However, if I add a server (3 servers total), the 3rd servers gets 0 connections, since the existing 100 clients are already connected to the 2 servers.
If i remove a server (1 server total), all those 100 connections will reconnect and are now served by 1 server.
Problem
Since websocket connections are persistent, adding/removing a server does not increase/decrease load per server until the clients decide to reconnect.
How does one then efficiently scale websockets and manage load per server?
This is similar to problems the gaming industry has been trying to solve for a long time. That is an area where you have many concurrent connections and you have to have fast communication between many clients.
Options:
Slave/master architecture where master retains connection to slaves to monitor health, load, etc. When someone joins the session/application they ping the master and the master responds with the next server. This is kind of client side load balancing except you are using server side heuristics.
This prevents your clients from blowing up a single server. You'll have to have the client poll the master before establishing the WS connection but that is simple.
This way you can also scale out to multi master if you need to and put them behind load balancers.
If you need to send a message between servers there are many options for that (handle it yourself, queues, etc).
This is how my drawing app Pixmap for Android, which I built last year, works. Works very well too.
Client side load balancing where client connects to a random host name. This is how Watch.ly works. Each host can then be its own load balancer and cluster of servers for safety. Risky but simple.
Traditional load balancing - ie round robin. Hard to beat haproxy. This should be your first approach and will scale to many thousands of concurrent users. Doesn't solve the problem of redistributing load though. One way to solve that with this setup is to push an event to your clients telling them to reconnect (and have each attempt to reconnect with a random timeout so you don't kill your servers).

Can application sessions share F5 load balancer connections?

I understand that two sessions cannot use a connection at the exact same time. But I had thought it was possible for multiple sessions to share the same connection or pipe, similar in principle to a threaded application, where processing of execution is time-sliced.
The reason I bring this up is I'm somewhat perplexed by how the F5 load balancer manages connections, in relation to application sessions. I have a web server which connects to the F5, which load balances 2 application servers:
Client (i.e. laptop) --> Web Server --> F5 Load balancer for app servers --> (app server 1, app server 2)
I was looking at the number of connections on the F5 load balancer the other day and it showed 7 connections to app server 1 and 3 connections to app server 2; 10 total connections. But the actual application had hundreds of sessions. How could this be? If there are 1000 sessions, let's say, that averages out to 1 connection per 100 sessions. Something doesn't add up here, I'm thinking.
Can and does the F5 load balancer distinguish between inbound connections and outbound connections? If so, how can I see both inbound and outbound connection? I'm thinking that, perhaps, there are 10 inbound connections from the web server to the load balancer and 1000 outbound connections (because 1000 sessions) to the app servers.
I'm thinking it should be possible to queue or share multiple sessions per connections but maybe that's not how it works, particularly with load balancers. Any help making sense of all of this would be most appreciated. Thank you.
If you were using the OneConnect feature, this is exactly what it's intended for. The BIG-IP manages an internal session state for these connections and can reuse and maintain server-side sessions for multiple external connections.
Useful for potentailly bulky applications but can cause issues if you have internal applications that reuse unique keys for session state (Java is a good example).
SOL7208 - Overview of the Oneconnect Profile

If the number of requests are huge, can load balancer cause the issue while sending responses to respective clients?

I do have architecture of a Load balancer followed by two Web Application server and Database, I am hitting thousands of HTTP requests to the server from Jmeter distributed testing environment.
At the time of getting response back, few request does not get response back from the server.
I checked Database logs, 100 % requests were responded.
Checked with Web Application servers access logs, 100 % requests were responded.
Can Load balancer cause the damage traversing these pending responses to the respective clients?
Every time different different request are getting stuck.
Thanks in Advance!!
If you suspect load balancer, look at 3 typical causes first:
Server takes longer to respond than load balancer is waiting
Client has shorter timeout than it takes for server to respond.
Port/thread/connection exhaustion on load balancer, or other LB configuration problems
In all three cases, I suggest looking at the load balancer logs. Since you didn't specify which LB you are using, I cannot say exactly how the log looks, but typically LB log gives you option to see:
How long it took for a request to be sent to a web server and for the response from the web server to return to load balancer. You can them compare those numbers to timeouts configured for load balancer and the client (problem 1 and 2).
How long it took for a request from the client to be processed by LB and how long LB took to respond to a client. If it takes long, then something is not right with load balancer (problem 3)
And then of course if you have any errors on load balancer, they may just explain what's going on.
If you cannot review logs for load balancer, I suggest changing your JMeter test temporarily to target servers behind load balancer directly. You can even configure your script to evenly distribute load between all servers (for example by using multiple thread groups). That would allow you to isolate the problem, and get more information on what's going on.

Maximum concurrent requests that a Self Hosted SignalR server can handle

I had been doing some load testing on SignalR server. According to my test case a Self Hosted SignalR server can handle only 20,000 concurrent requests at a time.
When SignalR has 20,000 open connections, the process consumes about 1.5 GB of RAM (I think that is too much). And when the connections exceed 22,000 the new clients get connection timeout error. Server never runs out of memory, just stops responding to new requests.
I'm aware of Server Farming, and that I can use that it in SignalR using BackPlane, but I'm concerned about Vertical Scaling here. I have achieved 25,000 connections using long polling (async asp.net handlers). I suppose signalR should be able to achieve more concurrent requests as it uses WebSockets.
Is there something I can do to have about 50,000 concurrent connections per node of SignalR? This performance tuning is of no help because I'm using Owin self-hosting. What can I do so that my server application takes less memory per connection?

Resources