If the number of requests are huge, can load balancer cause the issue while sending responses to respective clients? - performance

I do have architecture of a Load balancer followed by two Web Application server and Database, I am hitting thousands of HTTP requests to the server from Jmeter distributed testing environment.
At the time of getting response back, few request does not get response back from the server.
I checked Database logs, 100 % requests were responded.
Checked with Web Application servers access logs, 100 % requests were responded.
Can Load balancer cause the damage traversing these pending responses to the respective clients?
Every time different different request are getting stuck.
Thanks in Advance!!

If you suspect load balancer, look at 3 typical causes first:
Server takes longer to respond than load balancer is waiting
Client has shorter timeout than it takes for server to respond.
Port/thread/connection exhaustion on load balancer, or other LB configuration problems
In all three cases, I suggest looking at the load balancer logs. Since you didn't specify which LB you are using, I cannot say exactly how the log looks, but typically LB log gives you option to see:
How long it took for a request to be sent to a web server and for the response from the web server to return to load balancer. You can them compare those numbers to timeouts configured for load balancer and the client (problem 1 and 2).
How long it took for a request from the client to be processed by LB and how long LB took to respond to a client. If it takes long, then something is not right with load balancer (problem 3)
And then of course if you have any errors on load balancer, they may just explain what's going on.
If you cannot review logs for load balancer, I suggest changing your JMeter test temporarily to target servers behind load balancer directly. You can even configure your script to evenly distribute load between all servers (for example by using multiple thread groups). That would allow you to isolate the problem, and get more information on what's going on.

Related

How does AWS Application Load balancer select a target within a target group? How to load balance the websocket traffic?

I have an AWS Application load balancer to distribute the http(s) traffic.
Problem 1:
Suppose I have a target group with 2 EC2 instances: micro and xlarge. Obviously they can handle different traffic levels. Does the load balancer manage traffic proportionally to instance sizes or just round robin? If only round robin is used and no other factors taken into account, then it's not really balancing load, because at some point the micro instance will be suffering from the traffic, while xlarge will starve.
Problem 2:
Suppose I have target group with 2 EC2 instances, both are same size. But my service is not using a classic http request/response flow. It is using HTTP websockets, i.e. a client makes HTTP request just once, to establish a socket, and then keeps the socket open for longer time, sending and receiving messages (e.g. a chat service). Let's suppose my load balancer is using round robin and both EC2 instances have 1000 clients connected each. Now suppose one of the EC2 instances goes down and 1000 connected clients drop their socket connections. The instance gets back up quickly and is ready to accept websocket calls again. The 1000 clients who dropped are trying to reconnect. Now, if the load balancer would use pure round robin, I'll end up with 1500 clients connected to instance #1 and 500 clients connected to instance #2, thus not really balancing the load correctly.
Basically, I'm trying to find out if some more advanced logic is being used to select a target in a group, or is it just a naive round robin selection. If it's round robin only, then how can I really balance the websocket connections load?
Websockets start out as http or https connections, so a load balancer can dispatch them to a server. Once the server accepts the http connection, both the server and the client "upgrade" the connection to use the websocket protocol. They then leave the connection open to use for websocket traffic. As far as the load balancer can tell, the connection is simply a long-lasting http connection.
Taking a server down when it has websocket connections to clients requires your application to retry lost connections. Reconnecting on connection failure is one of the trickiest parts of websocket client programming. Your application cannot be robust without reconnect logic.
AWS's load balancer has no built-in knowledge of the capabilities of the servers behind it. You have observed that it sends requests equally to big and small servers. That can overwhelm the small ones.
I have managed this by building a /healthcheck endpoint in my servers. It's a straightforward https://example.com/heathcheck web page. You can put a little bit of content on the page announcing how many websocket connections are currently open, or anything else. Don't password protect it or require a session to hit it.
My /healthcheck endpoints, whenever hit, measure the server load. I simply use the number of current websocket connections, but you can use any metric you want. I compare the current load to a load threshold configured for each server. For example, on a micro instance I can handle 20 open websockets, and on a production instance I can handle 400.
If the server load is too high, my endpoint gives back a 503 http error status along with its content. 503 typically means "I am overloaded, please try again later." It can also mean "I will shut down when all my connections are closed. Please don't use me for any more connections."
Then I configure the load balancer to perform those health checks every couple of minutes on all the servers in the server pool (AWS calls the pool a "target group"). The health check operation detects "unhealthy" servers and temporarily takes them out of its rotation. (The health check also detects crashed servers, which is good.)
You need this loadbalancer health check for a large-scale production setup.
All that being said, you will get best results if all your server instances in your pool have roughly the same capacity as each other.

AWS ALB returning 502 without any log entries

We're using node js backend servers running in AWS ECS, behind an ALB. We then have AWS API gateway with a proxy lambda calling the ALB. This has been running in production for months, when suddenly a few days ago we started seeing 502 errors from some API calls.
I've checked the proxy lambda logs to see that the 502 is returned from the ALB. However, when I check my node application logs, there are no failing requests, in fact no requests seem to have reached the application at these timestamps. I then enabled access logs on the ALB, which only shows 200/201 responses - no 5xx whatsoever. I'm now a bit confused as to where to look next. What could cause my ALB to return 502 without this being present in the ALB access logs? And what could cause the requests to not reach my node app in ECS? Does anyone have any idea on what logs to check next or what to do to pinpoint the errors? Could some layer within ECS cause those symptoms? I can't see any errors in my docker containers or anything.
It seems to happen in bursts, up to 50 failed requests within a period of time, then all ok for several hours.
It could be due to a number of reasons. The below may be applicable to you -
The load balancer received a TCP RST from the target when attempting
to establish a connection.
The load balancer received an unexpected response from the target,
such as "ICMP Destination unreachable (Host unreachable)", when
attempting to establish a connection. Check whether traffic is allowed
from the load balancer subnets to the targets on the target port.
The target closed the connection with a TCP RST or a TCP FIN while the
load balancer had an outstanding request to the target. Check whether
the keep-alive duration of the target is shorter than the idle timeout
value of the load balancer.
The target response is malformed or contains HTTP headers that are not
valid.
The load balancer encountered an SSL handshake error or SSL handshake
timeout (10 seconds) when connecting to a target.
reference docs
This turned out to be memory leaks in my container applications. The RAM usage grew with every request until crash. At that point it took a while for ECS and ALB to react, so a bunch of requests were routed to the dead instance.
The problem was resolved by fixing the leak, but I'd have wanted better built in support for alarms on high memory usage from ECS/cloudwatch with triggers to replace instances on high usage gracefully. Seems i have to build that from scratch.

Google Cloud Platform - load balancer websocket keep disconnecting after few seconds

We are using 2 servers and have setup load balancer to redirect the trafic. Both servers are Compute engines.
We are also using websocket (socket.io) to keep the connection between users (online and offline status). When connection is established between users, it gets disconnected after few seconds. We concluded that it is load balancer configuration issue as if we use single server (without load balancer), connection remains alive until user goes offline.
We need help here if we need to do anything extra in load balancer configurations to work it smoothly with websocket.
Using ip addresses, not domain name (if that makes any difference)

websockets with load balancer scalability

I use a load balancer with my web site. The browser initiates a websocket connection to my app server. Does the open connection consume any resources on the LB or is it direct between the browser and the app server? If there is something open on the LB isn't it a bottleneck? I mean if my LB can handle X open connections then the X+1 user could not even open a connection.
It depends!
The most efficient load balancers listen for requests, do some analysis, then forward the requests; all the bits do not travel through the load balancer. The network forwarding happens at a lower network layer than http (e.g., it is not an http 302 redirect - the client never knows it happened, maintaining privacy around internal network configuration - this happens at OSI Level 4 I think).
However, some load balancers add more features, like acting as SSL endpoints or applying gzip compression. In these cases, they are processing bits as they pass through (encrypt/decrypt or compress in this case).
A picture may help. Compare the first diagram with the second & third here, noting redirection in the first that is absent in the others.

When would you need multiple servers to host one web application?

Is that called "clustering" of servers? When a web request is sent, does it go through the main server, and if the main server can't handle the extra load, then it forwards it to the secondary servers that can handle the load? Also, is one "server" that's up and running the application called an "instance"?
[...] Is that called "clustering" of servers?
Clustering is indeed using transparently multiple nodes that are seen as a unique entity: the cluster. Clustering allows you to scale: you can spread your load on all the nodes and, if you need more power, you can add more nodes (short version). Clustering allows you to be fault tolerant: if one node (physical or logical) goes down, other nodes can still process requests and your service remains available (short version).
When a web request is sent, does it go through the main server, and if the main server can't handle the extra load, then it forwards it to the secondary servers that can handle the load?
In general, this is the job of a dedicated component called a "load balancer" (hardware, software) that can use many algorithms to balance the request: round-robin, FIFO, LIFO, load based...
In the case of EC2, you previously had to load balance with round-robin DNS and/or HA Proxy. See Introduction to Software Load Balancing with Amazon EC2. But for some time now, Amazon has launched load balancing and auto-scaling (beta) as part of their EC2 offerings. See Elastic Load Balancing.
Also, is one "server" that's up and running the application called an "instance"?
Actually, an instance can be many things (depending of who's speaking): a machine, a virtual machine, a server (software) up and running, etc.
In the case of EC2, you might want to read Amazon EC2 Instance Types.
Here is a real example:
This specific configuration is hosted at RackSpace in their Managed Colo group.
Requests pass through a Cisco Firewall. They are then routed across a Gigabit LAN to a Cisco CSS 11501 Content Services Switch (eg Load Balancer). The Load Balancer matches the incoming content to a content rule, handles the SSL decryption if necessary, and then forwards the traffic to one of several back-end web servers.
Each 5 seconds, the load balancer requests a URL on each webserver. If the webserver fails (two times in a row, IIRC) to respond with the correct value, that server is not sent any traffic until the URL starts responding correctly.
Further behind the webservers is a MySQL master / slave configuration. Connections may be mad to the master (for transactions) or to the slaves for read only requests.
Memcached is installed on each of the webservers, with 1 GB of ram dedicated to caching. Each web application may utilize the cluster of memcache servers to cache all kinds of content.
Deployment is handled using rsync to sync specific directories on a management server out to each webserver. Apache restarts, etc.. are handled through similar scripting over ssh from the management server.
The amount of traffic that can be handled through this configuration is significant. The advantages of easy scaling and easy maintenance are great as well.
For clustering, any web request would be handled by a load balancer, which being updated as to the current loads of the server forming the cluster, sends the request to the least burdened server. As for if it's an instance.....I believe so but I'd wait for confirmation first on that.
You'd' need a very large application to be bothered with thinking about clustering and the "fun" that comes with it software and hardware wise, though. Unless you're looking to start or are already running something big, it wouldn't' be anything to worry about.
Yes, it can be required for clustering. Typically as the load goes up you might find yourself with a frontend server that does url rewriting, https if required and caching with squid say. The requests get passed on to multiple backend servers - probably using cookies to associate a session with a particular backend if necessary. You might have the database on a separate server also.
I should add that there are other reasons why you might need multiple servers, for instance there may be a requirement that the database is not on the frontend server for security reasons

Resources