I am working on a spring boot application.
I want to know how I can place load balancer in front of an application so that to distribute load across some number of servers.
I googled and found that there are some Netflix API like Eureka, Hystrix, Ribbon and Archaius that will help accomplish laod balancing job.
But could not found how these terminologies helps to distribute request and balance load at the same time provide high reliability and availability across all users accessing particular service.
I am going though all these but can not find out entry point to startup.
Actually I am not getting from where to start.
You can use HAProxy
You can run it on your server with your own configuration file, for example:
global
daemon
maxconn 256
defaults
mode tcp
timeout connect 5000ms
listen http-in
timeout client 180s
timeout server 180s
bind 127.0.0.1:80
server server1 157.166.226.27:8080 maxconn 32 check
server server2 157.166.226.28:8080 maxconn 32 check
server server3 157.166.226.29:8080 maxconn 32 check
server server4 157.166.226.30:8080 maxconn 32 check
server server5 157.166.226.31:8080 maxconn 32 check
server server6 157.166.226.32:8080 maxconn 32 check
This will allow every http request arriving on port 80 of local host to be distributed across listed servers, using round robin algorithm. For details, please see HAProxy documentation.
Understanding that your application is offering REST services I suggest you do not pursue looking into Netflix API. It is great but it will not help you for your use case. I suggest you have a look at ha-proxy, nginx or httpd for simple load balancing capabilities. Good part is that you don't have to look into session stickiness since REST is stateless per default.
Related
I have an AWS Application load balancer to distribute the http(s) traffic.
Problem 1:
Suppose I have a target group with 2 EC2 instances: micro and xlarge. Obviously they can handle different traffic levels. Does the load balancer manage traffic proportionally to instance sizes or just round robin? If only round robin is used and no other factors taken into account, then it's not really balancing load, because at some point the micro instance will be suffering from the traffic, while xlarge will starve.
Problem 2:
Suppose I have target group with 2 EC2 instances, both are same size. But my service is not using a classic http request/response flow. It is using HTTP websockets, i.e. a client makes HTTP request just once, to establish a socket, and then keeps the socket open for longer time, sending and receiving messages (e.g. a chat service). Let's suppose my load balancer is using round robin and both EC2 instances have 1000 clients connected each. Now suppose one of the EC2 instances goes down and 1000 connected clients drop their socket connections. The instance gets back up quickly and is ready to accept websocket calls again. The 1000 clients who dropped are trying to reconnect. Now, if the load balancer would use pure round robin, I'll end up with 1500 clients connected to instance #1 and 500 clients connected to instance #2, thus not really balancing the load correctly.
Basically, I'm trying to find out if some more advanced logic is being used to select a target in a group, or is it just a naive round robin selection. If it's round robin only, then how can I really balance the websocket connections load?
Websockets start out as http or https connections, so a load balancer can dispatch them to a server. Once the server accepts the http connection, both the server and the client "upgrade" the connection to use the websocket protocol. They then leave the connection open to use for websocket traffic. As far as the load balancer can tell, the connection is simply a long-lasting http connection.
Taking a server down when it has websocket connections to clients requires your application to retry lost connections. Reconnecting on connection failure is one of the trickiest parts of websocket client programming. Your application cannot be robust without reconnect logic.
AWS's load balancer has no built-in knowledge of the capabilities of the servers behind it. You have observed that it sends requests equally to big and small servers. That can overwhelm the small ones.
I have managed this by building a /healthcheck endpoint in my servers. It's a straightforward https://example.com/heathcheck web page. You can put a little bit of content on the page announcing how many websocket connections are currently open, or anything else. Don't password protect it or require a session to hit it.
My /healthcheck endpoints, whenever hit, measure the server load. I simply use the number of current websocket connections, but you can use any metric you want. I compare the current load to a load threshold configured for each server. For example, on a micro instance I can handle 20 open websockets, and on a production instance I can handle 400.
If the server load is too high, my endpoint gives back a 503 http error status along with its content. 503 typically means "I am overloaded, please try again later." It can also mean "I will shut down when all my connections are closed. Please don't use me for any more connections."
Then I configure the load balancer to perform those health checks every couple of minutes on all the servers in the server pool (AWS calls the pool a "target group"). The health check operation detects "unhealthy" servers and temporarily takes them out of its rotation. (The health check also detects crashed servers, which is good.)
You need this loadbalancer health check for a large-scale production setup.
All that being said, you will get best results if all your server instances in your pool have roughly the same capacity as each other.
I'm running a kubernetes application on GKE, which serves HTTP requests on port 80 and websocket on port 8080.
Now, HTTP part needs to know client's IP address, so I have to use HTTP load balancer as ingress service. Websocket part then has to use TCP load balancer, as it's clearly stated in docs that HTTP LB doesn't support it.
I got them both working, but on different IPs, and I need to have them on one.
I would expect that there is something like iptables on GCE, so I could forward traffic from port 80 to HTTP LB, and from 8080 to TCP LB, but I can't find anything like that. Anything including forwarding allows only one them.
I guess I could have one instance with nginx/HAproxy doing only this, but that seems like an overkill
Appreciate any help!
There's not a great answer to this right now. Ingress objects are really HTTP only right now, and we don't really support multiple grades of ingress in a single cluster (though we want to).
GCE's HTTP LB doesn't do websockets yet.
Services have a flaw in that they lose the client IP (we are working on that). Even once we solve this, you won't be able to use GCE's L7 balancer because of the extra port you need.
The best workaround I can think of, and has been used by a number of users until we preserve source IP, is this:
Run your own haproxy or nginx or even your own app as a Daemonset on some or all nodes (label controlled) with HostPorts.
Run a GCE Network LB (outside of Kubernetes) pointing at the nodes with HostPorts.
Once we can properly preserve external IPs, you can turn this back into a plain Service.
I'm considering to use Gwan for a backend game server. Although Gwan can handle lot of requests, I would want to make it scalable automatically. Gwan has elastic load balancer. Are there examples on how should that be setup at code/deployment?
imo, load balancing is a function of the cloud or data center you are working with and not gwan.
in microsoft's azure which is good as it offers linux VMs you set up an endpoint (which is essentially a port like 8080) as a load-balanced endpoint that terminates to the port on each VM.
set up your gwan to port 8080.
set up a loadbalanced endpoint on port 8080.
point the loadbalancer to the gwan port of 8080.
clients then hold sessions with either vm1 or vm2.
auto-scaling is a function of azure's availability set.
im sure a similar process is offered on Amazon and Rackspace.
Say I want to run something like the nyan cat telnet server (http://miku.acm.uiuc.edu/) and I need to handle 10,000 concurrent connections total. I have 10 servers in addition to a load balancer. Each server can handle 1,000 concurrent connections, and I want to put a load balancer in front of it to randomly divide the traffic to the 10 servers.
From what I've read, it's fairly simple for a load balancer to pass an HTTP request (along with the client IP) to the backend server, perhaps with FastCGI or with an X- header.
What would be the simplest way for the load balancer to pass the client IP to the backend server in this case with a simple TCP server? Would a hardware load balancer be needed, or are there ways to do this simply through software?
In other words, is there a uniform way to pass client IP when load balancing for non-HTTP stuff? The same way Google gets client IP when they load-balances Google Talk XMPP server or their Gmail IMAP server
This isn't for anything in specific; I'm just curious about if and how it can be done. Thanks in advance!
The simplest way would be for the load balancer to make itself completely invisible and pass the connection on with the source and destination IP address unmolested. For this to work, the same IP address must be assigned (as a loopback address, not to a physical interface) to all 10 servers and that would be the IP address the clients connect to. Internet traffic to that IP address has to go to the load balancer. The load balancer must be the default gateway for the servers.
I want to stop serving requests to my back end servers if the load on those servers goes above a certain level. Anyone who is already surfing the site will still get routed but new connection will be sent to a static server busy page until the load drops below a pre determined level.
I can use cookies to let the current customers in but I can't find information on how to to routing based on a custom load metric.
Can anyone point me in the right direction?
Nginx has an HTTP Upstream module for load balancing. Checking the responsiveness of the backend servers is done with the max_fails and fail_timeout options. Routing to an alternate page when no backends are available is done with the backup option. I recommend translating your load metrics into the options that Nginx supplies.
Let's say though that Nginx is still seeing the backend as being "up" when the load is higher than you want. You may be able to adjust that further by tuning the max connections of the backend servers. So, maybe the backend servers can only handle 5 connections before the load is too high, so you tune it only allow 5 connections. Then on the front-end, Nginx will time-out immediately when trying to send a sixth connection, and mark that server as inoperative.
Another option is to handle this outside of Nginx. Software like Nagios can not only monitor load, but can also proactively trigger actions based on the monitor it does.
You can generate your Nginx configs from a template that has options to mark each upstream node as up or down. When a monitor detects that the upstream load is too high, it could re-generate the Nginx config from the template as appropriate and then reload Nginx.
A lightweight version of the same idea could done with a script that runs on the same machine as your Nagios server, and performs simple monitoring as well as the config file updates.