I'm very sure this problem has been solved, but I can't find any information anywhere about it...
How do sysadmins programmatically add a new node to an existing and running load balancer ? Let's say I have a load balancer running and already balancing say my API server between two EC2 instances, and suddenly there's a traffic spike and I need a third node in the load balancer but I'm asleep... It would be wonderful if I had something monitoring probably RAM usage and some key performance indicators that tell me when I should have another node, and even better if it could add a new node to the load balancer alone...
I'm confident that this is possible and even trivial to do with node-http-proxy and distribute, but I'd like to know if this is possible to do with HAproxy and/or Nginx... I know Amazon's elastic load balancing is probably my best bet but I want to do it on my own (I want to spawn instances from rackspace, EC2, Joyent and probably others as it's convenient).
Once again, spawning a node is easy, I'd like to know how to add it to haproxy.cfg or something similar with Nginx without having to reload the whole proxy, and doing that programatically. Bash scripting is my best bet for this but it still does have to reload the whole proxy which is bad because it loses connections...
You have a few questions in there. For the "add nodes to haproxy without restarting it":
What I do for a similar problem is prepopulate the config file with server names.. e.g. web01, web02 ... web20 even if I only have 5 web servers at the time. Then in my hosts file I map those to the actual ips of the web servers.
To add a new server, you just create an entry for it in the hosts file and it will start passing health checks and get added.
For automated orchestration, it really depends on your environment and you'll probably have to write something custom that fits your needs. There are paid solutions (Scalr comes to mind) to handle orchestration too.
What I do: I have a line in my backend section in haproxy.cfg which says:
# new webservers here
And with a sed script I update haproxy.cfg with something like:
sed -i -e "/new webservers here/a\ server $ip_address $ip_address:12080 check maxconn 28 weight 100"
And then reload haproxy. Works transparently.
HAProxy has a Runtime API that allows you to do just that dynamically.
Please read the official documentation:
Dynamic Configuration HAProxy Runtime API
Related
I've just learned how to use notifications and subscriptions in Chef to carry out actions such as restarting services if a config file is changed.
I am still learning chef so may just have not got to this section yet but I'd like to know how to do the actions conditionally.
Eg1 if I change a config file for my stand alone apache server I only want to restart the service if we are outside core business hours ie the current local time is between 6pm and 6am. If we are in core business hours I want the restart to happen but at a later time, outside core hours.
Eg2 if I change a config file for my load balanced apache server cluster I only want restart the service if a) the load balancer service status is "running" and b) all other nodes in the cluster have their apache service status as running ie I'm not taking down more than one node in the cluster at once.
I imagine we might need to put the action in a ruby block that either loops until the conditions are met or sets a flag or creates a scheduled task to execute later but I have no idea what to look for to learn how best to do this.
I guess this topic is kind of philosophical. For me, Chef should not have a specific state or logic beyond the current node and run. If I want to restart at a specific time, I would create a cron job with a conditional and just set the conditional with chef (Something like debian's /var/run/reboot-required). Then crond would trigger the reboot.
For your second example, the LB should have no issues to deal with a restarting apache backend and failover to another backend. Given that Chef runs regulary with something called "splay" the probability is very low that no backend is reachable. Even with only 2 backends. That said, reloading may be the better way.
I have several stateless app servers packed into Docker containers. I have a lot of load on top of them and I want to horizontally scale this setup. My setup doesn't include load balancer nodes.
What I've done is simply increased nodes count — so far so good.
From my understanding Jelastic have some internal load balancer which decides to which node it should pass incoming request, e.g.:
user -> jelastic.my-provider.com -> one of 10 of app nodes created.
But I've noticed that lot of my nodes (especially last ones) are not receiving any requests, and just idling, while first nodes receive lion share of incoming requests (I have lot of them!). This looks strange for me, because I thought that internal load balancer doing some king of round-robin distribution.
How to setup round-robin balancing properly? I came to the conclusion that I have to create another environment with nginx/haproxy and manually add all my 10 nodes to list of downstream servers.
Edit: I've setup separate HAProxy instance and manually added all my nodes to haproxy.cfg and it worked like a charm. But the question is still open since I want to achieve automatic/by schedule horizontal scaling.
Edit2: I use Jelastic v5.3 Cerebro. I use custom Docker images (btw, I have something like ~20 envs which all about of custom images except of databases).
My topology for this specific case is pretty simple — single Docker environment with app server configured and scaled to 10 nodes. I don't use public IP.
Edit3: I don't need sticky sessions at all. All my requests came from another service deployed to jelastic (1 node).
I have a really simple setup: An azure load balancer for http(s) traffic, two application servers running windows and one database, which also contains session data.
The goal is being able to reboot or update the software on the servers, without a single request being dropped. The problem is that the health probe will do a test every 5 seconds and needs to fail 2 times in a row. This means when I kill the application server, a lot of requests during those 10 seconds will time out. How can I avoid this?
I have already tried running the health probe on a different port, then denying all traffic to the different port, using windows firewall. Load balancer will think the application is down on that node, and therefore no longer send new traffic to that specific node. However... Azure LB does hash-based load balancing. So the traffic which was already going to the now killed node, will keep going there for a few seconds!
First of all, could you give us additional details: is your database load balanced as well ? Are you performing read and write on this database or only read ?
For your information, you have the possibility to change Azure Load Balancer distribution mode, please refer to this article for details: https://learn.microsoft.com/en-us/azure/load-balancer/load-balancer-distribution-mode
I would suggest you to disable the server you are updating at load balancer level. Wait a couple of minutes (depending of your application) before starting your updates. This should "purge" your endpoint. When update is done, update your load balancer again and put back the server in it.
Cloud concept is infrastructure as code: this could be easily scripted and included in you deployment / update procedure.
Another solution would be to use Traffic Manager. It could give you additional option to manage your endpoints (It might be a bit oversized for 2 VM / endpoints).
Last solution is to migrate to a PaaS solution where all this kind of features are already available (Deployment Slot).
Hoping this will help.
Best regards
Can someone please tell me the pro's and con's of mod_jk vs mod_cluster.
We are looking to do very simple load balancing.. We are going to be using sticky sessions and just need something to route new requests to a new server if one server goes down. I feel that mod_jk does this and does a good job so why do I need mod_cluster?
If your JBoss version is 5.x or above, you should use mod_cluster, it will give you a better performance and reliability than mod_jk. Here you've some reasons:
better load balacing between app servers: the load balancing logic is calculated based on information and metrics provided directly by the applications servers (bear in mind they have first hand information about its load), in contrast with mod_jk with which the logic is calculated by the proxy itself. For that, mod_cluster uses an extra connection between the servers and the proxy (a part from the data one), used to send this load information.
better integration with the lifecycle of the applications deployed in the servers: the servers keep the proxy informed about the changes of the application in each respective node (for example if you undeploy the application in one of the nodes, the node will inform the proxy (mod_cluster) immediately, avoiding this way the inconvenient 404 errors.
it doesn't require ajp: you can also use it with http or https.
better management of the servers lifecycle events: when a server shutdowns or it's restarted, it informs the proxy about its state, so that the proxy can reconfigure itself automatically.
You can use sticky sessions as well with mod cluster, though of course, if one of the nodes fails, mod cluster won't help to keep the user sessions (as it would happen as well with other balancers, unless you've the JBoss nodes in cluster). But due to the reasons given above (keeping track of the server lifecycle events, and better load balancing mainly), in case one of the servers goes down, mod cluster will manage it better and more transparently to the user (the proxy will be informed immediately, and so it will never send requests to that node, until it's informed that it's restarted).
Remember that you can use mod_cluster with JBoss AS/EAP 5.x or JBoss Web 2.1.1 or above (in the case of Tomcat I think it's version 6 or above).
To sum up, though your use case of load balancing is simple, mod_cluster offers a better performance and scalability.
You can look for more information in the JBoss site for mod_cluster, and in its documentation page.
I've been looking at high availability solutions such as heartbeat, and keepalived to failover when an haproxy load balancer goes down. I realised that although we would like high availability it's not really a requirement at this point in time to do it to the extent of the expenditure on having 2 load balancer instances running at any one time so that we get instant failover (particularly as one lb is going to be redundant in our setup).
My alternate solution is to fire up a new load balancer EC2 instance from an AMI if the current load balancer has stopped working and associate it to the elastic ip that our domain name points to. This should ensure that downtime is limited to the time it takes to fire up the new instance and associate the elastic ip, which given our current circumstance seems like a reasonably cost effective solution to high availability, particularly as we can easily do it multi-av zone. I am looking to do this using the following steps:
Prepare an AMI of the load balancer
Fire up a single ec2 instance acting as the load balancer and assign the Elastic IP to it
Have a micro server ping the current load balancer at regular intervals (we always have an extra micro server running anyway)
If the ping times out, fire up a new EC2 instance using the load balancer AMI
Associate the elastic ip to the new instance
Shut down the old load balancer instance
Repeat step 3 onwards with the new instance
I know how to run the commands in my script to start up and shut down EC2 instances, associate the elastic IP address to an instance, and ping the server.
My question is what would be a suitable ping here? Would a standard ping suffice at regular intervals, and what would be a good interval? Or is this a rather simplistic approach and there is a smarter health check that I should be doing?
Also if anyone foresees any problems with this approach please feel free to comment
I understand exactly where you're coming from, my company is in the same position. We care about having a highly available fault tolerant system however the overhead cost simply isn't viable for the traffic we get.
One problem I have with your solution is that you're assuming the micro instance and load balancer wont both die at the same time. With my experience with amazon I can tell you it's defiantly possible that this could happen, however unlikely, its possible that whatever causes your load balancer to die also takes down the micro instance.
Another potential problem is you also assume that you will always be able to start another replacement instance during downtime. This is simply not the case, take for example an outage amazon had in their us-east-1 region a few days ago. A power outage caused one of their zones to loose power. When they restored power and began to recover the instances their API's were not working properly because of the sheer load. During this time it took almost 1 hour before they were available. If an outage like this knocks out your load balancer and you're unable to start another you'll be down.
That being said. I find the ELB's provided by amazon are a better solution for me. I'm not sure what the reasoning is behind using HAProxy but I recommend investigating the ELB's as they will allow you to do things such as auto-scaling etc.
For each ELB you create amazon creates one load balancer in each zone that has an instance registered. These are still vulnerable to certain problems during severe outages at amazon like the one described above. For example during this downtime I could not add new instances to the load balancers but my current instances ( the ones not affected by the power outage ) were still serving requests.
UPDATE 2013-09-30
Recently we've changed our infrastructure to use a combination of ELB and HAProxy. I find that ELB gives the best availability but the fact that it uses DNS load balancing doesn't work well for my application. So our setup is ELB in front of a 2 node HAProxy cluster. Using this tool HAProxyCloud I created for AWS I can easily add auto scaling groups to the HAProxy servers.
I know this is a little old, but the solution you suggest is overcomplicated, there's a much simpler method that does exactly what you're trying to accomplish...
Just put your HAProxy machine, with your custom AMI in an auto-scaling group with a minimum AND maximum of 1 instance. That way when your instance goes down the ASG will bring it right back up, EIP and all. No external monitoring necessary, same if not faster response to downed instances.