I have several stateless app servers packed into Docker containers. I have a lot of load on top of them and I want to horizontally scale this setup. My setup doesn't include load balancer nodes.
What I've done is simply increased nodes count — so far so good.
From my understanding Jelastic have some internal load balancer which decides to which node it should pass incoming request, e.g.:
user -> jelastic.my-provider.com -> one of 10 of app nodes created.
But I've noticed that lot of my nodes (especially last ones) are not receiving any requests, and just idling, while first nodes receive lion share of incoming requests (I have lot of them!). This looks strange for me, because I thought that internal load balancer doing some king of round-robin distribution.
How to setup round-robin balancing properly? I came to the conclusion that I have to create another environment with nginx/haproxy and manually add all my 10 nodes to list of downstream servers.
Edit: I've setup separate HAProxy instance and manually added all my nodes to haproxy.cfg and it worked like a charm. But the question is still open since I want to achieve automatic/by schedule horizontal scaling.
Edit2: I use Jelastic v5.3 Cerebro. I use custom Docker images (btw, I have something like ~20 envs which all about of custom images except of databases).
My topology for this specific case is pretty simple — single Docker environment with app server configured and scaled to 10 nodes. I don't use public IP.
Edit3: I don't need sticky sessions at all. All my requests came from another service deployed to jelastic (1 node).
Related
Our application is built on top of Intersystems IRIS (previously cache) and consists of a large core and DB that is enhanced with several external modules that connect to the core.
We deploy IRIS and the external apps on premise on the same server (for several reasons).
When we use mirror, we have several servers with the same content (IRIS + external modules) that act as a high availability mirroring system, where only one node is the 'active' one and the rest of them are waiting.
Ideally, our external modules are started up and stopped following the IRIS instance on each node using two callbacks available.
When configured in mirror, they are only started on the 'active' node (by a provided callback) and initially stopped on all other nodes.
When a failover occurs and one of the 'waiting' nodes is promoted to 'active', the external apps are started on that promoting node.
On the demoting node (passing from 'active' to waiting, crashed or hang) we don't have a good way to stop those services as there is no callback from intersystems.
We are analyzing possible alternatives, but any other would be greatly appreciated as well as comments:
Implementing an additional service that keeps track of the IRIS instance
Making the external modules 'mirror' aware
I would recommend to use one more server, where you would not need to stop/start services. Just keep the mirror alone, two servers, just only for the mirror, only data, no running code, no users. And one more Server, connected to the mirror as ECP via VIP. In this case, all of your services would work on that server, and should not care about where the status was changed. There will be a short outage during the switch in the mirror, but nothing fatal. I have such a configuration in production, but I have 10 servers behind the mirror, including 1 is just for interoperability reasons. And we already had a few switches, with no issues.
I have a really simple setup: An azure load balancer for http(s) traffic, two application servers running windows and one database, which also contains session data.
The goal is being able to reboot or update the software on the servers, without a single request being dropped. The problem is that the health probe will do a test every 5 seconds and needs to fail 2 times in a row. This means when I kill the application server, a lot of requests during those 10 seconds will time out. How can I avoid this?
I have already tried running the health probe on a different port, then denying all traffic to the different port, using windows firewall. Load balancer will think the application is down on that node, and therefore no longer send new traffic to that specific node. However... Azure LB does hash-based load balancing. So the traffic which was already going to the now killed node, will keep going there for a few seconds!
First of all, could you give us additional details: is your database load balanced as well ? Are you performing read and write on this database or only read ?
For your information, you have the possibility to change Azure Load Balancer distribution mode, please refer to this article for details: https://learn.microsoft.com/en-us/azure/load-balancer/load-balancer-distribution-mode
I would suggest you to disable the server you are updating at load balancer level. Wait a couple of minutes (depending of your application) before starting your updates. This should "purge" your endpoint. When update is done, update your load balancer again and put back the server in it.
Cloud concept is infrastructure as code: this could be easily scripted and included in you deployment / update procedure.
Another solution would be to use Traffic Manager. It could give you additional option to manage your endpoints (It might be a bit oversized for 2 VM / endpoints).
Last solution is to migrate to a PaaS solution where all this kind of features are already available (Deployment Slot).
Hoping this will help.
Best regards
I'm working on a web application frontend to a legacy system which involves a lot of CPU bound background processing. The application is also stateful on the server side and the domain objects needs to be held in memory across the entire session as the user operates on it via the web based interface. Think of it as something like a web UI front end to photoshop where each filter can take 20-30 seconds to execute on the server side, so the app still has to interact with the user in real time while they wait.
The main problem is that each instance of the server can only support around 4-8 instances of each "workspace" at once and I need to support a few hundreds of concurrent users at once. I'm going to be building this on Amazon EC2 to make use of the auto scaling functionality. So to summarize, the system is:
A web application frontend to a legacy backend system
task performed are CPU bound
Stateful, most calls will be some sort of RPC, the user will make multiple actions that interact with the stateful objects held in server side memory
Most tasks are semi-realtime, where they have to execute for 20-30 seconds and return the results to the user in the same session
Use amazon aws auto scaling
I'm wondering what is the best way to make a system like this distributed.
Obviously I will need a web server to interact with the browser and then send the cpu-bound tasks from the web server to a bunch of dedicated servers that does the background processing. The question is how to best hook up the 2 tiers together for my specific neeeds.
I've been looking at message Queue systems such as rabbitMQ but these seems to be geared towards one time task where any worker node can simply grab a job form a queue, execute it and forget the state. My needs are a little different since there could be multiple 'tasks' that needs to be 'sticky', for example if step 1 is started in node 1 then step 2 for the same workspace has to go to the same worker process.
Another problem I see is that most worker queue systems seems to be geared towards background tasks that can be processed anytime rather than a system that has to provide user feedback that I'm dealing with.
My question is, is there an off the shelf solution for something like this that will allow me to easily build a system that can scale? Would love to hear your thoughts.
RabbitMQ is has an RPC tutorial. I haven't used this pattern in particular but I am running RabbitMQ on a couple of nodes and it can handle hundreds of connections and millions of messages. With a little work in monitoring you can detect when there is more work to do then you have consumers for. Messages can also timeout so queues won't backup too greatly. To scale out capacity you can create multiple RabbitMQ nodes/clusters. You could have multiple rounds of RPC so that after the first response you include the information required to get second message to the correct destination.
0MQ has this as a basic pattern which will fanout work as needed. I've only played with this but it is simpler to code and possibly simpler to maintain (as it doesn't need a broker, devices can provide one though). This may not handle stickiness by default but it should be possible to write your own routing layer to handle it.
Don't discount HTTP for this as well. When you want request/reply, a strict throughput per backend node, and something that scales well, HTTP is well supported. With AWS you can use their ELB easily in front of an autoscaling group to provide the routing from frontend to backend. ELB supports sticky sessions as well.
I'm a big fan of RabbitMQ but if this is the whole scope then HTTP would work nicely and have fewer moving parts in AWS than the other solutions.
I've been looking at high availability solutions such as heartbeat, and keepalived to failover when an haproxy load balancer goes down. I realised that although we would like high availability it's not really a requirement at this point in time to do it to the extent of the expenditure on having 2 load balancer instances running at any one time so that we get instant failover (particularly as one lb is going to be redundant in our setup).
My alternate solution is to fire up a new load balancer EC2 instance from an AMI if the current load balancer has stopped working and associate it to the elastic ip that our domain name points to. This should ensure that downtime is limited to the time it takes to fire up the new instance and associate the elastic ip, which given our current circumstance seems like a reasonably cost effective solution to high availability, particularly as we can easily do it multi-av zone. I am looking to do this using the following steps:
Prepare an AMI of the load balancer
Fire up a single ec2 instance acting as the load balancer and assign the Elastic IP to it
Have a micro server ping the current load balancer at regular intervals (we always have an extra micro server running anyway)
If the ping times out, fire up a new EC2 instance using the load balancer AMI
Associate the elastic ip to the new instance
Shut down the old load balancer instance
Repeat step 3 onwards with the new instance
I know how to run the commands in my script to start up and shut down EC2 instances, associate the elastic IP address to an instance, and ping the server.
My question is what would be a suitable ping here? Would a standard ping suffice at regular intervals, and what would be a good interval? Or is this a rather simplistic approach and there is a smarter health check that I should be doing?
Also if anyone foresees any problems with this approach please feel free to comment
I understand exactly where you're coming from, my company is in the same position. We care about having a highly available fault tolerant system however the overhead cost simply isn't viable for the traffic we get.
One problem I have with your solution is that you're assuming the micro instance and load balancer wont both die at the same time. With my experience with amazon I can tell you it's defiantly possible that this could happen, however unlikely, its possible that whatever causes your load balancer to die also takes down the micro instance.
Another potential problem is you also assume that you will always be able to start another replacement instance during downtime. This is simply not the case, take for example an outage amazon had in their us-east-1 region a few days ago. A power outage caused one of their zones to loose power. When they restored power and began to recover the instances their API's were not working properly because of the sheer load. During this time it took almost 1 hour before they were available. If an outage like this knocks out your load balancer and you're unable to start another you'll be down.
That being said. I find the ELB's provided by amazon are a better solution for me. I'm not sure what the reasoning is behind using HAProxy but I recommend investigating the ELB's as they will allow you to do things such as auto-scaling etc.
For each ELB you create amazon creates one load balancer in each zone that has an instance registered. These are still vulnerable to certain problems during severe outages at amazon like the one described above. For example during this downtime I could not add new instances to the load balancers but my current instances ( the ones not affected by the power outage ) were still serving requests.
UPDATE 2013-09-30
Recently we've changed our infrastructure to use a combination of ELB and HAProxy. I find that ELB gives the best availability but the fact that it uses DNS load balancing doesn't work well for my application. So our setup is ELB in front of a 2 node HAProxy cluster. Using this tool HAProxyCloud I created for AWS I can easily add auto scaling groups to the HAProxy servers.
I know this is a little old, but the solution you suggest is overcomplicated, there's a much simpler method that does exactly what you're trying to accomplish...
Just put your HAProxy machine, with your custom AMI in an auto-scaling group with a minimum AND maximum of 1 instance. That way when your instance goes down the ASG will bring it right back up, EIP and all. No external monitoring necessary, same if not faster response to downed instances.
I'm very sure this problem has been solved, but I can't find any information anywhere about it...
How do sysadmins programmatically add a new node to an existing and running load balancer ? Let's say I have a load balancer running and already balancing say my API server between two EC2 instances, and suddenly there's a traffic spike and I need a third node in the load balancer but I'm asleep... It would be wonderful if I had something monitoring probably RAM usage and some key performance indicators that tell me when I should have another node, and even better if it could add a new node to the load balancer alone...
I'm confident that this is possible and even trivial to do with node-http-proxy and distribute, but I'd like to know if this is possible to do with HAproxy and/or Nginx... I know Amazon's elastic load balancing is probably my best bet but I want to do it on my own (I want to spawn instances from rackspace, EC2, Joyent and probably others as it's convenient).
Once again, spawning a node is easy, I'd like to know how to add it to haproxy.cfg or something similar with Nginx without having to reload the whole proxy, and doing that programatically. Bash scripting is my best bet for this but it still does have to reload the whole proxy which is bad because it loses connections...
You have a few questions in there. For the "add nodes to haproxy without restarting it":
What I do for a similar problem is prepopulate the config file with server names.. e.g. web01, web02 ... web20 even if I only have 5 web servers at the time. Then in my hosts file I map those to the actual ips of the web servers.
To add a new server, you just create an entry for it in the hosts file and it will start passing health checks and get added.
For automated orchestration, it really depends on your environment and you'll probably have to write something custom that fits your needs. There are paid solutions (Scalr comes to mind) to handle orchestration too.
What I do: I have a line in my backend section in haproxy.cfg which says:
# new webservers here
And with a sed script I update haproxy.cfg with something like:
sed -i -e "/new webservers here/a\ server $ip_address $ip_address:12080 check maxconn 28 weight 100"
And then reload haproxy. Works transparently.
HAProxy has a Runtime API that allows you to do just that dynamically.
Please read the official documentation:
Dynamic Configuration HAProxy Runtime API