What is the limit for number of servers in consul cluster? - consul

How many servers can consul handle? 10.000 - possible? 100.000?
I have an idea to implement configuration management system using it, and it is going to be a lot of nodes.

There is no hard upper limit, as James Phillips has answered here:
As long as your servers can handle the read/write loads you should be
fine with thousands of agents. We have folks in the 10's of thousands
of agents territory in a single datacenter.

Related

How to prevent being affected by data-center DDoS attack & maintainance related downtime?

I'm hosting a web application which should be highly-available. I'm hosting on multiple linodes and using a nodebalancer to distribute the traffic. My question might be stupid simple - but not long ago I was affected by a DDoS hitting the data-center. That made me think how I can be better prepared next time this happens.
The nodebalancer and servers are all in the same datacenter which should, of course, be fixed. But how does one go about doing this? If I have two load balancers in two different data centers - how can I setup the domain to point to both, but ignore the one affected by DDoS? Should I look into the DNS manager? Am I making things too complicated?
Really would appreciate some insights.
Thanks everyone...
You have to look at ways to load balance across datacenters. There's a few ways to do this, each with pros and cons.
If you have a lot of DB calls, running to datacenters HOT can introduce a lot of latency problems. What I would do is as follows.
Have the second datacenter (DC2) be a warm location. It is configured for everything to work and is constantly getting data from the master DB in DC 1, but isn't actively getting traffic.
Use a service like CLoudFlare for their extremely fast DNS switching. Have a service in DC2 that constantly pings the load balancer in DC1 to make sure that everything is up and well. When it has trouble contacting DC1, it can connect to CloudFlare via the API and switch the main 'A' record to point to DC2, in which case it now picks up the traffic.
I forget what CloudFlare calls it but it has a DNS feature that allows you to switch 'A' records almost instantly because the actual IP address given to the public is their own, they just route the traffic for you.
Amazon also have a similar feature with CloudFront I believe.
This plan is costly however as you're running much more infrastructure that rarely gets used. Linode is and will be rolling out more network improvements so hopefully this becomes less necessary.
For more advanced load balancing and HA, you can go with more "cloud" providers but it does come at a cost.
-Ricardo
Developer Evangelist, CircleCI, formally Linode

How to achieve high avaliability?

I'm about to build a new system and I want maximum availability! I'll have to use Windows!
I will have clients talking to my system using webservices. I'll also get data from surrounding systems. This data is delivered using messaging, MQ-series and MSMQ.
The system will produce some data that is sent back to the surrounding systems using queues.
After new data has come to the system different processes will use this data to do diffrent tasks, like printing, writing to databases etc.
To achieve high availablity I'm planning to have two versions of the system running in parallel on two different machines. The clients will try to use the first server thats responds correctly.
I think an ideal soultion would be that the incomming data from anyone of the two servers is placed in a COMMON queue(on a third machine?). Data in the queue can be picked up by processes on both servers(think producer-consumer pattern).
I think that maybe NServiceBus will suits my needs. I have a few questions according to the above.
Can a queue be shared between two servers? I dont want data to be stuck on a server if its gets down. I that case I want the other server to keep processing.
Can two(or more) "consumers"/processes on different machines pick data from a common queue?
Any advice is welcome!
The purpose of NSB distributor is not to address availability issues but to address scale issues, distributors help scaling out systems at a low cost.
By looking at the description, your system consist of WebService endpoits, multiple databases and queuing infrastructure. If you want to achieve complete high-availability you will have to make sure there are no single points of failures. In order to do that you will need,
A load balanced web farm for web service endpoints (2 or more servers)
Application cluster for queues and applications that relies on those queues.
Highly available database server, again clustered.
On top of everything a good SAN.
But if you are referring to being available to consumers, you just have to make sure target queues and webservice endpoints are available. And making sure the overall architecture promotes deferred execution.
Two or more applications can read a MSMQ queue remotely but thats something you don't want to do since it's based on DTC. And that's a real performance killer.
Some references
[http://blogs.msdn.com/b/clustering/archive/2012/05/01/10299698.aspx][1]
[http://msdn.microsoft.com/en-us/library/ms190202.aspx][2]
In short you will want to use the distributor... http://support.nservicebus.com/customer/portal/articles/859556-load-balancing-with-the-distributor
The key thing here is that the distributor node is a single point of failure so you want to run it on a cluster.

Scaling Tigase XMPP server on Amazon EC2

Does anyone have an experience running clustered Tigase XMPP servers on Amazon's EC2, primarily I wish to know about anything that might trip me up that is non-obvious. (For example apparently running Ejabberd on EC2 can cause issues due to Mnesia.)
Or if you have any general advice to installing and running Tigase on Ubuntu.
Extra information:
The system I’m developing uses XMPP just to communicate (in near real-time) between a mobile app and the server(s).
The number of users will initially be small, but hopefully will grow. This is why the system needs to be scalable. Presumably for a just a few thousand users you wouldn’t need a cc1.4xlarge EC2 instance? (Otherwise this is going to be very expensive to run!)
I plan on using a MySQL database hosted in Amazon RDS for the XMPP server database.
I also plan on creating an external XMPP component written in Python, using SleekXMPP. It will be this external component that does all the ‘work’ of the server, as the application I’m making is quite different from instant messaging. For this part I have not worked out how to connect an external XMPP component written in Python to a Tigase server. The documentation seems to suggest that components are written specifically for Tigase - and not for a general XMPP server, using XEP-0114: Jabber Component Protocol, as I expected.
With this extra information, if you can think of anything else I should know about I’d be glad to know.
Thank you :)
I have lots of experience. I think there is a load of non-obvious problems. Like the only reliable instance to run application like Tigase is cc1.4xlarge. Others cause problems with CPU availability and this is just a lottery whether you are lucky enough to run your service on a server which is not busy with others people work.
Also you need an instance with the highest possible I/O to make sure it can cope with network traffic. The high I/O applies especially to database instance.
Not sure if this is obvious or not, but there is this problem with hostnames on EC2, every time you start instance the hostname changes and IP address changes. Tigase cluster is quite sensitive to hostnames. There is a way to force/change the hostname for the instance, so this might be a way around the problem.
Of course I am talking about a cluster for millions of online users and really high traffic 100k XMPP packets per second or more. Generally for large installation it is way cheaper and more efficient to have a dedicated servers.
Generally Tigase runs very well on Amazon EC2 but you really need the latest SVN code as it has lots of optimizations added especially after tests on the cloud. If you provide some more details about your service I may have some more suggestions.
More comments:
If it comes to costs, a dedicated server is always cheaper option for constantly running service. Unless you plan to switch servers on/off on hourly basis I would recommend going for some dedicated service. Costs are lower and performance is way more predictable.
However, if you really want/need to stick to Amazon EC2 let me give you some concrete numbers, below is a list of instances and how many online users the cluster was able to reliably handle:
5*cc1.4xlarge - 1mln 700k online users
1*c1.xlarge - 118k online users
2*c1.xlarge - 127k online users
2*m2.4xlarge (with 5GB RAM for Tigase) - 236k online users
2*m2.4xlarge (with 20GB RAM for Tigase) - 315k online users
5*m2.4xlarge (with 60GB RAM for Tigase) - 400k online users
5*m2.4xlarge (with 60GB RAM for Tigase) - 312k online users
5*m2.4xlarge (with 60GB RAM for Tigase) - 327k online users
5*m2.4xlarge (with 60GB RAM for Tigase) - 280k online users
A few more comments:
Why amount of memory matters that much? This is because CPU power is very unreliable and inconsistent on all but cc1.4xlarge instances. You have 8 virtual CPUs but if you look at the top command you often see one CPU is working and the rest is not. This insufficient CPU power leads to internal queues grow in the Tigase. When the CPU power is back Tigase can process waiting packets. The more memory Tigase has the more packets can be queued and it better handles CPU deficiencies.
Why there is 5*m2.4xlarge 4 times? This is because I repeated tests many times at different days and time of the day. As you can see depending on the time and date the system could handle different load. I guess this is because Tigase instance shared CPU power with some other services. If they were busy Tigase suffered from CPU under power.
That said I think with installation of up to 10k online users you should be fine. However, other factors like roster size greatly matter as they affect traffic, and load. Also if you have other elements which generate a significant traffic this will put load on your system.
In any case, without some tests it is impossible to tell how really your system behaves or whether it can handle the load.
And the last question regarding component:
Of course Tigase does support XEP-0114 and XEP-0225 for connecting external components. So this should not be a problem with components written in different languages. On the other hand I recommend using Tigase's API for writing component. They can be deployed either as internal Tigase components or as external components and this is transparent for the developer, you do not have to worry about this at development time. This is part of the API and framework.
Also, you can use all the goods from Tigase framework, scripting capabilities, monitoring, statistics, much easier development as you can easily deploy your code as internal component for tests.
You really do not have to worry about any XMPP specific stuff, you just fill body of processPacket(...) method and that's it.
There should be enough online documentation for all of this on the Tigase website.
Also, I would suggest reading about Python support for multi-threading and how it behaves under a very high load. It used to be not so great.

High Performance Highly Available Tracking System

I currently hold a tracking service that records visits from various sources. At times we record the visits and redirect to our clients or we let clients call us to report visits. The architecture is two worker boxes configured behind a load-balancer. This system is setup using Amazon EC2 and the load-balancer used is Amazon's Elastic LB.
I did some benchmarking tests and have noticed significant network latencies. The traffic through load-balancer suffers atleast 2 times more delay than hitting any of the boxes directly.
Has anyone experienced such an issue and has attempted to solve it? Is this an Amazon EC2 specific issue?
Is there any other architecture in use that would lower my network latencies significantly. e.g. Using a HA such that traffic needn't go through a load-balancer but instead hits the end point servers directly? Before I start investing time on that I wanted to hear what others think of the same.
Thanks alot for your time,
Santosh
Change your LB and give it another try. HAProxy is great session/cookie aware L7 balancer and can be setup in Amazon cloud AFAIK. See this: http://agiletesting.blogspot.com/2009/02/load-balancing-in-amazon-ec2-with.html
You have to take into account that ELBs perform better after a while, then initially. Don't ask me why, but that's how it is -- loadbalancer warming?
It also really depends how much traffic you sent the ELB. Keep in mind that the hardware the ELB is provisioned on seems like a regular small instance. So the throughput is capped at ~25 MBit (last time I checked). If you require more, go dedicated.
In the end, I too would suggest that you look Haproxy on a dedicated instance. I'd expect some delay, 2x more delay sounds unreal. Maybe use another small instance and benchmark it directly against the ELB and then try a c1.medium.

Is it possible to rent CPU cycles?

I have an application that takes days to process data. Is there a service that would let me run my application on powerful computers?
I'm not running a website or a web service. This is taking lots and lots of data files, running them through a big custom application, and outputting a result.
It takes days on my PC and it's something that needs to be done every once in a while, but not continuously.
Cost isn't really an issue, in the sense that my company will pay for it, but of course it should be cheaper than buying a big-ass machine ourselves.
Have you considered Amazon EC2? You pay by the hour for what you use. No more, no less. You could event rent many servers at once to split the work load.
I'm not sure if that meets your requirement of "powerful computers", because they're just average servers, but at least it will give you a pay-as-you-go solution for running the program off of your own computer.
Amazon's EC2 Service is an excellent solution for your needs. You only pay for the time you use, and you can scale up to as many machines as you need.
From their information:
Elastic – Amazon EC2 enables you to increase or decrease capacity within minutes, not hours or days. You can commission one, hundreds or even thousands of server instances simultaneously. Of course, because this is all controlled with web service APIs, your application can automatically scale itself up and down depending on its needs.
Flexible – You have the choice of multiple instance types, operating systems, and software packages. Amazon EC2 allows you to select a configuration of memory, CPU, and instance storage that is optimal for your choice of operating system and application. For example, your choice of operating systems includes numerous Linux distributions, Microsoft Windows Server and OpenSolaris.
If your application is not parallel, you won't get many advantages by running it in a "big machine", unless the bottleneck is in the virtual memory swapping. Even the Top500 supercomputers are not essentially faster than any PC for sequential workloads.
If your application can exploit parallelism maybe you could use your company's existent resources more efficiently than just deploying it in one and only pc. If you have a few dozens of computers, you could set up a loosely coupled heterogeneous cluster (or local grid, terminology changes with fashion).
I recommend CPUsage.
It is a "startup" in grid computing.
It's speciality is that any individual can join to the grid with spare cpu cycles. That makes the grid management cheap, thus the grid usage prices are also very cheap.
They have an API which if you integrate into your program, it will be able to run on the system.

Resources