What are the minimum machine specifications necessary for Admin and Container processes? - spring-xd

The reference material simply states that JDK7 is required for Spring XD.
What are the minimum requirements (RAM, CPU, Disk) for hosts meant to run Spring XD Admin?
What are the minimum requirements (RAM, CPU, Disk) for hosts meant to run Spring XD Containers?

The answer in both cases is it depends what you need to use them for. It seems like Spring XD is designed for high throughput computing(HTC), so unlike traditional high performance computing the addition of GPUs or coprocessors in this case would probably not be particularly beneficial. If you just want to try it out and happen to have several servers laying around it seems like as long as you have something that is powerful enough to run an OS that supports Java you could probably at least make it work. If you are in the initial stages of testing Spring XD to see if it will integrate with your existing infrastructure this would allow you to at least try it out. If you have passed that stage of testing and are confident that Spring XD will work and would like to purchase hardware to optimize its performance feel free to continue reading.
I have not used Spring XD before, but based on the documentation I have been reading and some experiences with HTC there are a few considerations for setting up systems to run it. if you take a look at the diagram from the docs and read a little bit about the services it seems like the Admin, Zookeeper, Analytics Repo and Batch Job DB could be hosted on virtual machines(VMs) under the hypervisor of your choice.
Using a setup with several of the subsystems required to use the distributed model running on VMs would give you the ability to scale resources as necessary, e.g. to begin a single hypervisor system may be sufficient to run everything but as traffic/use grows it may be desirable to separate the VMs onto multiple hypervisors and give some of the VMs additional resources.
With the containers it seems like many other virtualization or containerization schemes for HTC, where more powerful systems e.g. lots of RAM, SSD storage, allow users to run more containers on a single physical box.
To adequately assess the needs for a new system running any application it is important to understand what the limiting factor on the problem is; is it memory bound, IO bound or CPU bound? For large scale parallel applications there are a variety of tools for profiling code and determining where bottlenecks occur. TAU is a common profiling utility in HPC and there are several proprietary offerings available as well.
Once the limitations of the program are clear specing out a system with hardware to reduce/minimize the issue is a lot easier, and normally less expensive. Hopefully this information is helpful.
Additions based on comments:
It seems like it would run with 128k of memory if you have an OS that will boot and run java and any other requirements. If there is backend storage setups somewhere, like a standalone DB server which can be used for the databases as described in the DB Config section of the guide it seems like only a small amount of storage would be necessary.
Depending on how you deploy the images for the Admin OS that may not even be necessary as you could use KIWI to create and deploy a custom OS image of your choosing with configuration files and other customizations embedded in the image. This image could be loaded via the network over PXE or to one of the other output formats KIWI supports like VMs, bootable USB and more.
The exact configuration of the systems running Spring XD will depend on the end goals, available infrastructure and a number of other things. It seems like the Spring XD Admin node could be run on most infrastructure servers. Factors such as reliability, stability and desired performance must also be considered when choosing hardware.
Q: Will Spring XD Admin run on a system with RaspberryPi like specs?
A: based on documentation, yes
Q: Will it run with good performance or reliably on such a system?
A: Probably not if being used for extended periods of time or for large amounts of traffic.

Related

Apache Aurora GPU Resources

I am checking out Apache Aurora with the scope of running scientific workflows (assuming a set of python scripts in a particular sequence). I've successfully managed to run a few of these aurora Jobs, and it looks great for my particular use-case.
I was wondering if there is a way to specify that a particular task (or job, in general) requires a number of GPU resources from my Apache Mesos cluster Of course Mesos needs to be aware of the GPU resources first, and it seems this is possible by defining these GPU resources as indicated here.
So the question is whether there is a way to communicate with Mesos via Aurora to accept offers with GPU resources available. As far as I can tell, the Resource object in Aurora is limited to CPU/Ram/Disk resources. Any hints are greatly appreciated.
Thanks!
I'm not familiar with Apache Aurora, but Mesosphere Marathon (a framework similar to Aurora in functionality) is limited to cpu, mem, and disk resources as well.
If you would like to use custom resources, you would probably need to write your own framework. Depending on your needs it may not be that difficult. For inspiration, check the RENDLER framework.
As mentioned in the thread you are referencing to, Mesos do not provide isolation for GPU (actually, for any custom) resources. Keep this is in mind when doing resource math.
When checking the Aurora tutorial I assume you can just specify this ressource as part of you job description:
resources = Resources(cpu = 2, ram = 4*GB, disk = 8*GB, gpu = 1),
Just keep in mind that this is in artificial resource for Mesos, so Mesos will not take care of resource isolation in this case. For example if you have several GPUs on one system, your code would have to manage the isolation/scheduling between the different GPUs.

Mesos real world use-cases

I'm trying to figure out what would be the reasons for using Mesos. Can you come up with other ones?
Running all of your services in the same cluster instead of dedicated clusters (your end-applications + DevOps such as Jenkins)
Running different maturity applications in same cluster (dev, test, production), or is this viable? Kubernetes has a similar approach with Labels
Mesos simplifies the use of traditional distributed applications such as Hadoop by easing deployment, unified API, bin-packing of resources
Full-disclosure: I currently work at Twitter and I'm involved in both Apache Mesos and Aurora.
Mesos uses cases can vary based upon a few dimensions: scale (10 servers vs 10s of thousands), available hardware (dedicated/static or in the public cloud/scalable), and workloads (primarily services, batch, or both).
Your list is a great start. Here are a few additional use cases / features to add.
Container Orchestration
As container runtimes like Docker have become popular, lots of potential users are looking at Mesos + a scheduler to manage orchestration once container images are created. Mesos is already quite mature and has been proven at scale, which I think has given it a leg up over some emergent solutions.
Increased Resource Utilization
For companies running >50 servers, a common motivation for adopting Mesos is to increase resource utilization to reduce CapEx. There are a number of examples of this in both the public and private cloud. In the case of Ebay they have been running Jenkins on Mesos and were able to reduce their VM footprint. Mesosphere has also published a case study of HubSpot (runnning on AWS), and how they've been able to replace hundreds of smaller servers with dozens of larger ones by more-efficiently using their available hardware.
Preemption
At Twitter we're running Mesos via one scheduler: Apache Aurora. One of the ways we can improve utilization relates to your use case: running different maturity applications in the same cluster. Aurora has a concept of environments, so you can run applications that are production, development, or test. Additionally, Aurora has a built-in preemption feature which allows it to prioritize production over non-production tasks, killing non-production tasks when those resources are needed to run production ones as well as a priority system within each environment.
Long-term, functionality related to preemption will also be located in the Mesos core itself -- it's a killer feature related to both increased resource utilization and running different maturity applications (dev, test, prod). There are a few Mesos tickets to follow if you're interested in keeping up to date, including MESOS-155 for preemption, and MESOS-1474 for inverse offers.
Colocating Batch and Services
Running batch and services in a shared Mesos cluster will be key to driving up utilization even further as js84 points out. Check out Project Myriad, an effort to colocate Mesos and YARN workloads in the same cluster. At this time I'm not aware of any large deployments running both batch and services, but it's certainly the direction the community is moving in as it becomes easier for multiple frameworks to run in a shared cluster.
At least one additional use case comes to mind: Development SDK for developing distributed applications. If you have a look at Mesos Frameworks you will find a number of frameworks which have been developed on top of Mesos. Also interesting Apple's Siri framework powering Siri.
Regarding your 1): One additional angle you should keep in mind here is scaling your applications in the same cluster. I.e. at peak load of your website, shift resources easily towards the webservers while scaling down the Hadoop analytical processing.

Scaling Tigase XMPP server on Amazon EC2

Does anyone have an experience running clustered Tigase XMPP servers on Amazon's EC2, primarily I wish to know about anything that might trip me up that is non-obvious. (For example apparently running Ejabberd on EC2 can cause issues due to Mnesia.)
Or if you have any general advice to installing and running Tigase on Ubuntu.
Extra information:
The system I’m developing uses XMPP just to communicate (in near real-time) between a mobile app and the server(s).
The number of users will initially be small, but hopefully will grow. This is why the system needs to be scalable. Presumably for a just a few thousand users you wouldn’t need a cc1.4xlarge EC2 instance? (Otherwise this is going to be very expensive to run!)
I plan on using a MySQL database hosted in Amazon RDS for the XMPP server database.
I also plan on creating an external XMPP component written in Python, using SleekXMPP. It will be this external component that does all the ‘work’ of the server, as the application I’m making is quite different from instant messaging. For this part I have not worked out how to connect an external XMPP component written in Python to a Tigase server. The documentation seems to suggest that components are written specifically for Tigase - and not for a general XMPP server, using XEP-0114: Jabber Component Protocol, as I expected.
With this extra information, if you can think of anything else I should know about I’d be glad to know.
Thank you :)
I have lots of experience. I think there is a load of non-obvious problems. Like the only reliable instance to run application like Tigase is cc1.4xlarge. Others cause problems with CPU availability and this is just a lottery whether you are lucky enough to run your service on a server which is not busy with others people work.
Also you need an instance with the highest possible I/O to make sure it can cope with network traffic. The high I/O applies especially to database instance.
Not sure if this is obvious or not, but there is this problem with hostnames on EC2, every time you start instance the hostname changes and IP address changes. Tigase cluster is quite sensitive to hostnames. There is a way to force/change the hostname for the instance, so this might be a way around the problem.
Of course I am talking about a cluster for millions of online users and really high traffic 100k XMPP packets per second or more. Generally for large installation it is way cheaper and more efficient to have a dedicated servers.
Generally Tigase runs very well on Amazon EC2 but you really need the latest SVN code as it has lots of optimizations added especially after tests on the cloud. If you provide some more details about your service I may have some more suggestions.
More comments:
If it comes to costs, a dedicated server is always cheaper option for constantly running service. Unless you plan to switch servers on/off on hourly basis I would recommend going for some dedicated service. Costs are lower and performance is way more predictable.
However, if you really want/need to stick to Amazon EC2 let me give you some concrete numbers, below is a list of instances and how many online users the cluster was able to reliably handle:
5*cc1.4xlarge - 1mln 700k online users
1*c1.xlarge - 118k online users
2*c1.xlarge - 127k online users
2*m2.4xlarge (with 5GB RAM for Tigase) - 236k online users
2*m2.4xlarge (with 20GB RAM for Tigase) - 315k online users
5*m2.4xlarge (with 60GB RAM for Tigase) - 400k online users
5*m2.4xlarge (with 60GB RAM for Tigase) - 312k online users
5*m2.4xlarge (with 60GB RAM for Tigase) - 327k online users
5*m2.4xlarge (with 60GB RAM for Tigase) - 280k online users
A few more comments:
Why amount of memory matters that much? This is because CPU power is very unreliable and inconsistent on all but cc1.4xlarge instances. You have 8 virtual CPUs but if you look at the top command you often see one CPU is working and the rest is not. This insufficient CPU power leads to internal queues grow in the Tigase. When the CPU power is back Tigase can process waiting packets. The more memory Tigase has the more packets can be queued and it better handles CPU deficiencies.
Why there is 5*m2.4xlarge 4 times? This is because I repeated tests many times at different days and time of the day. As you can see depending on the time and date the system could handle different load. I guess this is because Tigase instance shared CPU power with some other services. If they were busy Tigase suffered from CPU under power.
That said I think with installation of up to 10k online users you should be fine. However, other factors like roster size greatly matter as they affect traffic, and load. Also if you have other elements which generate a significant traffic this will put load on your system.
In any case, without some tests it is impossible to tell how really your system behaves or whether it can handle the load.
And the last question regarding component:
Of course Tigase does support XEP-0114 and XEP-0225 for connecting external components. So this should not be a problem with components written in different languages. On the other hand I recommend using Tigase's API for writing component. They can be deployed either as internal Tigase components or as external components and this is transparent for the developer, you do not have to worry about this at development time. This is part of the API and framework.
Also, you can use all the goods from Tigase framework, scripting capabilities, monitoring, statistics, much easier development as you can easily deploy your code as internal component for tests.
You really do not have to worry about any XMPP specific stuff, you just fill body of processPacket(...) method and that's it.
There should be enough online documentation for all of this on the Tigase website.
Also, I would suggest reading about Python support for multi-threading and how it behaves under a very high load. It used to be not so great.

is there any easy-to-use cluster building software?

Assume there are several computers, distributed in the same network.
I install a program on all of them, and so there is a cluster.
and I can log in it, run my application(like web server , db server, and so on).
I don't need to configure the IPs, don't need to balance the loading.
Is there any software like this now?
edit:
OK, I want to build a cluster that can provide an enterprise web server(also db server store data), we have lots of PC, they are only running a small program now(for shop floor work-flow control). I want to use the additional CPU and Disk resource to build a service.
What purpose are you planning to serve with your cluster? That will determine the tool you want to use.
That being said, you will have to do some configuration- like IPs, Authentication Mechanism, et cetra. If you don not tell it what you want, how will it know?
In general, if the application is not designed to be clustered, you will have more pain than advantages.
Is current load too high for current single box hardware?

Is it possible to rent CPU cycles?

I have an application that takes days to process data. Is there a service that would let me run my application on powerful computers?
I'm not running a website or a web service. This is taking lots and lots of data files, running them through a big custom application, and outputting a result.
It takes days on my PC and it's something that needs to be done every once in a while, but not continuously.
Cost isn't really an issue, in the sense that my company will pay for it, but of course it should be cheaper than buying a big-ass machine ourselves.
Have you considered Amazon EC2? You pay by the hour for what you use. No more, no less. You could event rent many servers at once to split the work load.
I'm not sure if that meets your requirement of "powerful computers", because they're just average servers, but at least it will give you a pay-as-you-go solution for running the program off of your own computer.
Amazon's EC2 Service is an excellent solution for your needs. You only pay for the time you use, and you can scale up to as many machines as you need.
From their information:
Elastic – Amazon EC2 enables you to increase or decrease capacity within minutes, not hours or days. You can commission one, hundreds or even thousands of server instances simultaneously. Of course, because this is all controlled with web service APIs, your application can automatically scale itself up and down depending on its needs.
Flexible – You have the choice of multiple instance types, operating systems, and software packages. Amazon EC2 allows you to select a configuration of memory, CPU, and instance storage that is optimal for your choice of operating system and application. For example, your choice of operating systems includes numerous Linux distributions, Microsoft Windows Server and OpenSolaris.
If your application is not parallel, you won't get many advantages by running it in a "big machine", unless the bottleneck is in the virtual memory swapping. Even the Top500 supercomputers are not essentially faster than any PC for sequential workloads.
If your application can exploit parallelism maybe you could use your company's existent resources more efficiently than just deploying it in one and only pc. If you have a few dozens of computers, you could set up a loosely coupled heterogeneous cluster (or local grid, terminology changes with fashion).
I recommend CPUsage.
It is a "startup" in grid computing.
It's speciality is that any individual can join to the grid with spare cpu cycles. That makes the grid management cheap, thus the grid usage prices are also very cheap.
They have an API which if you integrate into your program, it will be able to run on the system.

Resources