Apache Aurora GPU Resources - mesos

I am checking out Apache Aurora with the scope of running scientific workflows (assuming a set of python scripts in a particular sequence). I've successfully managed to run a few of these aurora Jobs, and it looks great for my particular use-case.
I was wondering if there is a way to specify that a particular task (or job, in general) requires a number of GPU resources from my Apache Mesos cluster Of course Mesos needs to be aware of the GPU resources first, and it seems this is possible by defining these GPU resources as indicated here.
So the question is whether there is a way to communicate with Mesos via Aurora to accept offers with GPU resources available. As far as I can tell, the Resource object in Aurora is limited to CPU/Ram/Disk resources. Any hints are greatly appreciated.
Thanks!

I'm not familiar with Apache Aurora, but Mesosphere Marathon (a framework similar to Aurora in functionality) is limited to cpu, mem, and disk resources as well.
If you would like to use custom resources, you would probably need to write your own framework. Depending on your needs it may not be that difficult. For inspiration, check the RENDLER framework.
As mentioned in the thread you are referencing to, Mesos do not provide isolation for GPU (actually, for any custom) resources. Keep this is in mind when doing resource math.

When checking the Aurora tutorial I assume you can just specify this ressource as part of you job description:
resources = Resources(cpu = 2, ram = 4*GB, disk = 8*GB, gpu = 1),
Just keep in mind that this is in artificial resource for Mesos, so Mesos will not take care of resource isolation in this case. For example if you have several GPUs on one system, your code would have to manage the isolation/scheduling between the different GPUs.

Related

Why does Mesos offer resources?

What is the significance of the decision in Mesos for frameworks to be offered resources by Mesos? This seems to be mentioned a lot, but ultimately all of the logic is in the Mesos allocation module, so whether it's Mesos making and revoking offers, or frameworks asking for resources, is this just a semantic difference?
Interesting question:
The original Mesos paper states the following rationale:
The master implements fine-grained sharing across
frameworks using resource offers. Each resource offer
is a list of free resources on multiple slaves. The master
decides how many resources to offer to each framework
according to an organizational policy
Frameworks requesting would have the following consequences:
Frameworks would have to be aware of resources in the cluster (e.g., does the cluster have GPUs)
The logic of choosing the request (by a framework) which should be granted given fairness and existing free cluster resources seems more complex and less scalable than the current allocation mechanisms (Not having hard evidence here, but just a feeling after having touched the Mesos allocator code)
Maybe most interestingly the Mesos Scheduler interface includes a requestResources(const std::vector& requests) call. The default Mesos DRF allocator does not implement this call, but nothing prevents you from implementing an allocator which does so.
If you are interested in more details about cluster scheduler I can recommend this blog post or the Omega paper.
Update:
This MesosCon talk discusses some future extensions to more optimistic offers: http://schd.ws/hosted_files/mesosconna2016/51/MesosCon_2016_OptimisticOffer.pdf

Bluemix Spark and Hadoop Service Configuration

Having run through configuration of both the Hadoop Big Insights and Apache Spark services on Bluemix, I noticed that Hadoop is very configurable.I have a choice of how many nodes there will be in the cluster and the RAM and CPU cores of those nodes as well as hard disk space
But the Spark service seems less configurable. The only choice I have is to choose between 2 and 30 Spark executors.
I am working with Bluemix as part of an IBM IC4 project to evaluate these services, so I have a few questions about this.
Is it possible to configure the Spark service in a similar way to the Hadoop service? i.e. choose nodes, RAM of nodes, CPU cores etc.
What are Spark executors in this context? Are they nodes? If so, what are their specifications?
Is there a plan to improve the options for Spark's configuration in the future?
Apologies for the questions but I need to know these specifications in order to carry out my work.
The Big Insights service is what some would call a hosted service. Which is to say, when you provision on instance of this service you get your own cluster with nodes configured as specified in the chosen plan. Consequently, you'll want to know exactly what each node you're paying for gives you. On the other hand, the Apache Spark service is a shared compute service, wherein you pay for compute to run your spark programs. Running spark is about in-memory compute, and creating RDDs over sources of data hosted by other data services. So in this context, what matters is how many concurrent jobs can I run and how many parallel tasks can I run with how much memory, and so on. In the Spark service plan, these executors seem to be an abstraction on this compute horsepower; unfortunately, hard for you to map that to physical hardware if you care about that. The plan description needs more elaboration and details about how one translates this abstraction to how you map to your workload needs.
However, I understand that this should be improved considerably at some point in the near future. There have been rumors about moving to only a single spark service plan where you can dial in, whenever you want, how much compute you need and that would take effect when you click "go", for all spark jobs from that point forward; it seems like you can twiddle the dials until you get what you want, see what that would cost, then lock it in until next time you need to change it. I can image something even more dynamic than that on a per-job basis. But anyway, seems like the direction things may be going for this compute service.

What are the minimum machine specifications necessary for Admin and Container processes?

The reference material simply states that JDK7 is required for Spring XD.
What are the minimum requirements (RAM, CPU, Disk) for hosts meant to run Spring XD Admin?
What are the minimum requirements (RAM, CPU, Disk) for hosts meant to run Spring XD Containers?
The answer in both cases is it depends what you need to use them for. It seems like Spring XD is designed for high throughput computing(HTC), so unlike traditional high performance computing the addition of GPUs or coprocessors in this case would probably not be particularly beneficial. If you just want to try it out and happen to have several servers laying around it seems like as long as you have something that is powerful enough to run an OS that supports Java you could probably at least make it work. If you are in the initial stages of testing Spring XD to see if it will integrate with your existing infrastructure this would allow you to at least try it out. If you have passed that stage of testing and are confident that Spring XD will work and would like to purchase hardware to optimize its performance feel free to continue reading.
I have not used Spring XD before, but based on the documentation I have been reading and some experiences with HTC there are a few considerations for setting up systems to run it. if you take a look at the diagram from the docs and read a little bit about the services it seems like the Admin, Zookeeper, Analytics Repo and Batch Job DB could be hosted on virtual machines(VMs) under the hypervisor of your choice.
Using a setup with several of the subsystems required to use the distributed model running on VMs would give you the ability to scale resources as necessary, e.g. to begin a single hypervisor system may be sufficient to run everything but as traffic/use grows it may be desirable to separate the VMs onto multiple hypervisors and give some of the VMs additional resources.
With the containers it seems like many other virtualization or containerization schemes for HTC, where more powerful systems e.g. lots of RAM, SSD storage, allow users to run more containers on a single physical box.
To adequately assess the needs for a new system running any application it is important to understand what the limiting factor on the problem is; is it memory bound, IO bound or CPU bound? For large scale parallel applications there are a variety of tools for profiling code and determining where bottlenecks occur. TAU is a common profiling utility in HPC and there are several proprietary offerings available as well.
Once the limitations of the program are clear specing out a system with hardware to reduce/minimize the issue is a lot easier, and normally less expensive. Hopefully this information is helpful.
Additions based on comments:
It seems like it would run with 128k of memory if you have an OS that will boot and run java and any other requirements. If there is backend storage setups somewhere, like a standalone DB server which can be used for the databases as described in the DB Config section of the guide it seems like only a small amount of storage would be necessary.
Depending on how you deploy the images for the Admin OS that may not even be necessary as you could use KIWI to create and deploy a custom OS image of your choosing with configuration files and other customizations embedded in the image. This image could be loaded via the network over PXE or to one of the other output formats KIWI supports like VMs, bootable USB and more.
The exact configuration of the systems running Spring XD will depend on the end goals, available infrastructure and a number of other things. It seems like the Spring XD Admin node could be run on most infrastructure servers. Factors such as reliability, stability and desired performance must also be considered when choosing hardware.
Q: Will Spring XD Admin run on a system with RaspberryPi like specs?
A: based on documentation, yes
Q: Will it run with good performance or reliably on such a system?
A: Probably not if being used for extended periods of time or for large amounts of traffic.

Mesos real world use-cases

I'm trying to figure out what would be the reasons for using Mesos. Can you come up with other ones?
Running all of your services in the same cluster instead of dedicated clusters (your end-applications + DevOps such as Jenkins)
Running different maturity applications in same cluster (dev, test, production), or is this viable? Kubernetes has a similar approach with Labels
Mesos simplifies the use of traditional distributed applications such as Hadoop by easing deployment, unified API, bin-packing of resources
Full-disclosure: I currently work at Twitter and I'm involved in both Apache Mesos and Aurora.
Mesos uses cases can vary based upon a few dimensions: scale (10 servers vs 10s of thousands), available hardware (dedicated/static or in the public cloud/scalable), and workloads (primarily services, batch, or both).
Your list is a great start. Here are a few additional use cases / features to add.
Container Orchestration
As container runtimes like Docker have become popular, lots of potential users are looking at Mesos + a scheduler to manage orchestration once container images are created. Mesos is already quite mature and has been proven at scale, which I think has given it a leg up over some emergent solutions.
Increased Resource Utilization
For companies running >50 servers, a common motivation for adopting Mesos is to increase resource utilization to reduce CapEx. There are a number of examples of this in both the public and private cloud. In the case of Ebay they have been running Jenkins on Mesos and were able to reduce their VM footprint. Mesosphere has also published a case study of HubSpot (runnning on AWS), and how they've been able to replace hundreds of smaller servers with dozens of larger ones by more-efficiently using their available hardware.
Preemption
At Twitter we're running Mesos via one scheduler: Apache Aurora. One of the ways we can improve utilization relates to your use case: running different maturity applications in the same cluster. Aurora has a concept of environments, so you can run applications that are production, development, or test. Additionally, Aurora has a built-in preemption feature which allows it to prioritize production over non-production tasks, killing non-production tasks when those resources are needed to run production ones as well as a priority system within each environment.
Long-term, functionality related to preemption will also be located in the Mesos core itself -- it's a killer feature related to both increased resource utilization and running different maturity applications (dev, test, prod). There are a few Mesos tickets to follow if you're interested in keeping up to date, including MESOS-155 for preemption, and MESOS-1474 for inverse offers.
Colocating Batch and Services
Running batch and services in a shared Mesos cluster will be key to driving up utilization even further as js84 points out. Check out Project Myriad, an effort to colocate Mesos and YARN workloads in the same cluster. At this time I'm not aware of any large deployments running both batch and services, but it's certainly the direction the community is moving in as it becomes easier for multiple frameworks to run in a shared cluster.
At least one additional use case comes to mind: Development SDK for developing distributed applications. If you have a look at Mesos Frameworks you will find a number of frameworks which have been developed on top of Mesos. Also interesting Apple's Siri framework powering Siri.
Regarding your 1): One additional angle you should keep in mind here is scaling your applications in the same cluster. I.e. at peak load of your website, shift resources easily towards the webservers while scaling down the Hadoop analytical processing.

Docker-Swarm, Kubernetes, Mesos & Core-OS Fleet

I am relatively new to all these, but I'm having troubles getting a clear picture among the listed technologies.
Though, all of these try to solve different problems, but do have things in common too. I would like to understand what are the things that are common and what is different. It is likely that the combination of few would be great fit, if so what are they?
I am listing a few of them along with questions, but it would be great if someone lists all of them in detail and answers the questions.
Kubernetes vs Mesos:
This link
What's the difference between Apache's Mesos and Google's Kubernetes
provides a good insight into the differences, but I'm unable to understand as to why Kubernetes should run on top of Mesos. Is it more to do with coming together of two opensource solutions?
Kubernetes vs Core-OS Fleet:
If I use kubernetes, is fleet required?
How does Docker-Swarm fit into all the above?
Disclosure: I'm a lead engineer on Kubernetes
I think that Mesos and Kubernetes are largely aimed at solving similar problems of running clustered applications, they have different histories and different approaches to solving the problem.
Mesos focuses its energy on very generic scheduling, and plugging in multiple different schedulers. This means that it enables systems like Hadoop and Marathon to co-exist in the same scheduling environment. Mesos is less focused on running containers. Mesos existed prior to widespread interest in containers and has been re-factored in parts to support containers.
In contrast, Kubernetes was designed from the ground up to be an environment for building distributed applications from containers. It includes primitives for replication and service discovery as core primitives, where-as such things are added via frameworks in Mesos. The primary goal of Kubernetes is a system for building, running and managing distributed systems.
Fleet is a lower-level task distributor. It is useful for bootstrapping a cluster system, for example CoreOS uses it to distribute the kubernetes agents and binaries out to the machines in a cluster in order to turn-up a kubernetes cluster. It is not really intended to solve the same distributed application development problems, think of it more like systemd/init.d/upstart for your cluster. It's not required if you run kubernetes, you can use other tools (e.g. Salt, Puppet, Ansible, Chef, ...) to accomplish the same binary distribution.
Swarm is an effort by Docker to extend the existing Docker API to make a cluster of machines look like a single Docker API. Fundamentally, our experience at Google and elsewhere indicates that the node API is insufficient for a cluster API. You can see a bunch of discussion on this here: https://github.com/docker/docker/pull/8859 and here: https://github.com/docker/docker/issues/8781
Join us on IRC # #google-containers if you want to talk more.
I think the simplest answer is that there is no simple answer. The swift rise to power of containers, and Docker in particular has left a power vacuum for "container scheduling and orchestration", whatever that might mean. In reality, that means you have a number of technologies that can work in harmony on some levels, but with certain aspects in competition. For example, Kubernetes can be used as a one stop shop for deploying and managing containers on a compute cluster (as Google originally designed it), but could also sit atop Fleet, making use of the resilience tier that Fleet provides on CoreOS.
As this Google vid states Kubernetes is not a complete out the box container scaling solution, but is a good statement to start from. In the same way, you would at some stage expect Apache Mesos to be able to work with Kubernetes, but not with Marathon, in as much as Marathon appears to fulfil the same role as Kubernetes. Somewhere I think I've read these could become part of the same effort, but I could be wrong about that - it's really about the strategic direction of Mesosphere and the corresponding adoption of Kubernetes principles.
In the DockerCon keynote, Solomon Hykes suggested Swarm would be a tier that could provide a common interface onto the many orchestration and scheduling frameworks. From what I can see, Swarm is designed to provide a smooth Docker deployment workflow, working with some existing container workflow frameworks such as Deis, but flexible enough to yield to "heavyweight" deployment and resource management such as Mesos.
Hope this helps - this could be an enormous post. I think the key is that these are young, evolving services that will likely merge and become interoperable, but we need to ride out the next 12 months to see how it plays out. There's some very clever people on the problem, so the future looks very bright.
As far as I understand it:
Mesos, Kubernetes and Fleet are all trying to solve a very similar problem. The idea is that you abstract away all your hardware from developers and the 'cluster management tool' sorts it all out for you. Then all you need to do is give a container to the cluster, give it some info (keep it running permanently, scale up if X happens etc) and the cluster manager will make it happen.
With Mesos, it does all the cluster management for you, but it doesn't include the scheduler. The scheduler is the bit that says, ok this process needs 2 procs and 512MB RAM, and I have a machine over there with that free, so I'll run it on that machine. There are some plugin schedulers available for Mesos: Marathon and Chronos and you can write your own. This gives you a lot of power of resource distribution and cluster scaling etc.
Fleet and Kubernetes seem to abstract away those sorts of details (so you don't have to write your own scheduler basically). This means you have to define your tasks and submit them in the format/manner defined by Fleet or Kubernetes and then they take over and schedule the tasks (containers) for you.
So I guess: Using Mesos may mean a bit more work in writing your own scheduler, but potentially provides more flexibility if required.
I think the idea of running Kubernetes on top of Mesos is that Kubernetes acts as the scheduler for Mesos. Personally I'm not sure what benefits this brings over running one or the other on its own though (hopefully someone will jump in and explain!)
As MikeB said.. it's early days, and it's all up for grabs (keep an eye on Amazon's ECS as well) so there are many competing standards and a lot of overlap!
-edit- I didn't mention Docker swarm as I don't really have much experience with it.
For anyone coming to this after 2017 fleet is deprecated. Do not use it anymore.
Fleet docs say "fleet is no longer actively developed or maintained by CoreOS" and link to Container orchestration: Moving from fleet to Kubernetes. Fleet was removed from Container Linux (formerly known as CoreOS Linux) and replaced with Kubernetes kubelet (agent). This coincided with a corporate pivot to offer Tectonic (a Kubernetes distro) as their primary product.

Resources