Marathon vs Aurora and their purposes - mesos

Both Marathon and Aurora are built on Mesos and supposedly are engineered for running long running services. My questions are:
What are their differences? I have struggled in finding any good explanations regarding their key differences
Do these frameworks run anything that runs on Linux? For Marathon they state that it can run anything that "is executable in a shell" but this is sort of vague :)
Thanks!

Disclaimer: I am the VP of Apache Aurora, and have been the tech lead of the Aurora team at Twitter for ~5 years. My likely-biased opinions are my own and do not necessarily represent those of Twitter or the ASF.
Do these frameworks run anything that runs on Linux? For Marathon they
state that it can run anything that "is executable in a shell" but
this is sort of vague :)
Essentially, yes. Ultimately these systems are sophisticated machinery to execute shell code somewhere in a cluster :-)
What are their differences? I have struggled in finding any good
explanations regarding their key differences
Aurora and Marathon do indeed offer similar feature sets, both being classified as "service schedulers". In other words, you hand us instructions for how to run your application servers, and we do our best to keep them up.
I'll offer some differences in broad strokes. When it comes to shortcomings mentioned in each, I think it's safe to say that the communities are aware and intend to fix them.
Ease of use
Aurora is not easy to install. It will likely feel like you are trailblazing while setting it up. It exposes a thrift API, which means you'll need a thrift client to interact with it programmatically (a REST-like API is coming, but is vaporware at the moment), or use our command line client. Aurora has a DSL for configuration which can be daunting, but allows you to easily share templates and common patterns as you use the system more.
Marathon, on the other hand, helps you to run 'Hello World' as quickly as possible. It has great docs to do this in many environments and there's little overhead to get going. It has a REST API, making it easier to adapt to custom tools. It uses JSON for configuration, which is easy to start with but more prone to cargo culting.
Targeted use cases
Aurora has always been designed to handle a large engineering organization. The clusters at Twitter have tens of thousands of machines and hundreds of engineers using them. It is critical to Twitter's business. As a result, we take our requirements of scale, stability, and security very seriously. We make sure to only condone features that we believe are trustworthy at scale in production (for example, we have our Docker support labeled as beta because of known issues with Docker itself and the Mesos-Docker integration). We also have features like preemption that make our clusters suitable for mixing business-critical services with prototypes and experiments.
I can't make any claim for or against Marathon's scalability. On the feature front, Marathon has build out features quickly, but this can feel bleeding edge in practice (Docker support is a good example). This is not always due to Marathon itself, but also layers down the stack. Marathon does not provide preemption.
Ownership
To some, ownership and governance of a project is important. It feel that in practice it does not define the openness of a project, but for some people/companies the legal fine print can be a deal-breaker.
Marathon is owned by a company (Mesosphere)
To some, this is beneficial, to others is is not. It means that you can pay for support and features. It also means that there is something to be sold, and the project direction is ultimately decided by Mesosphere's interests.
Aurora is owned by the Apache Software Foundation
This means it is subject to the governance model of the ASF, driven by the community. Aurora does not have paying customers, and there is not currently a software shop that you can pay for development.
tl;dr If you are just getting your feet wet with running services on Mesos, I would suggest Marathon as your first port of call. It will be easier for you to get running and poke around the ecosystem. If you are forming the 'private cloud strategy' for a company, I suggest seriously considering Aurora, as it is proven and specifically designed for that.

So I've been evaluating both and this is my summary.
Aurora
[+] also handles recurring jobs
[+] finer grained, extensive file-based configuration
[+] has namespaces so multiple environments can co-exist
[-] read-only UI, no official API
[~] file based configuration and cli based execution brings overhead (which can be justified with more extensive feature set)
Marathon
[+] very easy to setup and use
[+] UI that provides control and extensive API (even with features missing from UI at the moment)
[+] event bus to listen in on api calls
[-] handles only long-running jobs
[-] does not have separate deployment-run-cleanup steps, these if necessary need to be combined in a script of one-liner
Even though Aurora has better capabilities, I prefer Marathon due to Auroras complexity/overhead and lack of UI (for control) & API

I have more experience with Marathon.
Ideological:
Marathon is a relatively tested product that is used in production at AirBnB. Aurora is an early Apache project (so YMMV).
Both are open source and active. Feel free to contribute pull requests or file issues!
Technical:
Marathon doesn't schedule batch tasks or cron jobs
Marathon has a friendly UI and better health indicators (in 0.8.x)
In regards to your second question, you can run any command or docker container, and Mesos will do the resource isolation for you. If you have 50% CentOS nodes and 50% Ubuntu nodes and you run a task that executes apt-get, the task will have a 50% chance of failure. Mesos and Marathon have no awareness of the actual machines.

Disclaimer: I don't have hands-on experience with Aurora, only with Marathon.
ad Q1: In a nutshell Apache Aurora is capable of doing what Marathon + Chronos can provide, that is, schedule both long-running services and recurring (batch) jobs; see also Aurora user guide.
ad Q2: Yes, anything. Currently based on cgroups and Docker but hey, you can roll your own.

Related

Stateful workflow engine vs Orchestrated idempotent services

I realize the benefits of workflow engine such as easy to understand communication, easy waiting, parallelism and compensative actions with informative graphical model. The concept is great and more manageable than dogmatic event driven architecture without central coordinator and specified flow.
We are currently using legacy workflow engine to orchestrate microservices in insurance business. Over the time chunks of business logic and little helper scripts has creeped into process model, which is not developer friendly solution to maintain and test with continuous integration standards. Also the lack of available expertise and future support is a huge risk from the project management perspective.
I played around with Camunda and Activiti, but immediately faced compability issues with Spring Boot 3 and a lack of up to date examples and general knowledge outside of relatively small user community. This gives me a bad feeling of drowning into the same swamp as we are now in the future.
We planned design our own Java based orchestrator, which just invokes specified microservices in a specified order when the process is started or user task is completed. The orchestrator will also handle monitoring and versioning of the process flow. It's up to microservices to validate their business context and halt the process by raising user tasks if necessary. When user task is completed, the orchestrator restarts the whole process from the beginning with all tasks cleared. It is the responsibility of microservices to no-op when their work is already done in the previous run. Eventually, the process will reach it's end and finish. This solution would be a good balance of modern DX and coordinated process management.
Is there examples or name for such an idempotent orchestrated architecture?
You only get into the challenge of aligning dependencies between your services and the process engine (and other components) if you tightly couple the orchestration / engine with the services. Happened to me many times in the past, too. If you separate the engine (called remote process engine with Camunda 7, only architecture with Camunda 8), then you are not influenced by its dependencies. Try for instance the Camunda RUN distribution and the external task pattern or C8 SaaS to get to a cleaner, decoupled architecture. See Bernd Ruecker's reasoning here.
Details will depend on your specific requirements, but I would definitely advise anyone against building a homegrown solution. There are enough options in the market and these times are over. Requirements grow over time. There are security vulnerabilities to be aware of and to fix, etc. High maintenance, no market for resources, no synergies, you would need to maintain proprietary knowledge in the company and cannot achieve the same level of quality and feature richness as a more broadly used solution can. For a list of options see for instance Bernd Ruecker's articles. Among the available options I would personally prefer an orchestrator, which uses a graphical process modelling approach based on the BPMN 2 standand. It helps clarity, knowledge transfer, and Business-IT alignment and the standard is a vendor-independent skill set.
There is no need to build your own. Use temporal.io open source project. Besides Java SDK it supports Go, Typescript/Javascript, Python, PHP.
The project started at Uber in 2016. There are hundreds of companies using it for mission critical applications.

DC/OS vs just plain Mesos+Marathon

we are looking to build a cluster of Compute Nodes for Deep Learning model training jobs, some of them on the cloud and others locally, that have NVIDIA GPUs in them. We felt that using Mesos and the framework Marathon (M&M) would be our best options to schedule the cluster. However the documentations for (M&M) seem to be very ambiguous (or at least to me, sorry I'm an intern) and I'm running into a lot of issues concerning Zookeeper and the connections between the nodes.
Plus, it seems like Mesosphere are giving much more importance to DC/OS when it comes to tutorials and docs, and I guess it will also be patched more regularly and its interfaces (GUI and CLI) look much more user-friendly.
So I was wondering if by dropping the exploration of (M&M) and moving to DC/OS, would we lose a lot of control over the cluster? In M&M do we have perks that cannot be given in the Open Source Edition of DC/OS? like monitoring the machines, logging results etc.. If I ask my manager we might also get the Enterprise edition so that's not really a problem, but does DC/OS apply an abstraction layer that isn't really preferable to advanced users?
DC/OS is build around Apache Mesos and Marathon and gives a good default setup for zookeeper, networking, .... So IMO it is a good place to start as you can still use all M&M and Mesos features + the DC/OS features and ease of setup.
Disclaimer: I am working for Mesosphere.

Looking for help on how to manage microservices in Golang

Currently, I deal with microservices on a daily basis at my 9-5. Most everything that I touch is written in PHP, and as only a software engineer, SysOps manages everything that has to do with apps running, etc. I have a little familiarity in how the infrastructure and build pipeline is setup, but I still am not a SysOps or DevOps guy.
With that said, I love Golang and for a side project, I am creating a fairly large web application with a lot of moving parts. Writing and designing the code is easy as I have learned a lot from my day job, but deploying and managing Golang web apps (as they are executables) is quite different than updating files for apache to serve.
I have researched a lot on how I would build and deploy my microservice apps, but I keep on thinking of more problems that will need to be solved along the way. I have tinkered with the idea of using Docker for all of this, but I would rather not have the added complexity of learning that and managing storage for all of the images as that could be large.
Is there a best practice or a good way to manage Golang applications after they have been deployed? I would need a way to keep track of all the microservice processes to be able to see if they are still up and to be able to stop them when a new build is going to be deployed.
As for the setup, just assume that all the microservices will be run on the server, not in a container or in a VM. They will all need to be managed, but also able to act upon independently. Jenkins will be used for building and deploying. I will be using Consul for service discovery and possibly configuration, and most likely health checks on the services. I'm thinking of having each microservice register itself to consul when started and deregister when stopping.
Again, I am looking for a solution that is hopefully not just "Docker". I also had thoughts into creating a deploy service that manages the services (add and remove), as well as registering them in Consul. So if I cannot find a better solution, I might go that path. Any help is appreciated.
** Sorry if my question was confusing, but since a couple people answered on the wrong topic at hand, I will try to clarify. I don't need any help making the microservices, or even know anything about them. I brought that point up as to why I need to ask my question. Basically what I need is just the ability to manage the running go processes of all my microservices so I can do deployments and be able to stop and start processes to update the code. It is easier when you have to worry about one app, but when you can have up to 10-15 difference microservices they become harder to keep track of. After my own research, it seems that Supervisord is what I am looking for, but I'm not sure. That is the direction I am going in with this question. Thanks.
Golang is great to use for microservices, but I would say there is not so big difference of managing golang or other languages microservices.
What I would say is golang specific:
you don't need to install anything on servers since golang is compiled to single library
you can take advantage of std lib golang rpc package and gob binary decoding, instead of usin 3rd party solution (gorpc, protocol buffer etc)
Other than that you need to use your own judgement. There is plenty of ways of doing one things in microservices world; one day you will implement solution A but when after 3 month you will see that its better to do B, do that.
In internet, there is so much reading about microservices. I will recommend you 2 good resurces: https://books.google.co.uk/books/about/Building_Microservices.html?id=RDl4BgAAQBAJ&source=kp_cover&redir_esc=y&hl=en
And article: http://highscalability.com/blog/2014/4/8/microservices-not-a-free-lunch.html
Remember, microservices are not a golden bullet, they often can help making application easier to maintain and grow, but from the other side require lot of additional work, consequence in specifying API contracts and strong devops culture.

Clustering Microservice Components

We have a set of Microservices collaborating with each other in the eco system. We used to have occasional problems where one or more of these Microservices would go down accidentally. Thankfully, we have some monitoring built around which would realize this and take corrective action.
Now, we would like to have redundancy built around each of those Microservices. I'm thinking more like a master / slave approach where a slave is always on stand by and when the master goes off, the slave picks it up.
Should we consider using any framework that we could use as service registry, where we register each of those Microservices and allow them to be controlled? Any other suggestions on how to achieve the kind of master / slave architecture with the Microservices that would enable us to have failover redundancy?
I thought about this for a couple of minutes and this is what I currently think is the best method, based on experience.
There are a couple of problems you will face with availability. First is always having at least one endpoint up. This is easy enough to do by installing on multiple servers. In the enterprise space, you would use a name for the endpoint and then have it resolve to multiple servers (virtual or hardware). You would also load balance it.
The second is registry. This is a very easy problem with API management software. The really good software in this space is not cheap, so this is not a weekend hobbyist type of software. But there are open source API Management solutions out there. As I work in the Enterprise space, I am very familiar with options like Apigee, CA, Mashery, etc. so I cannot recommend an open source option and feel good about myself.
You could build your own registry, if you desire. Just be careful how you design it, as a "registry of all interface points" leads to a service that becomes more tightly coupled.

Docker-Swarm, Kubernetes, Mesos & Core-OS Fleet

I am relatively new to all these, but I'm having troubles getting a clear picture among the listed technologies.
Though, all of these try to solve different problems, but do have things in common too. I would like to understand what are the things that are common and what is different. It is likely that the combination of few would be great fit, if so what are they?
I am listing a few of them along with questions, but it would be great if someone lists all of them in detail and answers the questions.
Kubernetes vs Mesos:
This link
What's the difference between Apache's Mesos and Google's Kubernetes
provides a good insight into the differences, but I'm unable to understand as to why Kubernetes should run on top of Mesos. Is it more to do with coming together of two opensource solutions?
Kubernetes vs Core-OS Fleet:
If I use kubernetes, is fleet required?
How does Docker-Swarm fit into all the above?
Disclosure: I'm a lead engineer on Kubernetes
I think that Mesos and Kubernetes are largely aimed at solving similar problems of running clustered applications, they have different histories and different approaches to solving the problem.
Mesos focuses its energy on very generic scheduling, and plugging in multiple different schedulers. This means that it enables systems like Hadoop and Marathon to co-exist in the same scheduling environment. Mesos is less focused on running containers. Mesos existed prior to widespread interest in containers and has been re-factored in parts to support containers.
In contrast, Kubernetes was designed from the ground up to be an environment for building distributed applications from containers. It includes primitives for replication and service discovery as core primitives, where-as such things are added via frameworks in Mesos. The primary goal of Kubernetes is a system for building, running and managing distributed systems.
Fleet is a lower-level task distributor. It is useful for bootstrapping a cluster system, for example CoreOS uses it to distribute the kubernetes agents and binaries out to the machines in a cluster in order to turn-up a kubernetes cluster. It is not really intended to solve the same distributed application development problems, think of it more like systemd/init.d/upstart for your cluster. It's not required if you run kubernetes, you can use other tools (e.g. Salt, Puppet, Ansible, Chef, ...) to accomplish the same binary distribution.
Swarm is an effort by Docker to extend the existing Docker API to make a cluster of machines look like a single Docker API. Fundamentally, our experience at Google and elsewhere indicates that the node API is insufficient for a cluster API. You can see a bunch of discussion on this here: https://github.com/docker/docker/pull/8859 and here: https://github.com/docker/docker/issues/8781
Join us on IRC # #google-containers if you want to talk more.
I think the simplest answer is that there is no simple answer. The swift rise to power of containers, and Docker in particular has left a power vacuum for "container scheduling and orchestration", whatever that might mean. In reality, that means you have a number of technologies that can work in harmony on some levels, but with certain aspects in competition. For example, Kubernetes can be used as a one stop shop for deploying and managing containers on a compute cluster (as Google originally designed it), but could also sit atop Fleet, making use of the resilience tier that Fleet provides on CoreOS.
As this Google vid states Kubernetes is not a complete out the box container scaling solution, but is a good statement to start from. In the same way, you would at some stage expect Apache Mesos to be able to work with Kubernetes, but not with Marathon, in as much as Marathon appears to fulfil the same role as Kubernetes. Somewhere I think I've read these could become part of the same effort, but I could be wrong about that - it's really about the strategic direction of Mesosphere and the corresponding adoption of Kubernetes principles.
In the DockerCon keynote, Solomon Hykes suggested Swarm would be a tier that could provide a common interface onto the many orchestration and scheduling frameworks. From what I can see, Swarm is designed to provide a smooth Docker deployment workflow, working with some existing container workflow frameworks such as Deis, but flexible enough to yield to "heavyweight" deployment and resource management such as Mesos.
Hope this helps - this could be an enormous post. I think the key is that these are young, evolving services that will likely merge and become interoperable, but we need to ride out the next 12 months to see how it plays out. There's some very clever people on the problem, so the future looks very bright.
As far as I understand it:
Mesos, Kubernetes and Fleet are all trying to solve a very similar problem. The idea is that you abstract away all your hardware from developers and the 'cluster management tool' sorts it all out for you. Then all you need to do is give a container to the cluster, give it some info (keep it running permanently, scale up if X happens etc) and the cluster manager will make it happen.
With Mesos, it does all the cluster management for you, but it doesn't include the scheduler. The scheduler is the bit that says, ok this process needs 2 procs and 512MB RAM, and I have a machine over there with that free, so I'll run it on that machine. There are some plugin schedulers available for Mesos: Marathon and Chronos and you can write your own. This gives you a lot of power of resource distribution and cluster scaling etc.
Fleet and Kubernetes seem to abstract away those sorts of details (so you don't have to write your own scheduler basically). This means you have to define your tasks and submit them in the format/manner defined by Fleet or Kubernetes and then they take over and schedule the tasks (containers) for you.
So I guess: Using Mesos may mean a bit more work in writing your own scheduler, but potentially provides more flexibility if required.
I think the idea of running Kubernetes on top of Mesos is that Kubernetes acts as the scheduler for Mesos. Personally I'm not sure what benefits this brings over running one or the other on its own though (hopefully someone will jump in and explain!)
As MikeB said.. it's early days, and it's all up for grabs (keep an eye on Amazon's ECS as well) so there are many competing standards and a lot of overlap!
-edit- I didn't mention Docker swarm as I don't really have much experience with it.
For anyone coming to this after 2017 fleet is deprecated. Do not use it anymore.
Fleet docs say "fleet is no longer actively developed or maintained by CoreOS" and link to Container orchestration: Moving from fleet to Kubernetes. Fleet was removed from Container Linux (formerly known as CoreOS Linux) and replaced with Kubernetes kubelet (agent). This coincided with a corporate pivot to offer Tectonic (a Kubernetes distro) as their primary product.

Resources