Mesophere cluster with nodes of different operating systems - mesos

I want to setup a Mesophere cluster (mesos, dc/os, marathon) for running different jobs. The nodes which these jobs run on depend on the nature of the job. For e.g a job with C# code will run on a windows node. A job with pure C++ will run on Ubuntu or freebsd and so on. Each of these can again be a cluster. ie I want to have, lets say, 2 windows nodes and 4 ubuntu nodes. So I would like to know:
Can this be achieved in a single deployment ? Or do i need to setup different clusters for reach environment i want , one for windows, one for Ubuntu etc.
Regradless of a single hybrid or multiple environments, does mesos provide granularity of what the nodes send back. i.e I dont want to see high level status like job failed or running etc. My jobs write stats to a file on the system and i want to relay this back to the "main UI" or the layer that is managing all this

Can this be achieved in a single deployment?
If you want to use DCOS, currently offical just support centos/redhat, for ubuntu, you need to use at least 1604, which use systemd not old upstart of ubuntu. But afaik, windows is not support in DCOS.
So for your scenario, you had to use mesos directly not to use dcos, then with one cluster, you can set different mesos agent on ubuntu or windows. And you may add role or attribute when agent register to mesos master, so the framework can distinguish when dispatch job to proper agent.
BTW, for windows, you had to use at least 1.3.0 mesos which support windows, and you had to build it on windows using microsoft visual studio by yourself.
Does mesos provide granularity of what the nodes send back?
Yes, but you can not use default command executor, you need to write your own executor.
In your executor, you can set the value which you want to send back:
update = mesos_pb2.TaskStatus()
update.task_id.value = task.task_id.value
update.state = mesos_pb2.TASK_RUNNING
update.data = 'data with a \0 byte'
driver.sendStatusUpdate(update)
In your framework, you can receive it as follows:
def statusUpdate(self, driver, update):
slave_id, executor_id = self.taskData[update.task_id.value]
Here is an example I found from github which may help you about how to send your own data back.

Related

Ability to offer only part of the node resources?

Using dc/os we like to schedule tasks close to the data that the task requires that in our case is stored in hadoop/hdfs (on an HDP cluster). Issue is that the hadoop cluster is not run from within dc/os and so we are looking for a way to offer only a subset of the system resources.
For example: say we like to reserve 8GB of memory to data node services, then we like to provide the remainder to dc/os to schedule tasks.
From what i have read so far, the task can specify the resources it requires, but i have not found any means to specify what you want to offer from the node perspective.
I'm aware that a CDH cluster can be run on dc/os, that would be one way to go, but for now that is not provided for HDP.
Thanks for any idea's/tips,
Paul

Airflow setup for high availability

How to deploy apache airflow (formally known as airbnb's airflow) scheduler in high availability?
I am not asking about the backend DB or RabbitMQ that should obviously be deployed in high availability configuration.
My main focus is the scheduler - is there something special needs to be done?
After a bit digging I found that it is not safe to run multiple schedulers simoultanously, this means that out of the box - airflow schedulers are not safe to use in high availablity environments.
The airflow team are planning to solve this issue by adding a lock mechanism on the DAG data structure, but this is not implemented yet (I checked by running 2 schedulers and saw that they schedule the same dag instances which is not good).
This is described here:
https://groups.google.com/forum/#!topic/airbnb_airflow/-1wKa3OcwME
I did found a way to workaround this high availalbilty issue by wrapping the schedulers with my own code and use cluster tools for leader election (I personanlly use consul for this purpose). This way only the elected master is running the scheduler and when the master is down the slave replaces him.
Please consider this when u use airflow in high availabilty environments since out of the box, airflow scheduler is currently not suitable for this (unless you solve this issue yourself).
Edit - an alternative approach to the master slave solution is to use a cluster manager/scheduler to make sure that only one airflow scheduler instance is always available. This approach relies on the self healing abilities of the cluster manager u have. For example both mesos and nomad supports this kind of configuration (I presonally chose nomad for its simplicity).
My personal experience was to follow the instructions I found for some best practices; that is to restart the scheduler every 10 runs ( -N 10 ) and use this software when possible:
https://github.com/teamclairvoyant/airflow-scheduler-failover-controller
I also use a DAG which pings a monitoring system to be sure that the scheduler has not gone away.
In my scenario, I have 2 schedulers (on 2 separate docker swarms), with the standby cluster scheduler turned off (using docker swarm service scale=0). I needed to make sure the primary scheduler had stopped fully before I started up the standby scheduler. What I found was that having 2 running schedulers (even for a brief time period) resulted in an occasional DAG scheduled to run on both clusters leading to duplicate reports generated from two different cluster zone.

Running spark cluster on standalone mode vs Yarn/Mesos

Currently I am running my spark cluster as standalone mode. I am reading data from flat files or Cassandra(depending upon the job) and writing back the processed data to the Cassandra itself.
I was wondering if I switch to Hadoop and start using a Resource manager like YARN or mesos, does it give me an additional performance advantage like execution time and better resource management?
Currently sometime when I am processing huge chunk of data during shuffling with a possibility of stage failure. If I migrate to a YARN, can Resource manager address this issue?
Spark standalone cluster manager can also give you cluster mode capabilities.
Spark standalone cluster will provide almost all the same features as the other cluster managers if you are only running Spark.
When you submit your application in cluster mode all you job related files would be copied on to one of the machines on the cluster which would then submit the job on your behalf, if you submit the application in client mode the machine from which the job is being submitted would be taking care of driver related activities. This means that the machine from which the job has been submitted cannot go offline, whereas in cluster mode the machine from which the job has been submitted can go offline.
Having a Cassandra cluster would also not change any of these behaviors except it can save you network traffic if you can get the nearest contact point for the spark executor(Just like Data locality).
The failed stages gets rescheduled if you use either of the cluster managers.
I was wondering if I switch to Hadoop and start using a Resource manager like YARN or mesos, does it give me an additional performance advantage like execution time and better resource management?
In Standalone cluster model, each application uses all the available nodes in the cluster.
From spark-standalone documentation page:
The standalone cluster mode currently only supports a simple FIFO scheduler across applications. However, to allow multiple concurrent users, you can control the maximum number of resources each application will use. By default, it will acquire all cores in the cluster, which only makes sense if you just run one application at a time.
In other cases (when you are running multiple applications in the cluster) , you can prefer YARN.
Currently sometime when I am processing huge chunk of data during shuffling with a possibility of stage failure. If I migrate to a YARN, can Resource manager address this issue?
Not sure since your application logic is not known. But you can give a try with YARN.
Have a look at related SE question for benefits of YARN over Standalone and Mesos:
Which cluster type should I choose for Spark?

Can Hadoop Yarn run a grid?

Can a Hadoop Yarn instance manage nodes from different places on Earth, networks? Can it manage nodes that use different platforms?
Every note about Yarn I found tells that Yarn manages clusters, but if the app I deploy is written in Java then it should probably work on the nodes regardless of the nodes' hardware.
Similarly, Yarn seems general enough to support more than just a LAN.
YARN is not platform aware. It is also not aware about how application processes on different hosts communicate with each other to perform the work.
In the same time for YARN application master should be run as a command line - and thereof any node on the cluster with enough resources should be able to run it.
If not every platform is capable to run specific app master- then YARN should be aware on it. Today it can not, but I can imegine platform to be special kind of resource - and then YARN will select appropriate node
Regarding LAN if you have application master which knows how to manage job over several LAN - it is should be fine with YARN.

Celery, Resque, or custom solution for processing jobs on machines in my cloud?

My company has thousands of server instances running application code - some instances run databases, others are serving web apps, still others run APIs or Hadoop jobs. All servers run Linux.
In this cloud, developers typically want to do one of two things to an instance:
Upgrade the version of the application running on that instance. Typically this involves a) tagging the code in the relevant subversion repository, b) building an RPM from that tag, and c) installing that RPM on the relevant application server. Note that this operation would touch four instances: the SVN server, the build host (where the build occurs), the YUM host (where the RPM is stored), and the instance running the application.
Today, a rollout of a new application version might be to 500 instances.
Run an arbitrary script on the instance. The script can be written in any language provided the interpreter exists on that instance. E.g. The UI developer wants to run his "check_memory.php" script which does x, y, z on the 10 UI instances and then restarts the webserver if some conditions are met.
What tools should I look at to help build this system? I've seen Celery and Resque and delayed_job, but they seem like they're built for moving through a lot of tasks. This system is under much less load - maybe on a big day a thousand hundred upgrade jobs might run, and a couple hundred executions of arbitrary scripts. Also, they don't support tasks written in any language.
How should the central "job processor" communicate with the instances? SSH, message queues (which one), something else?
Thank you for your help.
NOTE: this cloud is proprietary, so EC2 tools are not an option.
I can think of two approaches:
Set up password-less SSH on the servers, have a file that contains the list of all machines in the cluster, and run your scripts directly using SSH. For example: ssh user#foo.com "ls -la". This is the same approach used by Hadoop's cluster startup and shutdown scripts. If you want to assign tasks dynamically, you can pick nodes at random.
Use something like Torque or Sun Grid Engine to manage your cluster.
The package installation can be wrapped inside a script, so you just need to solve the second problem, and use that solution to solve the first one :)

Resources