I've been doing some preliminary HA testing with marathon and can't get the recovery time below 1.5 minutes. As I am running Mesos version 0.22.1, I did not set the checkpoint flag and setting executor_registration_timeout to 10s and 15s does not seem to improve the fail-over time. Are there other parameters that I need to configure in Mesos/Marathon to achieve faster recovery?
Cheers,
Related
How to deploy apache airflow (formally known as airbnb's airflow) scheduler in high availability?
I am not asking about the backend DB or RabbitMQ that should obviously be deployed in high availability configuration.
My main focus is the scheduler - is there something special needs to be done?
After a bit digging I found that it is not safe to run multiple schedulers simoultanously, this means that out of the box - airflow schedulers are not safe to use in high availablity environments.
The airflow team are planning to solve this issue by adding a lock mechanism on the DAG data structure, but this is not implemented yet (I checked by running 2 schedulers and saw that they schedule the same dag instances which is not good).
This is described here:
https://groups.google.com/forum/#!topic/airbnb_airflow/-1wKa3OcwME
I did found a way to workaround this high availalbilty issue by wrapping the schedulers with my own code and use cluster tools for leader election (I personanlly use consul for this purpose). This way only the elected master is running the scheduler and when the master is down the slave replaces him.
Please consider this when u use airflow in high availabilty environments since out of the box, airflow scheduler is currently not suitable for this (unless you solve this issue yourself).
Edit - an alternative approach to the master slave solution is to use a cluster manager/scheduler to make sure that only one airflow scheduler instance is always available. This approach relies on the self healing abilities of the cluster manager u have. For example both mesos and nomad supports this kind of configuration (I presonally chose nomad for its simplicity).
My personal experience was to follow the instructions I found for some best practices; that is to restart the scheduler every 10 runs ( -N 10 ) and use this software when possible:
https://github.com/teamclairvoyant/airflow-scheduler-failover-controller
I also use a DAG which pings a monitoring system to be sure that the scheduler has not gone away.
In my scenario, I have 2 schedulers (on 2 separate docker swarms), with the standby cluster scheduler turned off (using docker swarm service scale=0). I needed to make sure the primary scheduler had stopped fully before I started up the standby scheduler. What I found was that having 2 running schedulers (even for a brief time period) resulted in an occasional DAG scheduled to run on both clusters leading to duplicate reports generated from two different cluster zone.
Mesos and Marathon mention checkpointing from time to time, but I couldn't find a good explanation of how it works anywhere. Also, what does it mean in practice?
1) Is the Task current state continuously being stored, or is only the Task ID stored? Where is it stored and what does it contain?
2) There are two Marathon instances. Marathon has been running Nginx for a week, then goes down. Does that mean that the actual Nginx application state continues running on the second Marathon instance, or does it just restart the task from beginning? If the Task actual state is copied, isn't there a lot of data to be continuously persisted and passed around between slaves?
Slave recovery is a feature of Mesos that allows:
Executors/tasks to keep running when the slave process is down and
Allows a restarted slave process to reconnect with running executors/tasks on the slave.
(Mesos Slave recovery).
So regarding you questions this means:
Enough information (a little more than TaskID) is stored in order that a new slave process can reconnect to the still running executor/task.
As the task state is not checkpointed, it would start the task from the beginning.
Hope this helps,
Joerg
I'm working with Apache mesos and marathon. I have 3 master nodes and 3 slave nodes. I configure mesos with quorum 2. Later I post a JSON to run one job with marathon and all look fine.
Then I try a shutdown of two master nodes to break the quorum, after this, mesos unregister all slave and all look ok, but when I inspect the slaves I found that the started job was continue running...it is normal? I was supposing that marathon stop all job after the quorum is lost.
Part of the Mesos philosophy, especially for long-running services, is that a failure in one or more Mesos components should not need to stop the user application.
If a slave shuts down and the framework has checkpointing enabled, the executor driver will wait for the slave's --recovery_timeout (default 15min) before shutting down the executor/tasks. To prevent this, disable checkpointing on your framework (in Marathon, just set --checkpoint=false when starting Marathon). See also Marathon's --failover_timeout on https://mesosphere.github.io/marathon/docs/command-line-flags.html
On the other hand, if it's just the Masters/ZKs that shut down, and the Slaves are still up and running, the slaves can still monitor the tasks and queue up status updates, so the tasks can stay alive. If ZK loses quorum, then there is no leading master, and each slave will continue to operate independently until a new leader is detected, at which point it will reregister with the master and send any queued status updates.
I noticed that when I start my Spark EC2 cluster from my local machine with spark/ec2/spark-ec2 start mycluster the setup routine has a nasty habit of destroying everything I put in my cluster's spark/conf/. Short of having to run a put-my-configs-back.sh script every time I start up my cluster, is there a "correct" way to set up persistent configurations that will survive a stop/start? Or just a better way?
I'm working off of Spark master locally and Spark 1.2 in my cluster.
I have a mysql master-slave configuration in which the replication is instant.
I would like replication to be 60 or x number of minutes behind of the master.
How do I accomplish this?
I read that mysql 5.6 has such option but I couldn't get an info for my mysql version which is 5.5.
Cheers,
D
http://www.percona.com/doc/percona-toolkit/2.2/pt-slave-delay.html
pt-slave-delay watches a slave and starts and stops its replication SQL thread as necessary to hold it at least as far behind the master as you request. In practice, it will typically cause the slave to lag between --delay and --delay"+"--interval behind the master.