Configure Marathon to not restart tasks that enter TASK_FINISHED - mesos

I launch tasks via Marathon. However they finish, Marathon restarts them. I would like them to only restart if they finish in failure. Is there a way such that if the task enters the state of TaskStatus.TASK_FINISHED that Marathon will not restart it, e.g., by suspending the job, i.e., by setting the number of tasks to zero?
Currently when my task completes successfully I PUT a message to the Marathon REST API scaling the job down to 0 instances. This is fine except that in response Marathon kills the task setting its status to TASK_KILLED and I would like it to be TASK_FINISHED to indicate its success.

If you have one-of tasks as you describe, I think the better solution would be to use a scheduler like
https://mesos.github.io/chronos/ or its successor
https://github.com/dcos/metronome
Marathon is normally used to keep tasks running, and rescheduling them if they reach a final task state.
See the Marathon docs, and also this explanation of different task types.

Related

Celery: AWS ECS Autoscale scale-in Event (how to not destroy long running tasks?)

I'm running Python Celery (a distributed task queue library) workers in an AWS ECS cluster (1 Celery worker running per EC2 instance), but the tasks are long running and NOT idempotent. This means that when an autoscaling scale-in event happens, which is when ECS terminates one of the containers running a worker because of low task load, the long running tasks currently in progress on that worker would be lost forever.
Does anyone have any suggestions on how to configure ECS autoscaling so no tasks are terminated before completion? Ideally, ECS scale-in event would initiate a warm-shutdown on the Celery worker in the EC2 instance it wants to terminate, but only ACTUALLY terminate the EC2 instance once the Celery worker has finished the warm shutdown, which occurs after all its tasks have completed.
I also understand there is something called instance protection, which can be set programmatically and protects instances from being terminated in a scale-in autoscale event: https://docs.aws.amazon.com/autoscaling/ec2/userguide/as-instance-termination.html#instance-protection-instance
However, I'm not aware of any Celery signals which trigger after all tasks have finished out in a warm shutdown, so I'm not sure how I'd programmatically know when to disable the protection anyways. And even if I found a way to disable the protection at the right moment, who would manage which worker gets sent the shutdown signal in the first place? Can EC2 be configured to do a custom action to instances in a scale-in event (like doing a warm celery shutdown) instead of just terminating the EC2 instance?
I think that while ECS scale-in your tasks it sends SIGTERM, wait for 30 seconds (default) and kill your task's containers with SIGKILL.
I think that you can increase the time between the signals with this variable: ECS_CONTAINER_STOP_TIMEOUT.
That way, your celery task could finish and no new tasks will be added to this celery worker (warm-shutdown after receiving the SIGTERM).
This answer might help you:
https://stackoverflow.com/a/49564080/1011253
What we do in our company is we do not use ECS, just "plain" EC2 (for this particular service). We have an "autoscaling" task that runs every N minutes, which depending on situation scales up the cluster by M new machines (all configurable via AWS parameter store). So basically Celery scales up/down itself. The task I mentioned also sends shutdown signal to every worker older than 10 minutes that is completely idle. When Celery worker shuts down, the whole machine terminates (in fact, Celery worker shuts it down via the #worker_shutdown.connect handler that powers off the machine - all these EC2 instances have "terminate" shutdown policy). The cluster processes millions of tasks per day, some of them running for up to 12 hours...

Apache Aurora cron jobs are not scheduled

I setup a Mesos cluster which runs Apache Aurora framework, and i registered 100 cron jobs which run every min on a 5 slave machine pool. I found after scheduled 100 times, the cron jobs stacked in "PENDING" state. May i ask what kind of logs i can inspect and what is the possible problem ?
It could be a couple of things:
Do you still have sufficient resources in your cluster?
Are those resources offered to Aurora? Or maybe only to another framework?
Do you have any task constraints that prevent your tasks from being scheduled?
Possible information source:
What does the tooltip or the expanded status say on the UI? (as shown in the screenshot)
The Aurora scheduler has log files. However normally those are not needed for an end user to figure out why stuff is stuck in pending.
In case you are stuck here, it would probably be the best to drop by in the #aurora IRC channel on freenode.

ensuring that a mesos task is not running after a TASK_LOST status update

I am trying to write a simple Mesos framework that can relaunch tasks that don't succeed.
The basic algorithm, which seems to be mostly working, is to read in a task list (e.g. shell commands) and then launch executors, waiting to hear back status messages. If I get TASK_FINISHED, that particular task is done. If I get TASK_FAILED/TASK_KILLED, I can retry the task elsewhere (or maybe give up).
The case I'm not sure about is TASK_LOST (or even slave lost). I am hoping to ensure that I don't launch another copy of a task that is already running. After getting TASK_LOST, Is it possible that the executor is still running somewhere, but a network problem has disconnected the slave from the master? Does Mesos deal with this case somehow, perhaps by having the executor kill itself (and the task) when it is unable to contact the master?
More generally, how can I make sure I don't have two of the same task running in this context?
Let me provide some background first and then try to answer your question.
1) The difference between TASK_LOST and other terminal unsuccessful states is that restarting a lost task could end in TASK_FINISHED, while failed or killed will most probably not.
2) Until you get a TASK_LOST you should assume your task is running. Imagine a Mesos Agent (Slave) dies for a while, but the tasks may still be running and will be successfully reconciled, even though the connection is temporarily lost.
3) Now to your original question. The problem is that it is utterly hard to have exactly once instance running (see e.g. [1] and [2]). If you have lost connection to your task, that can mean either a (temporary) network partition or that your task has died. You basically have to choose between two alternatives: either having the possibility of multiple instances running at the same time, or possibly having periods when there are no instances running.
4) It's not easy to guarantee that two tasks are not running concurrently. When you get a TASK_LOST update from Mesos it means either your task is dead or orphaned (it will be killed once reconciled). Now imagine a situation when a slave with your task is disconnected from the Mesos Master (due to a network partition): while you will get a TASK_LOST update and the Master ensures the task is killed once the connection is restored, your task will be running on the disconnected slave until then, which violates the guarantee given you have already started another instance once you got the TASK_LOST update.
5) Things you may want to look at:
recovery_timeout on Mesos slaves regulates when tasks commit suicide if the mesos-slave process dies
slave_reregister_timeout on the Mesos Master specifies how much time do slaves have to reregister with the Mesos Master and have their tasks reconciled (basically, when you get TASK_LOST updates for unreachable tasks).
[1] http://antirez.com/news/78
[2] http://bravenewgeek.com/you-cannot-have-exactly-once-delivery/
You can assume that TASK_LOST really means your task is lost and there is nothing you can do but launch another instance.
Two things to keep in mind though:
Your framework may register with failover timeout which means if your framework cannot communicate with slave for any reason (network unstable, slave died, scheduler died etc.) then Mesos will kill tasks for this framework after they fail to recover within that timeout. You will get TASK_LOST status after the task is actually considered dead (e.g. when failover timeout expires).
When not using failover timeout tasks will be killed immediately when connectivity is lost for any reason.

Apache Mesos Schedulers and Executors by example

I am trying to understand how the various components of Mesos work together, and found this excellent tutorial that contains the following architectural overview:
I have a few concerns about this that aren't made clear (either in the article or in the official Mesos docs):
Where are the Schedulers running? Are there "Scheduler nodes" where only the Schedulers should be running?
If I was writing my own Mesos framework, what Scheduler functionality would I need to implement? Is it just a binary yes/no or accept/reject for Offers sent by the Master? Any concrete examples?
If I was writing my own Mesos framework, what Executor functionality would I need to implement? Any concrete examples?
What's a concrete example of a Task that would be sent to an Executor?
Are Executors "pinned" (permanently installed on) Slaves, or do they float around in an "on demand" type fashion, being installed and executed dynamically/on-the-fly?
Great questions!
I believe it would be really helpful to have a look at a sample framework such as Rendler. This will probably answer most of your question and give you feeling for the framework internal.
Let me now try to answer the question which might be still be open after this.
Scheduler Location
Schedulers are not on on any special nodes, but keep in mind that schedulers can failover as well (as any part in a distributed system).
Scheduler functionality
Have a look at Rendler or at the framework development guide.
Executor functionality/Task
I believe Rendler is a good example to understand the Task/Executor relationship. Just start reading the README/description on the main github page.
Executor pinning
Executors are started on each node when the first Task requiring such executor is send to this node. After this it will remain on that node.
Hope this helped!
To add to js84's excellent response,
Scheduler Location: Many users like to launch the schedulers via another framework like Marathon to ensure that if the scheduler or its node dies, then it can be restarted elsewhere.
Scheduler functionality: After registering with Mesos, your scheduler will start getting resource offers in the resourceOffers() callback, in which your scheduler should launch (at least) one task on a subset (or all) of the resources being offered. You'll probably also want to implement the statusUpdate() callback to handle task completion/failure.
Note that you may not even need to implement your own scheduler if an existing framework like Marathon/Chronos/Aurora/Kubernetes could suffice.
Executor functionality: You usually don't need to create a custom executor if you just want to launch a linux process or docker container and know when it completes. You could just use the default mesos-executor (by specifying a CommandInfo directly in TaskInfo, instead of embedded inside an ExecutorInfo). If, however you want to build a custom executor, at minimum you need to implement launchTask(), and ideally also killTask().
Example Task: An example task could be a simple linux command like sleep 1000 or echo "Hello World", or a docker container (via ContainerInfo) like image : 'mysql'. Or, if you use a custom executor, then the executor defines what a task is and how to run it, so a task could instead be run as another thread in the executor's process, or just become an item in a queue in a single-threaded executor.
Executor pinning: The executor is distributed via CommandInfo URIs, just like any task binaries, so they do not need to be preinstalled on the nodes. Mesos will fetch and run it for you.
Schedulers: are some strategy to accept or reject the offer. Schedulers we can write our own or we can use some existing one like chronos. In scheduler we should evaluate the resources available and then either accept or reject.
Scheduler functionality: Example could be like suppose say u have a task which needs 8 cpus to run, but the offer from mesos may be 6 cpus which won't serve the need in this case u can reject.
Executor functionality : Executor handles state related information of your task. Set of APIs you need to implement like what is the status of assigned task in mesos slave. What is the num of cpus currently available in mesos slave where executor is running.
concrete example for executor : chronos
being installed and executed dynamically/on-the-fly : These are not possible, you need to pre configure the executors. However you can replicate the executors using autoscaling.

Monitor server, process, services, Task scheduler status

I am wondering if there is a way to monitor these automatically. Right now, in our production/QA/Dev environments - we have bunch of services running that are critical to the application. We also have automatic ETLs running on windows task scheduler at a set time of the day. Currently, I have to log into each server and see if all the services are running fine or not, or check event logs for any errors, or check task scheduler to see if ETLs ran well etc etc... I have to do all the manually... I am wondering if there is a tool out there that will do the monitoring for me and send emails only in case something needs attention (like ETLs fail to run, or service get stopped for whatever reason or errors in event log etc). Thanks for the help.
Paessler PRTG Network Monitor can do all that. we have very good experience with it.
http://www.paessler.com/prtg/features
Nagios is the best tool for monitoring. It checks for the server status as well the defined services in it and if any service goes down or system goes down, sends the mail to specified mail id.
Refer the : http://nagios.org/
Thanks for the above information. I looked at the above options but they have a price.. what I did is an inexpensive way to address my concerns..
For my windows task scheduler jobs that run every night - I installed this tool/service from codeplex that is working great.
http://motash.codeplex.com/documentation#CommentsAnchor
For Windows services - I am just setting the "Recovery" Tab in each service "property" with actions to do when it fails. (like restart, reboot, or run a program which could be an email that will notify)
I built a simple tool (https://cronitor.io) for monitoring periodic/scheduled tasks. The name is a play on "cron" from the unix world, but it is system/task agnostic. All you have to do is make an http request to a unique tracking URL whenever your job runs. If your job doesn't check-in according to the rules you define then it will send you an email/sms message.
It also allows you to track the duration of your jobs by making calls at the beginning and end of your task. This can be really useful for long running jobs since you can be alerted if they start taking too long to run. For example, I once had a backup task that was scheduled every hour. About six months after I set it up it started taking longer than an hour to run!
There is https://eyewitness.io - which is for monitoring server cron tasks, queues and websites. It makes sure each of your cron jobs run when they are supposed to, and alerts you if they failed to be run.

Resources