My scenario is that a framework is running on server A. It has an executor on server B running a task (a long running web service with a long initialization time). Server A is shutdown. The framework is then restarted somewhere else in the cluster.
Currently, after the restart the new framework registers a new executor which runs a new task. After some time, the Mesos master deactivates the old and no-longer-running framework which in turn kills the old but still-running executor and its task.
I would like the new framework to re-register the old executor rather than register a new one. Is this possible?
This on the Mesos forum answers my question:
http://www.mail-archive.com/user%40mesos.apache.org/msg00069.html
Included here for reference:
(1) One thing particular I found unexpected is that the executors are
shutdown if the scheduler is shutdown. Is there a way to keep executors/tasks
running when the scheduler is down? I would imagine when the scheduler comes
back, it could reestablish the state somehow and keep going without
interrupting the running tasks. Is this a use case that mesos is designed for?
You can use FrameworkInfo.failover_timeout to tell Mesos how long to wait for the framework to re-register before it cleans up the
framework's executors and tasks.
Also, note that for this to work the framework has to persist its
frameworkId when it first registers with the master. When the
framework comes back up it needs to reconnect by setting
FrameworkInfo.framework_id = persisted id.
Related
We are using a quartz scheduler for scheduling a few jobs.
We are multiple VMs so we have configured quartz in clustered mode using MySQL job store.
Now we are migrating to Kubernetes. So there will be multiple pods that will handle all the incoming requests, but there will be worker pods that will only run these schedulers. The problem is we don't want these main pods to execute these quartz jobs.
I was thinking of disabling schedulers for these main pods and enabling only for worker pods, In that way we can achieve what is required.
So the question is if we disable or put schedulers on standby for main pods will it also put other (workers) instances schedulers also on standby as all of them are running in cluster mode?
If anyone has a better solution, Please help.
I don't have any solution in my mind at all.
not even a workaround.
If we don't find the solution then we have to stick with our older VMs and cannot migrate to Kubernetes.
I have a spring-boot application, which takes request from users and save data in db.
There are certain integration calls need with the data saved. So I thought a scheduler task for every 15 mins which should pick this data and do necessary calls.
But my application is being deployed in AWS EC2 on 2 instances. So this scheduler process will run on both the instances, which will cause duplicate integration calls.
Any suggestions on how this can be achieved to avoid duplicate calls.
I haven't had any code as of now to share.
Please share your thoughts...Thanks.
It seems a similar question was answered here: Spring Scheduled Task running in clustered environment
My take:
1) Easy - you can move the scheduled process to a separate instance from the ones that service request traffic, and only run it on one instance, a "job server" if you will.
2) Most scalable - have the scheduled task on two instances but they will somehow have to synchronize who is active and who is standby (perhaps with a cache such as AWS Elasticache). Or you can switch over to using Quartz job scheduler with JDBCJobStore persistence, and it can coordinate which of the 2 instances gets to run the job. http://www.quartz-scheduler.org/documentation/quartz-2.x/tutorials/tutorial-lesson-09.html
I am trying to write a simple Mesos framework that can relaunch tasks that don't succeed.
The basic algorithm, which seems to be mostly working, is to read in a task list (e.g. shell commands) and then launch executors, waiting to hear back status messages. If I get TASK_FINISHED, that particular task is done. If I get TASK_FAILED/TASK_KILLED, I can retry the task elsewhere (or maybe give up).
The case I'm not sure about is TASK_LOST (or even slave lost). I am hoping to ensure that I don't launch another copy of a task that is already running. After getting TASK_LOST, Is it possible that the executor is still running somewhere, but a network problem has disconnected the slave from the master? Does Mesos deal with this case somehow, perhaps by having the executor kill itself (and the task) when it is unable to contact the master?
More generally, how can I make sure I don't have two of the same task running in this context?
Let me provide some background first and then try to answer your question.
1) The difference between TASK_LOST and other terminal unsuccessful states is that restarting a lost task could end in TASK_FINISHED, while failed or killed will most probably not.
2) Until you get a TASK_LOST you should assume your task is running. Imagine a Mesos Agent (Slave) dies for a while, but the tasks may still be running and will be successfully reconciled, even though the connection is temporarily lost.
3) Now to your original question. The problem is that it is utterly hard to have exactly once instance running (see e.g. [1] and [2]). If you have lost connection to your task, that can mean either a (temporary) network partition or that your task has died. You basically have to choose between two alternatives: either having the possibility of multiple instances running at the same time, or possibly having periods when there are no instances running.
4) It's not easy to guarantee that two tasks are not running concurrently. When you get a TASK_LOST update from Mesos it means either your task is dead or orphaned (it will be killed once reconciled). Now imagine a situation when a slave with your task is disconnected from the Mesos Master (due to a network partition): while you will get a TASK_LOST update and the Master ensures the task is killed once the connection is restored, your task will be running on the disconnected slave until then, which violates the guarantee given you have already started another instance once you got the TASK_LOST update.
5) Things you may want to look at:
recovery_timeout on Mesos slaves regulates when tasks commit suicide if the mesos-slave process dies
slave_reregister_timeout on the Mesos Master specifies how much time do slaves have to reregister with the Mesos Master and have their tasks reconciled (basically, when you get TASK_LOST updates for unreachable tasks).
[1] http://antirez.com/news/78
[2] http://bravenewgeek.com/you-cannot-have-exactly-once-delivery/
You can assume that TASK_LOST really means your task is lost and there is nothing you can do but launch another instance.
Two things to keep in mind though:
Your framework may register with failover timeout which means if your framework cannot communicate with slave for any reason (network unstable, slave died, scheduler died etc.) then Mesos will kill tasks for this framework after they fail to recover within that timeout. You will get TASK_LOST status after the task is actually considered dead (e.g. when failover timeout expires).
When not using failover timeout tasks will be killed immediately when connectivity is lost for any reason.
I am trying to understand how the various components of Mesos work together, and found this excellent tutorial that contains the following architectural overview:
I have a few concerns about this that aren't made clear (either in the article or in the official Mesos docs):
Where are the Schedulers running? Are there "Scheduler nodes" where only the Schedulers should be running?
If I was writing my own Mesos framework, what Scheduler functionality would I need to implement? Is it just a binary yes/no or accept/reject for Offers sent by the Master? Any concrete examples?
If I was writing my own Mesos framework, what Executor functionality would I need to implement? Any concrete examples?
What's a concrete example of a Task that would be sent to an Executor?
Are Executors "pinned" (permanently installed on) Slaves, or do they float around in an "on demand" type fashion, being installed and executed dynamically/on-the-fly?
Great questions!
I believe it would be really helpful to have a look at a sample framework such as Rendler. This will probably answer most of your question and give you feeling for the framework internal.
Let me now try to answer the question which might be still be open after this.
Scheduler Location
Schedulers are not on on any special nodes, but keep in mind that schedulers can failover as well (as any part in a distributed system).
Scheduler functionality
Have a look at Rendler or at the framework development guide.
Executor functionality/Task
I believe Rendler is a good example to understand the Task/Executor relationship. Just start reading the README/description on the main github page.
Executor pinning
Executors are started on each node when the first Task requiring such executor is send to this node. After this it will remain on that node.
Hope this helped!
To add to js84's excellent response,
Scheduler Location: Many users like to launch the schedulers via another framework like Marathon to ensure that if the scheduler or its node dies, then it can be restarted elsewhere.
Scheduler functionality: After registering with Mesos, your scheduler will start getting resource offers in the resourceOffers() callback, in which your scheduler should launch (at least) one task on a subset (or all) of the resources being offered. You'll probably also want to implement the statusUpdate() callback to handle task completion/failure.
Note that you may not even need to implement your own scheduler if an existing framework like Marathon/Chronos/Aurora/Kubernetes could suffice.
Executor functionality: You usually don't need to create a custom executor if you just want to launch a linux process or docker container and know when it completes. You could just use the default mesos-executor (by specifying a CommandInfo directly in TaskInfo, instead of embedded inside an ExecutorInfo). If, however you want to build a custom executor, at minimum you need to implement launchTask(), and ideally also killTask().
Example Task: An example task could be a simple linux command like sleep 1000 or echo "Hello World", or a docker container (via ContainerInfo) like image : 'mysql'. Or, if you use a custom executor, then the executor defines what a task is and how to run it, so a task could instead be run as another thread in the executor's process, or just become an item in a queue in a single-threaded executor.
Executor pinning: The executor is distributed via CommandInfo URIs, just like any task binaries, so they do not need to be preinstalled on the nodes. Mesos will fetch and run it for you.
Schedulers: are some strategy to accept or reject the offer. Schedulers we can write our own or we can use some existing one like chronos. In scheduler we should evaluate the resources available and then either accept or reject.
Scheduler functionality: Example could be like suppose say u have a task which needs 8 cpus to run, but the offer from mesos may be 6 cpus which won't serve the need in this case u can reject.
Executor functionality : Executor handles state related information of your task. Set of APIs you need to implement like what is the status of assigned task in mesos slave. What is the num of cpus currently available in mesos slave where executor is running.
concrete example for executor : chronos
being installed and executed dynamically/on-the-fly : These are not possible, you need to pre configure the executors. However you can replicate the executors using autoscaling.
From gearman's main page, they mention running with multiple job servers so if a job server dies, the clients can pick up a new job server. Given the statement and diagram below, it seems that the job servers do not communicate with each other.
Our question is what happens to those jobs that are queued in the job server that died? What is the best practice to have high-availability for these servers to make sure jobs aren't interrupted in a failure?
You are able to run multiple job servers and have the clients and workers connect to the first available job server they are configured with. This way if one job server dies, clients and workers automatically fail over to another job server. You probably don't want to run too many job servers, but having two or three is a good idea for redundancy.
Source
As far as I know there is no proper way to handle this at the moment, but as long as you run both job servers with permanent queues (using MySQL or another datastore - just don't use the same actual queue for both servers), you can simply restart the job server and it'll load its queue from the database. This will allow all the queued tasks to be submitted to available workers, even after the server has died.
There is however no automagical way of doing this when a job server goes down, so if both the job server and the datastore goes down (a server running both locally goes down) will leave the tasks in limbo until it gets back online.
The permanent queue is only read on startup (and inserted / deleted from as tasks are submitted and completed).
I'm not sure about the complexity required to add such functionality to gearmand and whether it's actually wanted, but simple "task added, task handed out, task completed"-notifications between servers shouldn't been too complicated to handle.