How are frameworks in Mesos notified when tasks scheduled by them finish? - mesos

My question is a combination of this and this question on stackoverflow, however, answers there don't help me. I want to know that when in a Mesos cluster a task corresponding to a framework finishes how is the framework scheduler informed of this. The more details (like who initiates the communication, is there a lag?, what all information is included in the message etc..) you can provide better it will be for me. I was not able to find my answer even in the Mesos docs.

Frameworks are notified about tasks with Update event
Update Sent by the master whenever there is a status update that is generated by the executor, agent or master. Status updates should be used by executors to reliably communicate the status of the tasks that they manage. It is crucial that a terminal update (e.g., TASK_FINISHED, TASK_KILLED, TASK_FAILED) is sent by the executor as soon as the task terminates, in order for Mesos to release the resources allocated to the task. It is also the responsibility of the scheduler to explicitly acknowledge the receipt of status updates that are reliably retried. See ACKNOWLEDGE in the Calls section above for the semantics. Note that uuid and data are raw bytes encoded in Base64.
All communication (in V1 API) is imitated by a framework. Framework is calling subscribe method and keep connection open to revive updates. Basically when task is done communication looks like this: Task → Executor → Agent → Master → Framework

Related

How to know the the running status of a spring integration flow

I have a simple integration flow that poll data based on a cron job from database, publish on a DirectChannel, then do split and transformations, and publish on another executor service channel, do some operations and finally publish to an output channel, its written using dsl style.
Also, I have an endpoint where I might receive an http request to trigger this flow, at this point I send the messages one of the mentioned channels to trigger the flow.
I want to make sure that the manual trigger doesn’t happen if the flow is already running due to either the cron job or another request.
I have used the isRunning method of the StandardIntegrationFlow, but it seems that it’s not thread safe.
I also tried using .wireTap(myService) and .handle(myService) where this service has an atomicBoolean flag but it got set per every message, which is not a solution.
I want to know if the flow is running without much intervention from my side, and if this is not supported how can I apply the atomic boolean logic on the overall flow and not on every message.
How can I simulate the racing condition in a test in order to make sure my implementation prevent this?
The IntegrationFlow is just a logical container for configuration phase. It does have those lifecycle methods, but only for an internal framework logic. Even if they are there, they don't help because endpoints are always running if you want to do them something by some event or input message.
It is hard to control all of that since it is in an async state as you explain. Even if we can stop a SourcePollingChannelAdapter in the beginning of that flow to let your manual call do do something, it doesn't mean that messages in other threads are not in process any more. The AtomicBoolean cannot help here for the same reason: even if you set it to true in the MessageSourceMutator.beforeReceive() and reset back to false in its afterReceive() when message is null, it still doesn't mean that messages you pushed down in other thread are already processed.
You might consider to use an aggregator for AtomicBoolean resetting in the end of batch since you mention that you pull data from DB, so perhaps there is a number of records per poll you can track downstream. This way your manual call could be skipped until aggregator collects results for that batch.
You also need to think about stopping a SourcePollingChannelAdapter at the moment when manual action is permitted, so there won't be any further race conditions with the cron.

Service synchronization issue

I've created two services.
One of them (scheduler) only requests to the other (backoffice) for performing some "large" operations.
When backoffice receives a request:
first creates a mark (key on redis) in order to set that the process has started.
Each time a request is reached:
backoffice checks if the mark exist.
When it exists means that the previous process has not yet finished, and escape it.
Perform the large process.
When process is finished, the previous key in redis is removed.
It would be something like this:
if (key exists)
return;
make long process... (1);
remove key;
The problem arises when service is destroyed when the process has not already finished and then it doesn't removes the mark on redis. It means the process will never run again.
Is there any way to solve this kind of problems?
The way to solve this problem is use an existing engine as building custom scalable and robust solution for reliable service orchestration is really hard.
I recommend looking at Uber Cadence Workflow which would allow to convert your pseudocode into a real production application with minor changes.
You can fire a background job that updates timestamp under the key, e.g. every minute.
When service attempts to start the process it must verify key existence (as it does now) + timestamp under the key. If it is more than 1 minute ago then the previous attempt is stale and you can start over.
Sounds like you should be using a messaging queue to schedule tasks for the back office service. Queuing solutions like RabbitMQ allow you to manually acknowledge (or “ack”) that the process is complete. Whenever a subscriber crashes, the queue detects that the connection dropped without acknowledgement and will re-enqueue the same task which will be picked up by the next available subscriber. Here’s another thread talking about this problem specifically focused on messaging queues:
What happens to fetched messages when RabbitMQ consumer crashes?

Golang background processing

How can one do background processing/queueing in Go?
For instance, a user signs up, and you send them a confirmation email - you want to send the confirmation email in the background as it may be slow, and the mail server may be down etc etc.
In Ruby a very nice solution is DelayedJob, which queues your job to a relational database (i.e. simple and reliable), and then uses background workers to run the tasks, and retries if the job fails.
I am looking for a simple and reliable solution, not something low level if possible.
While you could just open a goroutine and do every async task you want, this is not a great solution if you want reliability, i.e. the promise that if you trigger a task it will get done.
If you really need this to be production grade, opt for a distributed work queue. I don't know of any such queues that are specific to golang, but you can work with rabbitmq, beanstalk, redis or similar queuing engines to offload such tasks from your process and add fault tolerance and queue persistence.
A simple Goroutine can make the job:
http://golang.org/doc/effective_go.html#goroutines
Open a gorutine with the email delivery and then answer to the HTTP request or whatever
If you wish use a workqueue you can use Rabbitmq or Beanstalk client like:
https://github.com/streadway/amqp
https://github.com/kr/beanstalk
Or maybe you can create a queue in you process with a FIFO queue running in a goroutine
https://github.com/iNamik/go_container
But maybe the best solution is this job queue library, with this library you can set the concurrency limit, etc:
https://github.com/otium/queue
import "github.com/otium/queue"
q := queue.NewQueue(func(email string) {
//Your mail delivery code
}, 20)
q.Push("foo#bar.com")
I have created a library for running asynchronous tasks using a message queue (currently RabbitMQ and Memcache are supported brokers but other brokers like Redis or Cassandra could easily be added).
You can take a look. It might be good enough for your use case (and it also supports chaining and workflows).
https://github.com/RichardKnop/machinery
It is an early stage project though.
You can also use goworker library to schedule jobs.
http://www.goworker.org/
If you are coming from Ruby background and looking for something like Sidekiq, Resque, or DelayedJob, please check out the library asynq.
Queue semantics are very similar to sidekiq.
https://github.com/hibiken/asynq
If you want a library with a very simple interface, yet robust that feels Go-like, uses Redis as Backend and RabbitMQ as message broker, you can try
https://github.com/Joker666/cogman

How to manage several worker processes with one master process in ruby?

I need to develop multi-process application in Ruby: one master process manage several worker processes. Master creates (forks! I dont want multithreading, but multiprocessing) pull of workers and then asks database for new records to be processed periodically, and if any - send records to workers via pipes. When worker done its job it must notify master, so master can send to this worker another record to process. I cant find solution for notifications which workers can send to master, seems like it should be done in async, but IO pipes work in sync mode. Please, give me the direction to go? Any fork best practices in comments are also welcomed! Thank you.
PS. I dont want to use external solutions like EventMachine or Parallel, only forks.
well i do realy think a MQ system is more suitable.
master publish the job, works query and process the job.
rails also have a publish/subscription function
see
http://api.rubyonrails.org/classes/ActiveSupport/Notifications.html
Rails pub/sub with faye

TWS alert questions

Please let me know the answers to these questions on Tivoli Workload Scheduler.
How could TWS warn the admins when the job didn’t kick off as scheduled?
How could TWS warn the admins if the job is running longer than expected?
How could TWS send email about the completion of the job once it is finished without errors?
Starting from TWS 8.4 you can define an event rule with two correlated events. In your case, these two events need to be defined of type Job Status Changed and with the property Status set to "Running". It needs also to define in this rule a timeout of X seconds/minutes/hours and associate the timeout to a SendEmail action. Then when the timeout expires and the status remains the unchanged (running) the email is sent automatically. Hope this helps
It would depend on what version of TWS you are running. But if you are talking about versions > 8.3 all these alerts and notifications can be set up in the Tivoli Dynamic Workload Console (TDWC).
These new versions of TWS introduce the new concept of 'even-driven' scheduling, where you can set events and triggers for each job, in which you could include an email being sent or forward the event to an event management system.

Resources