Isn't chronos a centralized scheduler? - mesos

Why chronos is called as distributed and fault-tolerant scheduler? As per my understanding there is only one scheduler instance running that manages job schedules.
As per Chronos doc, internally, the Chronos scheduler main loop is quite simple.
The pattern is as follows:
Chronos reads all job state from the state store (ZooKeeper)
Jobs are registered within the scheduler and loaded into the job graph for tracking dependencies.
Jobs are separated into a list of those which should be run at the current time (based on the clock of the host machine), and those which should not.
Jobs in the list of jobs to run are queued, and will be launched as soon as a sufficient offer becomes available.
Chronos will sleep until the next job is scheduled to run, and begin again from step 1.
Experts please opine?

You can run Chronos as a single node (which is what you are talking about) but Chronos is designed to be run with multiple nodes each on different hosts (achieving HA via Zookeeper quorum). This follows the standard leader/follower methodology where only the leader is active and the follower(s) will redirect traffic to the leader. This is considered to be HA in many open source frameworks, including Mesos as seen here.
Leader abdication or failure can occur, which is where Zookeeper comes in - Chronos leader election will occur after a failure with the leader, assuming quorum has been established and maintained prior to this event.
See reference of multi nodes here and here.
How leader election is specified:
JobSchedulerElectionSpec.scala
Leader redirection:
RedirectFilter.scala

Related

how to shrink the scale of storm cluster (offline one supervisor machine)

as the title.
I has a storm cluster with 20 machines, one for nimbus, and 19 for supervisor.
Now I found that we don't need so many machines for storm cluster, and want to offline 2 supervisor machines.
I don't know how to do that gently, just stop the supervisor process in the 2 machines? But there are some executors which are for online service running in this two machines.
Any suggestions will be helpful, thanks
I am writing from memory here, so please try this out on a non-production cluster before you go do it and find out I misremembered something.
If your topologies are written to handle message loss (i.e. they either don't care about at-least-once, or you're using acking), you can just kill the supervisor and workers. Nimbus will figure out that the supervisor is dead and reassign the executors pretty quickly. When the new executors come up, the topologies will handle the lost messages as they weren't acked.
If you can't handle message loss, you can deactivate the topologies in Storm UI, wait for them to stop processing and kill the supervisor. Then reactivate the topologies and Nimbus will reassign them.

Storm Pacemaker with upgraded KafkaSpout

I had a question regarding the usage of Pacemaker. We have a currently running Storm cluster on 1.0.2 and are in the process of migrating it to 1.2.2. We also use KafkaSpout to consume data from the KAfka topics.
Now, since this release in for Kafka 0.10 +, most of the load from ZK would be taken off since the offsets won't be stored in ZK.
Considering this, does it make sense for us to also start looking at Pacemaker to reduce load further on ZK?
Our cluster has 70+ supervisor and around 70 workers with a few unused slots. Also, we have around 9100+ executors/tasks running.
Another question I have is regarding the heartbeats and who all send it to whom? From what I have read, workers and supervisors send their heartbeats to ZK, which is what Pacemaker alleviates. How about the tasks? Do they also send heartbeats? If yes, then is it to ZK or where else? There's this config called task.heartbeat.frequency.secs which has led me to some more confusion.
The reason I ask this is that if the task level heartbeats aren't being sent to ZK, then its pretty evident that Pacemaker won't be needed. This is because with no offsets being committed to ZK, the load would be reduced dramatically. Is my assesment correct or would Pacemaker be still a feasible option? Any leads would be appreciated.
Pacemaker is an optional Storm daemon designed to process heartbeats from workers, which is implemented as a in-memory storage. You could use it if ZK become a bottleneck because the storm cluster scaled up
supervisor report heartbeat to nimbusthat it is alive, used for failure tolerance, and the frequency is set via supervisor.heartbeat.frequency.secs, stored in ZK.
And worker should heartbeat to the supervisor, the frequency is set via worker.heartbeat.frequency.secs. These heartbeats are stored in local file system.
task.heartbeat.frequency.secs: How often a task(executor) should heartbeat its status to the master(Nimbus), it never take effect in storm, and has been deprecated for Storm v2.0 RPC heartbeat reporting
This heartbeat stats what executors are assigned to which worker, stored in ZK.

Hive jobs getting stuck after log initialization in a specified queue

It seems lack of resources due to other running jobs in the same queue. Is there any work around to priorities some jobs over running jobs in tsame queue to execute first?
If you're using YARN to schedule jobs, there is no way to preempt jobs within the same queue. A workaround is to move the jobs to another queue if you need to free up resources in a particular queue.
YARN also supports reservations (mentioned in YARN-1051 YARN Admission Control/Planner: enhancing the resource allocation model with time.) which allows you to reserve vcores for future jobs. This came in 2.6.0 but most of the documentation is in 2.8.0.
Reservation System
Resource Manager REST APIs for Reservations

Apache Sling Job distribution

I need some advice. I have to choose between sling events or jobs.
In documentation it is precisely said that events are distributed to all nodes in cluster, so I could handle it in each one separately and thats ok.
But it's stated jobs are more reliable and that's what I want to achieve - reliability.
But there's a catch: job can only be executed in one job consumer.
Is there some similar mechanism as with events I mean if I could consume job in each cluster node and notify sender about success/failure on each node?

How does Storm assign tasks to workers?

How does Storm assign tasks to its workers? How does load balancing work?
Storm assigns tasks to workers when you submit the topology via "storm jar ..."
A typical Storm cluster will have many Supervisors (aka Storm nodes). Each Supervisor node (server) will have many Worker processes running. The number of workers per Supervisor is determined by how many ports you assign with supervisor.slots.ports .
When the topology is submitted via "storm jar" the Storm platform determines which workers will host each of your spouts and bolts (aka tasks). The number of workers and executors which will host your topology is dependent on the "parallelism" that you set during development, when the topology is submitted, or changed in a live running topology using "storm rebalance".
Michael Noll has a great breakdown of Parallelism, Workers and Tasks in his blog post here: http://www.michael-noll.com/blog/2012/10/16/understanding-the-parallelism-of-a-storm-topology/#example-of-a-running-topology

Resources