Hadoop Heartbeat Message Exchange - hadoop

I want to exchange heartbeat message in hadoop yarn. In hadoop mapreduce1 we can exchange heartbeat message between job tracker and task tracker. how can i achieve it in yarn.

Heartbeat is implemented in YARN as well. That is the core concept of communication with the Data nodes to the Name nodes. Containers communicate with ApplicationMaster and Node Managers sends Heart beats to the Resource manager as well.
You also can implement this by running background threads always on some specific ports and
listens the requests. You can create background threads, refer this link and link
Please have a look at this image for better understanding.
Hope it helps!

Related

how to shrink the scale of storm cluster (offline one supervisor machine)

as the title.
I has a storm cluster with 20 machines, one for nimbus, and 19 for supervisor.
Now I found that we don't need so many machines for storm cluster, and want to offline 2 supervisor machines.
I don't know how to do that gently, just stop the supervisor process in the 2 machines? But there are some executors which are for online service running in this two machines.
Any suggestions will be helpful, thanks
I am writing from memory here, so please try this out on a non-production cluster before you go do it and find out I misremembered something.
If your topologies are written to handle message loss (i.e. they either don't care about at-least-once, or you're using acking), you can just kill the supervisor and workers. Nimbus will figure out that the supervisor is dead and reassign the executors pretty quickly. When the new executors come up, the topologies will handle the lost messages as they weren't acked.
If you can't handle message loss, you can deactivate the topologies in Storm UI, wait for them to stop processing and kill the supervisor. Then reactivate the topologies and Nimbus will reassign them.

Storm Pacemaker with upgraded KafkaSpout

I had a question regarding the usage of Pacemaker. We have a currently running Storm cluster on 1.0.2 and are in the process of migrating it to 1.2.2. We also use KafkaSpout to consume data from the KAfka topics.
Now, since this release in for Kafka 0.10 +, most of the load from ZK would be taken off since the offsets won't be stored in ZK.
Considering this, does it make sense for us to also start looking at Pacemaker to reduce load further on ZK?
Our cluster has 70+ supervisor and around 70 workers with a few unused slots. Also, we have around 9100+ executors/tasks running.
Another question I have is regarding the heartbeats and who all send it to whom? From what I have read, workers and supervisors send their heartbeats to ZK, which is what Pacemaker alleviates. How about the tasks? Do they also send heartbeats? If yes, then is it to ZK or where else? There's this config called task.heartbeat.frequency.secs which has led me to some more confusion.
The reason I ask this is that if the task level heartbeats aren't being sent to ZK, then its pretty evident that Pacemaker won't be needed. This is because with no offsets being committed to ZK, the load would be reduced dramatically. Is my assesment correct or would Pacemaker be still a feasible option? Any leads would be appreciated.
Pacemaker is an optional Storm daemon designed to process heartbeats from workers, which is implemented as a in-memory storage. You could use it if ZK become a bottleneck because the storm cluster scaled up
supervisor report heartbeat to nimbusthat it is alive, used for failure tolerance, and the frequency is set via supervisor.heartbeat.frequency.secs, stored in ZK.
And worker should heartbeat to the supervisor, the frequency is set via worker.heartbeat.frequency.secs. These heartbeats are stored in local file system.
task.heartbeat.frequency.secs: How often a task(executor) should heartbeat its status to the master(Nimbus), it never take effect in storm, and has been deprecated for Storm v2.0 RPC heartbeat reporting
This heartbeat stats what executors are assigned to which worker, stored in ZK.

Apache Sling Job distribution

I need some advice. I have to choose between sling events or jobs.
In documentation it is precisely said that events are distributed to all nodes in cluster, so I could handle it in each one separately and thats ok.
But it's stated jobs are more reliable and that's what I want to achieve - reliability.
But there's a catch: job can only be executed in one job consumer.
Is there some similar mechanism as with events I mean if I could consume job in each cluster node and notify sender about success/failure on each node?

What are the implications of using NFS3 file system for multi-instance queue managers in WebSphere MQ

We are stuck in a difficult scenario in our new MQ infrastructure implementation using multi-instance queue managers using WebSphere MQ v7.5 in Linux platform.
The concern is our Network Team is not able to configure NFS4 and hence we are still having the NFS3 version. We understand multi-instance queue managers will not function properly with NFS3. But are there any issues if we define queue managers in multi-instance fashion in NFS3 and expect to work perfect for single instance mode.
Thanks
I would not expect you to have issues running single-node queue managers with NFS3, we do so on a regular basis. The requirement for NFS4 was for the file locking mechanism required by multi-instance queue managers to determine when the primary instance has lost control and an a secondary queue manager should take over.
If you do define the queue manager as multi-instance, and the queue manager attempt to failover, it may not do so successfully, at worst it may corrupt your queue manager files.
If you control the failover yourself - as in, shutdown the queue manager on one node and start it again on another node - that should work for you, as there is no file sharing taking place and all files would be shutdown on the primary node before being opened on the secondary node. You would have to make sure the secondary queue manager is NOT running in standby node -- ever.
I hope this helps.
Dave

MQ Cluster - how to properly disable one node in production environments

I have some messages flowing through the MQ cluster by using cluster and alias queues. Some queues are defined multiple times, though the loadbalancing mechanism is used.
What is the propper way to extract one QM from the cluster without disturbing the whole message flow? Disabling the cluster-receiver channel, cluster-sender channels, or else?
Use the
suspend qmgr
command.
This suspends the queue manager from the cluster.
command reference

Resources