How do we allocate different number of reducers of heterogeneous clusters? - hadoop

Our system has a cluster of 5 hosts (e.g., data node or computer slaves…). Now, I want allocate different number of reducers of these hosts because 1 host is slow. . We are using Hadoop Yarn. The resource manager (so called Job tracker in MapReduce1) always allocate evenly number of reducers of to 5 hosts. Is there anyway that I can limit number of reducers of a specific host? For example, what I want is that a job with 40 reducers, 4 fast computers have 36 reducers (e.g., 9 reducers each host), the slow computer has only 4 reducers.

It is entirely possible and a common phenomenon to have heterogenous systems in a hadoop cluster. Typically, as the cluster keeps becoming larger and hence is scaling horizontally, new nodes of different configurations get added to the cluster.
In such scenarios, in order to have configurations applicable to a specific node or to a group of nodes, we need to modify the configurations accordingly on those hosts.
For example, in case of Hortonworks Data Platform where the cluster is managed through Ambari, the concept of host config groups can be leveraged for this purpose.
Please see the below link for further information:
https://docs.hortonworks.com/HDPDocuments/Ambari-2.1.1.0/bk_Ambari_Users_Guide/content/_using_host_config_groups.html
Also see the below link, where the discussion is about increasing the number of YARN containers at a node level. It remains the same in your case as well, which is the opposite of the use case discussed there:
How to increase the number of containers in nodemanager in YARN
Another useful link:
http://hortonworks.com/blog/how-to-plan-and-configure-yarn-in-hdp-2-0/

Related

How does Hadoop framework decides the node to run Map job

As per my understanding, files stored in HDFS are divided into blocks and and each block is replicated to multiple nodes, 3 by default. How does Hadoop framework choose the node to run a map job, out of all the nodes on which a particular block is replicated.
As I know, there will be same amounts of map tasks as amounts of blocks.
See manual here.
Usually, framework choose those nodes close to the input block for reducing network bandwidth for map task.
That's all I know.
In Mapreduce 1 it depends on how many map task are running in that datanode which hosts a replica, because the number of map tasks is fixed in MR1. In MR2 there are no fixed slots, so it depends on number of tasks already running in that node.

Can a slave node have multiple blocks of the same file in hadoop?

Say I have a hadoop cluster where one node is the Master node and the other is a Data node. The slave node is an 8-core machine just to make sure there are enough cores to process jobs parallelly. Can i still split the file into say 3 blocks and have the slave node store all the three blocks separately on it. In other words, "if we want to utilize all the slave nodes in a hadoop cluster", then is there a 1:1 relation between number of slave nodes and the maximum number of blocks of a file? If yes, then in such a case how would the map-reduce work. Will the master node fire three map jobs to the slave node and have each mapper pick up each block on the slave node?
My question can be seen in a different way. If we have a 1GB file on a cluster with 3 data nodes then how do the 64 MB blocks get divided and how are they distributed between the three nodes?
The second question seems to be more understandable for me so I will take that first.
From HDFS Perspective:
With 64MB block size a 1GB file consists from 16 blocks, blocks are being stored somewhat randomly between DataNodes, if you have more from them as the replication factor, but you can expect an even distribution between the nodes, if you do not load the data from one of the DNs. If you do, that DN will hold a replica from all the blocks, and other DNs will hold the remaining replicas distributed sort of evenly (still randomly placed). So yes, if you have a file consists from 16 blocks, and only 3 DN with a replication factor of 3 all 3 DNs will hold all 16 blocks for example.
From YARN's perspective when you run the MapReduce job:
YARN tries to find a container on a node for a mapper that has the data locally, there is a configurable wait time for a free container on such nodes before YARN starts up the mapper on a node that does not have the data.
YARN does not rely on physical cores directly, you can configure the number of virtual cores and the amount of memory a container uses, and based on these values YARN will allocate the amount of available containers in a NodeManager.
Further reading on YARN tuning on Cloudera Engineering blog
However:
From the first part of the question as I understand you want to achieve paralellism by defining the block size to split your data files.
MapReduce does not care about HDFS blocks, it has its own abstraction to split the input, it is called InputSplit. InputSplits are feeded to the mappers, by the InputFormat. Also InputSplits are defining the place where the split is available locally so that YARN can find a container that is on a node that has the split on local data storage. I suggest to check the API, and the available implementations of InputFormat, as they most likely suit your needs, however if they are not, then you can still write your own implementation, and specify it via the job configuration.

Hadoop, uneven load between machines

I have a cluster of 4 machines that I need to run a benchmark against.
I decide to use Terasort to benchmark.
However, when I run the benchmark, only one out of four machine is under load, while the other three are completely idle.
If I run the test another time, a different machine would be completely under load while the other three would be idle.
When I create the dataset with Teragen everything works just fine, the load is evenly distributed between all the four machine.
What can be wrong in this configuration ?
Thanks
I hope your cluster is distributed properly as 4 nodes (1 name node , 1 secondary name node, 2 data nodes)
The process flow happens like it starts with name-node and job tracker will schedule the job for the task trackers which has the data blocks.
The usage of data-nodes depends on few factors like number of replication, number of mappers and number of blocks.
If The number of blocks are many, it will be placed evenly in all the data nodes of your cluster. If the replication factor is 2, then the blocks will be available in both the data nodes. So both can run the mappers which deal with those blocks
If you have two blocks for a file and two mappers will run simultaneously in the data nodes and utilize the resources properly.
In your case, it seems block size is the problem. Try to reduce it. so there should be at least 2 blocks which makes utilization will be more and so is the performance.
Hadoop can be tuned as per your need with the below settings.
dfs.replication in hdfs-site.xml
dfs.block.size in hdfs-site.xml
Good luck !!!

Actual need of Zookeepers

I am new to HBase and I am still learning it. I just wanted to know that how many Zookeepers do we actually need? Is it one per regionserver or one per cluster?Thanks
The zookeeper is per cluster, and not per regionserver.
From The hbase definitive guide:
How many ZooKeepers should I run? You can run a ZooKeeper ensemble
that comprises 1 node only but in production it is recommended that
you run a ZooKeeper ensemble of 3, 5 or 7 machines; the more members
an ensemble has, the more tolerant the ensemble is of host failures.
Also, run an odd number of machines. In ZooKeeper, an even number of
peers is supported, but it is normally not used because an even sized
ensemble requires, proportionally, more peers to form a quorum than an
odd sized ensemble requires. For example, an ensemble with 4 peers
requires 3 to form a quorum, while an ensemble with 5 also requires 3
to form a quorum. Thus, an ensemble of 5 allows 2 peers to fail, and
thus is more fault tolerant than the ensemble of 4, which allows only
1 down peer.
Give each ZooKeeper server around 1GB of RAM, and if possible, its own
dedicated disk (A dedicated disk is the best thing you can do to
ensure a performant ZooKeeper ensemble). For very heavily loaded
clusters, run ZooKeeper servers on separate machines from
RegionServers (DataNodes and TaskTrackers).

HADOOP HDFS imbalance issue

I have a Hadoop cluster that have 8 machines and all the 8 machines are data nodes.
There's a program running on one machine(say machine A) that will create sequence files ( each of the file is about 1GB) in HDFS continuously.
Here's the problem: All of the 8 machines are the same hardware and has the same capacity. When other machines still have about 50% free space on the disks for HDFS, machine A has only 5% left.
I checked the block info and found that almost every block has one replica on machine A.
Is there any way to balance the replicas?
Thanks.
This is the default placement policy. It works well for the typical M/R pattern, where each HDFS node is also a compute node and the writer machines are uniformly distributed.
If you don't like it, then there is HDFS-385 Design a pluggable interface to place replicas of blocks in HDFS. You need to write a class that implements BlockPlacementPolicy interface, and then set this class in as the dfs.block.replicator.classname in hdfs-site.xml.
There is a way. you can use hadoop command line balancer tool.
HDFS data might not always be be placed uniformly across the DataNode.To spread HDFS data uniformly across the DataNodes in the cluster, this can be used.
hadoop balancer [-threshold <threshold>]
where, threshold is Percentage of disk capacity
see the following links for details:
http://hadoop.apache.org/docs/r1.0.4/commands_manual.html
http://hadoop.apache.org/docs/r1.0.4/hdfs_user_guide.html#Rebalancer

Resources