So I am having a cloudera cluster with 7 worker nodes.
30GB RAM
4 vCPUs
Here are some of my configurations which I found important (from Google) in tuning performance of my cluster. I am running with:
yarn.nodemanager.resource.cpu-vcores => 4
yarn.nodemanager.resource.memory-mb => 17GB (Rest reserved for OS and other processes)
mapreduce.map.memory.mb => 2GB
mapreduce.reduce.memory.mb => 2GB
Running nproc => 4 (Number of processing units available)
Now my concern is, when I look at my ResourceManager, I see Available Memory as 119 GB which is fine. But when I run a heavy sqoop job and my cluster is at its peak it uses only ~59 GB of memory, leaving ~60 GB memory unused.
One way which I see, can fix this unused memory issue is increasing map|reduce.memory to 4 GB so that we can use upto 16 GB per node.
Other way is to increase the number of containers, which I am not sure how.
4 cores x 7 nodes = 28 possible containers. 3 being used by other processes, only 5 are currently being available for sqoop job.
What should be the right config to improve cluster performance in this case. Can I increase the number of containers, say 2 containers per core. And is it recommended?
Any help or suggestions on the cluster configuration would be highly appreciated. Thanks.
If your input data is in 26 splits, YARN will create 26 mappers to process those splits in parallel.
If you have 7 nodes with 2 GB mappers for 26 splits, the repartition should be something like:
Node1 : 4 mappers => 8 GB
Node2 : 4 mappers => 8 GB
Node3 : 4 mappers => 8 GB
Node4 : 4 mappers => 8 GB
Node5 : 4 mappers => 8 GB
Node6 : 3 mappers => 6 GB
Node7 : 3 mappers => 6 GB
Total : 26 mappers => 52 GB
So the total memory used in your map reduce job if all mappers are running at the same time will be 26x2=52 GB. Maybe if you add the memory user by the reducer(s) and the ApplicationMaster container, you can reach your 59 GB at some point, as you said ..
If this is the behaviour you are witnessing, and the job is finished after those 26 mappers, then there is nothing wrong. You only need around 60 GB to complete your job by spreading tasks across all your nodes without needing to wait for container slots to free themselves. The other free 60 GB are just waiting around, because you don't need them. Increasing heap size just to use all the memory won't necessarily improve performance.
Edited:
However, if you still have lots of mappers waiting to be scheduled, then maybe its because your installation insconfigured to calculate container allocation using vcores as well. This is not the default in Apache Hadoop but can be configured:
yarn.scheduler.capacity.resource-calculator :
The ResourceCalculator implementation to be used to compare Resources in the scheduler. The default i.e. org.apache.hadoop.yarn.util.resource.DefaultResourseCalculator only uses Memory while DominantResourceCalculator uses Dominant-resource to compare multi-dimensional resources such as Memory, CPU etc. A Java ResourceCalculator class name is expected.
Since you defined yarn.nodemanager.resource.cpu-vcores to 4, and since each mapper uses 1 vcore by default, you can only run 4 mappers per node at a time.
In that case you can double your value of yarn.nodemanager.resource.cpu-vcores to 8. Its just an arbitrary value it should double the number of mappers.
Related
I have a 10 node cluster with each machine of 8 GB RAM and when I run my topology, the assigned memory is always proportional to the number of workers. And each worker is approximately taking a 1 GB of Memory. I want to allot 2 GB to each worker I tried to set in storm.yaml as worker.childopts: "-Xmx6g -Xms6g" since I am running three workers in each node. But the assigned memory decreased to below 1 GB.
How to tune my topology better?
I am getting the following error in one of my bolts
java.lang.OutOfMemoryError: Java heap space at java.nio.HeapByteBuffer.(HeapByteBuffer.java:57) at java.nio.ByteBuffer.allocate(ByteBuffer.java:335) at org.apache.kafka.common.memory.MemoryPo
You can reference this link
https://github.com/apache/storm/blob/master/conf/defaults.yaml
I config 2 params:
worker.heap.memory.mb: 768
supervisor.memory.capacity.mb: 4096.0
If you have 10 workers you have to config supervisor.memory.capacity.mb = (768 x 10) It's capacity of supervisor
I think your configuration is slightly wrong. The worker.childopts setting is passed to each worker JVM, so when you set -Xmx6g -Xms6g you are giving each of your three workers 6 gigs of memory (18 total for the node)
I have looked at the answer to
Why is Spark detecting 8 cores, when I only have 4?
And it doesn't seem to explain the following scenario: I am setting the spark.executor.cores at 5. I have spark.dynamicAllocation.enabled set to true. According to the Spark History Server, my 10 node cluster is running 30 executors, indicating that spark is using 3 executors per node. This seems to suggest that 15 cores are available (3 executors x 5 cores) per node. The specs for an m4.xlarge instance are 4 vCPUs with 16 GB of memory. Where are these extra cores coming from?
Note: I am setting spark.executor.memory at 3g and yarn.nodemanager.resource.memory-mb at 12200.
Summary
When I run a simple select count(*) from table query in hive only two nodes in my large cluster are being used for mapping. I would like to use the whole cluster.
Details
I am using a somewhat large cluster (tens of nodes each more than 200 GB RAM) running hdfs and Hive 1.2.1 (IBM-12).
I have a table of several billion rows. When I perform a simple
select count(*) from mytable;
hive creates hundreds of map tasks, but only 4 are running simultaneously.
This means that my cluster is mostly idle during the query which seems wasteful. I have tried ssh'ing to the nodes in use and they are not utilizing CPU or memory fully. Our cluster is backed by Infiniband networking and Isilon file storage neither of which seems very loaded at all.
We are using mapreduce as the engine. I have tried removing any limits to resources that I could find, but it does not change the fact that only two nodes are being used (4 concurrent mappers).
The memory settings are as follows:
yarn.nodemanager.resource.memory-mb 188928 MB
yarn.scheduler.minimum-allocation-mb 20992 MB
yarn.scheduler.maximum-allocation-mb 188928 MB
yarn.app.mapreduce.am.resource.mb 20992 MB
mapreduce.map.memory.mb 20992 MB
mapreduce.reduce.memory.mb 20992 MB
and we are running on 41 nodes. By my calculation I should be able to get 41*188928/20992 = 369 map/reduce tasks. Instead I get 4.
Vcore settings:
yarn.nodemanager.resource.cpu-vcores 24
yarn.scheduler.minimum-allocation-vcores 1
yarn.scheduler.maximum-allocation-vcores 24
yarn.app.mapreduce.am.resource.cpu-vcores 1
mapreduce.map.cpu.vcores 1
mapreduce.reduce.cpu.vcores 1
Is there are way to get hive/mapreduce to use more of my cluster?
How would a go about figuring out the bottle neck?
Could it be that Yarn is not assigning tasks fast enough?
I guess that using tez would improve performance, but I am still interested in why resources utilization is so limited (and we do not have it installed ATM).
Running parallel tasks depends on your memory setting in yarn
for example if you have 4 data nodes and your yarn memory properties are defined as below
yarn.nodemanager.resource.memory-mb 1 GB
yarn.scheduler.minimum-allocation-mb 1 GB
yarn.scheduler.maximum-allocation-mb 1 GB
yarn.app.mapreduce.am.resource.mb 1 GB
mapreduce.map.memory.mb 1 GB
mapreduce.reduce.memory.mb 1 GB
according to this setting you have 4 data nodes so total yarn.nodemanager.resource.memory-mb will be 4 GB that you can use to launch container
and since container can take 1 GB memory so it means at any given point of time you can launch 4 container , one will be used by application master so you can have maximum 3 mapper or reducer tasks can ran at any given point of time since application master,mapper and reducer each is using 1 GB memory
so you need to increase yarn.nodemanager.resource.memory-mb to increase the number of map/reduce task
P.S. - Here we are taking about maximum tasks that can be launched,it may be some less than that also
I'd maximize Hadoop performance in a distributed environment (using Apache Spark with Yarn) and I'm following the hints on a blog post of Cloudera with this configuration:
6 nodes, 16 core/node, ram 64G/node
and the proposed solution is:
--num-executors 17 --executor-cores 5 --executor-memory 19G
But i didn't understand why they use 17 num executors (in other words 3 executors for each node).
Our configuration is instead:
8 nodes, 8 core/node, ram 8G/node
What is the best solution?
Your ram is pretty low. I would expect this to be higher.
But, we start off with 8 nodes, and 8 cores. To determine our max executors we do nodes*(cores-1) = 56. Minus 1 core from each node for management.
So I would start off with
56 executors, 1 executor core, 1G ram.
If you have out of memory issues, double the ram, have the executors, up the cores.
28 executors, 2 executor cores, 2G ram
but your max executors will be less, because an executor must fit onto a node. You will be able to get a total of 24 allocated containers max.
I would try 3 cores before 4 cores next, as 3 cores will fit 2 executors on each node, while with 4 cores you will have the same executors as 7.
Or, you can skip right to...
8 executors, 7 cores, 7gig ram(want to leave some for the rest of cluster).
I also found if CPU Scheduling was disabled, yarn was overriding my cores setting, and it was always staying at 1, no matter my config. Other settings must also be changed to turn this on.
yarn.schedular.capacity.resource-calculator=org.apache.hadoop.yarn.util.resource.DominantResourceCalculator
I have this configuration:
Hadoop: v2.7.1 (Yarn)
An input file: Size = 100 GB.
3 Slaves: each has 4 VCORES with Speed = 2 GHz and RAM = 8 GB
5 Slaves: each has 2 VCORES with Speed = 1 GHz and RAM = 2 GB
MapReduce program: WordCount
How can I minimize WordCount execution time by assigning small input splits to the 5 slower slaves and big input splits to the 3 fastest slaves?
For each machine you can determine number of map/reduce slots, so if you want to send less workload to the slower machines you can define, for example 2 map/reduce task slots for each slower machine and 4 map/reduce task slot for each of the fast machines. This way you can control how much work load each different node in the cluster receives.