Swapping of the RAM while Shuffling in HADOOP - hadoop

I work with hadoop 1.1.1. My project is processing more than 6000 documents. My cluster contains 2 nodes: master(CPU:COREi7, RAM:6G) and slave(CPU:COREi3, RAM:12G). The number of MAPPER is 16. When I assign the number of REDUCER more than 1(e.i. 2,...,16) at the phase of shuffling the RAM begins to SWAP and this causes a significant reduction on my system speed.
How can I stop the RAM from swapping?
What is kept in RAM in the process between MAP and REDUCE?
Is there any reference?
Thanks a lot.

So on the master:
6G physical RAM;
2G allocated per process;
8 mappers and 8 reducers can run concurrently;
8x2 + 8x2, 32G memory required if all tasks are maxed out - over 5x your physical amount.
On the slave:
12G physical RAM;
2G allocated per task;
4 mappers, 4 reducers;
4x2 + 4x2, 16G memory required - 50% more than physical.
Now if you're only running a single job at a time, you can set the slowstart configuration property to 1.0 to ensure that the mappers and reducers don't run concurrently and that will help, but you're still maxed out on the master.
I suggest you reduce either the memory allocation to 1G (if you really want that many map / reduce slots on each node), or reduce the maximum number of tasks for both nodes, such that you're closer to the physical amount (if running maxed out).

Related

Increasing assigned memory for a topology in Storm

I have a 10 node cluster with each machine of 8 GB RAM and when I run my topology, the assigned memory is always proportional to the number of workers. And each worker is approximately taking a 1 GB of Memory. I want to allot 2 GB to each worker I tried to set in storm.yaml as worker.childopts: "-Xmx6g -Xms6g" since I am running three workers in each node. But the assigned memory decreased to below 1 GB.
How to tune my topology better?
I am getting the following error in one of my bolts
java.lang.OutOfMemoryError: Java heap space at java.nio.HeapByteBuffer.(HeapByteBuffer.java:57) at java.nio.ByteBuffer.allocate(ByteBuffer.java:335) at org.apache.kafka.common.memory.MemoryPo
You can reference this link
https://github.com/apache/storm/blob/master/conf/defaults.yaml
I config 2 params:
worker.heap.memory.mb: 768
supervisor.memory.capacity.mb: 4096.0
If you have 10 workers you have to config supervisor.memory.capacity.mb = (768 x 10) It's capacity of supervisor
I think your configuration is slightly wrong. The worker.childopts setting is passed to each worker JVM, so when you set -Xmx6g -Xms6g you are giving each of your three workers 6 gigs of memory (18 total for the node)

YARN: maximum parallel Map task count

Following is mentioned in the Hadoop definitive guide
"What qualifies as a small job? By default one that has less than 10 mappers, only one reducer, and the input size is less than the size of one HDFS block. "
But how does it count no of mapper in a job before executing it on YARN ?
In MR1 number of mapper depends on the no. of input splits. is the same applies for the YARN as well ?
In YARN containers are flexible. So Is there any way for computing max number of map task that can run on a given cluster in parallel( some kind of tight upper bound, because it will give me rough idea about how much data i can process in parallel ? ) ?
But how does it count no of mapper in a job before executing it on YARN ? In MR1 number of mapper depends on the no. of input splits. is the same applies for the YARN as well ?
Yes, in YARN as well if you are using MapReduce based frameworks, the number of mappers depend on input splits.
In YARN containers are flexible. So Is there any way for computing max number of map task that can run on a given cluster in parallel( some kind of tight upper bound, because it will give me rough idea about how much data i can process in parallel ? ) ?
The number of map tasks that can run in parallel on the YARN cluster depends on how many containers that can be launched and run in parallel on the cluster. This ultimately depends on how you will configure MapReduce in the cluster, which is clearly explained clearly in this guide from cloudera.
mapreduce.job.maps = MIN(yarn.nodemanager.resource.memory-mb / mapreduce.map.memory.mb,yarn.nodemanager.resource.cpu-vcores / mapreduce.map.cpu.vcores, number of physical drives x workload factor) x number of worker nodes
mapreduce.job.reduces = MIN(yarn.nodemanager.resource.memory-mb / mapreduce.reduce.memory.mb,yarn.nodemanager.resource.cpu-vcores / mapreduce.reduce.cpu.vcores, # of physical drives xworkload factor) x # of worker nodes
The workload factor can be set to 2.0 for most workloads. Consider a higher setting for CPU-bound workloads.
yarn.nodemanager.resource.memory-mb( Available Memory on a node for containers )= Total System memory – Reserved memory( like 10-20% of memory for Linux and its daemon services) - HDFS Data node ( 1024 MB) – (resources for task buffers, such as the HDFS Sort I/O buffer) – (Memory allocated for DataNode( default 1024 MB), NodeManager, RegionServer etc.)
Hadoop is a disk I/O-centric platform by design. The number of independent physical drives (“spindles”) dedicated to DataNode use limits how much concurrent processing a node can sustain. As a result, the number of vcores allocated to the NodeManager should be the lesser of either:
[(total vcores) – (number of vcores reserved for non-YARN use)] or [ 2 x (number of physical disks used for DataNode storage)]
So
yarn.nodemanager.resource.cpu-vcores = min{ ((total vcores) – (number of vcores reserved for non-YARN use)), (2 x (number of physical disks used for DataNode storage))}
Available vcores on a node for containers = total no. of vcores – for operating system( For calculating vcore demand, consider the number of concurrent processes or tasks each service runs as an initial guide. For OS we take 2 ) – Yarn node Manager( Def. is 1) – HDFS data node( Def. is 1).
Note ==>
mapreduce.map.memory.mb is combination of both mapreduce.map.java.opts.max.heap + some head room (safety value)
The settings for mapreduce.[map | reduce].java.opts.max.heap specify the default memory allotted for mapper and reducer heap size, respectively.
The mapreduce.[map| reduce].memory.mb settings specify memory allotted their containers, and the value assigned should allow overhead beyond the task heap size. Cloudera recommends applying a factor of 1.2 to the mapreduce.[map | reduce].java.opts.max.heap setting. The optimal value depends on the actual tasks. Cloudera also recommends setting mapreduce.map.memory.mb to 1–2 GB and setting mapreduce.reduce.memory.mb to twice the mapper value. The ApplicationMaster heap size is 1 GB by default, and can be increased if your jobs contain many concurrent tasks.
Reference –
http://www.cloudera.com/content/cloudera/en/documentation/core/latest/topics/cdh_ig_yarn_tuning.html
http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.0.6.0/bk_installing_manually_book/content/rpm-chap1-11.html

Why more memory on hadoop map task make mapreduce job slower?

I want to ask. Why if I configured on mapred-site.xml in mapreduce.map/reduce.memory.mb and mapreduce.map/reduce.java.opts to bigger value than default value make my job slower?
But If I configured it too low, then I'll get task failed. And I think on this condition, my memory configuration on hadoop is not necessary...
Can you give me an explanation?
What might be happening in your environment is, when you increase values of the mapreduce.map/reduce.memory.mb and mapreduce.map/reduce.java.opts configurations to upper bound, it actually reduces the number of containers allowed to execute Map/Reduce task in every node thus eventually causes the slowness in the over all job time.
If you have 2 nodes, each with 25 GB of free ram , and say you configured the mapreduce.map/reduce.memory.mb as 4 GB, then you might get atleast 6 containers on every node, totally it is 12. So you would get a chance of running 12 mapper/reducer tasks in parallel.
In case if you configure mapreduce.map/reduce.memory.mb as 10 GB , then you might get only 2 containers on every node , totally it would be 4 containers to execute your mapper/reducer tasks in parallel. So the mapper/reducer tasks would mostly run in sequence due to lack of free containers, thus causes a delay in the over all job completion time.
You should justify the approprite value for the configuration with considering the resources available and the amount of resources required for the Map/Reduce containers according to your environment. Hope this makes sense.
you can allocate memory for map/reduce containers based on two factors
available memory per each Datanode
total number of cores(vcores) you have.
try to create number of containers equivalent to number of cores you have in each detained. ( including hyper threading)
for example if you have 10 physical core ( 20 cores including hyper threading)
so total number containers you can plan is 19 ( leaving 1 core for other processes)
assume that you have 'X' GB Ram in each data node, then
leave some memory(assume Y GB) for other processes (heap) like, Datanode, Node Manager,Region server ,etc
Now memory available for YARN is X -Y = Z
Memory for Map container = Y/number of containers per node
Memory for Reduce container = Y/(2 * number of containers per node)

Yarn and MapReduce resource configuration

I currently have a pseudo-distributed Hadoop System running. The machine has 8 cores (16 virtual cores), 32 GB Ram.
My input files are between a few MB to ~68 MB (gzipped log files, which get uploaded to my server once they reach >60MB hence no fix max size). I want to run some Hive jobs on about 500-600 of those files.
Due to the incongruent input file size, I havent changed blocksize in Hadoop so far. As I understand best-case scenario would be if blocksize = input file size, but will Hadoop fill that block until its full if the file is less than blocksize? And how does the size and amount of input files affect performance, as opposed to say one big ~40 GB file?
And how would my optimal configuration for this setup look like?
Based on this guide (http://hortonworks.com/blog/how-to-plan-and-configure-yarn-in-hdp-2-0/) I came up with this configuration:
32 GB Ram, with 2 GB reserved for the OS gives me 30720 MB that can be allocated to Yarn containers.
yarn.nodemanager.resource.memory-mb=30720
With 8 cores I thought a maximum of 10 containers should be safe. So for each container (30720 / 10) 3072 MB of RAM.
yarn.scheduler.minimum-allocation-mb=3072
For Map Task Containers I doubled the minimum container size, which would allow for a maximum of 5 Map Tasks
mapreduce.map.memory.mb=6144
And if I want a maximum of 3 Reduce task I allocate:
mapreduce.map.memory.mb=10240
With JVM heap size to fit into the containers:
mapreduce.map.java.opts=-Xmx5120m
mapreduce.reduce.java.opts=-Xmx9216m
Do you think this configuration would be good, or would you change anything, and why?
Yeah, this configuration is good. But few changes I would like to mention.
For reducer memory, it should be
mapreduce.reduce.memory.mb=10240(I think its just a typo.)
Also one major addition I will suggest will be the cpu configuration.
you should put
Container Virtual CPU Cores=15
for Reducer as you are running only 3 reducers, you can give
Reduce Task Virtual CPU Cores=5
And for Mapper
Mapper Task Virtual CPU Cores=3
number of containers that will be run in parallel in (reducer OR
mapper) = min(total ram / mapreduce.(reduce OR map).memory.mb, total
cores/ (Map OR Reduce) Task Virtual CPU Cores).
Please refer http://openharsh.blogspot.in/2015/05/yarn-configuration.html for detailed understading.

How to decide on number of parallel mapers/reducers along with Heap memory?

Say I have a EMR job running on 11 node cluster: m1.small master node while 10 m1.xlarge slave nodes.
Now one m1.xlarge node has 15 GB of RAM.
How to then decide on the number of parallel mappers and reducers which can be set?
My jobs are memory intensive and I would like to have more and more of heap allotted to JVM.
Another related question:
If we set the following parameter:
<property><name>mapred.child.java.opts</name><value>-Xmx4096m</value></property>
<property><name>mapred.job.reuse.jvm.num.tasks</name><value>1</value></property>
<property><name>mapred.tasktracker.map.tasks.maximum</name><value>2</value></property>
<property><name>mapred.tasktracker.reduce.tasks.maximum</name><value>2</value></property>
So will this 4GB be shared by 4 processes (2 mapper and 2 reducer) or will they all get 4GB each?
They will each get 4gb.
You should check what your heap setting is for the task trackers and the data nodes, then you'll have an idea of how much memory you have left over to allocate to children (the actual mappers / reducers).
Then it's just a balancing act. If you need more memory, you'll want less mappers / reducers, and vice versa.
Also try to keep in mind how many cores your CPU has, you don't want 100 map tasks on a single core. To tweak, it's best to monitor both heap usage and cpu utilization over time so you can fiddle with the knobs.

Resources