Hadoop - MapReduce runs extremely slow when using YARN

Hadoop - MapReduce runs extremely slow when using YARN - hadoop

I know there is another question about this, but there are no answers yet, so I'm going to try and ask it in a more detail.
I am running a map-reduce job using Hadoop 2.2.0 on a 2 node cluster that I have setup on Amazon 2 EC2 instances; the master node is a medium instance and the slave node is also a medium instance. It runs extremely slowly, it takes over 17 minutes, but when I run the same exact job on the same cluster without yarn it runs in under 1 minute. Here is what my mapred-site.xml looks like:
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
If I change the mapreduce.framework property to 'local, so that the file simply reads:
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>local</value>
</property>
</configuration>
I can then run the same map-reduce job in less than a minute. However, I would like to use YARN, so that I can track the map-reduce job through the webapp. When I run the job with the mapreduce.framework property set to yarn it takes 17+ minutes to run the same exact job. I cannot imagine that YARN would slow down a map-reduce job to such an extreme level.
I am also using "top" to track my CPU usage, and it seems that when I run it with yarn, the CPU usage is split between the different nodes, however, when I change run it with "local" all of the processing is done on the master node. I'm not sure how this makes sense, because it seems to me, that when the CPU processing is split between the different nodes, it should run faster, not slower. Is there something I forgot to configure in Hadoop to make running on a cluster faster?
Here are the rest of my configuration files:
core-site.xml
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://namenode:8020</value>
</property>
</configuration>
hdfs-site.xml
<configuration>
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
<property>
<name>dfs.permissions</name>
<value>false</value>
</property>
<property>
<name>fs.checkpoint.dir</name>
<value>file:/home/ubuntu/hadoop/hdfs/snn</value>
</property>
<property>
<name>fs.checkpoint.edits.dir</name>
<value>file:/home/ubuntu/hadoop/hdfs/snn</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/home/ubuntu/hadoop/hdfs/nn</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/home/ubuntu/hadoop/hdfs/nn</value>
</property>
</configuration>
yarn-site.xml
<configuration>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>namenode:8031</value>
</property>
<property>
<name>yarn.resourcemanager.address</name>
<value>namenode:8032</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>namenode:8030</value>
</property>
<property>
<name>yarn.resourcemanager.admin.address</name>
<value>namenode:8033</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address</name>
<value>namenode:8088</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
</configuration>
Is there something wrong with the way I set this up? Has anyone else ran into this problem? Any help will be greatly appreciated, Thanks!

I wish i still remember where I read this so I can give you a reference. You won't benefit with yarn unless you have large size cluster.

Related

No Disk Space allocated to HDFS filesystem

I am trying to set up Hadoop on my local machine. However, when I'm running wordcount based on map reduce example (I did hdfs namenode -format earlier) :
This is maybe hard to read but I end up with a "Job failed with state FAILED due to:
Application failed 2 times
due to AM Container exited with
exitCode: -1000 Failing this attempt.Diagnostics: No space available in any of the local directories."
I don't understand why I have such error. This is how my applications & attempt look like :
I followed several tutorials, ending up with these parameters :
mapred-site.xml :
configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
yarn-site.xml :
<configuration>
<!-- Site specific YARN configuration properties -->
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.auxservices.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
<name>yarn.nodemanager.disk-health-checker.enable</name>
<value>false</value>
</property>
<property>
<name>yarn.application.classpath</name>
<value>
%HADOOP_HOME%\etc\hadoop,
%HADOOP_HOME%\share\hadoop\common\*,
%HADOOP_HOME%\share\hadoop\common\lib\*,
%HADOOP_HOME%\share\hadoop\hdfs\*,
%HADOOP_HOME%\share\hadoop\hdfs\lib\*,
%HADOOP_HOME%\share\hadoop\mapreduce\*,
%HADOOP_HOME%\share\hadoop\mapreduce\lib\*,
%HADOOP_HOME%\share\hadoop\yarn\*,
%HADOOP_HOME%\share\hadoop\yarn\lib\*
</value>
</property>
</configuration>
core-site.xml :
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>
hdfs-site.xml :
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:///C:/hadoop-3.3.0/data/namenode</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:///C:/hadoop-3.3.0/data/datanode</value>
</property>
</configuration>
Can you help me with this please ? I already tried what is mentionned in these questions :
Hadoop errorcode -1000, No space available in any of the local directories, except for the namenode emptying cache part.
Hadoop Windows setup. Error while running WordCountJob: "No space available in any of the local directories"
what do you think ?
Thank you !

Odd Hadoop behavior, master performing all work?

I have setup a cluster using this guide: https://medium.com/#jootorres_11979/how-to-set-up-a-hadoop-3-2-1-multi-node-cluster-on-ubuntu-18-04-2-nodes-567ca44a3b12
Currently I have one datanode and one master node.
What happens when I run a Hadoop job is that, the datanode's network activity indicates that it is sending a lot of data and the namenode receives that data. Also, the namenode's CPU is utilized fully while the datanode's CPU is not used at all. See the figure:
The nodes are VMs on the same machine. This happens for several different scripts, the figure is from running a WordCount algorithm.
Why is the work not being performed on the datanode? What could cause such a behavior?
Any help is appreciated.

According to the guide, mapred-site.xml was not changed. This means that the default values are used. The default for mapreduce.framework.name is "local". This means that all calculations will be performed locally. This must be changed to "yarn".
I created the following mapred-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>yarn.app.mapreduce.am.env</name>
<value>HADOOP_MAPRED_HOME=/usr/local/hadoop</value>
</property>
<property>
<name>mapreduce.map.env</name>
<value>HADOOP_MAPRED_HOME=/usr/local/hadoop</value>
</property>
<property>
<name>mapreduce.reduce.env</name>
<value>HADOOP_MAPRED_HOME=/usr/local/hadoop</value>
</property>
<property>
<name>mapreduce.application.classpath</name>
<value> $HADOOP_MAPRED_HOME/share/hadoop/mapreduce/*
$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/lib/*
</value>
</property>
</configuration>
I also had to change yarn-site.xml to:
<?xml version="1.0"?>
<configuration>
<property>
<name>yarn.resourcemanager.hostname</name>
<value>hadoop-master</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
</configuration>
After restarting yarn and Hadoop, everything worked as expected. The work is performed at datanodes.

Hadoop Cluster - Zero Memory

I have two VM setup for the Hadoop cluster, as below.
VM-MASTER, 4GB Memory
VM-SLAVE, 4GB Memory
I have the following config for yarn-site.xml. When I goto http://VM-MASTEr:8088/cluster. I see Memory Total is 0, and VCores Total is 0.
Am I missing something here?
I think this problem caused the job I submitted always in ACCEPTED state, and never move into RUNNING state. I'm using Hadoop 2.8.0.
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.scheduler.minimum-allocation-mb</name>
<value>500</value>
</property>
<property>
<name>yarn.nodemanager.resource.memory-mb</name>
<value>2048</value>
</property>
<property>
<name>yarn.nodemanager.resource.cpu-vcores</name>
<value>1</value>
</property>
</configuration>

MapReduce job hangs, waiting for AM container to be allocated

I tried to run simple word count as MapReduce job. Everything works fine when run locally (all work done on Name Node). But, when I try to run it on a cluster using YARN (adding mapreduce.framework.name=yarn to mapred-site.conf) job hangs.
I came across a similar problem here:
MapReduce jobs get stuck in Accepted state
Output from job:
*** START ***
15/12/25 17:52:50 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
15/12/25 17:52:51 WARN mapreduce.JobResourceUploader: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
15/12/25 17:52:51 INFO input.FileInputFormat: Total input paths to process : 5
15/12/25 17:52:52 INFO mapreduce.JobSubmitter: number of splits:5
15/12/25 17:52:52 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1451083949804_0001
15/12/25 17:52:53 INFO impl.YarnClientImpl: Submitted application application_1451083949804_0001
15/12/25 17:52:53 INFO mapreduce.Job: The url to track the job: http://hadoop-droplet:8088/proxy/application_1451083949804_0001/
15/12/25 17:52:53 INFO mapreduce.Job: Running job: job_1451083949804_0001
mapred-site.xml:
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.job.tracker</name>
<value>localhost:54311</value>
</property>
<!--
<property>
<name>mapreduce.job.tracker.reserved.physicalmemory.mb</name>
<value></value>
</property>
<property>
<name>mapreduce.map.memory.mb</name>
<value>1024</value>
</property>
<property>
<name>mapreduce.reduce.memory.mb</name>
<value>2048</value>
</property>
<property>
<name>yarn.app.mapreduce.am.resource.mb</name>
<value>3000</value>
<source>mapred-site.xml</source>
</property> -->
</configuration>
yarn-site.xml
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<!--
<property>
<name>yarn.nodemanager.resource.memory-mb</name>
<value>3000</value>
<source>yarn-site.xml</source>
</property>
<property>
<name>yarn.scheduler.minimum-allocation-mb</name>
<value>500</value>
</property>
<property>
<name>yarn.scheduler.capacity.maximum-am-resource-percent</name>
<value>3000</value>
</property>
-->
</configuration>
//I the left commented options - they were not solving the problem
YarnApplicationState: ACCEPTED: waiting for AM container to be allocated, launched and register with RM.
What can be the problem?
EDIT:
I tried this configuration (commented) on machines: NameNode(8GB RAM) + 2x DataNode (4GB RAM). I get the same effect: Job hangs on ACCEPTED state.
EDIT2:
changed configuration (thanks #Manjunath Ballur) to:
yarn-site.xml:
<configuration>
<property>
<name>yarn.resourcemanager.hostname</name>
<value>hadoop-droplet</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>hadoop-droplet:8031</value>
</property>
<property>
<name>yarn.resourcemanager.address</name>
<value>hadoop-droplet:8032</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>hadoop-droplet:8030</value>
</property>
<property>
<name>yarn.resourcemanager.admin.address</name>
<value>hadoop-droplet:8033</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address</name>
<value>hadoop-droplet:8088</value>
</property>
<property>
<description>Classpath for typical applications.</description>
<name>yarn.application.classpath</name>
<value>
$HADOOP_CONF_DIR,
$HADOOP_COMMON_HOME/*,$HADOOP_COMMON_HOME/lib/*,
$HADOOP_HDFS_HOME/*,$HADOOP_HDFS_HOME/lib/*,
$HADOOP_MAPRED_HOME/*,$HADOOP_MAPRED_HOME/lib/*,
$YARN_HOME/*,$YARN_HOME/lib/*
</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce.shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
<name>yarn.nodemanager.local-dirs</name>
<value>/data/1/yarn/local,/data/2/yarn/local,/data/3/yarn/local</value>
</property>
<property>
<name>yarn.nodemanager.log-dirs</name>
<value>/data/1/yarn/logs,/data/2/yarn/logs,/data/3/yarn/logs</value>
</property>
<property>
<description>Where to aggregate logs</description>
<name>yarn.nodemanager.remote-app-log-dir</name>
<value>/var/log/hadoop-yarn/apps</value>
</property>
<property>
<name>yarn.scheduler.minimum-allocation-mb</name>
<value>50</value>
</property>
<property>
<name>yarn.scheduler.maximum-allocation-mb</name>
<value>390</value>
</property>
<property>
<name>yarn.nodemanager.resource.memory-mb</name>
<value>390</value>
</property>
</configuration>
mapred-site.xml:
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>yarn.app.mapreduce.am.resource.mb</name>
<value>50</value>
</property>
<property>
<name>yarn.app.mapreduce.am.command-opts</name>
<value>-Xmx40m</value>
</property>
<property>
<name>mapreduce.map.memory.mb</name>
<value>50</value>
</property>
<property>
<name>mapreduce.reduce.memory.mb</name>
<value>50</value>
</property>
<property>
<name>mapreduce.map.java.opts</name>
<value>-Xmx40m</value>
</property>
<property>
<name>mapreduce.reduce.java.opts</name>
<value>-Xmx40m</value>
</property>
</configuration>
Still not working.
Additional info: I can see no nodes on cluster preview (similar problem here: Slave nodes not in Yarn ResourceManager )

You should check the status of Node managers in your cluster. If the NM nodes are short on disk space then RM will mark them "unhealthy" and those NMs can't allocate new containers.
1) Check the Unhealthy nodes: http://<active_RM>:8088/cluster/nodes/unhealthy
If the "health report" tab says "local-dirs are bad" then it means you need to cleanup some disk space from these nodes.
2) Check the DFS dfs.data.dir property in hdfs-site.xml. It points the location on local file system where hdfs data is stored.
3) Login to those machines and use df -h & hadoop fs - du -h commands to measure the space occupied.
4) Verify hadoop trash and delete it if it's blocking you.
hadoop fs -du -h /user/user_name/.Trash and hadoop fs -rm -r /user/user_name/.Trash/*

I feel, you are getting your memory settings wrong.
To understand the tuning of YARN configuration, I found this to be a very good source: http://www.cloudera.com/content/www/en-us/documentation/enterprise/latest/topics/cdh_ig_yarn_tuning.html
I followed the instructions given in this blog and was able to get my jobs running. You should alter your settings proportional to the physical memory you have on your nodes.
Key things to remember is:
Values of mapreduce.map.memory.mb and mapreduce.reduce.memory.mb should be at least yarn.scheduler.minimum-allocation-mb
Values of mapreduce.map.java.opts and mapreduce.reduce.java.opts should be around "0.8 times the value of" corresponding mapreduce.map.memory.mb and mapreduce.reduce.memory.mb configurations. (In my case it is 983 MB ~ (0.8 * 1228 MB))
Similarly, value of yarn.app.mapreduce.am.command-opts should be "0.8 times the value of" yarn.app.mapreduce.am.resource.mb
Following are the settings I use and they work perfectly for me:
yarn-site.xml:
<property>
<name>yarn.scheduler.minimum-allocation-mb</name>
<value>1228</value>
</property>
<property>
<name>yarn.scheduler.maximum-allocation-mb</name>
<value>9830</value>
</property>
<property>
<name>yarn.nodemanager.resource.memory-mb</name>
<value>9830</value>
</property>
mapred-site.xml
<property>
<name>yarn.app.mapreduce.am.resource.mb</name>
<value>1228</value>
</property>
<property>
<name>yarn.app.mapreduce.am.command-opts</name>
<value>-Xmx983m</value>
</property>
<property>
<name>mapreduce.map.memory.mb</name>
<value>1228</value>
</property>
<property>
<name>mapreduce.reduce.memory.mb</name>
<value>1228</value>
</property>
<property>
<name>mapreduce.map.java.opts</name>
<value>-Xmx983m</value>
</property>
<property>
<name>mapreduce.reduce.java.opts</name>
<value>-Xmx983m</value>
</property>
You can also refer to the answer here: Yarn container understanding and tuning
You can add vCore settings, if you want your container allocation to take into account CPU also. But, for this to work, you need to use CapacityScheduler with DominantResourceCalculator. See the discussion about this here: How are containers created based on vcores and memory in MapReduce2?

This has solved my case for this error:
<property>
<name>yarn.scheduler.capacity.maximum-am-resource-percent</name>
<value>100</value>
</property>

Check your hosts file on master and slave nodes. I had exactly this problem. My hosts file looked like this on master node for example
127.0.0.0 localhost
127.0.1.1 master-virtualbox
192.168.15.101 master
I changed it like below
192.168.15.101 master master-virtualbox localhost
So it worked.

These lines
<property>
<name>yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage</name>
<value>100</value>
</property>
in the yarn-site.xml solved my problem since the node will be marked as unhealthy when disk usage is >=95%. Solution mainly suitable for pseudodistributed mode.

You have 512 MB RAM on each of the instance and all your memory configurations in yarn-site.xml and mapred-site.xml are 500 MB to 3 GB. You will not be able to run any thing on the cluster. Change every thing to ~256 MB.
Also your mapred-site.xml is using framework to by yarn and you have job tracker address which is not correct. You need to have resource manager related parameters in yarn-site.xml on a multinode cluster (including resourcemanager web address). With out that, the cluster does not know where your cluster is.
You need to revisit both your xml files.

anyway that's work for me .thank you a lot! #KaP
that's my yarn-site.xml
<property>
<name>yarn.resourcemanager.hostname</name>
<value>MacdeMacBook-Pro.local</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address</name>
<value>${yarn.resourcemanager.hostname}:8088</value>
</property>
<property>
<name>yarn.nodemanager.resource.memory-mb</name>
<value>4096</value>
</property>
<property>
<name>yarn.scheduler.minimum-allocation-mb</name>
<value>2048</value>
</property>
<property>
<name>yarn.nodemanager.vmem-pmem-ratio</name>
<value>2.1</value>
that's my mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>

The first thing is to check yarn resource manager logs. I had searched the Internet about this problem for a very long time, but nobody told me how to find out what is really happening. It's so straightforward and simple to check yarn resource manager logs. I am confused why people ignore logs.
For me, there was a error in log
Caused by: org.apache.hadoop.net.ConnectTimeoutException: 20000 millis timeout while waiting for channel to be ready for connect. ch : java.nio.channels.SocketChannel[connection-pending remote=172.16.0.167/172.16.0.167:55622]
That's because I switched wifi network in my work place, so my computer IP changed.

Old question, but I got on the same issue recently and in my case it was due to manually setting the master to local in the code.
Please, search for conf.setMaster("local[*]") and remove it.
Hope it helps.

Hadoop 2.5.1 job stuck at map 0% and reduce 0%

I am trying to run a word count example. My current testing setup is:
NameNode and ResourceManager on one machine (10.38.41.134).
DataNode and NodeManager on another (10.38.41.135).
They can ssh between them without passwords.
When reading the logs, I don't get any warnings, except a security warning (I didn't set it up for testing) and a containermanager.AuxServices 'mapreduce_shuffle' warning. Upon submitting the example job, nodes react to it and output logs, which suggests that they can communicate well. NodeManager outputs memory usage, but the job doesn't budge.
Where should I even start looking for problems? Everything else I could find is either old or non-relevant. I followed the official cluster setup tutorial for version 2.5.1 which left way too many questions unanswered.
My conf files are following:
core-site.xml
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://10.38.41.134:9000</value>
</property>
</configuration>
hdfs-site.xml
<configuration>
<property>
<name>dfs.namenode.rpc-bind-host</name>
<value>0.0.0.0</value>
</property>
<property>
<name>dfs.namenode.servicerpc-bind-host</name>
<value>0.0.0.0</value>
</property>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.client.block.write.replace-datanode-on-failure.enable</name>
<value>NEVER</value>
<description>
</description>
</property>
<property>
<name>dfs.namenode.datanode.registration.ip-hostname-check</name>
<value>false</value>
</property>
</configuration>
mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
<description>The runtime framework for executing MapReduce jobs.
Can be one of local, classic or yarn.
</description>
</property>
</configuration>
yarn-site.xml
<configuration>
<property>
<name>yarn.nodemanager.delete.debug-delay-sec</name>
<value>300</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
<name>mapreduce.jobtracker.address</name>
<value>10.38.41.134:50030</value>
</property>
</configuration>
Everything else is default.

I suggest you first try to get it working with a single server cluster so it's easier to debug.
When that is working, continue with two nodes.
As already suggested, memory might be an issue. Without tweaking the settings, it seems some 2GB is the minimum and I'd recommend at least 4GB per server. Also remember to check also the job's logs (under logs/userlogs, especially syslog).
P.S. I share your frustration about old / non-relevant documentation.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Hadoop - MapReduce runs extremely slow when using YARN - hadoop

I wish i still remember where I read this so I can give you a reference. You won't benefit with yarn unless you have large size cluster.

Related

No Disk Space allocated to HDFS filesystem

Odd Hadoop behavior, master performing all work?

Hadoop Cluster - Zero Memory

MapReduce job hangs, waiting for AM container to be allocated

Hadoop 2.5.1 job stuck at map 0% and reduce 0%

Categories

Resources