Configure Yarn with Hadoop 2.7.4 resources issue - hadoop

I have configured hadoop 2.7.4 by following this tutorial. DataNode, NameNode and SecondaryNameNode are working properly.
But when I run yarn, NodeManager goes down with the following message
org.apache.hadoop.yarn.exceptions.YarnRuntimeException:
org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Recieved
SHUTDOWN signal from Resourcemanager ,Registration of NodeManager
failed, Message from ResourceManager: NodeManager from localhost
doesn't satisfy minimum allocations, Sending SHUTDOWN signal to the
NodeManager.
My system has 8 cpu with 8 GB RAM. How to configure yarn with these resources? I have found a lot such as this but could not find any solution that solve my problem.

I had the same problem during a course. We were using Amazon virtual machines with 2 cores.
After various modifications in yarn-site.xml, we got our NodeManager running setting the following properties
<property>
<name>yarn.nodemanager.resource.memory-mb</name>
<value>4096</value>
</property>
<property>
<name>yarn.nodemanager.resource.cpu-vcores</name>
<value>2</value>
</property>
In your case, you may need to establish 8 virtual cores.

Related

Hadoop-Installation-Multinode

Hi all I am trying to install the multinode hadoop installation. Everything works fine but my nodemanager for yarn is not working. When I looked at the log file for Yarn nodemanager, I got following information
"org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl:
Initialized nodemanager for null: physical-memory=-1 virtual-memory=-2
virtual-cores=-1"
I have no idea why its not showing the actual memory and virtual core. My VM has 8GB memory and 8Vcpus. Because of above values I am getting this error:
"org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Recieved
SHUTDOWN signal from Resourcemanager ,Registration of NodeManager
failed, Message from ResourceManager: NodeManager from SFeUbuntuVM2
doesn't satisfy minimum allocations, Sending SHUTDOWN signal to the
NodeManager"
Can someone help me out with this issue?
Check if you have
Selinux disabled
firewall disabled
Check your configuration files.
mapred-site.xml
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
yarn-site.xml
<property>
<name>yarn.resourcemanager.hostname</name>
<value>{your host name}</value>
</property>
After all do format your namenode, and start all services again.

hadoop Protocol message tag had invalid wire type

I Set up hadoop 2.6 cluster using two nodes of 8 cores each on Ubuntu 12.04. sbin/start-dfs.sh and sbin/start-yarn.sh both succeed. And I can see the following after jps on the master node.
22437 DataNode
22988 ResourceManager
24668 Jps
22748 SecondaryNameNode
23244 NodeManager
The jps outcome on the slave node is
19693 DataNode
19966 NodeManager
I then run the PI example.
bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.0.jar pi 30 100
Which gives me there error-log
java.io.IOException: Failed on local exception: com.google.protobuf.InvalidProtocolBufferException: Protocol message tag had invalid wire type.; Host Details : local host is: "Master-R5-Node/xxx.ww.y.zz"; destination host is: "Master-R5-Node":54310;
at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:772)
at org.apache.hadoop.ipc.Client.call(Client.java:1472)
at org.apache.hadoop.ipc.Client.call(Client.java:1399)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232)
at com.sun.proxy.$Proxy9.getFileInfo(Unknown Source)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:752)
The problem seems with the HDFS file system since trying out the command bin/hdfs dfs -mkdir /user fails with the similar exception.
java.io.IOException: Failed on local exception: com.google.protobuf.InvalidProtocolBufferException: Protocol message tag had invalid wire type.; Host Details : local host is: "Master-R5-Node/xxx.ww.y.zz"; destination host is: "Master-R5-Node":54310;
where xxx.ww.y.zz is the ip-address of Master-R5-Node
I have checked and followed all the recommendations of ConnectionRefused on Apache and on this site.
Despite the week long effort, I cannot get it fixed.
Thanks.
There are so many reasons to what may lead to the problem I faced. But I finally ended up fixing it using some of the following things.
Make sure that you have the needed permission to the /hadoop and hdfs temporary files. (you have to figure out where that is for your paticular case)
remove the port number from fs.defaultFS in $HADOOP_CONF_DIR/core-site.xml. It should look like this:
`<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://my.master.ip.address/</value>
<description>NameNode URI</description>
</property>
</configuration>`
Add the following two properties to `$HADOOP_CONF_DIR/hdfs-site.xml
<property>
<name>dfs.datanode.use.datanode.hostname</name>
<value>false</value>
</property>
<property>
<name>dfs.namenode.datanode.registration.ip-hostname-check</name>
<value>false</value>
</property>
Voila! You should now be up and running!

How to optimize and tune hadoop cluster performance

I am not very familiar with hadoop cluster configs and I have recently integrated Apache Nutch with Apache Hadoop and I have crawled data indexed in Solr successfully.
I have my master-slave sources as below:
Master:
CPU : 4 cores
memory :12G
hard disk : 37G
Slave1 :
CPU : 2 cores
memory :4G
hard disk : 18G
Slave2:
CPU : 2 cores
memory :4G
hard disk : 16G
Slave3 :
CPU : 2 cores
memory :4G
hard disk : 16G
Slave4 :
CPU : 4 cores
memory :4G
hard disk : 50G
I have configed core-site.xml, mapred-site.xml, hdfs-site.xml, masters and slaves.
Here is my core-site.xml :
<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>/usr/local/My Project Name/hadoop-datastore</value>
<description>store data</description>
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://master:54310</value>
<description>the name of default file system</description>
</property>
</configuration>
Here is my mapred-site.xml :
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>master:54311</value>
<description>host and port</description>
</property>
<property>
<name>mapred.reduce.tasks</name>
<value>10</value>
<description></description>
</property>
<property>
<name>mapred.map.tasks</name>
<value>20</value>
<description></description>
</property>
<property>
<name>mapred.tasktracker.map.tasks.maximum</name>
<value>8</value>
<description></description>
</property>
<property>
<name>mapred.tasktracker.reduce.tasks.maximum</name>
<value>8</value>
<description></description>
</property>
</configuration>
And here is my hdfs-site.xml:
<configuration>
<property>
<name>dfs.replication</name>
<value>2</value>
<description>default block</description>
</property>
</configuration>
And here is my conf/masters :
master
And finally my conf/slaves:
master
slave1
slave2
slave3
slave4
This story goes well: When I run master and run the Jps command, I have the folowings on master:
19031 TaskTracker
18644 DataNode
18764 SecondaryNameNode
18884 JobTracker
13226 Jps
18506 NameNode
And when I run the Jps command on all the slaves, I have the followings:
4969 DataNode
5057 TaskTracker
5592 Jps
When I look at Master Hadoop Map/Reduce administration I have the following Cluster Summary:
<h2>Cluster Summary (Heap Size is 114.5 MB/889 MB)</h2>
<table border="1" cellpadding="5" cellspacing="0">
<tr><th>Running Map Tasks</th><th>Running Reduce Tasks</th><th>Total Submissions</th><th>Nodes</th><th>Occupied Map Slots</th><th>Occupied Reduce Slots</th><th>Reserved Map Slots</th><th>Reserved Reduce Slots</th><th>Map Task Capacity</th><th>Reduce Task Capacity</th><th>Avg. Tasks/Node</th><th>Blacklisted Nodes</th><th>Graylisted Nodes</th><th>Excluded Nodes</th></tr>
<tr><td>8</td><td>8</td><td>1607</td><td>1</td><td>8</td><td>8</td><td>0</td><td>0</td><td>8</td><td>8</td><td>16.00</td><td>0</td><td>0</td><td>0</td></tr></table>
<br>
The problem is this procedure works fine with topN :1000 but There is load on master with high cpu and memory usage but when I find top on slaves, Neither cpu nor memory has loads. I mean both cpu and memory usage is low and cpu idle is high.
I wonder whether it is natural and OK or not. I am looking for some solutions and configs so that I am able to share the load on all slaves and make the procedure faster.
Any links, documentations and solutions are very much appreciated.
Your master node is running a lot of services :
TaskTracker DataNode SecondaryNameNode JobTracker NameNode
Typically in a decent sized cluster the Master would not have the datanode service.
Name Node & secondary Name node should be on different nodes. You can set secondary name node on one of your data nodes.
Similarly Task Tracker - Master typically does not have task Tracker. I.e. you do not run MR tasks on Master.
On the other hand for pure experimentation the setup you have done is ok & the CPU usage you are noticing is obvious.
I found an error about version 1.2.1 looking deeply at logs directory, saying this version is a 1.2.1 snapshot version. So I changed the server, installing simply version 1.2.1 and making all slaves and master similar in version. That fixed my problem. Now happily I have five nodes equal to the count of my machines.
And I really thank ... for his greate help

Unable to start nodemanager of Hadoop YARN at OS X 10.8

After starting all other nodes, when I try to start nodemanager, it seems it has been opened and then automatically terminated. Like the following:
Yitongs-MacBook-Pro:hadoop timyitong$ sbin/yarn-daemon.sh start nodemanager
starting nodemanager, logging to /Users/timyitong/Dev/hadoop/logs/yarn-timyitong-nodemanager-Yitongs-MacBook-Pro.local.out
Yitongs-MacBook-Pro:hadoop timyitong$ jps
8981 DataNode
9300 Jps
9139 JobHistoryServer
8932 NameNode
9038 ResourceManager
I don't get any error, any exception, but the nodemanger is not there. And when I try to stop it, it says like this (the stopnodes.sh is just a script), which confirms that the nodemanager is not there:
Yitongs-MacBook-Pro:hadoop timyitong$ sh stopnodes.sh
stopping namenode
stopping datanode
stopping resourcemanager
no nodemanager to stop
stopping historyserver
And I am not sure whether it is because nodemanager is not started, when I try to run the sample wordcount program, I always got my task pending forever.
My environment is OS X 10.8, Hadoop YARN 2.2.0.
And I already solved the java version issue with export JAVA_HOME=$(/usr/libexec/java_home -v 1.6).
Acctually I used bin/yarn nodemanger to start the server directly and found out the problem. It is in my yarn-site.xml where I should not set the name of yarn.nodemanager.aux-services containing dots (.) like mapreduce.shuffle. After change mapreduce.shuffle to mapreduce_shuffle, the problem is solved.
Really don't understand why it does not allow dots, since I config everything according to this blog post, where this setting seems to be fine.
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce.shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
</configuration>
The mapreduce.shuffle should be mapreduce_shuffle . Please observe _ (underscore instead of dot). Also have a look at http://www.thecloudavenue.com/2012/01/getting-started-with-nextgen-mapreduce.html

Installing Hadoop on NFS

As a start, I've installed Hadoop (0.15.2) and setup a cluster of 3 nodes: one each for NameNode, DataNode and the JobTracker. All the daemons are up and running. But when I issue any command I get the above error. For instance, when I do a copyFromLocal, I get the following error:
Am I missing something?
More details:
I am trying to install Hadoop on an NFS file system. I've installed 1.0.4 version and tried running it but to of no avail. The 1.0.4 version doesn't start the datanode. And the log files for the datanode are empty. Hence I switched back to 0.15 version which started all the daemons atleast.
I believe the problem is due to the underlying NFS file system i.e. all the datanodes and masters using the same files and folders. But I am not sure if that is actually the case.
But I don't see any reason why I shouldn't be able to run Hadoop on NFS (after appropriately setting the configuration parameters).
Currently I am trying and figuring out if I could set the name and data directories differently for different machines based on the individual machine names.
Configuration file: (hadoop-site.xml)
<property>
<name>fs.default.name</name>
<value>mumble-12.cs.wisc.edu:9001</value>
</property>
<property>
<name>mapred.job.tracker</name>
<value>mumble-13.cs.wisc.edu:9001</value>
</property>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.secondary.info.port</name>
<value>9002</value>
</property>
<property>
<name>dfs.info.port</name>
<value>9003</value>
</property>
<property>
<name>mapred.job.tracker.info.port</name>
<value>9004</value>
</property>
<property>
<name>tasktracker.http.port</name>
<value>9005</value>
</property>
Error using Hadoop 1.0.4 (DataNode doesn't get started):
2013-04-22 18:50:50,438 INFO org.apache.hadoop.ipc.Server: IPC Server handler 7 on 9001, call addBlock(/tmp/hadoop-akshar/mapred/system/jobtracker.info, DFSClient_502734479, null) from 128.105.112.13:37204: error: java.io.IOException: File /tmp/hadoop-akshar/mapred/system/jobtracker.info could only be replicated to 0 nodes, instead of 1
java.io.IOException: File /tmp/hadoop-akshar/mapred/system/jobtracker.info could only be replicated to 0 nodes, instead of 1
Error using Hadoop 0.15.2:
[akshar#mumble-12] (38)$ bin/hadoop fs -copyFromLocal lib/junit-3.8.1.LICENSE.txt input
13/04/17 03:22:11 WARN fs.DFSClient: Error while writing.
java.net.SocketException: Connection reset
at java.net.SocketInputStream.read(SocketInputStream.java:189)
at java.net.SocketInputStream.read(SocketInputStream.java:121)
at java.net.SocketInputStream.read(SocketInputStream.java:203)
at java.io.DataInputStream.readShort(DataInputStream.java:312)
at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.endBlock(DFSClient.java:1660)
at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.close(DFSClient.java:1733)
at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:49)
at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:64)
at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:55)
at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:83)
at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:140)
at org.apache.hadoop.fs.FileSystem.copyFromLocalFile(FileSystem.java:826)
at org.apache.hadoop.fs.FsShell.copyFromLocal(FsShell.java:120)
at org.apache.hadoop.fs.FsShell.run(FsShell.java:1360)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
at org.apache.hadoop.fs.FsShell.main(FsShell.java:1478)
13/04/17 03:22:12 WARN fs.DFSClient: Error while writing.
java.net.SocketException: Connection reset
at java.net.SocketInputStream.read(SocketInputStream.java:189)
at java.net.SocketInputStream.read(SocketInputStream.java:121)
at java.net.SocketInputStream.read(SocketInputStream.java:203)
at java.io.DataInputStream.readShort(DataInputStream.java:312)
at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.endBlock(DFSClient.java:1660)
at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.close(DFSClient.java:1733)
at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:49)
at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:64)
at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:55)
at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:83)
at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:140)
at org.apache.hadoop.fs.FileSystem.copyFromLocalFile(FileSystem.java:826)
at org.apache.hadoop.fs.FsShell.copyFromLocal(FsShell.java:120)
at org.apache.hadoop.fs.FsShell.run(FsShell.java:1360)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
at org.apache.hadoop.fs.FsShell.main(FsShell.java:1478)
13/04/17 03:22:12 WARN fs.DFSClient: Error while writing.
java.net.SocketException: Connection reset
at java.net.SocketInputStream.read(SocketInputStream.java:189)
at java.net.SocketInputStream.read(SocketInputStream.java:121)
at java.net.SocketInputStream.read(SocketInputStream.java:203)
at java.io.DataInputStream.readShort(DataInputStream.java:312)
at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.endBlock(DFSClient.java:1660)
at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.close(DFSClient.java:1733)
at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:49)
at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:64)
at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:55)
at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:83)
at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:140)
at org.apache.hadoop.fs.FileSystem.copyFromLocalFile(FileSystem.java:826)
at org.apache.hadoop.fs.FsShell.copyFromLocal(FsShell.java:120)
at org.apache.hadoop.fs.FsShell.run(FsShell.java:1360)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
at org.apache.hadoop.fs.FsShell.main(FsShell.java:1478)
copyFromLocal: Connection reset
I was able to get Hadoop to run over NFS using version 1.1.2. It might work for other versions, but I can't guarantee anything.
If you have an NFS file system then each node should have access to the filesystem. The fs.default.name tells Hadoop the filesystem URI to use, so it should be pointed to the local disk. I'll assume that your NFS directory is mounted to each node at /nfs.
In core-site.xml you should define:
<property>
<name>fs.default.name</name>
<value>file:///</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/nfs/tmp</value>
</property>
In mapred-site.xml you should define:
<property>
<name>mapred.job.tracker</name>
<value>node1:8021</value>
</property>
<property>
<name>mapred.local.dir</name>
<value>/tmp/mapred-local</value>
</property>
Since hadoop.tmp.dir is pointed to the nfs drive then the default locations of mapred.system.dir and mapreduce.jobtracker.staging.root.dir point to locations on the nfs drive. It might run with leaving the default value for mapred.local.dir, but it is supposed to point to the local filesystem so to be safe you can put that in /tmp.
You don't have to worry about hdfs-site.xml. This configuration file is used when you start the namenode, but with everything being distributed on the nfs drive you shouldn't run HDFS.
Now you can run start-mapred.sh on the jobtracker node and run a hadoop job. Don't run start-all.sh or start-dfs.sh because those will start HDFS. If you run multiple DataNodes that point to the same NFS directory, then one DataNode will lock that directory and the others will shutdown because they are unable to obtain a lock.
I tested the configuration with:
bin/hadoop jar hadoop-examples-1.1.2.jar wordcount /nfs/data/test.text /nfs/out
Note that you need to specify full paths to the input and output locations.
I also tried:
bin/hadoop jar hadoop-examples-1.1.2.jar grep /nfs/data/loremIpsum.txt /nfs/out2 lorem
It gave me the same output as when I run it in Standalone, so I assume it is performing correctly.
Here is more information on fs.default.name:
http://www.greenplum.com/blog/dive-in/usage-and-quirks-of-fs-default-name-in-hadoop-filesystem

Resources