Hadoop MR2 Job statistics - hadoop

I have Hadoop version 2.6.0 installed in my machine.
hduser#vagrant:/usr/local/hadoop$ hadoop version
Hadoop 2.6.0
Also, I started the hadoop cluster using bash sbin/start-dfs.sh and see the Datanode, namenode and secondarynode running.
hduser#vagrant:/usr/local/hadoop$ jps
2627 DataNode
2503 NameNode
3634 Jps
2825 SecondaryNameNode
I'm also able to submit a job and able to see the output without any issues.
hadoop jar ./share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.0.jar pi 2 5
Question:-
1. I dont see the (nodemanager and resourcemanager) YARN not running but still the jobs are completed. Where did the MR job run and where I can see the status of the job and the number of jobs run?
Here is my netstat results:-
hduser#vagrant:/usr/local/hadoop$ netstat -tulpn|grep java
(Not all processes could be identified, non-owned process info
will not be shown, you would have to be root to see it all.)
tcp 0 0 0.0.0.0:50070 0.0.0.0:* LISTEN 2503/java
tcp 0 0 0.0.0.0:50010 0.0.0.0:* LISTEN 2627/java
tcp 0 0 0.0.0.0:50075 0.0.0.0:* LISTEN 2627/java
tcp 0 0 0.0.0.0:50020 0.0.0.0:* LISTEN 2627/java
tcp 0 0 127.0.0.1:54310 0.0.0.0:* LISTEN 2503/java
tcp 0 0 0.0.0.0:50090 0.0.0.0:* LISTEN 2825/java

You still have to configure and launch YARN services (start-yarn.sh script) and configure your mapreduce jobs to use it:
etc/hadoop/mapred-site.xml
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
Currently your jobs are being launched in "local" mode (the job runs inside the JVM you've launched with "hadoop jar"), not in "yarn" mode. It works for debugging, but since there is only one JVM involved, you are not doing parallell/distributed computing in "local" mode.

Related

could only be written to 0 of the 1 minReplication nodes. There are 0 datanode(s) running and 0 node(s) are excluded in this operation

I set up hadoop on two clusters, and In the master node when I tried to put file using:
hadoop fs -put test.txt /mydata/
I got the following error:
put: File /mydata/test.txt._COPYING_ could only be written to 0 of the 1 minReplication nodes. There are 0 datanode(s) running and 0 node(s) are excluded in this operation.
When I typed hdfs dfsadmin -report it gave me the following information:
WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Configured Capacity: 0 (0 B)
Present Capacity: 0 (0 B)
DFS Remaining: 0 (0 B)
DFS Used: 0 (0 B)
DFS Used%: 0.00%
Replicated Blocks:
Under replicated blocks: 0
Blocks with corrupt replicas: 0
Missing blocks: 0
Missing blocks (with replication factor 1): 0
Low redundancy blocks with highest priority to recover: 0
Pending deletion blocks: 0
Erasure Coded Block Groups:
Low redundancy block groups: 0
Block groups with corrupt internal blocks: 0
Missing block groups: 0
Low redundancy blocks with highest priority to recover: 0
Pending deletion blocks: 0
Then when I tried to access hdfs from datanode with hadoop fs -ls / it gives me the following information:
INFO ipc.Client: Retrying connect to server: master/172.31.81.91:10001. Already tried 0 time(s); maxRetries=45
I set up the instance on 2 aws-ubuntu instances and opened all TCP/IPV4 ports. I have the following setups:
On two setups:
core-site.xml
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://172.31.81.91:9000</value>
</property>
</configuration>
hdfs-site.xml
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.permissions</name>
<value>false</value>
</property>
<property>
<name>dfs.name.dir</name>
<value>file:///home/hadoop/hadoopinfra/hdfs/namenode </value>
</property>
<property>
<name>dfs.data.dir</name>
<value>file:///home/hadoop/hadoopinfra/hdfs/datanode </value>
</property>
</configuration>
mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>yarn.app.mapreduce.am.env</name>
<value>HADOOP_MAPRED_HOME=$HADOOP_HOME</value>
</property>
<property>
<name>mapreduce.map.env</name>
<value>HADOOP_MAPRED_HOME=$HADOOP_HOME</value>
</property>
<property>
<name>mapreduce.reduce.env</name>
<value>HADOOP_MAPRED_HOME=$HADOOP_HOME</value>
</property>
</configuration>
/etc/hosts
127.0.0.1 localhost
172.31.81.91 master
172.31.45.232 slave-1
# The following lines are desirable for IPv6 capable hosts
::1 ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
ff02::3 ip6-allhosts
workers
172.31.45.232
And when I type jps I can get
master
12532 NameNode
12847 SecondaryNameNode
13599 Jps
datanode
5172 Jps
4810 DataNode
When I type sudo netstat -ntlp I can get:
master:
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name
tcp 0 0 0.0.0.0:9870 0.0.0.0:* LISTEN 12532/java
tcp 0 0 127.0.0.53:53 0.0.0.0:* LISTEN 696/systemd-resolve
tcp 0 0 0.0.0.0:22 0.0.0.0:* LISTEN 1106/sshd
tcp 0 0 127.0.0.1:631 0.0.0.0:* LISTEN 809/cupsd
tcp 0 0 172.31.81.91:9000 0.0.0.0:* LISTEN 12532/java
tcp 0 0 0.0.0.0:9868 0.0.0.0:* LISTEN 12847/java
tcp6 0 0 :::80 :::* LISTEN 1176/apache2
tcp6 0 0 :::22 :::* LISTEN 1106/sshd
tcp6 0 0 ::1:631 :::* LISTEN 809/cupsd
datanode:
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name
tcp 0 0 0.0.0.0:9864 0.0.0.0:* LISTEN 4810/java
tcp 0 0 0.0.0.0:9866 0.0.0.0:* LISTEN 4810/java
tcp 0 0 0.0.0.0:9867 0.0.0.0:* LISTEN 4810/java
tcp 0 0 127.0.0.53:53 0.0.0.0:* LISTEN 691/systemd-resolve
tcp 0 0 0.0.0.0:22 0.0.0.0:* LISTEN 1142/sshd
tcp 0 0 127.0.0.1:631 0.0.0.0:* LISTEN 854/cupsd
tcp 0 0 127.0.0.1:45029 0.0.0.0:* LISTEN 4810/java
tcp6 0 0 :::22 :::* LISTEN 1142/sshd
tcp6 0 0 ::1:631 :::* LISTEN 854/cupsd
I am using hadoop 3.1.3, any help would be appreciated! Thanks!
A quick answer from myself...After many tries
If using AWS, all ip should be public IPs.
in core-site.xml, use the public dns instead of IP
delete the data node file after format. Not sure why... But this really solve my problem.
Thanks for help!

hadoop slave cannot connect to master:8031

I'm a total hadoop newbee. I set up hadoop on two machines master and slave following this tutorial (I obtained the same error following this other tutorial).
Problem: After starting dfs and yarn, the only node appearing on localhost:50070 is the master, even if the right processes are running on the master (NameNode, DataNode, SecondaryNameNode, ResourceManager) and on the slave (DataNode).
The nodemanager log of the slave reports: INFO org.apache.hadoop.ipc.Client: Retrying connect to server: master/192.168.0.14:8031. Already tried 10 times.
Note that:
I have already edited yarn-site.xml following this thread.
I have disabled the firewall on the master
I ran netstat -anp | grep 8031 on the master and confirmed that there are a couple of processes listening on port 8031 using tcp.
I suffered from the same problem, and these steps may help.
Make sure your namenodes can negotiate with your datanodes.
hadoop#namenode-01:~$ jps
8678 NameNode
9530 WebAppProxyServer
9115 ResourceManager
8940 SecondaryNameNode
9581 Jps
hadoop#datanode-01:~$ jps
8592 NodeManager
8715 Jps
8415 DataNode
Check your nodemanager-datanode-01.log from $HADOOP_HOME/logs reports.
INFO org.apache.hadoop.ipc.Client: Retrying connect to server: namenode-01/192.168.4.2:8031. Already tried 8 time(s);
Check if your yarn.resourcemanager.resource-tracker.address listen on ipv6 instead of ipv4.
hadoop#namenode-01:~$ netstat -natp
tcp6 0 0 127.0.2.1:8088 :::* LISTEN 9115/java
tcp6 0 0 127.0.2.1:8030 :::* LISTEN 9115/java
tcp6 0 0 127.0.2.1:8031 :::* LISTEN 9115/java
tcp6 0 0 127.0.2.1:8032 :::* LISTEN 9115/java
tcp6 0 0 127.0.2.1:8033 :::* LISTEN 9115/java
If your yarn addresses listen on ipv6, maybe you should disable ipv6 first.
sysctl -w net.ipv6.conf.all.disable_ipv6=1
sysctl -w net.ipv6.conf.default.disable_ipv6=1
Finally, the default yarn addresses are like this:
yarn.resourcemanager.resource-tracker.address ${yarn.resourcemanager.hostname}:8031
Check your /ect/hosts to avoid misconfigurations.
hadoop#namenode-01:~$ cat /etc/hosts
127.0.2.1 namenode-01 namenode-01 (for example, this line config hostname as 127.0.2.1, delete this line)
192.168.4.2 namenode-01
192.168.4.3 datanode-01
192.168.4.4 datanode-02
192.168.4.5 datanode-03

Why hdfs namenode and datanode both always listen on 0.0.0.0 at a random port?

I installed a new hadoop-2.2.0 today, I found after hdfs is started (using /sbin/start-dfs.sh), namenode and datanode both always listen on 0.0.0.0 at a random port? I cannot find relate configuration on http://hadoop.apache.org/docs/r2.3.0/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml.
The port is not 50070, 50470, 50090, 50010, 50020, 50075, 50475...etc. which are listed on hdfs-default.xml, it is just a random port.
8369 Jps
8109 DataNode
7936 NameNode
Namenode listens on the followings:
tcp 0 0 0.0.0.0:46628 0.0.0.0:* LISTEN 7936/java <==
tcp 0 0 10.173.130.119:9000 0.0.0.0:* LISTEN 7936/java
tcp 0 0 10.173.130.119:50070 0.0.0.0:* LISTEN 7936/java
Datanode listens on the followings:
tcp 0 0 10.173.130.119:50020 0.0.0.0:* LISTEN 8109/java
tcp 0 0 0.0.0.0:35114 0.0.0.0:* LISTEN 8109/java <==
tcp 0 0 10.173.130.119:50010 0.0.0.0:* LISTEN 8109/java
tcp 0 0 10.173.130.119:50075 0.0.0.0:* LISTEN 8109/java
Thanks for any advise.
Yes It assigns random port everytime a namenode or datanode restarted. But if you observer all the namenode listeners running on same process id in this case (7936) and Datanode listeners run on same process id ie 8109. So internally process is same.

What ports does Apache Hadoop version 1.0.3 use for intracluster communicaion of the daemons

I know port 22 is only used for control scripts.
But i need to know what ports I should open for my 3 node cluster. 2 slaves, 1 namenode/jobtracker.
On what port do the daemons run? On what ports are the URLs displayed?
The hadoop distro is: Apache Hadoop version 1.0.3
By URL I assume you mean the JobTracker and TaskTracker interfaces. The breakdown is as follows:
Namenode 50070
Datanodes 50075
Secondarynamenode 50090
JobTracker 50030
TaskTracker 50060
Here are the default port number in Hadoop.

Hadoop dfs error : INFO ipc.Client: Retrying connect to server: localhost

I have successfully setup a Hadoop cluster with 6 nodes (master, salve<1-5>)
Formatted the namenode -> done
Starting up and shutting down cluster -> works fine
Executing "hadoop dfs -ls /" gives this error -> Error: INFO ipc.Client: Retrying connect to server: localhost
I tried to see the services running using:
sudo netstat -plten | grep java
hduser#ubuntu:~$ sudo netstat -plten | grep java
tcp 0 0 0.0.0.0:50070 0.0.0.0:* LISTEN 1000 93307 11384/java
tcp 0 0 0.0.0.0:44440 0.0.0.0:* LISTEN 1000 92491 11571/java
tcp 0 0 0.0.0.0:40633 0.0.0.0:* LISTEN 1000 92909 11758/java
tcp 0 0 0.0.0.0:50010 0.0.0.0:* LISTEN 1000 93449 11571/java
tcp 0 0 0.0.0.0:50075 0.0.0.0:* LISTEN 1000 93673 11571/java
tcp 0 0 0.0.0.0:50020 0.0.0.0:* LISTEN 1000 93692 11571/java
tcp 0 0 127.0.0.1:40485 0.0.0.0:* LISTEN 1000 93666 12039/java
tcp 0 0 0.0.0.0:44582 0.0.0.0:* LISTEN 1000 93013 11852/java
tcp 0 0 10.42.43.1:54310 0.0.0.0:* LISTEN 1000 92471 11384/java
tcp 0 0 10.42.43.1:54311 0.0.0.0:* LISTEN 1000 93290 11852/java
tcp 0 0 0.0.0.0:50090 0.0.0.0:* LISTEN 1000 93460 11758/java
tcp 0 0 0.0.0.0:34154 0.0.0.0:* LISTEN 1000 92179 11384/java
tcp 0 0 0.0.0.0:50060 0.0.0.0:* LISTEN 1000 94200 12039/java
tcp 0 0 0.0.0.0:50030 0.0.0.0:* LISTEN 1000 93550 11852/java
Its the master IP binded to port 54310 and 54311 and not the localhost(loopback).
The conf-site.xml has been properly configured:
<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>/home/hduser/hadoop/tmp</value>
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://master:54310</value>
</property>
</configuration>
Why is it expecting localhost to be binded to 54310 rather than the master hich I have configured here. Help appreciated. How do I resolve this ??
Cheers
Apparently, someone added the older hadoop(1.0.3) bin directory into the path variable before I had added the new hadoop(1.0.4) bin directory. And thus whenever I ran "hadoop" from the CLI, it executed the binaries of the older hadoop rather than the new one.
Solution:
Remove the entire bin path of older hadoop
Shutdown cluster - Exit terminal
Login in new terminal session
Startup node
Tried hadoop dfs -ls / -> Works fine !!!! Good lesson learnt.
Looks like many people ran into this problem.
There might be no need to change /etc/hosts, and make sure you can access master and slave from each other, and your core-site.xml are the same pointing to the right master node and port number.
Then run $HADOOP/bin/stop-all.sh, $HADOOP/bin/start-all.sh on master node ONLY. (If run on slave might lead to problems). Use JPS to check whether all services are there as follows.
On master node:
4353 DataNode
4640 JobTracker
4498 SecondaryNameNode
4788 TaskTracker
4989 Jps
4216 NameNode
On slave node:
3143 Jps
2827 DataNode
2960 TaskTracker
In addition, check your firewall rules between namenode and datanode

Resources