hadoop slave cannot connect to master:8031 - hadoop

I'm a total hadoop newbee. I set up hadoop on two machines master and slave following this tutorial (I obtained the same error following this other tutorial).
Problem: After starting dfs and yarn, the only node appearing on localhost:50070 is the master, even if the right processes are running on the master (NameNode, DataNode, SecondaryNameNode, ResourceManager) and on the slave (DataNode).
The nodemanager log of the slave reports: INFO org.apache.hadoop.ipc.Client: Retrying connect to server: master/192.168.0.14:8031. Already tried 10 times.
Note that:
I have already edited yarn-site.xml following this thread.
I have disabled the firewall on the master
I ran netstat -anp | grep 8031 on the master and confirmed that there are a couple of processes listening on port 8031 using tcp.

I suffered from the same problem, and these steps may help.
Make sure your namenodes can negotiate with your datanodes.
hadoop#namenode-01:~$ jps
8678 NameNode
9530 WebAppProxyServer
9115 ResourceManager
8940 SecondaryNameNode
9581 Jps
hadoop#datanode-01:~$ jps
8592 NodeManager
8715 Jps
8415 DataNode
Check your nodemanager-datanode-01.log from $HADOOP_HOME/logs reports.
INFO org.apache.hadoop.ipc.Client: Retrying connect to server: namenode-01/192.168.4.2:8031. Already tried 8 time(s);
Check if your yarn.resourcemanager.resource-tracker.address listen on ipv6 instead of ipv4.
hadoop#namenode-01:~$ netstat -natp
tcp6 0 0 127.0.2.1:8088 :::* LISTEN 9115/java
tcp6 0 0 127.0.2.1:8030 :::* LISTEN 9115/java
tcp6 0 0 127.0.2.1:8031 :::* LISTEN 9115/java
tcp6 0 0 127.0.2.1:8032 :::* LISTEN 9115/java
tcp6 0 0 127.0.2.1:8033 :::* LISTEN 9115/java
If your yarn addresses listen on ipv6, maybe you should disable ipv6 first.
sysctl -w net.ipv6.conf.all.disable_ipv6=1
sysctl -w net.ipv6.conf.default.disable_ipv6=1
Finally, the default yarn addresses are like this:
yarn.resourcemanager.resource-tracker.address ${yarn.resourcemanager.hostname}:8031
Check your /ect/hosts to avoid misconfigurations.
hadoop#namenode-01:~$ cat /etc/hosts
127.0.2.1 namenode-01 namenode-01 (for example, this line config hostname as 127.0.2.1, delete this line)
192.168.4.2 namenode-01
192.168.4.3 datanode-01
192.168.4.4 datanode-02
192.168.4.5 datanode-03

Related

Datanode and Namenode runs but not reflected in UI

I have a small setback in configuring my Master and Slave in Hadoop and I have both my namenode and datanode in Master and Slave up and running.
However the LiveNodes count in the WebUI are not getting reflected but the nodes are running.
I have already tried disabling the firewall and formatted the nodes but I am unable to resolve the same.
Any help would be highly appreciated !!!
Here are the snippets :
Master:
jps command output :
5088 Jps
4446 NameNode
4681 SecondaryNameNode
Slave :
jps command output:
2478 Jps
2410 DataNode
ubuntu#hadoop-master:/usr/local/hadoop/etc/hadoop$ $HADOOP_HOME/bin/hdfs dfsadmin -refreshNodes
16/04/28 02:22:37 WARN ipc.Client: Address change detected. Old: hadoop-master/52.200.230.29:50077 New: hadoop-master/127.0.0.1:50077
refreshNodes: Call From hadoop-master/127.0.0.1 to hadoop-master:50077 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused
Log file of hadoop-slave-1:
2016-04-28 21:23:07,248 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Problem connecting to server: hadoop-master/52.200.230.29:9000
2016-04-28 21:23:12,257 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Problem connecting to server: hadoop-master/52.200.230.29:9000
2016-04-28 21:23:17,265 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Problem connecting to server: hadoop-master/52.200.230.29:9000
2016-04-28 21:23:22,273 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Problem connecting to server: hadoop-master/52.200.230.29:9000
2016-04-28 21:23:27,282 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Problem connecting to server: hadoop-master/52.200.230.29:9000
2016-04-28 21:23:32,291 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Problem connecting to server: hadoop-master/52.200.230.29:9000
Log File of Hadoop-master:
2016-04-28 21:21:04,002 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Roll Edit Log from 127.0.0.1
2016-04-28 21:21:04,002 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: Rolling edit logs
2016-04-28 21:21:04,002 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: Ending log segment 407
2016-04-28 21:21:04,002 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: Number of transactions: 2 Total time for transactions(ms): 0 Number of transactions batched in Syncs: 0 Number of syncs: 2 SyncTimes(ms): 22
2016-04-28 21:21:04,003 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: Number of transactions: 2 Total time for transactions(ms): 0 Number of transactions batched in Syncs: 0 Number of syncs: 3 SyncTimes(ms): 23
2016-04-28 21:21:04,003 INFO org.apache.hadoop.hdfs.server.namenode.FileJournalManager: Finalizing edits file /usr/local/hadoop/hadoop_data/hdfs/namenode/current/edits_inprogress_0000000000000000407 -> /usr/local/hadoop/hadoop_data/hdfs/namenode/current/edits_0000000000000000407-0000000000000000408
2016-04-28 21:21:04,004 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: Starting log segment at 409
netstat -pant command on my master:
ubuntu#hadoop-master:/usr/local/hadoop/etc/hadoop$ netstat -pant
(Not all processes could be identified, non-owned process info
will not be shown, you would have to be root to see it all.)
Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name
tcp 0 0 0.0.0.0:50070 0.0.0.0:* LISTEN 21491/java
tcp 0 0 0.0.0.0:22 0.0.0.0:* LISTEN -
tcp 0 0 0.0.0.0:50077 0.0.0.0:* LISTEN -
tcp 0 0 127.0.0.1:50078 0.0.0.0:* LISTEN 21491/java
tcp 0 0 0.0.0.0:9000 0.0.0.0:* LISTEN -
tcp 0 0 0.0.0.0:50090 0.0.0.0:* LISTEN 21726/java
tcp 0 0 172.31.63.189:50070 128.235.8.68:57225 ESTABLISHED 21491/java
tcp 0 0 127.0.0.1:41471 127.0.0.1:50078 TIME_WAIT -
tcp 0 124 172.31.63.189:22 128.235.8.68:56950 ESTABLISHED -
tcp 0 0 172.31.63.189:50070 128.235.8.68:57224 ESTABLISHED 21491/java
tcp 0 0 172.31.63.189:50070 128.235.8.68:57223 ESTABLISHED 21491/java
tcp 0 0 172.31.63.189:22 128.235.8.68:57084 ESTABLISHED -
tcp 0 0 172.31.63.189:22 58.218.204.215:39402 ESTABLISHED -
tcp 0 0 172.31.63.189:50070 128.235.8.68:57227 ESTABLISHED 21491/java
tcp 0 0 172.31.63.189:50070 128.235.8.68:57228 ESTABLISHED 21491/java
tcp 0 0 172.31.63.189:50070 128.235.8.68:57226 ESTABLISHED 21491/java
tcp6 0 0 :::22 :::* LISTEN -
tcp6 0 0 :::50077 :::* LISTEN -
tcp6 0 0 :::9000 :::* LISTEN -
Connection refused
I can see this error from your post. I guess you need to do 3 things
make sure 50077 port is listen by a process and it is your hadoop process
make sure it is access able using some tools like telnet
besides firewall. selinux can also affect access. So shut it down and restart your service and try again

Hadoop MR2 Job statistics

I have Hadoop version 2.6.0 installed in my machine.
hduser#vagrant:/usr/local/hadoop$ hadoop version
Hadoop 2.6.0
Also, I started the hadoop cluster using bash sbin/start-dfs.sh and see the Datanode, namenode and secondarynode running.
hduser#vagrant:/usr/local/hadoop$ jps
2627 DataNode
2503 NameNode
3634 Jps
2825 SecondaryNameNode
I'm also able to submit a job and able to see the output without any issues.
hadoop jar ./share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.0.jar pi 2 5
Question:-
1. I dont see the (nodemanager and resourcemanager) YARN not running but still the jobs are completed. Where did the MR job run and where I can see the status of the job and the number of jobs run?
Here is my netstat results:-
hduser#vagrant:/usr/local/hadoop$ netstat -tulpn|grep java
(Not all processes could be identified, non-owned process info
will not be shown, you would have to be root to see it all.)
tcp 0 0 0.0.0.0:50070 0.0.0.0:* LISTEN 2503/java
tcp 0 0 0.0.0.0:50010 0.0.0.0:* LISTEN 2627/java
tcp 0 0 0.0.0.0:50075 0.0.0.0:* LISTEN 2627/java
tcp 0 0 0.0.0.0:50020 0.0.0.0:* LISTEN 2627/java
tcp 0 0 127.0.0.1:54310 0.0.0.0:* LISTEN 2503/java
tcp 0 0 0.0.0.0:50090 0.0.0.0:* LISTEN 2825/java
You still have to configure and launch YARN services (start-yarn.sh script) and configure your mapreduce jobs to use it:
etc/hadoop/mapred-site.xml
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
Currently your jobs are being launched in "local" mode (the job runs inside the JVM you've launched with "hadoop jar"), not in "yarn" mode. It works for debugging, but since there is only one JVM involved, you are not doing parallell/distributed computing in "local" mode.

Why hdfs namenode and datanode both always listen on 0.0.0.0 at a random port?

I installed a new hadoop-2.2.0 today, I found after hdfs is started (using /sbin/start-dfs.sh), namenode and datanode both always listen on 0.0.0.0 at a random port? I cannot find relate configuration on http://hadoop.apache.org/docs/r2.3.0/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml.
The port is not 50070, 50470, 50090, 50010, 50020, 50075, 50475...etc. which are listed on hdfs-default.xml, it is just a random port.
8369 Jps
8109 DataNode
7936 NameNode
Namenode listens on the followings:
tcp 0 0 0.0.0.0:46628 0.0.0.0:* LISTEN 7936/java <==
tcp 0 0 10.173.130.119:9000 0.0.0.0:* LISTEN 7936/java
tcp 0 0 10.173.130.119:50070 0.0.0.0:* LISTEN 7936/java
Datanode listens on the followings:
tcp 0 0 10.173.130.119:50020 0.0.0.0:* LISTEN 8109/java
tcp 0 0 0.0.0.0:35114 0.0.0.0:* LISTEN 8109/java <==
tcp 0 0 10.173.130.119:50010 0.0.0.0:* LISTEN 8109/java
tcp 0 0 10.173.130.119:50075 0.0.0.0:* LISTEN 8109/java
Thanks for any advise.
Yes It assigns random port everytime a namenode or datanode restarted. But if you observer all the namenode listeners running on same process id in this case (7936) and Datanode listeners run on same process id ie 8109. So internally process is same.

HMASTER fails to run in psedo distributed mode

I have installed hadoop 1.2.0 on Ubuntu.. All the services namenode, sec namenode, datanode, jobtracker, tasktracker running well.
I then installed hbase-0-94.8 and I hope, configurations are okay as well. But HMaster fails to start on port 9000.. It actually starts and then drops down.
I have self ssh on. It is working
my /etc/hosts entires,
127.0.0.1 localhost prakashl
# The following lines are desirable for IPv6 capable hosts
::1 ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
kandabap#prakashl:/usr/lib/hbase/hbase-0.94.8/conf$ jps
2735 HQuorumPeer
3017 HRegionServer
2270 TaskTracker
3715 Jps
2100 JobTracker
1845 DataNode
2009 SecondaryNameNode
1688 NameNode
hbase-master.log
>
2014-05-14 09:28:37,015 INFO org.apache.hadoop.hbase.master.ActiveMasterManager: Master=localhost,60000,1400023716583
2014-05-14 09:28:38,108 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000. Already tried 0 time(s).
2014-05-14 09:28:39,109 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000. Already tried 1 time(s).
2014-05-14 09:28:40,109 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000. Already tried 2 time(s).
2014-05-14 09:28:41,110 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000. Already tried 3 time(s).
2014-05-14 09:28:42,111 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000. Already tried 4 time(s).
2014-05-14 09:28:43,111 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000. Already tried 5 time(s).
2014-05-14 09:28:44,112 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000. Already tried 6 time(s).
2014-05-14 09:28:45,112 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000. Already tried 7 time(s).
2014-05-14 09:28:46,113 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000. Already tried 8 time(s).
2014-05-14 09:28:47,113 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000. Already tried
<<<<<
I copied jar files from hadoop folders to hbase/lib folder to address incompatibilities if any.
kandabap#prakashl:/usr/lib/hbase/hbase-0.94.8/logs$ netstat -ntla
Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address Foreign Address State
tcp 0 0 0.0.0.0:48575 0.0.0.0:* LISTEN
tcp 0 0 0.0.0.0:58304 0.0.0.0:* LISTEN
tcp 0 0 0.0.0.0:50020 0.0.0.0:* LISTEN
tcp 0 0 0.0.0.0:2181 0.0.0.0:* LISTEN
tcp 0 0 127.0.0.1:47238 0.0.0.0:* LISTEN
tcp 0 0 0.0.0.0:50090 0.0.0.0:* LISTEN
tcp 0 0 0.0.0.0:42987 0.0.0.0:* LISTEN
tcp 0 0 0.0.0.0:50060 0.0.0.0:* LISTEN
tcp 0 0 0.0.0.0:50030 0.0.0.0:* LISTEN
tcp 0 0 127.0.0.1:60020 0.0.0.0:* LISTEN
tcp 0 0 127.0.0.1:8020 0.0.0.0:* LISTEN
tcp 0 0 127.0.0.1:8021 0.0.0.0:* LISTEN
tcp 0 0 0.0.0.0:50070 0.0.0.0:* LISTEN
tcp 0 0 0.0.0.0:22 0.0.0.0:* LISTEN
tcp 0 0 0.0.0.0:35255 0.0.0.0:* LISTEN
tcp 0 0 127.0.0.1:631 0.0.0.0:* LISTEN
tcp 0 0 0.0.0.0:50010 0.0.0.0:* LISTEN
tcp 0 0 0.0.0.0:50075 0.0.0.0:* LISTEN
Change the value of hbase.rootdir from hdfs://localhost:9000/hbase to hdfs://localhost:8020/hbase in hbase-site.xml, and restart HBase.

Hadoop dfs error : INFO ipc.Client: Retrying connect to server: localhost

I have successfully setup a Hadoop cluster with 6 nodes (master, salve<1-5>)
Formatted the namenode -> done
Starting up and shutting down cluster -> works fine
Executing "hadoop dfs -ls /" gives this error -> Error: INFO ipc.Client: Retrying connect to server: localhost
I tried to see the services running using:
sudo netstat -plten | grep java
hduser#ubuntu:~$ sudo netstat -plten | grep java
tcp 0 0 0.0.0.0:50070 0.0.0.0:* LISTEN 1000 93307 11384/java
tcp 0 0 0.0.0.0:44440 0.0.0.0:* LISTEN 1000 92491 11571/java
tcp 0 0 0.0.0.0:40633 0.0.0.0:* LISTEN 1000 92909 11758/java
tcp 0 0 0.0.0.0:50010 0.0.0.0:* LISTEN 1000 93449 11571/java
tcp 0 0 0.0.0.0:50075 0.0.0.0:* LISTEN 1000 93673 11571/java
tcp 0 0 0.0.0.0:50020 0.0.0.0:* LISTEN 1000 93692 11571/java
tcp 0 0 127.0.0.1:40485 0.0.0.0:* LISTEN 1000 93666 12039/java
tcp 0 0 0.0.0.0:44582 0.0.0.0:* LISTEN 1000 93013 11852/java
tcp 0 0 10.42.43.1:54310 0.0.0.0:* LISTEN 1000 92471 11384/java
tcp 0 0 10.42.43.1:54311 0.0.0.0:* LISTEN 1000 93290 11852/java
tcp 0 0 0.0.0.0:50090 0.0.0.0:* LISTEN 1000 93460 11758/java
tcp 0 0 0.0.0.0:34154 0.0.0.0:* LISTEN 1000 92179 11384/java
tcp 0 0 0.0.0.0:50060 0.0.0.0:* LISTEN 1000 94200 12039/java
tcp 0 0 0.0.0.0:50030 0.0.0.0:* LISTEN 1000 93550 11852/java
Its the master IP binded to port 54310 and 54311 and not the localhost(loopback).
The conf-site.xml has been properly configured:
<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>/home/hduser/hadoop/tmp</value>
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://master:54310</value>
</property>
</configuration>
Why is it expecting localhost to be binded to 54310 rather than the master hich I have configured here. Help appreciated. How do I resolve this ??
Cheers
Apparently, someone added the older hadoop(1.0.3) bin directory into the path variable before I had added the new hadoop(1.0.4) bin directory. And thus whenever I ran "hadoop" from the CLI, it executed the binaries of the older hadoop rather than the new one.
Solution:
Remove the entire bin path of older hadoop
Shutdown cluster - Exit terminal
Login in new terminal session
Startup node
Tried hadoop dfs -ls / -> Works fine !!!! Good lesson learnt.
Looks like many people ran into this problem.
There might be no need to change /etc/hosts, and make sure you can access master and slave from each other, and your core-site.xml are the same pointing to the right master node and port number.
Then run $HADOOP/bin/stop-all.sh, $HADOOP/bin/start-all.sh on master node ONLY. (If run on slave might lead to problems). Use JPS to check whether all services are there as follows.
On master node:
4353 DataNode
4640 JobTracker
4498 SecondaryNameNode
4788 TaskTracker
4989 Jps
4216 NameNode
On slave node:
3143 Jps
2827 DataNode
2960 TaskTracker
In addition, check your firewall rules between namenode and datanode

Resources