Hadoop Multi-Node Cluster: exception: java.net.ConnectException: Connection refused - hadoop

I have set up 4 nodes hadoop cluster using http://pingax.com/install-apache-hadoop-ubuntu-cluster-setup/:
Namenode: node04
Datanode: node01
Datanode: node02
Datanode: node03
I can see only two nodes(node01,node03) running in my cluster. Node02 has an log with error message as:
2015-12-11 10:15:18,698 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: node04/127.17.0.224:9000. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2015-12-11 10:15:19,699 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: node04/127.17.0.224:9000. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2015-12-11 10:15:20,699 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: node04/127.17.0.224:9000. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
Every nodes /etc/hosts contains following:
127.0.0.1 localhost
127.17.0.221 node01
127.17.0.222 node02
127.17.0.223 node03
127.17.0.224 node04
# The following lines are desirable for IPv6 capable hosts
::1 ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
And /etc/hadoop/masters contains node04, /etc/hadoop/slaves contains node01 node02 and node03
Would you please help me understand how to get to it?
Thanks!

Perform these actions:
Go to node02 and run telnet node04 9000 and ping node04 commands
to confirm there is connectivity between node02 and node04
On all nodes check whether core-site.xml and hdfs-site.xml have the same contents

Check ssh and sshd
ssh connection between the nodes
Check port binding details (Hadoop Datanodes cannot find NameNode)
Also refer https://wiki.apache.org/hadoop/ServerNotAvailable

Related

Apache Hadoop multi-node cluster failed not showing remote Datanode

I'm having a hard time with setting up a multi-node cluster. I have a Razer running Ubuntu 20.04 and a IMAC running OSX Catalina. Razer is the host namenode and both the Razer and IMAC are set are the datanodes (slave workers). Both computers have SSH-keys replicated so they can SSH connect without any password. However, I'm having problems with showing the remote datanode from the IMAC as Live on my Hadoop dashboard. I can see the datanode live from the Razer I think it has something to do with my remote machine MAC not being able to connect to the HDFS which I set in the core-site.xml as hfds://hadoopmaster:9000.
RAZER = Hostname: Hadoopmaster
IMAC = Hostname: Hadoopslave
Based on some troubleshooting, I reviewed the datanode logs in the IMAC and saw that it is refusing to connect to hadoopmaster on port 9000.
2020-06-01 13:44:33,193 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:
hadoopmaster/192.168.1.191:8070. Already tried 6 time(s); retry policy is
RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2020-06-01 13:44:35,550 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:
hadoopmaster/192.168.1.191:8070. Already tried 7 time(s); retry policy is
RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2020-06-01 13:44:36,574 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:
hadoopmaster/192.168.1.191:8070. Already tried 8 time(s); retry policy is
RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2020-06-01 13:44:37,597 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:
hadoopmaster/192.168.1.191:8070. Already tried 9 time(s); retry policy is
RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2020-06-01 13:44:37,619 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Problem
connecting to server: hadoopmaster/192.168.1.191:8070
2020-06-01 13:44:44,660 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:
hadoopmaster/192.168.1.191:8070. Already tried 0 time(s); retry policy is
RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2020-06-01 13:44:45,534 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: RECEIVED
SIGNAL 15: SIGTERM
2020-06-01 13:44:45,537 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: SHUTDOWN_MSG:
Here are my settings:
hdfs-site.xml
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
core-site.xml
<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>/usr/local/Cellar/hadoop/hdfs/tmp</value>
<description>A base for other temporary directories</description>
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://hadoopmaster:8070</value>
</property>
</configuration>
So I think there's issues with connecting to port 9000 on my machine. So my next step was testing out the ssh connections in my terminal command window:
IMAC Command: ssh username#hadoopmaster -p 9000
Results:
Refused to connect
So my next step was performing the SSH command on my Razer machine:
Razer Command: ssh hadoopmaster -p 9000
Results:
Refused to connect
So I tried on my Razer to modify the UFW firewall to open port 9000, any to hadoopmaster, all ports, and still no luck.
Please help me have my remote machine IMAC connect to port 9000 on the Razer so I can create the hadoop cluster in my network and view the remote slave machines as live datanodes on the dashboard.

Why suddenly the slave node lost connection to the master node in hadoop?

I have set up hadoop 2.7.2 upon a cluster with a master(ubuntu 15.10) and two slave(slave2,3) hosted in the master by virtualbox.
I have run several examples like wordcount,it all works ok. But when i try to run my own job,say Myjob, it runs well at first, but after a while, it will definitely be interrupted by this error:
INFO ipc.Client: Retrying connect to server: slave3/xxx.216.227.176(the ip of slave):38046. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=1000 MILLISECONDS)
Sometimes it would be slave2 ,sometimes would be slave3. And my connection to that slave by ssh shows that the connection is closed by remote.
But the virtualbox shows that slave runs well, and i can ressh to that slave, However all the hadoop process are been killed.
Need to mention that my own job runs longer than the example job.
At first, i think it maybe some error caused by my config file, so, i reinstall hadoop at master and slaves. But the error stills.
So, i think that maybe it is caused by my network config in the slave nodes. So, i changed the last filed of the slave's ip like xxx.xxx.xxx.183 to xxx.xxx.xxx.176 and reinstall hadoop.
And I rerun the job, at this time the job runs longer than usual. But, at the last, when the map stage mostly finished (map 86% reduce 28%), it failed due to the same error!
INFO ipc.Client: Retrying connect to server: slave3/125.xxx.227.xxx(the ip of slave):38046. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=1000 MILLISECONDS)
Also there is something log in yarn-user-resourcemanager-Master.log :
java.net.ConnectException: Call From Master/xxx.216.227.186 to slave2:44592 failed on connection exception: java.net.ConnectException: refuse to connect; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused
It seems like that the longer time the app runs , the bigger chance it will fail.
Here is my hosts file:
127.0.0.1 localhost
#127.0.1.1 Master
xxx.216.227.186 Master
xxx.216.227.185 slave1# the slave1 has some problem thus do not connect to the cluster
xxx.216.227.176 slave2
xxx.216.227.166 slave3
# The following lines are desirable for IPv6 capable hosts
::1 ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
Why? How to fix it? Thanks!

Datanode is not showing up on hitting jps command

I am newbie in hadoop i have setup multinode cluster but when i hit jps command on master node it shows only namenode not datanode and when i paste this url 'Master:50070' it shows no live node due to which i am unable to copy data from my local system into hdfs it throws this error
hduser#oodles-Latitude-3540:~$ hadoop fs -copyFromLocal /home/oodles/input/test /tmp
15/06/28 16:27:56 WARN hdfs.DFSClient: DataStreamer Exception
org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /tmp/test._COPYING_ could only be replicated to 0 nodes instead of minReplication (=1). There are 0 datanode(s) running and no node(s) are excluded in this operation.
at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:1549)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:3200)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:641)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:482)
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:962)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2039)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2035)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2033)
after starting hadoop cluster using this command start-dfs.sh my namenode started successfully but datanode did't . when i check datanode log it shows this
ToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2015-06-28 04:01:53,496 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: Master/192.168.0.126:9000. Already tried 6 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2015-06-28 04:01:54,498 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: Master/192.168.0.126:9000. Already tried 7 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2015-06-28 04:01:55,499 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: Master/192.168.0.126:9000. Already tried 8 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2015-06-28 04:01:56,500 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: Master/192.168.0.126:9000. Already tried 9 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
i googled but not found solution for this .
when i hit jps command on slave node there it is showing datanode only
and one thing more when i paste 'Master:50070' into browser and Browse file system
it shows me this error
HTTP ERROR 500
Problem accessing /nn_browsedfscontent.jsp. Reason:
Can't browse the DFS since there are no live nodes available to redirect to.
Caused by:
java.io.IOException: Can't browse the DFS since there are no live nodes available to redirect to.
at org.apache.hadoop.hdfs.server.namenode.NamenodeJspHelper.redirectToRandomDataNode(NamenodeJspHelper.java:666)
at org.apache.hadoop.hdfs.server.namenode.nn_005fbrowsedfscontent_jsp._jspService(nn_005fbrowsedfscontent_jsp.java:70)
My hadoop cluster configuration is like this
1) /etc/host file on master
2) /etc/hosts file on slave
i have edit entry in master and slave file in hadoop configuration folder i.e masters file i added Master and slaves file i added Slave1
Can anybody help me to solve these problems!
datanode logs showing in two pictures
Do you config the ssh? Try you use ssh to login the other node to check the ssh connection.

Unable to add a datanode to Hadoop

I got all my settings right and I am able to run Hadoop ( 1.1.2 ) on a single-Node. However, after making the changes to the relevant files ( /etc/hosts, *-site.xml ), I am not able to add a Datanode to the cluster and I keep getting the following error on the Slave.
Anybody knows how to rectify this?
2013-05-13 15:36:10,135 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: localhost/127.0.0.1:54310. Already tried 3 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
2013-05-13 15:36:11,137 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: localhost/127.0.0.1:54310. Already tried 4 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
2013-05-13 15:36:12,140 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: localhost/127.0.0.1:54310. Already tried 5 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
Check the value of fs.default.name in your core-site.xml conf file (on each node in your cluster). This needs to be the network name of the name node and i suspect you have this as hdfs://localhost:54310).
Failing that check for any mention of localhost in your hadoop configuration files on all nodes in your cluster:
grep localhost $HADOOP_HOME/conf/*.xml
try relpacing localhost with the namenode's ip address or network name

Failed to read data from a hadoop URL

I am following Tom White's 'Hadoop - The Definitive Guide'.
When I try to use the Java Interface to read data from a hadoop URL I get the following error message:
hadoop#ubuntu:/usr/local/hadoop$ hadoop URLCat hdfs://master/hdfs/data/SampleText.txt
12/11/21 13:46:32 INFO ipc.Client: Retrying connect to server: master/192.168.9.55:8020. Already tried 0 time(s).
12/11/21 13:46:33 INFO ipc.Client: Retrying connect to server: master/192.168.9.55:8020. Already tried 1 time(s).
12/11/21 13:46:34 INFO ipc.Client: Retrying connect to server: master/192.168.9.55:8020. Already tried 2 time(s).
12/11/21 13:46:35 INFO ipc.Client: Retrying connect to server: master/192.168.9.55:8020. Already tried 3 time(s).
12/11/21 13:46:36 INFO ipc.Client: Retrying connect to server: master/192.168.9.55:8020. Already tried 4 time(s).
12/11/21 13:46:37 INFO ipc.Client: Retrying connect to server: master/192.168.9.55:8020. Already tried 5 time(s).
The Contents of the URLCat file are as follows:
import java.net.URL;
import java.io.InputStream;
import org.apache.hadoop.io.IOUtils;
import org.apache.hadoop.fs.FsUrlStreamHandlerFactory;
public class URLCat {
static {
URL.setURLStreamHandlerFactory(new FsUrlStreamHandlerFactory());
}
public static void main(String[] args) throws Exception {
InputStream in = null;
try {
in = new URL(args[0]).openStream();
IOUtils.copyBytes(in, System.out, 4096, false);
} finally {
IOUtils.closeStream(in);
}
}
}
The /etc/hosts file contents are:
127.0.0.1 localhost
127.0.1.1 ubuntu.ubuntu-domain ubuntu
# The following lines are desirable for IPv6 capable hosts
::1 ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
# /ect/hosts Master and slaves
192.168.9.55 master
192.168.9.56 slave1
192.168.9.57 slave2
192.168.9.58 slave3
First I'd check whether the Hadoop daemons are running. A convenient tool is jps . Make sure that (at least) the namenode and the datanodes are running.
If you still can't connect, check whether the url is correct. As you provided hdfs://master/ (without any port number) Hadoop assumes that your namenode listens on port 8020 (default). This is what you see in the logs.
For a quick lookup in core-site.xml (fs.default.name) you can check whether you have a custom port defined for the filesystem URI (in this case 54310).

Resources