hadoop master host fails to connect to localhost: connection refused - hadoop

I've setup HDFS with two nodes, on different hosts, in the same network. I'm using HDFS C++ API. Hdfs name node and data nodes start normally, but when I try to read any data, or open a file, I get the following error:
Call From master/192.168.X.X to localhost:54310 failed on connection exception: connection refused
So I guess it's connected with ssh.
On master box, the following commands work (/etc/hosts file contains master and slave):
ssh master
ssh slave
ssh localhost
ssh user#localhsot
ssh localhost -p 22
But when I try ssh localhost -p 54310, it fails with 'connection refused' error. But ps -ef | grep :54310shows that name node listens on that port.
Any ideas how to fix that?
hdfs-site.xml
<property>
<name>dfs.replication</name>
<value>2</value>
<description>Default block replication.
The actual number of replications can be specified when the file is created.
The default is used if replication is not specified in create time.
</description>
</property>
<property>
<name>dfs.datanode.max.locked.memory</name>
<value>0</value>
<description>Default block replication.
The actual number of replications can be specified when the file is created.
The default is used if replication is not specified in create time.
</description>
</property>
</configuration>
mapreg-site.xml
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>master:54311</value>
<description>The host and port that the MapReduce job tracker runs
at. If "local", then jobs are run in-process as a single map
and reduce task.
</description>
</property>
</configuration>
core-site.xml
<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>/app/hadoop/tmp</value>
<description>A base for other temporary directories.</description>
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://master:54310</value>
<description>The name of the default file system. A URI whose
scheme and authority determine the FileSystem implementation. The
uri's scheme determines the config property (fs.SCHEME.impl) naming
the FileSystem implementation class. The uri's authority is used to
determine the host, port, etc. for a filesystem.</description>
</property>
</configuration>
slaves
master
slave
masters
master
EDIT: results from netstat -an
tcp 0 0 127.0.0.1:54310 0.0.0.0:* LISTEN
tcp 0 0 127.0.0.1:54310 127.0.0.1:45156 ESTABLISHED
tcp 0 0 127.0.0.1:45156 127.0.0.1:54310 ESTABLISHED
tcp 0 0 127.0.0.1:54310 127.0.0.1:45140 TIME_WAIT
tcp 0 0 127.0.0.1:54310 127.0.0.1:45134 TIME_WAIT
I've also replaced master with localhost on master host, which solved the problem on master. Now the only error I'm getting is in slave, which fails to connect to the master
2018-01-21 23:53:18,597 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: master/ 192.168.0.237:54310. Already tried 8 time(s); retry policy is RetryUpToMaximumCountWithFixedSl eep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2018-01-21 23:53:19,599 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: master/ 192.168.0.237:54310. Already tried 9 time(s); retry policy is RetryUpToMaximumCountWithFixedSl eep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2018-01-21 23:53:19,609 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Problem connecti ng to server: master/192.168.0.237:54310
2018-01-21 23:53:25,613 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: master/ 192.168.0.237:54310. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSl eep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2018-01-21 23:53:26,615 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: master/ 192.168.0.237:54310. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSl eep(maxRetries=10, sleepTime=1000 MILLISECONDS)

Related

Apache Hadoop multi-node cluster failed not showing remote Datanode

I'm having a hard time with setting up a multi-node cluster. I have a Razer running Ubuntu 20.04 and a IMAC running OSX Catalina. Razer is the host namenode and both the Razer and IMAC are set are the datanodes (slave workers). Both computers have SSH-keys replicated so they can SSH connect without any password. However, I'm having problems with showing the remote datanode from the IMAC as Live on my Hadoop dashboard. I can see the datanode live from the Razer I think it has something to do with my remote machine MAC not being able to connect to the HDFS which I set in the core-site.xml as hfds://hadoopmaster:9000.
RAZER = Hostname: Hadoopmaster
IMAC = Hostname: Hadoopslave
Based on some troubleshooting, I reviewed the datanode logs in the IMAC and saw that it is refusing to connect to hadoopmaster on port 9000.
2020-06-01 13:44:33,193 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:
hadoopmaster/192.168.1.191:8070. Already tried 6 time(s); retry policy is
RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2020-06-01 13:44:35,550 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:
hadoopmaster/192.168.1.191:8070. Already tried 7 time(s); retry policy is
RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2020-06-01 13:44:36,574 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:
hadoopmaster/192.168.1.191:8070. Already tried 8 time(s); retry policy is
RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2020-06-01 13:44:37,597 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:
hadoopmaster/192.168.1.191:8070. Already tried 9 time(s); retry policy is
RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2020-06-01 13:44:37,619 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Problem
connecting to server: hadoopmaster/192.168.1.191:8070
2020-06-01 13:44:44,660 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:
hadoopmaster/192.168.1.191:8070. Already tried 0 time(s); retry policy is
RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2020-06-01 13:44:45,534 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: RECEIVED
SIGNAL 15: SIGTERM
2020-06-01 13:44:45,537 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: SHUTDOWN_MSG:
Here are my settings:
hdfs-site.xml
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
core-site.xml
<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>/usr/local/Cellar/hadoop/hdfs/tmp</value>
<description>A base for other temporary directories</description>
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://hadoopmaster:8070</value>
</property>
</configuration>
So I think there's issues with connecting to port 9000 on my machine. So my next step was testing out the ssh connections in my terminal command window:
IMAC Command: ssh username#hadoopmaster -p 9000
Results:
Refused to connect
So my next step was performing the SSH command on my Razer machine:
Razer Command: ssh hadoopmaster -p 9000
Results:
Refused to connect
So I tried on my Razer to modify the UFW firewall to open port 9000, any to hadoopmaster, all ports, and still no luck.
Please help me have my remote machine IMAC connect to port 9000 on the Razer so I can create the hadoop cluster in my network and view the remote slave machines as live datanodes on the dashboard.

Hadoop Kerberos: Datanode cannot connect to Namenode. Started Datanode by jsvc to binding with privileged ports (not use SASL)

I've set up an HA Hadoop cluster that worked. But after adding Kerberos authentication datanode cannot connect to namenode.
Verified that Namenode servers starts successfully and log no error. I start all services with user 'hduser'
$ sudo netstat -tuplen
...
tcp 0 0 10.28.94.150:8019 0.0.0.0:* LISTEN 1001 20218 1518/java
tcp 0 0 10.28.94.150:50070 0.0.0.0:* LISTEN 1001 20207 1447/java
tcp 0 0 10.28.94.150:9000 0.0.0.0:* LISTEN 1001 20235 1447/java
Datanode
Start datanode as root, using jsvc to bind service with privileged ports (ref.
Secure Datanode)
$ sudo -E sbin/hadoop-daemon.sh start datanode
starting datanode, logging to /opt/hadoop-2.7.3/logs//hadoop-hduser-datanode-STWHDDN01.out
Got the error that datanode cannot connect to namenodes:
...
2018-01-08 09:25:40,051 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: dnUserName = hduser
2018-01-08 09:25:40,052 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: supergroup = supergroup
2018-01-08 09:25:40,114 INFO org.apache.hadoop.ipc.CallQueueManager: Using callQueue class java.util.concurrent.LinkedBlockingQueue
2018-01-08 09:25:40,125 INFO org.apache.hadoop.ipc.Server: Starting Socket Reader #1 for port 50020
2018-01-08 09:25:40,152 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Opened IPC server at /0.0.0.0:50020
2018-01-08 09:25:40,219 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Refresh request received for nameservices: ha-cluster
2018-01-08 09:25:41,189 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Starting BPOfferServices for nameservices: ha-cluster
2018-01-08 09:25:41,226 INFO org.apache.hadoop.ipc.Server: IPC Server Responder: starting
2018-01-08 09:25:41,227 INFO org.apache.hadoop.ipc.Server: IPC Server listener on 50020: starting
2018-01-08 09:25:42,297 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: STWHDRM02/10.28.94.151:9000. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2018-01-08 09:25:42,300 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: STWHDRM01/10.28.94.150:9000. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
datanode hdfs-site.xml (excerpt):
<property>
<name>dfs.block.access.token.enable</name>
<value>true</value>
</property>
<property>
<name>dfs.datanode.keytab.file</name>
<value>/opt/hadoop/etc/hadoop/hdfs.keytab</value>
</property>
<property>
<name>dfs.datanode.kerberos.principal</name>
<value>hduser/_HOST#FDATA.COM</value>
</property>
<property>
<name>dfs.datanode.address</name>
<value>0.0.0.0:1004</value>
</property>
<property>
<name>dfs.datanode.http.address</name>
<value>0.0.0.0:1006</value>
</property>
<property>
<name>dfs.datanode.data.dir.perm</name>
<value>700</value>
</property>
I have set HADOOP_SECURE_DN_USER=hduser and JSVC_HOME in hadoop-env.sh
hdfs.keytab on datanode:
$ klist -ke etc/hadoop/hdfs.keytab Keytab name: FILE:etc/hadoop/hdfs.keytab
KVNO Principal
---- --------------------------------------------------------------------------
1 hduser/stwhddn01#FDATA.COM (aes256-cts-hmac-sha1-96)
1 hduser/stwhddn01#FDATA.COM (aes128-cts-hmac-sha1-96)
1 hduser/stwhddn01#FDATA.COM (des3-cbc-sha1)
1 hduser/stwhddn01#FDATA.COM (arcfour-hmac)
1 hduser/stwhddn01#FDATA.COM (des-hmac-sha1)
1 hduser/stwhddn01#FDATA.COM (des-cbc-md5)
1 HTTP/stwhddn01#FDATA.COM (aes256-cts-hmac-sha1-96)
1 HTTP/stwhddn01#FDATA.COM (aes128-cts-hmac-sha1-96)
1 HTTP/stwhddn01#FDATA.COM (des3-cbc-sha1)
1 HTTP/stwhddn01#FDATA.COM (arcfour-hmac)
1 HTTP/stwhddn01#FDATA.COM (des-hmac-sha1)
1 HTTP/stwhddn01#FDATA.COM (des-cbc-md5)
OS: Centos 7
Hadoop: 2.7.3
Kerberos: MIT 1.5.1
I guest when running datanode as user root it does not authenticate with kerberos.
Any ideas?
I found the problem. Need to change /etc/hosts to map 127.0.0.1 to localhost only.
Before
127.0.0.1 STWHDDD01
127.0.0.1 localhost
...
After
127.0.0.1 localhost
...
I still wonder why the old mapping worked in the context of no Kerberos authentication.

Hadoop:datanode not connecting to namenode on localhost:50070 cluster summary shows 0

logs
2014-05-12 16:41:26,773 INFO org.apache.hadoop.ipc.RPC: Server at namenode/192.168.12.196:10001 not available yet, Zzzzz...
2014-05-12 16:41:28,777 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: namenode/192.168.12.196:10001. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
core site xml....
<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>/usr/local/hadoop/tmp</value>
<description>A base for other temporary directories.</description>
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://user#namenode:10001</value>
</property>
</configuration>
i put in etc/hosts
192.168.12.196 namenode
in masters
user#namenode
in slaves
localhost
and my namenode is on user#192.168.12.196
if i do jps on all node it shows datanode namenode job/tasktracker working fine
You need to change to localhost into namenode in slaves and masters file and restart hadoop once it will work fine.
for better view
Thanks for your Comment
if i put hostname in slaves of namenode it runs datanode and namenode on same node
configuration of my masters and slave are hereafter,
on namenode's masters
'user#namenode'
on namenode'master
hdname1#data1 (data1 belong to ip of node and hdname1 is user)
hdname2#data2
on datanode's masters
user#namenode
on datanode's slaves
hdname1#data1

How to change address 'hadoop jar' command is connecting to?

I have been trying to start a MapReduce job on my cluster with the following command:
bin/hadoop jar myjar.jar MainClass /user/hduser/input /user/hduser/output
But I get the following error over and over again, until connection is refused:
13/08/08 00:37:16 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:54310. Already tried 5 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
I then checked with netstat to see if the service was listening to the correct port:
~> sudo netstat -plten | grep java
tcp 0 0 10.1.1.4:54310 0.0.0.0:* LISTEN 10022 38365 11366/java
tcp 0 0 10.1.1.4:54311 0.0.0.0:* LISTEN 10022 32164 11829/java
Now I notice that my service is listening to port 10.1.1.4:54310, which is the IP of my master, but it seems that the 'hadoop jar' command is connecting to 127.0.0.1 (the localhost, which is the same machine) but therefore doesn't find the service. Is there anyway to force 'hadoop jar' to look at 10.1.1.4 instead of 127.0.0.1?
My NameNode, DataNode, JobTracker, TaskTracker, ... are all running. I even checked for DataNode and TaskTracker on the slaves and it all seems to be working. I can check the WebUI on the master and it shows my cluster is online.
I expect the problem to be DNS related since it seems that the 'hadoop jar' command finds the correct port, but always uses the 127.0.0.1 address instead of the 10.1.1.4
UPDATE
Configuration in core-site.xml
<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>/app/hadoop/tmp</value>
<description>A base for other temporary directories.</description>
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://master:54310</value>
<description>The name of the default file system. A URI whose
scheme and authority determine the FileSystem implementation. The
uri's scheme determines the config property (fs.SCHEME.impl) naming
the FileSystem implementation class. The uri's authority is used to
determine the host, port, etc. for a filesystem.</description>
</property>
</configuration>
Configuration in mapred-site.xml
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>master:54311</value>
<description>The host and port that the MapReduce job tracker runs
at. If "local", then jobs are run in-process as a single map
and reduce task.
</description>
</property>
</configuration>
Configuration in hdfs-site.xml
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
<description>Default block replication.
The actual number of replications can be specified when the file is created.
The default is used if replication is not specified in create time.
</description>
</property>
</configuration>
Although it seemed to be a DNS issue, it was actually Hadoop trying to resolve a reference to localhost in the code. I was deploying the jar of someone else and assumed it was correct. Upon further inspection I found the reference to localhost and changed it to master, solving my issue.

unable to check nodes on hadoop [Connection refused]

If I type http://localhost:50070 or http://localhost:9000 to see the nodes,my browser shows me nothing I think it can't connect to the server.
I tested my hadoop with this command:
hadoop jar hadoop-*test*.jar TestDFSIO -write -nrFiles 10 -fileSize 1000
but too didn't work and it tries to connect to the server,this is the output:
12/06/06 17:25:24 INFO mapred.FileInputFormat: nrFiles = 10
12/06/06 17:25:24 INFO mapred.FileInputFormat: fileSize (MB) = 1000
12/06/06 17:25:24 INFO mapred.FileInputFormat: bufferSize = 1000000
12/06/06 17:25:25 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000. Already tried 0 time(s).
12/06/06 17:25:26 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000. Already tried 1 time(s).
12/06/06 17:25:27 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000. Already tried 2 time(s).
12/06/06 17:25:28 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000. Already tried 3 time(s).
12/06/06 17:25:29 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000. Already tried 4 time(s).
12/06/06 17:25:30 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000. Already tried 5 time(s).
12/06/06 17:25:31 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000. Already tried 6 time(s).
12/06/06 17:25:32 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000. Already tried 7 time(s).
12/06/06 17:25:33 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000. Already tried 8 time(s).
12/06/06 17:25:34 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000. Already tried 9 time(s).
java.net.ConnectException: Call to localhost/127.0.0.1:9000 failed on connection exception: java.net.ConnectException: Connection refused
I changed some files like this:
in conf/core-site.xml:
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
</property>
in conf/hdfs-site.xml:
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
</configuration>
in conf/mapred-site.xml:
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>localhost:9001</value>
</property>
</configuration>
Thanks for your attention. If I run this command:
cat /etc/hosts
I see:
127.0.0.1 localhost
127.0.1.1 ubuntu.ubuntu-domain ubuntu
# The following lines are desirable for IPv6 capable hosts
::1 ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
and if i run this one:
ps axww | grep hadoop
I see this result:
2170 pts/0 S+ 0:00 grep --color=auto hadoop
but no effect. Have you any idea, how can I solve my problem?
There are few things that you need to take care of before starting hadoop services.
Check what this returns:
hostname --fqdn
In your case this should be localhost.
Also comment out IPV6 in /etc/hosts.
Did you format the namenode before starting HDFS.
hadoop namenode -format
How did you install Hadoop. Location of log files will depend on that. Usually it is in location "/var/log/hadoop/" if you have used cloudera's distribution.
If you are a complete newbie, I suggest installing Hadoop using Cloudera SCM which is quite easy. I have posted my approach in installing Hadoop with Cloudera's distribution.
Also
Make sure DFS location has a write permission. It usually sits # /usr/local/hadoop_store/hdfs
That is a common reason.
same problem i got and this solved my problem:
problem lies with the permission given to the folders
"chmod" 755 or greater for the folders
/home/username/hadoop/*
Another possibility is the namenode is not running.
You can remove the HDFS files:
rm -rf /tmp/hadoop*
Reformat the HDFS
bin/hadoop namenode -format
And restart hadoop services
bin/hadoop/start-all.sh (Hadoop 1.x)
or
sbin/hadoop/start-all.sh (Hadoop 2.x)
also edit your /etc/hosts file and change 127.0.1.1 to 127.0.0.1...proper dns resolution is very important for hadoop and a bit tricky too..also add following property in your core-site.xml file -
<property>
<name>hadoop.tmp.dir</name>
<value>/path_to_temp_directory</value>
</property>
the default location for this property is /tmp directory which get emptied after each system restart..so you loose all your info at each restart..also add these properties in your hdfs-site.xml file -
<property>
<name>dfs.name.dir</name>
<value>/path_to_name_directory</value>
</property>
<property>
<name>dfs.data.dir</name>
<value>/path_to_data_directory</value>
</property>
I am assuming that is your first installation of hadoop.
At the beginning please check if your daemons are working. To do that use (in terminal):
jps
If only jps appears that means all daemons are down. Please check the log files. Especially the namenode. Log folder is probably somewhere there /usr/lib/hadoop/logs
If you have some permission problems. Use this guide during the installation.
Good installation guide
I am shooting with this explanations but these are most common problems.
Hi Edit your core conf/core-site.xml and change localhost to 0.0.0.0. Use the conf below. That should work.
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://0.0.0.0:9000</value>
</property>

Resources