Hadoop Kerberos: Datanode cannot connect to Namenode. Started Datanode by jsvc to binding with privileged ports (not use SASL)

I've set up an HA Hadoop cluster that worked. But after adding Kerberos authentication datanode cannot connect to namenode.
Verified that Namenode servers starts successfully and log no error. I start all services with user 'hduser'
$ sudo netstat -tuplen
tcp 0 0* LISTEN 1001 20218 1518/java
tcp 0 0* LISTEN 1001 20207 1447/java
tcp 0 0* LISTEN 1001 20235 1447/java
Start datanode as root, using jsvc to bind service with privileged ports (ref.
Secure Datanode)
$ sudo -E sbin/hadoop-daemon.sh start datanode
starting datanode, logging to /opt/hadoop-2.7.3/logs//hadoop-hduser-datanode-STWHDDN01.out
Got the error that datanode cannot connect to namenodes:
2018-01-08 09:25:40,051 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: dnUserName = hduser
2018-01-08 09:25:40,052 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: supergroup = supergroup
2018-01-08 09:25:40,114 INFO org.apache.hadoop.ipc.CallQueueManager: Using callQueue class java.util.concurrent.LinkedBlockingQueue
2018-01-08 09:25:40,125 INFO org.apache.hadoop.ipc.Server: Starting Socket Reader #1 for port 50020
2018-01-08 09:25:40,152 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Opened IPC server at /
2018-01-08 09:25:40,219 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Refresh request received for nameservices: ha-cluster
2018-01-08 09:25:41,189 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Starting BPOfferServices for nameservices: ha-cluster
2018-01-08 09:25:41,226 INFO org.apache.hadoop.ipc.Server: IPC Server Responder: starting
2018-01-08 09:25:41,227 INFO org.apache.hadoop.ipc.Server: IPC Server listener on 50020: starting
2018-01-08 09:25:42,297 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: STWHDRM02/ Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2018-01-08 09:25:42,300 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: STWHDRM01/ Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
datanode hdfs-site.xml (excerpt):
I have set HADOOP_SECURE_DN_USER=hduser and JSVC_HOME in hadoop-env.sh
hdfs.keytab on datanode:
$ klist -ke etc/hadoop/hdfs.keytab Keytab name: FILE:etc/hadoop/hdfs.keytab
KVNO Principal
---- --------------------------------------------------------------------------
1 hduser/stwhddn01#FDATA.COM (aes256-cts-hmac-sha1-96)
1 hduser/stwhddn01#FDATA.COM (aes128-cts-hmac-sha1-96)
1 hduser/stwhddn01#FDATA.COM (des3-cbc-sha1)
1 hduser/stwhddn01#FDATA.COM (arcfour-hmac)
1 hduser/stwhddn01#FDATA.COM (des-hmac-sha1)
1 hduser/stwhddn01#FDATA.COM (des-cbc-md5)
1 HTTP/stwhddn01#FDATA.COM (aes256-cts-hmac-sha1-96)
1 HTTP/stwhddn01#FDATA.COM (aes128-cts-hmac-sha1-96)
1 HTTP/stwhddn01#FDATA.COM (des3-cbc-sha1)
1 HTTP/stwhddn01#FDATA.COM (arcfour-hmac)
1 HTTP/stwhddn01#FDATA.COM (des-hmac-sha1)
1 HTTP/stwhddn01#FDATA.COM (des-cbc-md5)
OS: Centos 7
Hadoop: 2.7.3
Kerberos: MIT 1.5.1
I guest when running datanode as user root it does not authenticate with kerberos.
Any ideas?

I found the problem. Need to change /etc/hosts to map to localhost only.
Before STWHDDD01 localhost
After localhost
I still wonder why the old mapping worked in the context of no Kerberos authentication.


Apache Hadoop multi-node cluster failed not showing remote Datanode

I'm having a hard time with setting up a multi-node cluster. I have a Razer running Ubuntu 20.04 and a IMAC running OSX Catalina. Razer is the host namenode and both the Razer and IMAC are set are the datanodes (slave workers). Both computers have SSH-keys replicated so they can SSH connect without any password. However, I'm having problems with showing the remote datanode from the IMAC as Live on my Hadoop dashboard. I can see the datanode live from the Razer I think it has something to do with my remote machine MAC not being able to connect to the HDFS which I set in the core-site.xml as hfds://hadoopmaster:9000.
RAZER = Hostname: Hadoopmaster
IMAC = Hostname: Hadoopslave
Based on some troubleshooting, I reviewed the datanode logs in the IMAC and saw that it is refusing to connect to hadoopmaster on port 9000.
2020-06-01 13:44:33,193 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:
hadoopmaster/ Already tried 6 time(s); retry policy is
RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2020-06-01 13:44:35,550 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:
hadoopmaster/ Already tried 7 time(s); retry policy is
RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2020-06-01 13:44:36,574 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:
hadoopmaster/ Already tried 8 time(s); retry policy is
RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2020-06-01 13:44:37,597 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:
hadoopmaster/ Already tried 9 time(s); retry policy is
RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2020-06-01 13:44:37,619 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Problem
connecting to server: hadoopmaster/
2020-06-01 13:44:44,660 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:
hadoopmaster/ Already tried 0 time(s); retry policy is
RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2020-06-01 13:44:45,534 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: RECEIVED
2020-06-01 13:44:45,537 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: SHUTDOWN_MSG:
Here are my settings:
<description>A base for other temporary directories</description>
So I think there's issues with connecting to port 9000 on my machine. So my next step was testing out the ssh connections in my terminal command window:
IMAC Command: ssh username#hadoopmaster -p 9000
Refused to connect
So my next step was performing the SSH command on my Razer machine:
Razer Command: ssh hadoopmaster -p 9000
Refused to connect
So I tried on my Razer to modify the UFW firewall to open port 9000, any to hadoopmaster, all ports, and still no luck.
Please help me have my remote machine IMAC connect to port 9000 on the Razer so I can create the hadoop cluster in my network and view the remote slave machines as live datanodes on the dashboard.

Unable to start Hadoop (3.1.0) in Pseudomode on Ubuntu (16.04)

I am trying to follow the Getting Started guide from the Hadoop Apache website, in particular from the Pseudo distributed configuration,
Getting started guide from Apache Hadoop 3.1.0
but I am unable to start the Hadoop Name- and Data Nodes. Can anyone help advise ? even if its things I can run to try to debug/investigate further.
At the end of the logs I see an Error message (not sure if its important or a red-herring).
2018-04-18 14:15:40,003 INFO org.apache.hadoop.hdfs.StateChange: STATE* Network topology has 0 racks and 0 datanodes
2018-04-18 14:15:40,006 INFO org.apache.hadoop.hdfs.StateChange: STATE* UnderReplicatedBlocks has 0 blocks
2018-04-18 14:15:40,014 INFO org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Total number of blocks = 0
2018-04-18 14:15:40,014 INFO org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Number of invalid blocks = 0
2018-04-18 14:15:40,014 INFO org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Number of under-replicated blocks = 0
2018-04-18 14:15:40,014 INFO org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Number of over-replicated blocks = 0
2018-04-18 14:15:40,014 INFO org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Number of blocks being written = 0
2018-04-18 14:15:40,014 INFO org.apache.hadoop.hdfs.StateChange: STATE* Replication Queue initialization scan for invalid, over- and under-replicated blocks completed in 11 msec
2018-04-18 14:15:40,028 INFO org.apache.hadoop.ipc.Server: IPC Server Responder: starting
2018-04-18 14:15:40,028 INFO org.apache.hadoop.ipc.Server: IPC Server listener on 9000: starting
2018-04-18 14:15:40,029 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: NameNode RPC up at: localhost/
2018-04-18 14:15:40,031 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Starting services required for active state
2018-04-18 14:15:40,031 INFO org.apache.hadoop.hdfs.server.namenode.FSDirectory: Initializing quota with 4 thread(s)
2018-04-18 14:15:40,033 INFO org.apache.hadoop.hdfs.server.namenode.FSDirectory: Quota initialization completed in 2 milliseconds name space=1 storage space=0 storage types=RAM_DISK=0, SSD=0, DISK=0, ARCHIVE=0, PROVIDED=0 2018-04-18 14:15:40,037 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Starting CacheReplicationMonitor with interval 30000 milliseconds
> 2018-04-18 14:15:40,232 ERROR
> org.apache.hadoop.hdfs.server.namenode.NameNode: RECEIVED SIGNAL 15:
> 2018-04-18 14:15:40,236 ERROR
> org.apache.hadoop.hdfs.server.namenode.NameNode: RECEIVED SIGNAL 1:
> 2018-04-18 14:15:40,236 INFO
> org.apache.hadoop.hdfs.server.namenode.NameNode: SHUTDOWN_MSG:
> /************************************************************
> SHUTDOWN_MSG: Shutting down NameNode at c0315/
I have confirmed, that I can ssh localhost without a password prompt. I have also run the following steps from the above mentioned Apache Getting Started guide,
$ bin/hdfs namenode -format
$ sbin/start-dfs.sh
But I cant run step 3. to browse the location at http://localhost:9870/. When I run >jsp from the terminal prompt I just get returned,
14900 Jps
I was expecting a list of my nodes.
I will attach the full logs.
Can anyone help even with ways to debug this please ?
Java Version,
$ java --version
java 9.0.4
Java(TM) SE Runtime Environment (build 9.0.4+11)
Java HotSpot(TM) 64-Bit Server VM (build 9.0.4+11, mixed mode)
EDIT1 : I have repeated the steps with Java8 as well and get the same error message.
EDIT2: Following the comment suggestions below I have checked that I am definitely pointing at Java8 now and I have also commented out the localhost setting for from the /etc/hosts file
Ubuntu version,
$ lsb_release -a
No LSB modules are available.
Distributor ID: neon
Description: KDE neon User Edition 5.12
Release: 16.04
Codename: xenial
I have tried running the commands, bin/hdfs version
Hadoop 3.1.0
Source code repository https://github.com/apache/hadoop -r 16b70619a24cdcf5d3b0fcf4b58ca77238ccbe6d
Compiled by centos on 2018-03-30T00:00Z
Compiled with protoc 2.5.0
From source with checksum 14182d20c972b3e2105580a1ad6990
This command was run using /home/steelydan.com/roycecollige/Apps/hadoop-3.1.0/share/hadoop/common/hadoop-common-3.1.0.jar
when I try bin/hdfs groups it doesnt return but gives me,
018-04-18 15:33:34,590 INFO ipc.Client: Retrying connect to server: localhost/ Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
when I try, $ bin/hdfs lsSnapshottableDir
lsSnapshottableDir: Call From c0315/ to localhost:9000 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused
when I try, $ bin/hdfs classpath
I have not been able to figure out (I just tried again since I miss NEON so much) but even though :9000 is not in use, the OS sends a SIGTERM in my case too.
The only way I have found to solve this was to go back to stock Ubuntu, sadly.

hadoop master host fails to connect to localhost: connection refused

I've setup HDFS with two nodes, on different hosts, in the same network. I'm using HDFS C++ API. Hdfs name node and data nodes start normally, but when I try to read any data, or open a file, I get the following error:
Call From master/192.168.X.X to localhost:54310 failed on connection exception: connection refused
So I guess it's connected with ssh.
On master box, the following commands work (/etc/hosts file contains master and slave):
ssh master
ssh slave
ssh localhost
ssh user#localhsot
ssh localhost -p 22
But when I try ssh localhost -p 54310, it fails with 'connection refused' error. But ps -ef | grep :54310shows that name node listens on that port.
Any ideas how to fix that?
<description>Default block replication.
The actual number of replications can be specified when the file is created.
The default is used if replication is not specified in create time.
<description>Default block replication.
The actual number of replications can be specified when the file is created.
The default is used if replication is not specified in create time.
<description>The host and port that the MapReduce job tracker runs
at. If "local", then jobs are run in-process as a single map
and reduce task.
<description>A base for other temporary directories.</description>
<description>The name of the default file system. A URI whose
scheme and authority determine the FileSystem implementation. The
uri's scheme determines the config property (fs.SCHEME.impl) naming
the FileSystem implementation class. The uri's authority is used to
determine the host, port, etc. for a filesystem.</description>
EDIT: results from netstat -an
tcp 0 0* LISTEN
tcp 0 0 TIME_WAIT
tcp 0 0 TIME_WAIT
I've also replaced master with localhost on master host, which solved the problem on master. Now the only error I'm getting is in slave, which fails to connect to the master
2018-01-21 23:53:18,597 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: master/ Already tried 8 time(s); retry policy is RetryUpToMaximumCountWithFixedSl eep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2018-01-21 23:53:19,599 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: master/ Already tried 9 time(s); retry policy is RetryUpToMaximumCountWithFixedSl eep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2018-01-21 23:53:19,609 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Problem connecti ng to server: master/
2018-01-21 23:53:25,613 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: master/ Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSl eep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2018-01-21 23:53:26,615 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: master/ Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSl eep(maxRetries=10, sleepTime=1000 MILLISECONDS)

hadoop namenode not starting/formatting on Ubuntu

am trying to set up a hadoop instance on Ubuntu. The namenode is not starting up. When i do jps command I can see all but namenode . Here is my hdfs-site.xml file.
and heres my core-site.xml
The error that i got is
ERROR org.apache.hadoop.hdfs.server.namenode.FSNamesystem: FSNamesystem initialization failed.
java.io.IOException: NameNode is not formatted.
When I formatted namenode I got this on prompt
STARTUP_MSG: Starting NameNode
STARTUP_MSG: host = hanu/
STARTUP_MSG: args = [–format]
STARTUP_MSG: version = 1.2.1
STARTUP_MSG: build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.2 -r 1503152; compiled by 'mattf' on Mon Jul 22 15:23:09 PDT 2013
STARTUP_MSG: java = 1.8.0_31
Usage: java NameNode [-format [-force ] [-nonInteractive]] | [-upgrade] | [-rollback] | [-finalize] | [-importCheckpoint] | [-recover [ -force ] ]
15/02/03 15:03:41 INFO namenode.NameNode: SHUTDOWN_MSG:
SHUTDOWN_MSG: Shutting down NameNode at hanu/
I've tried to to change files as per various suggestions out there but nothing is working. I think namenode is not formatting properly.
Whats wrong in my setup and how can I get it corrected.Any help is appreciated. Thanks
The reason you are seeing the error message is because of command typo, that is why namenode class is showing the Usage error, may be you have issued the command option improperly.
Make sure you type the command properly:
bin/hadoop namenode -format
and then try to start the NameNode, you could start NameNode service on foreground just to see if everything is working out properly and if you don't see any errors you could kill the process and start all the services using start-all.sh script.
Here's how you could start NameNode process on foreground:
bin/hadoop namenode
once started these are the log messages to look for to validate a proper startup:
15/02/04 10:42:44 INFO http.HttpServer: Jetty bound to port 50070
15/02/04 10:42:44 INFO mortbay.log: jetty-6.1.26
15/02/04 10:42:45 INFO mortbay.log: Started SelectChannelConnector#
15/02/04 10:42:45 INFO namenode.NameNode: Web-server up at:
15/02/04 10:42:45 INFO ipc.Server: IPC Server Responder: starting
15/02/04 10:42:45 INFO ipc.Server: IPC Server listener on 8020: starting
15/02/04 10:42:45 INFO ipc.Server: IPC Server handler 0 on 8020: starting
15/02/04 10:42:45 INFO ipc.Server: IPC Server handler 1 on 8020: starting
15/02/04 10:42:45 INFO ipc.Server: IPC Server handler 2 on 8020: starting
15/02/04 10:42:45 INFO ipc.Server: IPC Server handler 3 on 8020: starting
15/02/04 10:42:45 INFO ipc.Server: IPC Server handler 4 on 8020: starting
15/02/04 10:42:45 INFO ipc.Server: IPC Server handler 5 on 8020: starting
15/02/04 10:42:45 INFO ipc.Server: IPC Server handler 6 on 8020: starting
15/02/04 10:42:45 INFO ipc.Server: IPC Server handler 7 on 8020: starting
15/02/04 10:42:45 INFO ipc.Server: IPC Server handler 8 on 8020: starting
15/02/04 10:42:45 INFO ipc.Server: IPC Server handler 9 on 8020: starting
you could kill the service by sending <Ctrl+C> to the process.

unable to check nodes on hadoop [Connection refused]

If I type http://localhost:50070 or http://localhost:9000 to see the nodes,my browser shows me nothing I think it can't connect to the server.
I tested my hadoop with this command:
hadoop jar hadoop-*test*.jar TestDFSIO -write -nrFiles 10 -fileSize 1000
but too didn't work and it tries to connect to the server,this is the output:
12/06/06 17:25:24 INFO mapred.FileInputFormat: nrFiles = 10
12/06/06 17:25:24 INFO mapred.FileInputFormat: fileSize (MB) = 1000
12/06/06 17:25:24 INFO mapred.FileInputFormat: bufferSize = 1000000
12/06/06 17:25:25 INFO ipc.Client: Retrying connect to server: localhost/ Already tried 0 time(s).
12/06/06 17:25:26 INFO ipc.Client: Retrying connect to server: localhost/ Already tried 1 time(s).
12/06/06 17:25:27 INFO ipc.Client: Retrying connect to server: localhost/ Already tried 2 time(s).
12/06/06 17:25:28 INFO ipc.Client: Retrying connect to server: localhost/ Already tried 3 time(s).
12/06/06 17:25:29 INFO ipc.Client: Retrying connect to server: localhost/ Already tried 4 time(s).
12/06/06 17:25:30 INFO ipc.Client: Retrying connect to server: localhost/ Already tried 5 time(s).
12/06/06 17:25:31 INFO ipc.Client: Retrying connect to server: localhost/ Already tried 6 time(s).
12/06/06 17:25:32 INFO ipc.Client: Retrying connect to server: localhost/ Already tried 7 time(s).
12/06/06 17:25:33 INFO ipc.Client: Retrying connect to server: localhost/ Already tried 8 time(s).
12/06/06 17:25:34 INFO ipc.Client: Retrying connect to server: localhost/ Already tried 9 time(s).
java.net.ConnectException: Call to localhost/ failed on connection exception: java.net.ConnectException: Connection refused
I changed some files like this:
in conf/core-site.xml:
in conf/hdfs-site.xml:
in conf/mapred-site.xml:
Thanks for your attention. If I run this command:
cat /etc/hosts
I see: localhost ubuntu.ubuntu-domain ubuntu
# The following lines are desirable for IPv6 capable hosts
::1 ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
and if i run this one:
ps axww | grep hadoop
I see this result:
2170 pts/0 S+ 0:00 grep --color=auto hadoop
but no effect. Have you any idea, how can I solve my problem?
There are few things that you need to take care of before starting hadoop services.
Check what this returns:
hostname --fqdn
In your case this should be localhost.
Also comment out IPV6 in /etc/hosts.
Did you format the namenode before starting HDFS.
hadoop namenode -format
How did you install Hadoop. Location of log files will depend on that. Usually it is in location "/var/log/hadoop/" if you have used cloudera's distribution.
If you are a complete newbie, I suggest installing Hadoop using Cloudera SCM which is quite easy. I have posted my approach in installing Hadoop with Cloudera's distribution.
Make sure DFS location has a write permission. It usually sits # /usr/local/hadoop_store/hdfs
That is a common reason.
same problem i got and this solved my problem:
problem lies with the permission given to the folders
"chmod" 755 or greater for the folders
Another possibility is the namenode is not running.
You can remove the HDFS files:
rm -rf /tmp/hadoop*
Reformat the HDFS
bin/hadoop namenode -format
And restart hadoop services
bin/hadoop/start-all.sh (Hadoop 1.x)
sbin/hadoop/start-all.sh (Hadoop 2.x)
also edit your /etc/hosts file and change to dns resolution is very important for hadoop and a bit tricky too..also add following property in your core-site.xml file -
the default location for this property is /tmp directory which get emptied after each system restart..so you loose all your info at each restart..also add these properties in your hdfs-site.xml file -
I am assuming that is your first installation of hadoop.
At the beginning please check if your daemons are working. To do that use (in terminal):
If only jps appears that means all daemons are down. Please check the log files. Especially the namenode. Log folder is probably somewhere there /usr/lib/hadoop/logs
If you have some permission problems. Use this guide during the installation.
Good installation guide
I am shooting with this explanations but these are most common problems.
Hi Edit your core conf/core-site.xml and change localhost to Use the conf below. That should work.
