Secure Hadoop - Datanode cannot connect with namenode - hadoop

I am using hadoop-2.6.0 and created HA enabled cluster with kerberos security in windows platform. Everything works fine if permission is set to false. But when I enable below property,
hdfs-site.xml
<property>
<name>dfs.permissions</name>
<value>true</value>
</property>
Datanode cannot connect with the namenode. I am getting the following exception
Exception
2015-05-21 10:44:42,461 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Problem connecting to server: kumar/192.168.3.4:9000
2015-05-21 10:44:46,079 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Problem connecting to server: dinesh/192.168.3.3:9000
2015-05-21 10:44:47,471 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Problem connecting to server: kumar/192.168.3.4:9000
2015-05-21 10:44:51,085 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Problem connecting to server: dinesh/192.168.3.3:9000
2015-05-21 10:44:52,477 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Problem connecting to server: kumar/192.168.3.4:9000
I cannot find the exact root cause for this problem. I need help to solve this.

I just changed the default supergroup name to newly created group which has members of all hadoop users. Now all user in that group act as superuser hence it works fine.
<property>
<name>dfs.permissions.superusergroup</name>
<value>Hadoopgroup</value>
</property>
Refer superuser

Related

hbase master not starting

I am running HBase on Hadoop in standalone mode. I have successfully installed hadoop,zookeeper and hbase but in hbase master is not starting. Below is my hbase-site.xml
<configuration>
<property>
<name>hbase.rootdir</name>
<value>hdfs://localhost:9000/hbase</value>
</property>
<property>
<name>hbase.zookeeper.property.dataDir</name>
<value>/home/kumar/hdata/zookeeper</value>
</property>
<property>
<name>hbase.zookeeper.property.clientPort</name>
<value>2181</value>
</property>
</configuration>
I have started hadoop and zookeeper services:
start-all.sh
zkServer.sh start
start-hbase.sh
and the processes I am getting in Jps command
2133 DataNode
1974 NameNode
2679 NodeManager
2365 SecondaryNameNode
3917 QuorumPeerMain
2527 ResourceManager
3935 Jps
Hbase shell is starting succcesfully, but when i run any command in shell like 'list' I am getting below error:
ERROR: KeeperErrorCode = NoNode for /hbase/master
After that I try to run master with below command
hbase master start
and I am getting below error:
2018-10-15 18:51:51,380 ERROR [main] server.ZooKeeperServer: ZKShutdownHandler is not registered, so ZooKeeper server won't take any action on ERROR or SHUTDOWN server state changes
2018-10-15 18:51:51,437 INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2182] server.NIOServerCnxnFactory: Accepted socket connection from /127.0.0.1:34034
2018-10-15 18:51:51,479 INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2182] server.ServerCnxn: The list of known four letter word commands is : [{1936881266=srvr, 1937006964=stat, 2003003491=wchc, 1685417328=dump, 1668445044=crst, 1936880500=srst, 1701738089=envi, 1668247142=conf, 2003003507=wchs, 2003003504=wchp, 1668247155=cons, 1835955314=mntr, 1769173615=isro, 1920298859=ruok, 1735683435=gtmk, 1937010027=stmk}]
2018-10-15 18:51:51,479 INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2182] server.ServerCnxn: The list of enabled four letter word commands is : [[wchs, stat, stmk, conf, ruok, mntr, srvr, envi, srst, isro, dump, gtmk, crst, cons]]
2018-10-15 18:51:51,479 INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2182] server.NIOServerCnxn: Processing stat command from /127.0.0.1:34034
2018-10-15 18:51:51,485 INFO [Thread-2] server.NIOServerCnxn: Stat command output
2018-10-15 18:51:51,491 INFO [main] zookeeper.MiniZooKeeperCluster: Started MiniZooKeeperCluster and ran successful 'stat' on client port=2182
Could not start ZK at requested port of 2181. ZK was started at port: 2182. Aborting as clients (e.g. shell) will not be able to find this ZK quorum.
2018-10-15 18:51:51,497 ERROR [main] master.HMasterCommandLine: Master exiting
java.io.IOException: Could not start ZK at requested port of 2181. ZK was started at port: 2182. Aborting as clients (e.g. shell) will not be able to find this ZK quorum.
at org.apache.hadoop.hbase.master.HMasterCommandLine.startMaster(HMasterCommandLine.java:217)
at org.apache.hadoop.hbase.master.HMasterCommandLine.run(HMasterCommandLine.java:140)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.hbase.util.ServerCommandLine.doMain(ServerCommandLine.java:149)
at org.apache.hadoop.hbase.master.HMaster.main(HMaster.java:2983)
2018-10-15 18:51:51,500 INFO [Thread-2] server.NIOServerCnxn: Closed socket connection for client /127.0.0.1:34034 (no session established for client)
Also I am not getting any response for local hbase web URL
localhost:60010
HBase has a built-in instance of Zookeeper for just development environments and by default when you start HBase by the command start-hbase.sh it start the Zookeeper daemon, too. The error is because of you already start a standalone Zookeeper that uses the port 2181. When you start HBase it tries to start it's built-in zookeeper in port 2181, too and it got the error!
If you want to use standalone Zookeeper component, first edit the file hbase-env.sh and add the line: export HBASE_MANAGES_ZK=false (You also can search for HBASE_MANAGES_ZK variable in the file and set it to false). So now when you start the HBase, it just starts HBase Daemon and not Zookeeper anymore. Remember you should start the Zookeeper daemon before HBase.
I solved this by adding this in hbase-site.xml:
<property>
<name>hbase.zookeeper.property.dataDir</name>
<value>/home/hadoop/zookeeper</value>
</property>
in my case I was working on hbase 2.2.0 in standalone

ERROR in datanode execution while running Hadoop first time in Windows 10

I am trying to run Hadoop 3.1.1 in my Windows 10 machine. I modified all the files:
hdfs-site.xml
mapred-site.xml
core-site.xml
yarn-site.xml
Then, I executed the following command:
C:\hadoop-3.1.1\bin> hdfs namenode -format
The format ran correctly so I directed to C:\hadoop-3.1.1\sbin to execute the following command:
C:\hadoop-3.1.1\sbin> start-dfs.cmd
The command prompt opens 2 new windows: one for datanode and another for namenode.
The namenode window keeps running:
2018-09-02 21:37:06,232 INFO ipc.Server: IPC Server Responder: starting
2018-09-02 21:37:06,232 INFO ipc.Server: IPC Server listener on 9000: starting
2018-09-02 21:37:06,247 INFO namenode.NameNode: NameNode RPC up at: localhost/127.0.0.1:9000
2018-09-02 21:37:06,247 INFO namenode.FSNamesystem: Starting services required for active state
2018-09-02 21:37:06,247 INFO namenode.FSDirectory: Initializing quota with 4 thread(s)
2018-09-02 21:37:06,247 INFO namenode.FSDirectory: Quota initialization completed in 3 milliseconds
name space=1
storage space=0
storage types=RAM_DISK=0, SSD=0, DISK=0, ARCHIVE=0, PROVIDED=0
2018-09-02 21:37:06,279 INFO blockmanagement.CacheReplicationMonitor: Starting CacheReplicationMonitor with interval 30000 milliseconds
While the datanode gives following error:
ERROR: datanode.DataNode: Exception in secureMain
org.apache.hadoop.util.DiskChecker$DiskErrorException: Too many failed volumes - current valid volumes: 0, volumes configured: 1, volumes failed: 1, volume failures tolerated: 0
at org.apache.hadoop.hdfs.server.datanode.checker.StorageLocationChecker.check(StorageLocationChecker.java:220)
at org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:2762)
at org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:2677)
at org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:2719)
at org.apache.hadoop.hdfs.server.datanode.DataNode.secureMain(DataNode.java:2863)
at org.apache.hadoop.hdfs.server.datanode.DataNode.main(DataNode.java:2887)
2018-09-02 21:37:04,250 INFO util.ExitUtil: Exiting with status 1: org.apache.hadoop.util.DiskChecker$DiskErrorException: Too many failed volumes - current valid volumes: 0, volumes configured: 1, volumes failed: 1, volume failures tolerated: 0
2018-09-02 21:37:04,250 INFO datanode.DataNode: SHUTDOWN_MSG:
And then, the datanode shuts down! I tried several ways to overcome this error, but this is first time I am installing Hadoop on windows and can't understand what to do next!
I got things working, after I removed the file system reference for the datanode in hdfs-site.xml. I found that enabled the software to create and initialise its own datanode, which then popped up in sbin. After that I could use hdfs without a hitch. Here is what worked for me for Hadoop 3.1.3 on windows:
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:///C:/Users/myusername/hadoop/hadoop-3.1.3/data/namenode</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>datanode</value>
</property>
</configuration>
Cheers,
MV
I had the same problem and what worked for me was editing hdfs-site.xml as follows:
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:///C:/Hadoop/hadoop-3.1.2/data/namenode</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>/C:/Hadoop/hadoop-3.1.2/data/datanode</value>
</property>

Hadoop Basic - error while creating directory

I have started learning hadoop recently and I am getting the below error while creating new folder -
vm4learning#vm4learning:~/Installations/hadoop-1.2.1/bin$ ./hadoop fs
-mkdir helloworld Warning: $HADOOP_HOME is deprecated. 15/06/14 19:46:35 INFO ipc.Client: Retrying connect to server:
localhost/127.0.0.1:9000. Already tried 0 time(s); retry policy is
RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1
SECONDS)
Request you to help.below are the namdenode logs -
015-06-14 22:01:08,158 INFO org.apache.hadoop.hdfs.server.common.Storage: Storage directory /home/vm4learning/Installations/hadoop-1.2.1/data/dfs/name does not exist
2015-06-14 22:01:08,161 ERROR org.apache.hadoop.hdfs.server.namenode.FSNamesystem: FSNamesystem initialization failed.
org.apache.hadoop.hdfs.server.common.InconsistentFSStateException: Directory /home/vm4learning/Installations/hadoop-1.2.1/data/dfs/name is in an inconsistent state: storage directory does not exist or is not accessible.
at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:304)
at org.apache.hadoop.hdfs.server.namenode.FSDirectory.loadFSImage(FSDirectory.java:104)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(FSNamesystem.java:427)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.(FSNamesystem.java:395)
at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:299)
at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:569)
at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1479)
at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1488)
2015-06-14 22:01:08,182 ERROR org.apache.hadoop.hdfs.server.namenode.NameNode: org.apache.hadoop.hdfs.server.common.InconsistentFSStateException: Directory /home/vm4learning/Installations/hadoop-1.2.1/data/dfs/name is in an inconsistent state: storage directory does not exist or is not accessible.
at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:304)
at org.apache.hadoop.hdfs.server.namenode.FSDirectory.loadFSImage(FSDirectory.java:104)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(FSNamesystem.java:427)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.(FSNamesystem.java:395)
at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:299)
at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:569)
at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1479)
at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1488)
2015-06-14 22:01:08,185 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at vm4learning/192.168.1.102
************************************************************/
Before starting to create a directory, you should be sure that your hadoop installation is correct, through jps command, and looking for any process missing.
In your case, the namenode isn't up.
If you see in the logs, it appears to be that some folders aren't created. Do this:
mkdir -p $HADOOP_HOME/dfs/name
mkdir -p $HADOOP_HOME/dfs/name/data
And specify in hdfs-site.xml the following.
<property>
<name>dfs.data.dir</name>
<value>/usr/local/hadoop/dfs/name/data</value>
<final>true</final>
</property>
<property>
<name>dfs.name.dir</name>
<value>/usr/local/hadoop/dfs/name</value>
<final>true</final>
</property>
Reinitialize hadoop, and remember to format previous to do anything.
Another reason why the same error may appear is that you still not executed the formatting of the namenode like that:
hdfs namenode -format
I have missed the step 7 from this tutorial (https://kontext.tech/column/hadoop/377/latest-hadoop-321-installation-on-windows-10-step-by-step-guide ) and faced the same error
hdfs is the utility from HADOOP_HOME/bin

Getting Time out Error while executing hadoop jar ./hadoop-examples-1.0.3.jar pi 2 5

When I am trying hadoop jar ./hadoop-examples-1.0.3.jar pi 2 5.It shows following error.
hduser#ubuntu:/usr/local/hadoop-1.0.3$ hadoop jar ./hadoop-examples-1.0.3.jar pi 2 5
Warning: $HADOOP_HOME is deprecated.
Number of Maps = 2
Samples per Map = 5
Wrote input for Map #0
Wrote input for Map #1
Starting Job
14/12/07 09:47:33 INFO ipc.Client: Retrying connect to server: 172.16.76.1/172.16.76.1:9002. Already tried 0 time(s).
14/12/07 09:47:54 INFO ipc.Client: Retrying connect to server: 172.16.76.1/172.16.76.1:9002. Already tried 1 time(s).
^C^Chduser#ubuntu:/usr/local/hadoop-1.0.3$
hduser#ubuntu:~/data/dfs$ jps
7176 DataNode
7484 JobTracker
6960 NameNode
7704 TaskTracker
7400 SecondaryNameNode
7766 Jps
hduser#ubuntu:~/data/dfs$ cd /usr/local/hadoop-1.0.3/
As you can see that all the processes are running.But when I go to log I found the below error;
ERROR org.apache.hadoop.hdfs.server.namenode.FSNamesystem: FSNamesystem initialization failed.
org.apache.hadoop.hdfs.server.common.InconsistentFSStateException: Directory /home/hduser/data/dfs/namenode is in an inconsistent state: storage directory does not exist or is not accessible.
I checked my .xml files configuration.Which is as follows;
mapre-site.xml
<property>
<name>mapred.job.tracker</name>
<value>172.16.76.1:9002</value>
</property>
core-site.xml
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9001</value>
</property>
hdfs-site.xml
<property>
<name>dfs.name.dir</name>
<value>/home/hduser/data/dfs/namenode</value>
</property>
<property>
<name>dfs.data.dir</name>
<value>/home/hduser/data/dfs/datanode</value>
</property>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
I tried formatting the namenode and starting again.But nothing happened.
I changed my directory owner permission to 755 and tried again but still the issue did not
resolve the issue.
I exhausted all the The suggestions given on this blog bur it did not help.
I though all the possible solution and tried so may times to solve it but still am not able to do
that.The same issue is coming again and again.
Here is the detailed error msg:
2014-12-07 06:25:05,376 INFO org.apache.hadoop.hdfs.server.common.Storage: Storage directory
/home/hduser/data/dfs/namenode does not exist.
2014-12-07 06:25:05,377 ERROR org.apache.hadoop.hdfs.server.namenode.FSNamesystem: FSNamesystem initialization failed.
org.apache.hadoop.hdfs.server.common.InconsistentFSStateException: Directory /home/hduser/data/dfs/namenode is in an inconsistent state: storage directory does not exist or is not accessible.
at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:303)
at org.apache.hadoop.hdfs.server.namenode.FSDirectory.loadFSImage(FSDirectory.java:100)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(FSNamesystem.java:388)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.<init>(FSNamesystem.java:362)
at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:276)
at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:496)
at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1279)
at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1288)
2014-12-07 06:25:05,399 ERROR org.apache.hadoop.hdfs.server.namenode.NameNode: org.apache.hadoop.hdfs.server.common.InconsistentFSStateException: Directory /home/hduser/data/dfs/namenode is in an inconsistent state: storage directory does not exist or is not accessible.
at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:303)
at org.apache.hadoop.hdfs.server.namenode.FSDirectory.loadFSImage(FSDirectory.java:100)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(FSNamesystem.java:388)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.<init>(FSNamesystem.java:362)
at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:276)
at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:496)
at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1279)
at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1288)
2014-12-07 06:25:05,407 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: SHUTDOWN_MSG:
Can you try delete the namenode storage directory /home/hduser/data/dfs/namenode and then format the namenode ?

Hadoop:datanode not connecting to namenode on localhost:50070 cluster summary shows 0

logs
2014-05-12 16:41:26,773 INFO org.apache.hadoop.ipc.RPC: Server at namenode/192.168.12.196:10001 not available yet, Zzzzz...
2014-05-12 16:41:28,777 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: namenode/192.168.12.196:10001. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
core site xml....
<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>/usr/local/hadoop/tmp</value>
<description>A base for other temporary directories.</description>
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://user#namenode:10001</value>
</property>
</configuration>
i put in etc/hosts
192.168.12.196 namenode
in masters
user#namenode
in slaves
localhost
and my namenode is on user#192.168.12.196
if i do jps on all node it shows datanode namenode job/tasktracker working fine
You need to change to localhost into namenode in slaves and masters file and restart hadoop once it will work fine.
for better view
Thanks for your Comment
if i put hostname in slaves of namenode it runs datanode and namenode on same node
configuration of my masters and slave are hereafter,
on namenode's masters
'user#namenode'
on namenode'master
hdname1#data1 (data1 belong to ip of node and hdname1 is user)
hdname2#data2
on datanode's masters
user#namenode
on datanode's slaves
hdname1#data1

Resources