Data-Node Does Not Start - hadoop

I have trouble starting my Hadoop data-node. I did all the research that I could and none of the methods were helpful in solving my issue. Here's my terminal console output when I try to start it using
hadoop datanode -start
This is what happens:
root#Itanium:~/Desktop/hadoop# hadoop datanode -start
Warning: $HADOOP_HOME is deprecated.
13/09/29 22:11:42 INFO datanode.DataNode: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting DataNode
STARTUP_MSG: host = Itanium/127.0.1.1
STARTUP_MSG: args = [-start]
STARTUP_MSG: version = 1.2.1
STARTUP_MSG: build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.2 -r 1503152; compiled by 'mattf' on Mon Jul 22 15:23:09 PDT 2013
STARTUP_MSG: java = 1.7.0_25
************************************************************/
Usage: java DataNode
[-rollback]
13/09/29 22:11:42 INFO datanode.DataNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down DataNode at Itanium/127.0.1.1
************************************************************/
root#Itanium:~/Desktop/hadoop# jps
31438 SecondaryNameNode
32013 Jps
31818 TaskTracker
1146 Bootstrap
31565 JobTracker
30930 NameNode
root#Itanium:~/Desktop/hadoop#
As we can see the DataNode attempts to start but then shuts down. All the while I have been having trouble with NameNode starting up. I used to fix this by manually starting it using
start-dfs.sh
And now the problem is with DataNode. I really would appreciate all your help in resolving this issue.
And one more generic question. Why is Hadoop displaying such inconsistent behavior. I am sure I did not change any of the *-site.xml settings.

use this command hadoop datanode -rollback

I had a similar issue as well. Looking at the comment posted by Anup "seems to be an issue with namespaceIDs not matching" I was able to find a reference that showed me how to solve my issue.
http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-multi-node-cluster/#caveats
I took a look at the logfile on the slave nodes where the DataNodes did not start. They both had the following exception :
2014-11-05 10:26:14,289 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: java.io.IOException: Incompatible namespaceIDs in /scratch/hdfs/data/srinivasand: namenode namespaceID = 1296690356; datanode namespaceID = 1228298945
at org.apache.hadoop.hdfs.server.datanode.DataStorage.doTransition(DataStorage.java:232)
at org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:147)
at org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNode.java:385)
at org.apache.hadoop.hdfs.server.datanode.DataNode.<init>(DataNode.java:299)
at org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:1582)
at org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:1521)
at org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:1539)
at org.apache.hadoop.hdfs.server.datanode.DataNode.secureMain(DataNode.java:1665)
at org.apache.hadoop.hdfs.server.datanode.DataNode.main(DataNode.java:1682)
Fixing this exception solved the issue.
The fix is to either
a) delete the dfs data directory. reformat using namenode -format.
b) update the VERSION file so that the two namespace IDs match.
I was able to use option b) and the datanodes started successfully after that.
The bug report that leads to this issue is recorded at : https://issues.apache.org/jira/browse/HDFS-107

I ever got the same issue, it turns out that 50010 port is occupied by other application, stop the application, restart Hadoop

Related

Namenode keeps shutting down after start-dfs.sh

Namenode for Fully Distributed Hadoop in Ubuntu mode will not stay open/ It starts and shutsdown with the error below. I tried a few things but nothing works. The namenode log is below and it automatically shuts down. Any help is appreciated.
Directory /usr/hdfs/namenode is in an inconsistent state: storage directory does not exist or is not accessible.
2019-03-25 01:34:44,354 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at
I have already tried reformatting namenode
7952 SecondaryNameNode
7714 DataNode
23346 NodeManager
10555 Jps
23167 ResourceManager
kindly recheck files vim ~/.bashrc, core-site.xml, hdfs-site.xml

Why I can't start the NameNode in this Hadoop 1.2.1 installation?

I am new in Apache Hadoop and I am following a video course on Udemy.
The course is based on Hadoop 1.2.1, is it a too old version? Is better start my study with another course based on a more recent version or is it ok?
So I have installed Hadoop 1.2.1 on an Ubuntu 12.04 system and I have configured it in pseudo distribution mode.
According with the tutorial I have do it using the following settings in the following configuration files:
conf/core-site.xml:
fs.default.name
hdfs://localhost:9000
conf/hdfs-site.xml:
dfs.replication
1
conf/mapred-site.xml:
mapred.job.tracker
localhost:9001
Then in the Linux shell I do:
ssh localhost
So I am connected through SSH to my local system.
Then I go into the Hadoop bin directory, /home/andrea/hadoop/hadoop-1.2.1/bin/ and here I perform this command that have to perform the format of the name node (what exactly means?):
bin/hadoop namenode –format
And this I the obtained output:
andrea#andrea-virtual-machine:~/hadoop/hadoop-1.2.1/bin$ ./hadoop namenode –format
16/01/17 12:55:25 INFO namenode.NameNode: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG: host = andrea-virtual-machine/127.0.1.1
STARTUP_MSG: args = [–format]
STARTUP_MSG: version = 1.2.1
STARTUP_MSG: build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.2 -r 1503152; compiled by 'mattf' on Mon Jul 22 15:23:09 PDT 2013
STARTUP_MSG: java = 1.7.0_79
************************************************************/
Usage: java NameNode [-format [-force ] [-nonInteractive]] | [-upgrade] | [-rollback] | [-finalize] | [-importCheckpoint] | [-recover [ -force ] ]
16/01/17 12:55:25 INFO namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at andrea-virtual-machine/127.0.1.1
************************************************************/
andrea#andrea-virtual-machine:~/hadoop/hadoop-1.2.1/bin$
Then I try to start all the nodes performing this command:
./start–all.sh
and now I obtain:
andrea#andrea-virtual-machine:~/hadoop/hadoop-1.2.1/bin$ ./start-all.sh
starting namenode, logging to /home/andrea/hadoop/hadoop-1.2.1/libexec/../logs/hadoop-andrea-namenode-andrea-virtual-machine.out
localhost: starting datanode, logging to /home/andrea/hadoop/hadoop-1.2.1/libexec/../logs/hadoop-andrea-datanode-andrea-virtual-machine.out
localhost: starting secondarynamenode, logging to /home/andrea/hadoop/hadoop-1.2.1/libexec/../logs/hadoop-andrea-secondarynamenode-andrea-virtual-machine.out
starting jobtracker, logging to /home/andrea/hadoop/hadoop-1.2.1/libexec/../logs/hadoop-andrea-jobtracker-andrea-virtual-machine.out
localhost: starting tasktracker, logging to /home/andrea/hadoop/hadoop-1.2.1/libexec/../logs/hadoop-andrea-tasktracker-andrea-virtual-machine.out
andrea#andrea-virtual-machine:~/hadoop/hadoop-1.2.1/bin$
Now I try to open in the browser the following URLs:
http//localhost:50070/
and can't open it (page not found)
and:
http://localhost:50030/
this is correctly opened and redirect to this jsp page:
http://localhost:50030/jobtracker.jsp
So, in the shell I perform the jps command that lists all the running Java process for the user:
andrea#andrea-virtual-machine:~/hadoop/hadoop-1.2.1/bin$ jps
6247 Jps
5720 DataNode
5872 SecondaryNameNode
6116 TaskTracker
5965 JobTracker
andrea#andrea-virtual-machine:~/hadoop/hadoop-1.2.1/bin$
As you can see seems that the NameNode is not started.
On the tutorial that I am following say that:
If NameNode or DataNode is not listed than it might happen that the
namenode's or datanode's root directory which is set by the property
'dfs.name.dir' is getting messed up. It by default points to the /tmp
directory which operating system changes from time to time. Thus, HDFS
when comes up after some changes by OS, gets confused and namenode
doesn't start.
So to solve this problem provide this solution (that can't work for me).
First stop all nodes by the stop-all.sh script.
Then I have to explicitly set the 'dfs.name.dir' and 'dfs.data.dir'.
So I have created a dfs directory into the Hadoop path and into this directory I have created 2 directories (at the same level): data and name (the idea is to make two folders inside it which would be used for datanode demon and namenode demon).
So I have something like this:
andrea#andrea-virtual-machine:~/hadoop/hadoop-1.2.1/dfs$ tree
.
├── data
└── name
Then I use this configuration for the hdfs-site.xml where I explicitly set the previous 2 directories:
<configuration>
<property>
<name>dfs.data.dir</name>
<value>/home/andrea/hadoop/hadoop-1.2.1/dfs/data/</value>
</property>
<property>
<name>dfs.name.dir</name>
<value>/home/andrea/hadoop/hadoop-1.2.1/dfs/name/</value>
</property>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
So, after this changing, I run again the command to format the NameNode:
hadoop namenode –format
And I obtain this output:
andrea#andrea-virtual-machine:~/hadoop/hadoop-1.2.1/dfs$ hadoop namenode –format16/01/17 13:14:53 INFO namenode.NameNode: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG: host = andrea-virtual-machine/127.0.1.1
STARTUP_MSG: args = [–format]
STARTUP_MSG: version = 1.2.1
STARTUP_MSG: build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.2 -r 1503152; compiled by 'mattf' on Mon Jul 22 15:23:09 PDT 2013
STARTUP_MSG: java = 1.7.0_79
************************************************************/
Usage: java NameNode [-format [-force ] [-nonInteractive]] | [-upgrade] | [-rollback] | [-finalize] | [-importCheckpoint] | [-recover [ -force ] ]
16/01/17 13:14:53 INFO namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at andrea-virtual-machine/127.0.1.1
************************************************************/
andrea#andrea-virtual-machine:~/hadoop/hadoop-1.2.1/dfs$
So I start again all the nodes by: start-all.sh and this is the obtained output:
andrea#andrea-virtual-machine:~/hadoop/hadoop-1.2.1/bin$ start-all.sh
starting namenode, logging to /home/andrea/hadoop/hadoop-1.2.1/libexec/../logs/hadoop-andrea-namenode-andrea-virtual-machine.out
localhost: starting datanode, logging to /home/andrea/hadoop/hadoop-1.2.1/libexec/../logs/hadoop-andrea-datanode-andrea-virtual-machine.out
localhost: starting secondarynamenode, logging to /home/andrea/hadoop/hadoop-1.2.1/libexec/../logs/hadoop-andrea-secondarynamenode-andrea-virtual-machine.out
starting jobtracker, logging to /home/andrea/hadoop/hadoop-1.2.1/libexec/../logs/hadoop-andrea-jobtracker-andrea-virtual-machine.out
localhost: starting tasktracker, logging to /home/andrea/hadoop/hadoop-1.2.1/libexec/../logs/hadoop-andrea-tasktracker-andrea-virtual-machine.out
andrea#andrea-virtual-machine:~/hadoop/hadoop-1.2.1/bin$
Then I perform the jps command to see if all the nodes are correctly started but this is what I obtain:
andrea#andrea-virtual-machine:~/hadoop/hadoop-1.2.1/bin$ jps
8041 SecondaryNameNode
8310 TaskTracker
8406 Jps
8139 JobTracker
andrea#andrea-virtual-machine:~/hadoop/hadoop-1.2.1/bin$
The situation worsened because now I have 2 nodes that are not started: the NameNode and the DataNode.
What am I missing? How can I try to solve this issue and start all my nodes?
Would you try to turn.of the IPTABLES.once and reformat along with exporting the java path.
IF you have configure in hdfs-site.xml with ,
While you are formatting name node
<property>
<name>dfs.name.dir</name>
<value>/home/andrea/hadoop/hadoop-1.2.1/dfs/name/</value>
</property>
then while formatting name node you should see
> successfully formatted /home/andrea/hadoop/hadoop-1.2.1/dfs/name/
message if name node format is successful. As per your logs I am not able to see those successful logs. Check permission issues may be there.
If it didn't start try using another command:
hadoop-daemon.sh start namenode
Hope it works...

Hadoop name node not starting

I am trying to run hadoop as a root user, i executed namenode format command hadoop namenode -format. after that I tried to open hadoop daemons , but namenode is not starting. I run the command hadoop namenode -importCheckpoint and it gives foll. error:
14/09/15 01:25:55 INFO common.Storage: Storage directory /home/umaima/cloudera_namedir is not formatted.
14/09/15 01:25:55 INFO common.Storage: Formatting ...
14/09/15 01:25:55 ERROR namenode.FSNamesystem: FSNamesystem initialization failed.
java.io.IOException: NameNode is not formatted.
at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:336)
at org.apache.hadoop.hdfs.server.namenode.FSImage.doImportCheckpoint(FSImage.java:531)
at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:375)
at org.apache.hadoop.hdfs.server.namenode.FSDirectory.loadFSImage(FSDirectory.java:110)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(FSNamesystem.java:372)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.<init>(FSNamesystem.java:335)
at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:271)
at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:467)
at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1330)
at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1339)
14/09/15 01:25:55 INFO namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at umaima-Lenovo-IdeaPad-S510p/127.0.1.1
************************************************************/
I am stuck in this. Any help is highly appreciated. Thanks in advance
Before formatting the namenode, delete the tmp folder(contains datanode and namenode) and then format the namenode.
And then start the hadoop services.

Incompatible build versions error in Hadoop slave node

My Hadoop cluster was running without any mistake. I don't know what has changed but when I try to start Hadoop components with start-all.sh command from master, I check running processes with jps command and see that DataNode does not work in slave node.
The datanode log is below. Versions of Hadoop installation (1.0.4) are the same on the machines in the cluster. I could not find how to solve the problem.
2013-09-18 09:35:21,638 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting DataNode
STARTUP_MSG: host = noon101/10.240.20.30
STARTUP_MSG: args = []
STARTUP_MSG: version = 1.0.4-SNAPSHOT
STARTUP_MSG: build = -r ; compiled by 'hduser' on Wed May 29 10:55:16 EEST 2013
************************************************************/
2013-09-18 09:35:21,752 INFO org.apache.hadoop.metrics2.impl.MetricsConfig: loaded properties from hadoop-metrics2.properties
2013-09-18 09:35:21,761 INFO org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source MetricsSystem,sub=Stats registered.
2013-09-18 09:35:21,762 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled snapshot period at 10 second(s).
2013-09-18 09:35:21,762 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: DataNode metrics system started
2013-09-18 09:35:21,867 INFO org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source ugi registered.
2013-09-18 09:35:21,869 WARN org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Source name ugi already exists!
2013-09-18 09:35:26,964 FATAL org.apache.hadoop.hdfs.server.datanode.DataNode: Incompatible build versions: namenode BV = 1393290; datanode BV =
2013-09-18 09:35:27,070 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: java.io.IOException: Incompatible build versions: namenode BV = 1393290; datanode BV =
at org.apache.hadoop.hdfs.server.datanode.DataNode.handshake(DataNode.java:566)
at org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNode.java:362)
at org.apache.hadoop.hdfs.server.datanode.DataNode.<init>(DataNode.java:299)
at org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:1582)
at org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:1521)
at org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:1539)
at org.apache.hadoop.hdfs.server.datanode.DataNode.secureMain(DataNode.java:1665)
at org.apache.hadoop.hdfs.server.datanode.DataNode.main(DataNode.java:1682)
2013-09-18 09:35:27,071 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down DataNode at noon101/10.240.20.30
************************************************************/
Below are part of datanode logs.
Slave node:
STARTUP_MSG: Starting DataNode
STARTUP_MSG: host = noon101/10.240.20.30
STARTUP_MSG: args = []
STARTUP_MSG: version = 1.0.4-SNAPSHOT
STARTUP_MSG: build = -r ; compiled by 'hduser' on Wed May 29 10:55:16 EEST 2013
Master node:
STARTUP_MSG: Starting DataNode
STARTUP_MSG: host = noon102/10.240.20.32
STARTUP_MSG: args = []
STARTUP_MSG: version = 1.0.4
STARTUP_MSG: build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.0 -r 1393290; compiled by 'hortonfo' on Wed Oct 3 05:13:58 UTC 2012
When looking at the nodes, I see version and build values are different. Even if this is the problem, I still do not know how to solve it.
This generally would happen if there was any kind of change to the conf directory on the Master Name node. Are you sure there wasn't some kind of an ant script which ran or something which messed with the jar files in the 'lib' directory of Hadoop?
This issue seems to be fairly prevalent. The following link has a clear step-by-step instruction of how you can salvage the situation. The gist of it being, you try to copy the conf directory of the Datanode to Master node after backing it up.
http://permalink.gmane.org/gmane.comp.jakarta.lucene.hadoop.user/22499
See if this helps.
I had the same problem and it's solved! I received the error "Incompatible build versions: namenode BV = ; datanode BV = 985326" many times as I had many < property > ... < /property > tabs under ... < configuration > tab in mapred-site.xml. I had to make sure all configurations are same in all nodes (master+slaves). If a single value in < property > ... < /property > tab differs from one node to another, you are likely to see the Incompatible Build Version error.
So here is the summary:
1) Make sure you have same hadoop version in all nodes.
2) Make sure you have same all *-site.xml are same in all nodes that is you use same configuration in core-site.xml, mapred-site.xml, and hdfs-site.xml.
Good Luck!

Unable to identify the .xml/file to update the host after installing Hadoop

I encountered below mentioned error after installing Hadoop and executing hadoop namenode -format command.
Based on the displayed logs, I figured out that I need to update the "host" in the configuration. But, I am unable to find the exact location of the configuration file (.xml), which needs to be updated.
I am installing on Fedora on a single node. I am looking for your help in addressing this issue. Please point me to any specific link or documentation that could be helpful while debugging.
[hadoop#hadoop ~]$ hadoop namenode -format
Warning: $HADOOP_HOME is deprecated.
13/02/03 11:33:09 INFO namenode.NameNode: STARTUP_MSG:
STARTUP_MSG: Starting NameNode
STARTUP_MSG: host = java.net.UnknownHostException: hadoop: hadoop
13/02/03 11:34:27 INFO namenode.NameNode: SHUTDOWN_MSG: :
SHUTDOWN_MSG: Shutting down NameNode at java.net.UnknownHostException: hadoop: hadoop
Configuration files are in $HADOOP_HOME/etc/hadoop, where $HADOOP_HOME probably is /usr/local/hadoop. There can be several files with your hostname, you should check there.
Also, you should check /etc/hosts.

Resources