The NameNode does not start after stop-all.sh with start-all.sh. I try hadoop namenode -format and hadoop-daemon.sh start namenode then everything ok. However my data is lost in HDFS.
I do not want data loss. This result, hadoop namenode -format command is not want my path to a solution. How can I start the NameNode with start-all.sh ?
Thanks
First of all, stop-all.sh with start-all.sh are deprecated. Use start-dfs.sh and start-yarn.sh instead of start-all.sh. Same with stop-all.sh(it already says so)
secondly, hadoop namenode -format formats your HDFS and should therefore be used only once, at the time of installation.
Hadoop by default sets the property of hadoop.tmp.dir to a directory in /tmp, where the files are deleted after every restart. Set the hadoop.tmp.dir property in $HADOOP_HOME/conf/hadoop/core-site.xml, to some place where the files are not usually deleted. Run the hadoop namenode -format (actually it is hdfs namenode -format, this one is also deprecated.) one last time and start the daemons.
PS: If you can post the log file or the terminal screenshot of the error, it will be easier to help you.
hadoop.temp.dir
temp = should be "tmp" => hadoop.tmp.dir
I missed only "e".
I have an old Hadoop install that I'm looking to update to Hadoop 2. In the
old setup, I have a $HADOOP_HOME/conf/masters file that specifies the
secondary namenode.
Looking through the Hadoop 2 documentation I can't find any mention of a
"masters" file, or how to setup a secondary namenode.
Any help in the right direction would be appreciated.
The slaves and masters files in the conf folder are only used by some scripts in the bin folder like start-mapred.sh, start-dfs.sh and start-all.sh scripts.
These scripts are a mere convenience so that you can run them from a single node to ssh into each master / slave node and start the desired hadoop service daemons.
You only need these files on the name node machine if you intend to launch your cluster from this single node (using password-less ssh).
Alternatively, You can also start an Hadoop daemon manually on a machine via
bin/hadoop-daemon.sh start [namenode | secondarynamenode | datanode | jobtracker | tasktracker]
In order to run the secondary name node, use the above script on the designated machines providing the 'secondarynamenode' value to the script
See #pwnz0r 's 2nd comment on answer on How separate hadoop secondary namenode from primary namenode?
To reiterate here:
In hdfs-site.xml:
<property>
<name>dfs.secondary.http.address</name>
<value>$secondarynamenode.full.hostname:50090</value>
<description>SecondaryNameNodeHostname</description>
</property>
I am using Hadoop 2.6 and had to use
<property>
<name>dfs.secondary.http.address</name>
<value>secondarynamenode.hostname:50090</value>
</property>
for further details refer https://hadoop.apache.org/docs/r2.6.0/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml
Update hdfs-site.xml file by updating and adding following property
cd $HADOOP_HOME/etc/hadoop
sudo vi hdfs-site.xml
Then paste these lines into configuration tag
<property>
<name>dfs.secondary.http.address</name>
<value>hostname:50090</value>
</property>
I tried to start namenode using bin/start-all.sh. But, this command doesnt start namenode. I know if I do bin/hadoop namenode -format , namenode will start but in that case, I will lose all my data. Is there a way to start namenode without formatting it?
Your problem might be related to the following:
Hadoop writes its NameNode data in /tmp/hadoop- folder by default which is cleaned after every reboot.
Add following property to conf/hdfs-site.xml
<property>
<name>dfs.name.dir</name>
<value><path to your desired folder></value>
</property>
The "dfs.name.dir" property allows you to control where Hadoop writes NameNode metadata.
bin/start-all.sh should start the namenode, as well as the datanodes, the jobtracker and the tasktrackers. So, check the log of the namenode for possible errors.
An alternative way to skip starting the jobtracker and the tasktrackers and just start the namenode (and the datanodes) is by using the command:
bin/start-dfs.sh
Actually, bin/start-all.sh is equivalent to using the commands:
bin/start-dfs.sh, which starts the namenode and datanodes and
bin/start-mapred.sh, which starts the jobtracker and the tasktrackers.
For more details, visit this page.
I have installed Hadoop in pseudo distributed mode on my laptop, OS is Ubuntu.
I have changed paths where hadoop will store its data (by default hadoop stores data in /tmp folder)
hdfs-site.xml file looks as below :
<property>
<name>dfs.data.dir</name>
<value>/HADOOP_CLUSTER_DATA/data</value>
</property>
Now whenever I restart machine and try to start hadoop cluster using start-all.sh script, data node never starts. I confirmed that data node is not start by checking logs and by using jps command.
Then I
Stopped cluster using stop-all.sh script.
Formatted HDFS using hadoop namenode -format command.
Started cluster using start-all.sh script.
Now everything works fine even if I stop and start cluster again. Problem occurs only when I restart machine and try to start the cluster.
Has anyone encountered similar problem?
Why this is happening and
How can we solve this problem?
By changing dfs.datanode.data.dir away from /tmp you indeed made the data (the blocks) survive across a reboot. However there is more to HDFS than just blocks. You need to make sure all the relevant dirs point away from /tmp, most notably dfs.namenode.name.dir (I can't tell what other dirs you have to change, it depends on your config, but the namenode dir is mandatory, could be also sufficient).
I would also recommend using a more recent Hadoop distribution. BTW, the 1.1 namenode dir setting is dfs.name.dir.
For those who use hadoop 2.0 or above versions config file names may be different.
As this answer points out, go to the /etc/hadoop directory of your hadoop installation.
Open the file hdfs-site.xml. This user configuration will override the default hadoop configurations, that are loaded by the java classloader before.
Add dfs.namenode.name.dir property and set a new namenode dir (default is file://${hadoop.tmp.dir}/dfs/name).
Do the same for dfs.datanode.data.dir property (default is file://${hadoop.tmp.dir}/dfs/data).
For example:
<property>
<name>dfs.namenode.name.dir</name>
<value>/Users/samuel/Documents/hadoop_data/name</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>/Users/samuel/Documents/hadoop_data/data</value>
</property>
Other property where a tmp dir appears is dfs.namenode.checkpoint.dir. Its default value is: file://${hadoop.tmp.dir}/dfs/namesecondary.
If you want, you can easily also add this property:
<property>
<name>dfs.namenode.checkpoint.dir</name>
<value>/Users/samuel/Documents/hadoop_data/namesecondary</value>
</property>
I set up and configured a multi-node Hadoop cluster using this tutorial.
When I type in the start-all.sh command, it shows all the processes initializing properly as follows:
starting namenode, logging to /usr/local/hadoop/libexec/../logs/hadoop-root-namenode-jawwadtest1.out
jawwadtest1: starting datanode, logging to /usr/local/hadoop/libexec/../logs/hadoop-root-datanode-jawwadtest1.out
jawwadtest2: starting datanode, logging to /usr/local/hadoop/libexec/../logs/hadoop-root-datanode-jawwadtest2.out
jawwadtest1: starting secondarynamenode, logging to /usr/local/hadoop/libexec/../logs/hadoop-root-secondarynamenode-jawwadtest1.out
starting jobtracker, logging to /usr/local/hadoop/libexec/../logs/hadoop-root-jobtracker-jawwadtest1.out
jawwadtest1: starting tasktracker, logging to /usr/local/hadoop/libexec/../logs/hadoop-root-tasktracker-jawwadtest1.out
jawwadtest2: starting tasktracker, logging to /usr/local/hadoop/libexec/../logs/hadoop-root-tasktracker-jawwadtest2.out
However, when I type the jps command, I get the following output:
31057 NameNode
4001 RunJar
6182 RunJar
31328 SecondaryNameNode
31411 JobTracker
32119 Jps
31560 TaskTracker
As you can see, there's no datanode process running. I tried configuring a single-node cluster but got the same problem. Would anyone have any idea what could be going wrong here? Are there any configuration files that are not mentioned in the tutorial or I may have looked over? I am new to Hadoop and am kinda lost and any help would be greatly appreciated.
EDIT:
hadoop-root-datanode-jawwadtest1.log:
STARTUP_MSG: args = []
STARTUP_MSG: version = 1.0.3
STARTUP_MSG: build = https://svn.apache.org/repos/asf/hadoop/common/branches/$
************************************************************/
2012-08-09 23:07:30,717 INFO org.apache.hadoop.metrics2.impl.MetricsConfig: loa$
2012-08-09 23:07:30,734 INFO org.apache.hadoop.metrics2.impl.MetricsSourceAdapt$
2012-08-09 23:07:30,735 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl:$
2012-08-09 23:07:30,736 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl:$
2012-08-09 23:07:31,018 INFO org.apache.hadoop.metrics2.impl.MetricsSourceAdapt$
2012-08-09 23:07:31,024 WARN org.apache.hadoop.metrics2.impl.MetricsSystemImpl:$
2012-08-09 23:07:32,366 INFO org.apache.hadoop.ipc.Client: Retrying connect to $
2012-08-09 23:07:37,949 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: $
at org.apache.hadoop.hdfs.server.datanode.DataStorage.doTransition(Data$
at org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransition$
at org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNo$
at org.apache.hadoop.hdfs.server.datanode.DataNode.<init>(DataNode.java$
at org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNod$
at org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode($
at org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataN$
at org.apache.hadoop.hdfs.server.datanode.DataNode.secureMain(DataNode.$
at org.apache.hadoop.hdfs.server.datanode.DataNode.main(DataNode.java:1$
2012-08-09 23:07:37,951 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: S$
/************************************************************
SHUTDOWN_MSG: Shutting down DataNode at jawwadtest1/198.101.220.90
************************************************************/
You need to do something like this:
bin/stop-all.sh (or stop-dfs.sh and stop-yarn.sh in the 2.x serie)
rm -Rf /app/tmp/hadoop-your-username/*
bin/hadoop namenode -format (or hdfs in the 2.x serie)
the solution was taken from:
http://pages.cs.brandeis.edu/~cs147a/lab/hadoop-troubleshooting/. Basically it consists in restarting from scratch, so make sure you won't loose data by formating the hdfs.
I ran into the same issue. I have created a hdfs folder '/home/username/hdfs' with sub-directories name, data, and tmp which were referenced in config xml files of hadoop/conf.
When I started hadoop and did jps, I couldn't find datanode so I tried to manually start datanode using bin/hadoop datanode. Then I realized from error message that it has permissions issue accessing the dfs.data.dir=/home/username/hdfs/data/ which was referenced in one of the hadoop config files. All I had to do was stop hadoop, delete the contents of /home/username/hdfs/tmp/* directory and then try this command - chmod -R 755 /home/username/hdfs/ and then start hadoop. I could find the datanode!
I faced similar issue while running the datanode. The following steps were useful.
In [hadoop_directory]/sbin directory use ./stop-all.sh to stop all the running services.
Remove the tmp dir using rm -r [hadoop_directory]/tmp (The path configured in [hadoop_directory]/etc/hadoop/core-site.xml)
sudo mkdir [hadoop_directory]/tmp (Make a new tmp directory)
Go to */hadoop_store/hdfs directory where you have created namenode and datanode as sub-directories. (The paths configured in [hadoop_directory]/etc/hadoop/hdfs-site.xml). Use
rm -r namenode
rm -r datanode
In */hadoop_store/hdfs directory use
sudo mkdir namenode
sudo mkdir datanode
In case of permission issue, use
chmod -R 755 namenode
chmod -R 755 datanode
In [hadoop_directory]/bin use
hadoop namenode -format (To format your namenode)
In [hadoop_directory]/sbin directory use ./start-all.sh or ./start-dfs.sh to start the services.
Use jps to check the services running.
Delete the datanode under your hadoop folder then rerun start-all.sh
I was having the same problem running a single-node pseudo-distributed instance. Couldn't figure out how to solve it, but a quick workaround is to manually start a DataNode with
hadoop-x.x.x/bin/hadoop datanode
Follow these steps and your datanode will start again.
Stop dfs.
Open hdfs-site.xml
Remove the data.dir and name.dir properties from hdfs-site.xml and -format namenode again.
Then remove the hadoopdata directory and add the data.dir and name.dir in hdfs-site.xml and again format namenode.
Then start dfs again.
Need to follow 3 steps.
(1) Need to go to the logs and check the most recent log (In hadoop-
2.6.0/logs/hadoop-user-datanode-ubuntu.log)
If the error is as
java.io.IOException: Incompatible clusterIDs in /home/kutty/work/hadoop2data/dfs/data: namenode clusterID = CID-c41df580-e197-4db6-a02a-a62b71463089; datanode clusterID = CID-a5f4ba24-3a56-4125-9137-fa77c5bb07b1
i.e. namenode cluster id and datanode cluster id's are not identical.
(2) Now copy the namenode clusterID which is CID-c41df580-e197-4db6-a02a-a62b71463089 in above error
(3) Replace the Datanode cluster ID with Namenode cluster ID in hadoopdata/dfs/data/current/version
clusterID=CID-c41df580-e197-4db6-a02a-a62b71463089
Restart Hadoop. Will run DataNode
Stop all the services - ./stop-all.sh
Format all the hdfs tmp directory from all the master and slave. Don't forget to format from slave.
Format the namenode.(hadoop namenode -format)
Now start the services on namenode.
./bin/start-all.sh
This made a difference for me to start the datanode service.
Stop the dfs and yarn first.
Remove the datanode and namenode directories as specified in the core-site.xml file.
Re-create the directories.
Then re-start the dfs and the yarn as follows.
start-dfs.sh
start-yarn.sh
mr-jobhistory-daemon.sh start historyserver
Hope this works fine.
Delete the files under $hadoop_User/dfsdata and $hadoop_User/tmpdata
then run:
hdfs namenode -format
finally run:
start-all.sh
Then your problem gets solved.
Please control if the the tmp directory property is pointing to a valid directory in core-site.xml
<property>
<name>hadoop.tmp.dir</name>
<value>/home/hduser/data/tmp</value>
</property>
If the directory is misconfigured, the datanode process will not start properly.
Run Below Commands in Line:-
stop-all.sh (Run Stop All to Stop all the hadoop process)
rm -r /usr/local/hadoop/tmp/ (Your Hadoop tmp directory which you configured in hadoop/conf/core-site.xml)
sudo mkdir /usr/local/hadoop/tmp (Make the same directory again)
hadoop namenode -format (Format your namenode)
start-all.sh (Run Start All to start all the hadoop process)
JPS (It will show the running processes)
Step 1:- Stop-all.sh
Step 2:- got to this path
cd /usr/local/hadoop/bin
Step 3:- Run that command
hadoop datanode
Now DataNode work
Check whether the hadoop.tmp.dir property in the core-site.xml is correctly set.
If you set it, navigate to this directory, and remove or empty this directory.
If you didn't set it, you navigate to its default folder /tmp/hadoop-${user.name}, likewise remove or empty this directory.
In case of Mac os(Pseudo-distributed mode):
Open terminal
Stop dfs. 'sbin/stop-all.sh'.
cd /tmp
rm -rf hadoop*
Navigate to hadoop directory. Format the hdfs. bin/hdfs namenode -format
sbin/start-dfs.sh
Error in datanode.log file
$ more /usr/local/hadoop/logs/hadoop-hduser-datanode-ubuntu.log
Shows:
java.io.IOException: Incompatible clusterIDs in /usr/local/hadoop_tmp/hdfs/datanode: namenode clusterID = CID-e4c3fed0-c2ce-4d8b-8bf3-c6388689eb82; datanode clusterID = CID-2fcfefc7-c931-4cda-8f89-1a67346a9b7c
Solution: Stop your cluster and issue the below command & then start your cluster again.
sudo rm -rf /usr/local/hadoop_tmp/hdfs/datanode/*
I have got details of the issue in the log file like below :
"Invalid directory in dfs.data.dir: Incorrect permission for /home/hdfs/dnman1, expected: rwxr-xr-x, while actual: rwxrwxr-x"
and from there I identified that the datanote file permission was 777 for my folder. I corrected to 755 and it started working.
Instead of deleting everything under the "hadoop tmp dir", you can set another one. For example, if your core-site.xml has this property:
<property>
<name>hadoop.tmp.dir</name>
<value>/home/hduser/data/tmp</value>
</property>
You can change this to:
<property>
<name>hadoop.tmp.dir</name>
<value>/home/hduser/data/tmp2</value>
</property>
and then scp core-site.xml to each node, and then "hadoop namenode -format", and then restart hadoop.
This is for newer version of Hadoop (I am running 2.4.0)
In this case stop the cluster sbin/stop-all.sh
Then go to /etc/hadoop for config files.
In the file: hdfs-site.xml
Look out for directory paths corresponding to
dfs.namenode.name.dir
dfs.namenode.data.dir
Delete both the directories recursively (rm -r).
Now format the namenode via bin/hadoop namenode -format
And finally sbin/start-all.sh
Hope this helps.
You need to check :
/app/hadoop/tmp/dfs/data/current/VERSION and /app/hadoop/tmp/dfs/name/current/VERSION ---
in those two files and that to Namespace ID of name node and datanode.
If and only if data node's NamespaceID is same as name node's NamespaceID then your datanode will run.
If those are different copy the namenode NamespaceID to your Datanode's NamespaceID using vi editor or gedit and save and re run the deamons it will work perfectly.
if formatting the tmp directory is not working then try this:
first stop all the entities like namenode, datanode etc. (you will
be having some script or command to do that)
Format tmp directory
Go to /var/cache/hadoop-hdfs/hdfs/dfs/ and delete all the contents
in the directory manually
Now format your namenode again
start all the entities then use jps command to confirm that the
datanode has been started
Now run whichever application you have
Hope this helps.
I configured hadoop.tmp.dir in conf/core-site.xml
I configured dfs.data.dir in conf/hdfs-site.xml
I configured dfs.name.dir in conf/hdfs-site.xml
Deleted everything under "/tmp/hadoop-/" directory
Changed file permissions from 777 to 755 for directory listed under dfs.data.dir
And the data node started working.
Even after removing the remaking the directories, the datanode wasn't starting.
So, I started it manually using bin/hadoop datanode
It did not reach any conclusion. I opened another terminal from the same username and did jps and it showed me the running datanode process.
It's working, but I just have to keep the unfinished terminal open by the side.
Follow these steps and your datanode will start again.
1)Stop dfs.
2)Open hdfs-site.xml
3)Remove the data.dir and name.dir properties from hdfs-site.xml and -format namenode again.
4)Then start dfs again.
Got the same error. Tried to start and stop dfs several times, cleared all directories that are mentioned in previous answers, but nothing helped.
The issue was resolved only after rebooting OS and configuring Hadoop from the scratch. (configuring Hadoop from the scratch without rebooting didn't work)
Once I was not able to find data node using jps in hadoop, then I deleted the
current folder in the hadoop installed directory (/opt/hadoop-2.7.0/hadoop_data/dfs/data) and restarted hadoop using start-all.sh and jps.
This time I could find the data node and current folder was created again.
Try this
stop-all.sh
vi hdfs-site.xml
change the value given for property dfs.data.dir
format namenode
start-all.sh
I Have applied some mixed configuration, and its worked for me.
First >>
Stop Hadoop all Services using
${HADOOP_HOME}/sbin/stop-all.sh
Second >>
Check mapred-site.xml which is located at your ${HADOOP_HOME}/etc/hadoop/mapred-site.xml and change the localhost to master.
Third >>
Remove the temporary folder created by hadoop
rm -rf //path//to//your//hadoop//temp//folder
Fourth >>
Add the recursive permission on temp.
sudo chmod -R 777 //path//to//your//hadoop//temp//folder
Fifth >>
Now Start all the services again. And First check that all service including datanode is running.
enter image description here
mv /usr/local/hadoop_store/hdfs/datanode /usr/local/hadoop_store/hdfs/datanode.backup
mkdir /usr/local/hadoop_store/hdfs/datanode
hadoop datanode OR start-all.sh
jps