I installed hadoop 2.6.0 in my laptop running Ubuntu 14.04LTS. I successfully started the hadoop daemons by running start-all.sh and I run a WourdCount example successfully, then I tried to run a jar example that didn't work with me so I decide to format using hadoop namenode -format and start all over again but when I start all daemons using start-dfs.sh && start-yarn.sh then jps all daemons runs but not the datanode as shown bellow:
hdferas#feras-Latitude-E4310:/usr/local/hadoop$ jps
12628 NodeManager
12110 NameNode
12533 ResourceManager
13335 Jps
12376 SecondaryNameNode
How to solve that?
I have faced this issue and it is very easy to solve. Your datanode is not starting because after your namenode and datanode started running you formatted the namenode again. That means you have cleared the metadata from namenode. Now the files which you have stored for running the word count are still in the datanode and datanode has no idea where to send the block reports since you formatted the namenode so it will not start.
Here are the things you need to do to fix it.
Stop all the Hadoop services (stop-all.sh) and close any active ssh connections.
cat /usr/local/hadoop/etc/hadoop/hdfs-site.xml
This step is important, see where datanode's data is gettting stored. It is the value associated for datanode.data.dir. For me it is /usr/local/hadoop/hadoop_data/hdfs/datanode. Open your terminal and navigate to above directory and delete the directory named current which will be there under that directory. Make sure you are only deleting the "current" directory.
sudo rm -r /usr/local/hadoop/hadoop_data/hdfs/datanode/current
Now format the namenode and check whether everything is fine.
hadoop namenode -format
say yes if it asks you for anything.
jps
Hope my answer solves the issue. If it doesn't let me know.
Little advice: Don't format your namenode. Without namenode there is no way to reconstruct the data. If your wordcount is not running that is some other problem.
I had this issue when formatting namenode too. What i did to solve the issue was:
Find your dfs.name.dir location. Consider for example, your dfs.name.dir is /home/hadoop/hdfs.
(a) Now go to, /home/hadoop/hdfs/current.
(b) Search for the file VERSION. Open it using a text editor.
(c) There will be a line namespaceID=122684525 (122684525 is my ID, yours will be different). Note the ID down.
Now find your hadoop.tmp.dir location. Mine is /home/hadoop/temp.
(a) Go to /home/hadoop/temp/dfs/data/current.
(b) Search for the file VERSION and open it using a text editor.
(c) There will be a line namespaceID=. The namespaceID in this file and previous one must be same.
(d) This is the main reason why my datanode was not started. I made them both same and now datanode starts fine.
Note: copy the namespaceID from /home/hadoop/hdfs/current/VERSION to
/home/hadoop/temp/dfs/data/current/VERSION. Dont do it in reverse.
Now do start-dfs.sh && start-yarn.sh. Datanode will be started.
You Just need To Remove All The Contents Of DataNode Folder And Format The Datanode By Using The Following Command
hadoop namenode -format
Even I had same issue and checked the log and found below error
Exception - Datanode log
FATAL org.apache.hadoop.hdfs.server.datanode.DataNode: Exception in secureMain
java.io.IOException: All directories in dfs.datanode.data.dir are invalid: "/usr/local/hadoop_store/hdfs/datanode/
Ran the below command to resolve the issue
sudo chown -R hduser:hadoop /usr/local/hadoop_store
Note - I have create the namenode and datanode under the path /usr/local/hadoop_store
The above problem is occurred due to format the namenode (hadoop namenode -format) without stopping the dfs and yarn daemons. While formating namenode, the question given below is appeared and you press Y key for this.
Re-format filesystem in Storage Directory /tmp/hadoop-root/dfs/name ? (Y or N)
Solution,
You need to delete the files within the current(directory name) directory of dfs.name.dir, you mention in hdfs.site.xml. In my system dfs.name.dir is available in /tmp/hadoop-root/dfs/name/current.
rm -r /tmp/hadoop-root/dfs/name/current
By using the above comment I removed files inside in the current directory. Make sure you are only deleting the "current" directory.Again format the namenode after stopped the dfs and yarn daemons (stop-dfs.sh & stop-yarn.sh). Now datanode will start normally!!
at core-site.xml check for absolute path of temp directory, If this is not pointed correctly or not created (mkdir). The data node cant be started.
add below property in yarn-site.xml
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
not the right way to do it. but surely works~
remove files from your datanode ,namenode and tmp folder. any files/folders created inside these are owned by hadoop and may have some reference to the last run datanode details which may have failed or locked due to which the datanode does not star at the next attempt
I got the same issue (DataNode & TaskTracker would not come up).
RESOLUTION:
DELETE EVERY "CURRENT" SUB-DIRECTORY UNDER: data, name, and namesecondary to resolve DataNode/taskTracker not showing when you start-all.sh, then jps
(My dfs.name.dir location is: /home/training/hadoop-temp/dfs/data/current; /home/training/hadoop-temp/dfs/name/current; /home/training/hadoop-temp/dfs/namesecondary/current
Make sure you stop services: stop-all.sh
1. Go to each "current" sub-directory under data, name, namesecondary and remove/delete (example: rm -r name/current)
2. Then format: hadoop namenode -format
3. mkdir current under /home/training/hadoop-temp/dfs/data/current
4. Take the directory and contents from /home/training/hadoop-temp/dfs/name/current and copy into the /data/current directory
EXAMPLE: files under:
/home/training/hadoop-temp/dfs/name/current
[training#CentOS current]$ ls -l
-rw-rw-r--. 1 training training 9901 Sep 25 01:50 edits
-rw-rw-r--. 1 training training 582 Sep 25 01:50 fsimage
-rw-rw-r--. 1 training training 8 Sep 25 01:50 fstime
-rw-rw-r--. 1 training training 101 Sep 25 01:50 VERSION
5. Change the storageType=NAME_NODE in VERSION to storageType=DATA_NODE in the data/current/VERSION that you just copied over.
BEFORE:
[training#CentOS dfs]$ cat data/current/VERSION
namespaceID=1018374124
cTime=0
storageType=NAME_NODE
layoutVersion=-32
AFTER:
[training#CentOS dfs]$ cat data/current/VERSION
namespaceID=1018374124
cTime=0
storageType=DATA_NODE
layoutVersion=-32
6. Make sure each subdirectory below has the same files that name/current has for data, name, namesecondary
[training#CentOS dfs]$ pwd
/home/training/hadoop-temp/dfs/
[training#CentOS dfs]$ ls -l
total 12
drwxr-xr-x. 5 training training 4096 Sep 25 01:29 data
drwxrwxr-x. 5 training training 4096 Sep 25 01:19 name
drwxrwxr-x. 5 training training 4096 Sep 25 01:29 namesecondary
7. Now start the services: start-all.sh
You should see all 5 services when you type: jps
I am using hadoop-2.6.0.I resolved using:
1.Deleting all files within
/usr/local/hadoop_store/hdfs
command : sudo rm -r /usr/local/hadoop_store/hdfs/*
2.Format hadoop namenode
command : hadoop namenode -format
3.Go to ..../sbin directory(cd /usr/local/hadoop/sbin)
start-all.sh
use command==> hduser#abc-3551:/$ jps
Following services would be started now :
19088 Jps
18707 ResourceManager
19043 NodeManager
18535 SecondaryNameNode
18329 DataNode
18159 NameNode
When I had this same issue, the 'Current' folder wasn't even being created in my hadoop/data/datanode folder. If this is the case for you too,
~copy the contents of 'Current' from namenode and paste it into datanode folder.
~Then, open VERSION for datanode and change the storageType=NAME_NODE to storageType=DATA_NODE
~run jps to see that the datanode continues to run
I am new student on Hadoop clusters, and I built a multi-node in the lab
But I cannot start NameNode or DataNode.
After I execute start-all.sh and jps: only shows jobtracker, tasktracker, secondenamenode, jps on Master. But slaves works good with datanode and tasktracker
And when I execute stop-all.sh:
it should shows: No tasttracker to stop, but it did show in jps
And this is the log file about NameNode:
1.Cannot access storage directory /app/hadoop/tmp/dfs/name
2.ERROR org.apache.hadoop.hdfs.server![enter image description here][2].namenode.FSNamesystem: FSNamesystem initialization failed.
3.org.apache.hadoop.hdfs.server.common.InconsistentFSStateException: Directory /app/hadoop/tmp/dfs/name is in an inconsistent state: storage directory does not exist or is not accessible.
4.org.apache.hadoop.hdfs.server.namenode.NameNode: org.apache.hadoop.hdfs.server.common.InconsistentFSStateException: Directory /app/hadoop/tmp/dfs/name is in an inconsistent state: storage directory does not exist or is not accessible.
And I did try the namenode -format, yet it doesn't work.
Could somebody show me the way, and tell me why this happens?
Lots of thanks ahead.
PS: I am using hadoop1.0.3 + java1.7.0_51
I think you did not give permissions to data dir of tmp.data.dir.
Try bellow command to give permissions and try your start-all.sh once.
sudo chown $USER /(DIR NAME).
And try this command:
hadoop namenode -format
I set up and configured a multi-node Hadoop cluster using this tutorial.
When I type in the start-all.sh command, it shows all the processes initializing properly as follows:
starting namenode, logging to /usr/local/hadoop/libexec/../logs/hadoop-root-namenode-jawwadtest1.out
jawwadtest1: starting datanode, logging to /usr/local/hadoop/libexec/../logs/hadoop-root-datanode-jawwadtest1.out
jawwadtest2: starting datanode, logging to /usr/local/hadoop/libexec/../logs/hadoop-root-datanode-jawwadtest2.out
jawwadtest1: starting secondarynamenode, logging to /usr/local/hadoop/libexec/../logs/hadoop-root-secondarynamenode-jawwadtest1.out
starting jobtracker, logging to /usr/local/hadoop/libexec/../logs/hadoop-root-jobtracker-jawwadtest1.out
jawwadtest1: starting tasktracker, logging to /usr/local/hadoop/libexec/../logs/hadoop-root-tasktracker-jawwadtest1.out
jawwadtest2: starting tasktracker, logging to /usr/local/hadoop/libexec/../logs/hadoop-root-tasktracker-jawwadtest2.out
However, when I type the jps command, I get the following output:
31057 NameNode
4001 RunJar
6182 RunJar
31328 SecondaryNameNode
31411 JobTracker
32119 Jps
31560 TaskTracker
As you can see, there's no datanode process running. I tried configuring a single-node cluster but got the same problem. Would anyone have any idea what could be going wrong here? Are there any configuration files that are not mentioned in the tutorial or I may have looked over? I am new to Hadoop and am kinda lost and any help would be greatly appreciated.
EDIT:
hadoop-root-datanode-jawwadtest1.log:
STARTUP_MSG: args = []
STARTUP_MSG: version = 1.0.3
STARTUP_MSG: build = https://svn.apache.org/repos/asf/hadoop/common/branches/$
************************************************************/
2012-08-09 23:07:30,717 INFO org.apache.hadoop.metrics2.impl.MetricsConfig: loa$
2012-08-09 23:07:30,734 INFO org.apache.hadoop.metrics2.impl.MetricsSourceAdapt$
2012-08-09 23:07:30,735 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl:$
2012-08-09 23:07:30,736 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl:$
2012-08-09 23:07:31,018 INFO org.apache.hadoop.metrics2.impl.MetricsSourceAdapt$
2012-08-09 23:07:31,024 WARN org.apache.hadoop.metrics2.impl.MetricsSystemImpl:$
2012-08-09 23:07:32,366 INFO org.apache.hadoop.ipc.Client: Retrying connect to $
2012-08-09 23:07:37,949 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: $
at org.apache.hadoop.hdfs.server.datanode.DataStorage.doTransition(Data$
at org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransition$
at org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNo$
at org.apache.hadoop.hdfs.server.datanode.DataNode.<init>(DataNode.java$
at org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNod$
at org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode($
at org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataN$
at org.apache.hadoop.hdfs.server.datanode.DataNode.secureMain(DataNode.$
at org.apache.hadoop.hdfs.server.datanode.DataNode.main(DataNode.java:1$
2012-08-09 23:07:37,951 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: S$
/************************************************************
SHUTDOWN_MSG: Shutting down DataNode at jawwadtest1/198.101.220.90
************************************************************/
You need to do something like this:
bin/stop-all.sh (or stop-dfs.sh and stop-yarn.sh in the 2.x serie)
rm -Rf /app/tmp/hadoop-your-username/*
bin/hadoop namenode -format (or hdfs in the 2.x serie)
the solution was taken from:
http://pages.cs.brandeis.edu/~cs147a/lab/hadoop-troubleshooting/. Basically it consists in restarting from scratch, so make sure you won't loose data by formating the hdfs.
I ran into the same issue. I have created a hdfs folder '/home/username/hdfs' with sub-directories name, data, and tmp which were referenced in config xml files of hadoop/conf.
When I started hadoop and did jps, I couldn't find datanode so I tried to manually start datanode using bin/hadoop datanode. Then I realized from error message that it has permissions issue accessing the dfs.data.dir=/home/username/hdfs/data/ which was referenced in one of the hadoop config files. All I had to do was stop hadoop, delete the contents of /home/username/hdfs/tmp/* directory and then try this command - chmod -R 755 /home/username/hdfs/ and then start hadoop. I could find the datanode!
I faced similar issue while running the datanode. The following steps were useful.
In [hadoop_directory]/sbin directory use ./stop-all.sh to stop all the running services.
Remove the tmp dir using rm -r [hadoop_directory]/tmp (The path configured in [hadoop_directory]/etc/hadoop/core-site.xml)
sudo mkdir [hadoop_directory]/tmp (Make a new tmp directory)
Go to */hadoop_store/hdfs directory where you have created namenode and datanode as sub-directories. (The paths configured in [hadoop_directory]/etc/hadoop/hdfs-site.xml). Use
rm -r namenode
rm -r datanode
In */hadoop_store/hdfs directory use
sudo mkdir namenode
sudo mkdir datanode
In case of permission issue, use
chmod -R 755 namenode
chmod -R 755 datanode
In [hadoop_directory]/bin use
hadoop namenode -format (To format your namenode)
In [hadoop_directory]/sbin directory use ./start-all.sh or ./start-dfs.sh to start the services.
Use jps to check the services running.
Delete the datanode under your hadoop folder then rerun start-all.sh
I was having the same problem running a single-node pseudo-distributed instance. Couldn't figure out how to solve it, but a quick workaround is to manually start a DataNode with
hadoop-x.x.x/bin/hadoop datanode
Follow these steps and your datanode will start again.
Stop dfs.
Open hdfs-site.xml
Remove the data.dir and name.dir properties from hdfs-site.xml and -format namenode again.
Then remove the hadoopdata directory and add the data.dir and name.dir in hdfs-site.xml and again format namenode.
Then start dfs again.
Need to follow 3 steps.
(1) Need to go to the logs and check the most recent log (In hadoop-
2.6.0/logs/hadoop-user-datanode-ubuntu.log)
If the error is as
java.io.IOException: Incompatible clusterIDs in /home/kutty/work/hadoop2data/dfs/data: namenode clusterID = CID-c41df580-e197-4db6-a02a-a62b71463089; datanode clusterID = CID-a5f4ba24-3a56-4125-9137-fa77c5bb07b1
i.e. namenode cluster id and datanode cluster id's are not identical.
(2) Now copy the namenode clusterID which is CID-c41df580-e197-4db6-a02a-a62b71463089 in above error
(3) Replace the Datanode cluster ID with Namenode cluster ID in hadoopdata/dfs/data/current/version
clusterID=CID-c41df580-e197-4db6-a02a-a62b71463089
Restart Hadoop. Will run DataNode
Stop all the services - ./stop-all.sh
Format all the hdfs tmp directory from all the master and slave. Don't forget to format from slave.
Format the namenode.(hadoop namenode -format)
Now start the services on namenode.
./bin/start-all.sh
This made a difference for me to start the datanode service.
Stop the dfs and yarn first.
Remove the datanode and namenode directories as specified in the core-site.xml file.
Re-create the directories.
Then re-start the dfs and the yarn as follows.
start-dfs.sh
start-yarn.sh
mr-jobhistory-daemon.sh start historyserver
Hope this works fine.
Delete the files under $hadoop_User/dfsdata and $hadoop_User/tmpdata
then run:
hdfs namenode -format
finally run:
start-all.sh
Then your problem gets solved.
Please control if the the tmp directory property is pointing to a valid directory in core-site.xml
<property>
<name>hadoop.tmp.dir</name>
<value>/home/hduser/data/tmp</value>
</property>
If the directory is misconfigured, the datanode process will not start properly.
Run Below Commands in Line:-
stop-all.sh (Run Stop All to Stop all the hadoop process)
rm -r /usr/local/hadoop/tmp/ (Your Hadoop tmp directory which you configured in hadoop/conf/core-site.xml)
sudo mkdir /usr/local/hadoop/tmp (Make the same directory again)
hadoop namenode -format (Format your namenode)
start-all.sh (Run Start All to start all the hadoop process)
JPS (It will show the running processes)
Step 1:- Stop-all.sh
Step 2:- got to this path
cd /usr/local/hadoop/bin
Step 3:- Run that command
hadoop datanode
Now DataNode work
Check whether the hadoop.tmp.dir property in the core-site.xml is correctly set.
If you set it, navigate to this directory, and remove or empty this directory.
If you didn't set it, you navigate to its default folder /tmp/hadoop-${user.name}, likewise remove or empty this directory.
In case of Mac os(Pseudo-distributed mode):
Open terminal
Stop dfs. 'sbin/stop-all.sh'.
cd /tmp
rm -rf hadoop*
Navigate to hadoop directory. Format the hdfs. bin/hdfs namenode -format
sbin/start-dfs.sh
Error in datanode.log file
$ more /usr/local/hadoop/logs/hadoop-hduser-datanode-ubuntu.log
Shows:
java.io.IOException: Incompatible clusterIDs in /usr/local/hadoop_tmp/hdfs/datanode: namenode clusterID = CID-e4c3fed0-c2ce-4d8b-8bf3-c6388689eb82; datanode clusterID = CID-2fcfefc7-c931-4cda-8f89-1a67346a9b7c
Solution: Stop your cluster and issue the below command & then start your cluster again.
sudo rm -rf /usr/local/hadoop_tmp/hdfs/datanode/*
I have got details of the issue in the log file like below :
"Invalid directory in dfs.data.dir: Incorrect permission for /home/hdfs/dnman1, expected: rwxr-xr-x, while actual: rwxrwxr-x"
and from there I identified that the datanote file permission was 777 for my folder. I corrected to 755 and it started working.
Instead of deleting everything under the "hadoop tmp dir", you can set another one. For example, if your core-site.xml has this property:
<property>
<name>hadoop.tmp.dir</name>
<value>/home/hduser/data/tmp</value>
</property>
You can change this to:
<property>
<name>hadoop.tmp.dir</name>
<value>/home/hduser/data/tmp2</value>
</property>
and then scp core-site.xml to each node, and then "hadoop namenode -format", and then restart hadoop.
This is for newer version of Hadoop (I am running 2.4.0)
In this case stop the cluster sbin/stop-all.sh
Then go to /etc/hadoop for config files.
In the file: hdfs-site.xml
Look out for directory paths corresponding to
dfs.namenode.name.dir
dfs.namenode.data.dir
Delete both the directories recursively (rm -r).
Now format the namenode via bin/hadoop namenode -format
And finally sbin/start-all.sh
Hope this helps.
You need to check :
/app/hadoop/tmp/dfs/data/current/VERSION and /app/hadoop/tmp/dfs/name/current/VERSION ---
in those two files and that to Namespace ID of name node and datanode.
If and only if data node's NamespaceID is same as name node's NamespaceID then your datanode will run.
If those are different copy the namenode NamespaceID to your Datanode's NamespaceID using vi editor or gedit and save and re run the deamons it will work perfectly.
if formatting the tmp directory is not working then try this:
first stop all the entities like namenode, datanode etc. (you will
be having some script or command to do that)
Format tmp directory
Go to /var/cache/hadoop-hdfs/hdfs/dfs/ and delete all the contents
in the directory manually
Now format your namenode again
start all the entities then use jps command to confirm that the
datanode has been started
Now run whichever application you have
Hope this helps.
I configured hadoop.tmp.dir in conf/core-site.xml
I configured dfs.data.dir in conf/hdfs-site.xml
I configured dfs.name.dir in conf/hdfs-site.xml
Deleted everything under "/tmp/hadoop-/" directory
Changed file permissions from 777 to 755 for directory listed under dfs.data.dir
And the data node started working.
Even after removing the remaking the directories, the datanode wasn't starting.
So, I started it manually using bin/hadoop datanode
It did not reach any conclusion. I opened another terminal from the same username and did jps and it showed me the running datanode process.
It's working, but I just have to keep the unfinished terminal open by the side.
Follow these steps and your datanode will start again.
1)Stop dfs.
2)Open hdfs-site.xml
3)Remove the data.dir and name.dir properties from hdfs-site.xml and -format namenode again.
4)Then start dfs again.
Got the same error. Tried to start and stop dfs several times, cleared all directories that are mentioned in previous answers, but nothing helped.
The issue was resolved only after rebooting OS and configuring Hadoop from the scratch. (configuring Hadoop from the scratch without rebooting didn't work)
Once I was not able to find data node using jps in hadoop, then I deleted the
current folder in the hadoop installed directory (/opt/hadoop-2.7.0/hadoop_data/dfs/data) and restarted hadoop using start-all.sh and jps.
This time I could find the data node and current folder was created again.
Try this
stop-all.sh
vi hdfs-site.xml
change the value given for property dfs.data.dir
format namenode
start-all.sh
I Have applied some mixed configuration, and its worked for me.
First >>
Stop Hadoop all Services using
${HADOOP_HOME}/sbin/stop-all.sh
Second >>
Check mapred-site.xml which is located at your ${HADOOP_HOME}/etc/hadoop/mapred-site.xml and change the localhost to master.
Third >>
Remove the temporary folder created by hadoop
rm -rf //path//to//your//hadoop//temp//folder
Fourth >>
Add the recursive permission on temp.
sudo chmod -R 777 //path//to//your//hadoop//temp//folder
Fifth >>
Now Start all the services again. And First check that all service including datanode is running.
enter image description here
mv /usr/local/hadoop_store/hdfs/datanode /usr/local/hadoop_store/hdfs/datanode.backup
mkdir /usr/local/hadoop_store/hdfs/datanode
hadoop datanode OR start-all.sh
jps