I'm new to hadoop and is in learning phase.
As per Hadoop Definitve guide, i have set up my hadoop in pseudo distributed mode and everything was working fine. I was even able to execute all the examples from chapter 3 yesterday. Today, when i rebooted my unix and tried to run start-dfs.sh and then tried localhost:50070... it is showing error and when i try to stop dfs (stop-dfs.sh) it says no namenode to stop. I have been googling the issue but no result. Also, when i format my namenode again...everything starts working fine and i'm able to connect to the localhost:50070 and even replicate files and directories in hdfs but as soon as i restart my linux and try to connect to hdfs the same problem comes up.
Below is the error log:
************************************************************/
2011-06-22 15:45:55,249 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG: host = ubuntu/127.0.1.1
STARTUP_MSG: args = []
STARTUP_MSG: version = 0.20.203.0
STARTUP_MSG: build = http://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20-security-203 -r 1099333; compiled by 'oom' on Wed May 4 07:57:50 PDT 2011
************************************************************/
2011-06-22 15:45:56,383 INFO org.apache.hadoop.metrics2.impl.MetricsConfig: loaded properties from hadoop-metrics2.properties
2011-06-22 15:45:56,455 INFO org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source MetricsSystem,sub=Stats registered.
2011-06-22 15:45:56,494 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled snapshot period at 10 second(s).
2011-06-22 15:45:56,494 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: NameNode metrics system started
2011-06-22 15:45:57,007 INFO org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source ugi registered.
2011-06-22 15:45:57,031 WARN org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Source name ugi already exists!
2011-06-22 15:45:57,059 INFO org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source jvm registered.
2011-06-22 15:45:57,070 INFO org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source NameNode registered.
2011-06-22 15:45:57,374 INFO org.apache.hadoop.hdfs.util.GSet: VM type = 32-bit
2011-06-22 15:45:57,374 INFO org.apache.hadoop.hdfs.util.GSet: 2% max memory = 19.33375 MB
2011-06-22 15:45:57,374 INFO org.apache.hadoop.hdfs.util.GSet: capacity = 2^22 = 4194304 entries
2011-06-22 15:45:57,374 INFO org.apache.hadoop.hdfs.util.GSet: recommended=4194304, actual=4194304
2011-06-22 15:45:57,854 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: fsOwner=anshu
2011-06-22 15:45:57,854 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: supergroup=supergroup
2011-06-22 15:45:57,854 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: isPermissionEnabled=true
2011-06-22 15:45:57,868 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: dfs.block.invalidate.limit=100
2011-06-22 15:45:57,869 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: isAccessTokenEnabled=false accessKeyUpdateInterval=0 min(s), accessTokenLifetime=0 min(s)
2011-06-22 15:45:58,769 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Registered FSNamesystemStateMBean and NameNodeMXBean
2011-06-22 15:45:58,809 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: Caching file names occuring more than 10 times
**2011-06-22 15:45:58,825 INFO org.apache.hadoop.hdfs.server.common.Storage: Storage directory /tmp/hadoop-anshu/dfs/name does not exist.
2011-06-22 15:45:58,827 ERROR org.apache.hadoop.hdfs.server.namenode.FSNamesystem: FSNamesystem initialization failed.
org.apache.hadoop.hdfs.server.common.InconsistentFSStateException: Directory /tmp/hadoop-anshu/dfs/name is in an inconsistent state: storage directory does not exist or is not accessible.
at org.apache.h**adoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:291)
at org.apache.hadoop.hdfs.server.namenode.FSDirectory.loadFSImage(FSDirectory.java:97)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(FSNamesystem.java:379)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.<init>(FSNamesystem.java:353)
at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:254)
at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:434)
at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1153)
at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1162)
2011-06-22 15:45:58,828 ERROR org.apache.hadoop.hdfs.server.namenode.NameNode: org.apache.hadoop.hdfs.server.common.InconsistentFSStateException: Directory /tmp/hadoop-anshu/dfs/name is in an inconsistent state: storage directory does not exist or is not accessible.
at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:291)
at org.apache.hadoop.hdfs.server.namenode.FSDirectory.loadFSImage(FSDirectory.java:97)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(FSNamesystem.java:379)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.<init>(FSNamesystem.java:353)
at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:254)
at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:434)
at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1153)
at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1162)
2011-06-22 15:45:58,829 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at ubuntu/127.0.1.1
************************************************************/
Any help is appreciated
Thank-you
here is the kicker:
org.apache.hadoop.hdfs.server.common.InconsistentFSStateException:
Directory /tmp/hadoop-anshu/dfs/name
is in an inconsistent state: storage
directory does not exist or is not
accessible.
i'd been having similar issues. i used stop-all.sh to shut down hadoop. i guess it was foolish of me to think this would properly save the data in my HDFS.
but as far as i can tell from what appears to be the appropriate code chunk in the hadoop-daemon.sh script, this is not the case - it just kills the processes:
(stop)
if [ -f $pid ]; then
if kill -0 `cat $pid` > /dev/null 2>&1; then
echo stopping $command
kill `cat $pid`
else
echo no $command to stop
fi
else
echo no $command to stop
fi
did you look to see if the directory it's complaining about existed? i checked and mine did not, although there was an (empty!) data folder in there here I imagine data might have once lived.
so my guess was that what we need to do is configure Hadoop such that our namenode and datanode are NOT stored in a tmp directory. there is some possibility that the OS is doing maintenance and getting rid of these files. either that hadoop figures you don't care about them anymore because you wouldn't have left them in a tmp directory if you did, and you wouldn't be restarting your machine in the middle of a map-reduce job. I don't really think this should happen (i mean, that's not how i would design things) but it seemed like a good guess.
so, based on this site http://wiki.datameer.com/display/DAS11/Hadoop+configuration+file+templates
i edited my conf/hdfs-site.xml file to point to the following paths (obviously, make your own directories as you see fit):
<property>
<name>dfs.name.dir</name>
<value>/hadoopstorage/name/</value>
</property>
<property>
<name>dfs.data.dir</name>
<value>/hadoopstorage/data/</value>
</property>
Did this, formatted the new namenode (sadly, data loss seems inevitable in this situation), stopped and started hadoop with the shell scripts, restarted the machine, and my files were still there...
YMMV...hope this works for you! i'm on OS X but i don't think you should have dissimilar results.
J
If you dont care about losing data just execute the command:
./hadoop namenode -format
I had similar issue and this helped
chown -R hdfs:hadoop /path/to/namenode/date/dir
Setting this properties in conf/hdfs-site.xml file worked for me!!!
Thanks jsh
<property>
<name>dfs.name.dir</name>
<value>/hadoopstorage/name/</value>
</property>
<property>
<name>dfs.data.dir</name>
<value>/hadoopstorage/data/</value>`enter code here`
</property>
Dont forget to set proper permissions to those directories
JSH answer is correct.
Just a couple of changes for hadoop 2.6 i had to do:
<property>
<name>dfs.namenode.name.dir</name>
<value>/hadoopstorage/name/</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>/hadoopstorage/data/</value>
</property>
If you have not resolved the problem, try this:
give the dfs.name.dir directory to in the user group hadoop and give the group to write permission.
See your coresite.xml in hadoop config directory
Go to config directory
vi core-site.xml,hdf.site.xml
Make sure your port numbers and paths are correct
I have the similar problem, but slightly different.
Running start-all.sh seams quite well, but jps shows that there is no namenodes and I could not see the list when I run hdfs dfs -ls /.
My first attempt is to run hadoop namenode -format, then namenode appears but datanode disappears.
After googling the solution, I run rm -rf /usr/local/hadoop_store/hdfs/datanode/* and restart hadoop, jps shows:
12912 ResourceManager
13391 FsShell
13420 Jps
13038 NodeManager
12733 SecondaryNameNode
12432 NameNode
12556 DataNode
Now I can use hadoop commands as usual.
HTH!
Related
I have a cluster running Hadoop 1.2.1 with Giraph on top. The server runs ok, but when I stop it, I am unable to make it run again. In the datanode log I get the following error: ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: java.io.IOException: Cannot lock storage /pathToFolder/data/datanode. The directory is already locked.
I have tried many solutions that I found online:
Checking permissions of folders.
Checking equal versions of VERSION file for namenode and datanode.
Checking configuration files (core-site, hdfs-site, mapred-site, master, slaves, ...)
Deleting / Changing the namenode and datanode data folders
Removing hadoop temporary files
Bottomline is, everything seems fine, but it is still failing to start the datanode. A complete log file for the datanode is the following:
2020-06-24 11:23:46,624 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: STARTUP_MSG:
/********************
STARTUP_MSG: Starting DataNode
STARTUP_MSG: host = XXXX
STARTUP_MSG: args = []
STARTUP_MSG: version = 1.2.1
STARTUP_MSG: build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.2 -r 1503152; compiled by 'mattf' on Mon Jul 22 15:23:09 PDT 2013
STARTUP_MSG: java = 1.8.0_212
********************/
2020-06-24 11:23:46,719 INFO org.apache.hadoop.metrics2.impl.MetricsConfig: loaded properties from hadoop-metrics2.properties
2020-06-24 11:23:46,725 INFO org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source MetricsSystem,sub=Stats registered.
2020-06-24 11:23:46,726 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled snapshot period at 10 second(s).
2020-06-24 11:23:46,726 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: DataNode metrics system started
2020-06-24 11:23:46,791 INFO org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source ugi registered.
2020-06-24 11:23:46,794 WARN org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Source name ugi already exists!
2020-06-24 11:23:46,903 INFO org.apache.hadoop.hdfs.server.common.Storage: Cannot lock storage /users/lahdak/rojas/AppHadoop/data/datanode. The directory is already locked.
2020-06-24 11:23:47,004 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: java.io.IOException: Cannot lock storage /users/lahdak/rojas/AppHadoop/data/datanode. The directory is already locked.
at org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.lock(Storage.java:599)
at org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.analyzeStorage(Storage.java:452)
at org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:111)
at org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNode.java:414)
at org.apache.hadoop.hdfs.server.datanode.DataNode.<init>(DataNode.java:321)
at org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:1712)
at org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:1651)
at org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:1669)
at org.apache.hadoop.hdfs.server.datanode.DataNode.secureMain(DataNode.java:1795)
at org.apache.hadoop.hdfs.server.datanode.DataNode.main(DataNode.java:1812)
2020-06-24 11:23:47,004 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: SHUTDOWN_MSG:
/********************
SHUTDOWN_MSG: Shutting down DataNode at XXXX
********************/
Still haven't managed to get rid of the problem (Datanode not shutting down correctly), but I found a workaround to the situation. I used lsof +D /pathto detect active processes and killed them. The weird part is that this process was invisible to top and jps commands.
I was using Hadoop 1.2.1 in a pseudo-distributed mode in Ubuntu and everything was working fine. But then I had to restart my system . And now when I am hit jps command after giving start-all.sh i was able to see only tasktracker and jobtracker running. Could anyone tell me the possible reason of this problem? and guide me getting this resolved?
************************************************************/
2017-03-13 18:41:16,733 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting DataNode
STARTUP_MSG: host = keerthana-VirtualBox/127.0.1.1
STARTUP_MSG: args = []
STARTUP_MSG: version = 1.2.1
STARTUP_MSG: build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.2 -r 1503152; compiled by 'mattf' on Mon Jul 22 15:23:09 PDT 2013
STARTUP_MSG: java = 1.8.0_121
************************************************************/
2017-03-13 18:41:19,383 INFO org.apache.hadoop.metrics2.impl.MetricsConfig: loaded properties from hadoop-metrics2.properties
2017-03-13 18:41:19,628 INFO org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source MetricsSystem,sub=Stats registered.
2017-03-13 18:41:19,653 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled snapshot period at 10 second(s).
2017-03-13 18:41:19,653 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: DataNode metrics system started
2017-03-13 18:41:21,947 INFO org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source ugi registered.
2017-03-13 18:41:22,117 WARN org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Source name ugi already exists!
2017-03-13 18:41:23,564 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Invalid directory in dfs.data.dir: Incorrect permission for /home/keerthana/hadoop/dfs/data, expected: rwxr-xr-x, while actual: rwxrwxr-x
2017-03-13 18:41:23,564 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: All directories in dfs.data.dir are invalid.
2017-03-13 18:41:23,564 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Exiting Datanode
2017-03-13 18:41:23,630 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down DataNode at keerthana-VirtualBox/127.0.1.1
************************************************************/
As suggested from the logs, there are some problems with the dfs.data.dir "/home/keerthana/hadoop/dfs/data" location.
Provide check if the folder exists or not and whether they have the proper permission or not. If folder exists , then provide 755 access for that folder.
chmod -R 755 /home/keerthana/hadoop/dfs/data
I have started learning hadoop recently and I am getting the below error while creating new folder -
vm4learning#vm4learning:~/Installations/hadoop-1.2.1/bin$ ./hadoop fs
-mkdir helloworld Warning: $HADOOP_HOME is deprecated. 15/06/14 19:46:35 INFO ipc.Client: Retrying connect to server:
localhost/127.0.0.1:9000. Already tried 0 time(s); retry policy is
RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1
SECONDS)
Request you to help.below are the namdenode logs -
015-06-14 22:01:08,158 INFO org.apache.hadoop.hdfs.server.common.Storage: Storage directory /home/vm4learning/Installations/hadoop-1.2.1/data/dfs/name does not exist
2015-06-14 22:01:08,161 ERROR org.apache.hadoop.hdfs.server.namenode.FSNamesystem: FSNamesystem initialization failed.
org.apache.hadoop.hdfs.server.common.InconsistentFSStateException: Directory /home/vm4learning/Installations/hadoop-1.2.1/data/dfs/name is in an inconsistent state: storage directory does not exist or is not accessible.
at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:304)
at org.apache.hadoop.hdfs.server.namenode.FSDirectory.loadFSImage(FSDirectory.java:104)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(FSNamesystem.java:427)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.(FSNamesystem.java:395)
at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:299)
at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:569)
at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1479)
at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1488)
2015-06-14 22:01:08,182 ERROR org.apache.hadoop.hdfs.server.namenode.NameNode: org.apache.hadoop.hdfs.server.common.InconsistentFSStateException: Directory /home/vm4learning/Installations/hadoop-1.2.1/data/dfs/name is in an inconsistent state: storage directory does not exist or is not accessible.
at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:304)
at org.apache.hadoop.hdfs.server.namenode.FSDirectory.loadFSImage(FSDirectory.java:104)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(FSNamesystem.java:427)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.(FSNamesystem.java:395)
at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:299)
at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:569)
at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1479)
at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1488)
2015-06-14 22:01:08,185 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at vm4learning/192.168.1.102
************************************************************/
Before starting to create a directory, you should be sure that your hadoop installation is correct, through jps command, and looking for any process missing.
In your case, the namenode isn't up.
If you see in the logs, it appears to be that some folders aren't created. Do this:
mkdir -p $HADOOP_HOME/dfs/name
mkdir -p $HADOOP_HOME/dfs/name/data
And specify in hdfs-site.xml the following.
<property>
<name>dfs.data.dir</name>
<value>/usr/local/hadoop/dfs/name/data</value>
<final>true</final>
</property>
<property>
<name>dfs.name.dir</name>
<value>/usr/local/hadoop/dfs/name</value>
<final>true</final>
</property>
Reinitialize hadoop, and remember to format previous to do anything.
Another reason why the same error may appear is that you still not executed the formatting of the namenode like that:
hdfs namenode -format
I have missed the step 7 from this tutorial (https://kontext.tech/column/hadoop/377/latest-hadoop-321-installation-on-windows-10-step-by-step-guide ) and faced the same error
hdfs is the utility from HADOOP_HOME/bin
I have tried all the different solutions provided at stackoverflow on this topic, but of no help
Asking again with the specific log and the details
Any help is appreciated
I have one master node and 5 slave nodes in my Hadoop cluster. ubuntu user and ubuntu group is the owner of the ~/Hadoop folder
Both the ~/hadoop/hdfs/data & ~/hadoop/hdfs/name folder exist
and permission for both the folders are set to 755
successfully formated the namenode before starting the script start-all.sh
THE SCRIPT FAILS TO LAUNCH THE "NAMENODE"
These are running at the master node
ubuntu#master:~/hadoop/bin$ jps
7067 TaskTracker
6914 JobTracker
7237 Jps
6834 SecondaryNameNode
6682 DataNode
ubuntu#slave5:~/hadoop/bin$ jps
31438 TaskTracker
31581 Jps
31307 DataNode
Below is the log from name-node log files.
..........
..........
.........
014-12-03 12:25:45,460 INFO org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source jvm registered.
2014-12-03 12:25:45,461 INFO org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source NameNode registered.
2014-12-03 12:25:45,532 INFO org.apache.hadoop.hdfs.util.GSet: Computing capacity for map BlocksMap
2014-12-03 12:25:45,532 INFO org.apache.hadoop.hdfs.util.GSet: VM type = 64-bit
2014-12-03 12:25:45,532 INFO org.apache.hadoop.hdfs.util.GSet: 2.0% max memory = 1013645312
2014-12-03 12:25:45,532 INFO org.apache.hadoop.hdfs.util.GSet: capacity = 2^21 = 2097152 entries
2014-12-03 12:25:45,532 INFO org.apache.hadoop.hdfs.util.GSet: recommended=2097152, actual=2097152
2014-12-03 12:25:45,588 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: fsOwner=ubuntu
2014-12-03 12:25:45,588 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: supergroup=supergroup
2014-12-03 12:25:45,588 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: isPermissionEnabled=true
2014-12-03 12:25:45,622 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: dfs.block.invalidate.limit=100
2014-12-03 12:25:45,623 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: isAccessTokenEnabled=false accessKeyUpdateInterval=0 min(s), accessTokenLifetime=0 min(s)
2014-12-03 12:25:45,716 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Registered FSNamesystemStateMBean and NameNodeMXBean
2014-12-03 12:25:45,777 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: dfs.namenode.edits.toleration.length = 0
2014-12-03 12:25:45,777 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: Caching file names occuring more than 10 times
2014-12-03 12:25:45,785 INFO org.apache.hadoop.hdfs.server.common.Storage: Storage directory /home/ubuntu/hadoop/file:/home/ubuntu/hadoop/hdfs/name does not exist
2014-12-03 12:25:45,787 ERROR org.apache.hadoop.hdfs.server.namenode.FSNamesystem: FSNamesystem initialization failed.
org.apache.hadoop.hdfs.server.common.InconsistentFSStateException: Directory /home/ubuntu/hadoop/file:/home/ubuntu/hadoop/hdfs/name is in an inconsistent state: storage directory does not exist or is not accessible.
at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:304)
at org.apache.hadoop.hdfs.server.namenode.FSDirectory.loadFSImage(FSDirectory.java:104)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(FSNamesystem.java:427)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.<init>(FSNamesystem.java:395)
at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:299)
at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:569)
at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1479)
at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1488)
2014-12-03 12:25:45,801 ERROR org.apache.hadoop.hdfs.server.namenode.NameNode: org.apache.hadoop.hdfs.server.common.InconsistentFSStateException: Directory /home/ubuntu/hadoop/file:/home/ubuntu/hadoop/hdfs/name is in an inconsistent state: storage directory does not exist or is not accessible.
at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:304)
at org.apache.hadoop.hdfs.server.namenode.FSDirectory.loadFSImage(FSDirectory.java:104)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(FSNamesystem.java:427)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.<init>(FSNamesystem.java:395)
at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:299)
at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:569)
at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1479)
at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1488)
Removed the "file:" from the hdfs-site.xml file
[WRONG HDFS-SITE.XML]
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/home/hduser/mydata/hdfs/namenode</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/home/hduser/mydata/hdfs/datanode</value>
</property>
[CORRECT HDFS-SITE.XML]
<property>
<name>dfs.namenode.name.dir</name>
<value>/home/hduser/mydata/hdfs/namenode</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>/home/hduser/mydata/hdfs/datanode</value>
</property>
Thanks to Erik for the help.
Follow the below steps,
1.Stop all services
2.Format your namenode
3.Delete your data node directory
4.start all services
run these commands on terminal
$ cd ~
$ mkdir -p mydata/hdfs/namenode
$ mkdir -p mydata/hdfs/datanode
give permission to both directory 755
then,
Add this property in conf/hdfs-site.xml
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/home/hduser/mydata/hdfs/namenode</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/home/hduser/mydata/hdfs/datanode</value>
</property>
if not work ,then
stop-all.sh
start-all.sh
1) name node directory you should be owner and give chmod 750 appropriately
2)stop all services
3)use hadoop namenode -format to format namenode
4)add this to hdfs-site.xml
<property>
<name>dfs.data.dir</name>
<value>path/to/hadooptmpfolder/dfs/name/data</value>
<final>true</final>
</property>
<property>
<name>dfs.name.dir</name>
<value>path/to/hadooptmpfolder/dfs/name</value>
<final>true</final>
</property>
5) to run hadoop namenode -format
add export PATH=$PATH:/usr/local/hadoop/bin/ in ~/.bashrc
wherever hadoop is unzip add that in path
Had similar problem, I formatted the namenode then started it
Hadoop namenode -format
hadoop-daemon.sh start namenode
You can follow given below steps to remove this error:
Stop all hadoop daemons
Delete all files from given below directory:
/tmp/hadoop-{user}/dfs/name/current and /tmp/hadoop-{user}/dfs/data/current
where user is the user with which you logged in into the box.
Format namenode
Start all services
You will now see a new file VERSION created in directory /tmp/hadoop-/dfs/name/current
One thing to notice here is that value of Cluster ID in file /tmp/hadoop-eip/dfs/name/current/VERSION must be same as in /tmp/hadoop-eip/dfs/data/current/VERSION
-Hitesh
I am trying to learn Hadoop by following a tutorial and trying to do pseudo-distributed mode on my machine.
My core-site.xml is:
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
<description>The name of the default file system. A URI whose scheme and authority determine the FileSystem implementation.
</description>
</property>
</configuration>
My hdfs-site.xml file is:
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
<description>The actual number of replications can be specified when the
file is created.
</description>
</property>
</configuration>
My mapred-site.xml file is:
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>localhost:9001</value>
<description>The host and port that the MapReduce job tracker runs
at.
</description>
</property>
</configuration>
When I run the command it ran successfully but what it is doing actually:
hadoop-1.2.1$ bin/hadoop namenode -format
14/11/26 12:37:16 INFO namenode.NameNode: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG: host = myhost/127.0.0.8
STARTUP_MSG: args = [-format]
STARTUP_MSG: version = 1.2.1
STARTUP_MSG: build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.2 -r 1503152; compiled by 'mattf' on Mon Jul 22 15:23:09 PDT 2013
STARTUP_MSG: java = 1.6.0_45
************************************************************/
14/11/26 12:37:17 INFO util.GSet: Computing capacity for map BlocksMap
14/11/26 12:37:17 INFO util.GSet: VM type = 64-bit
14/11/26 12:37:17 INFO util.GSet: 2.0% max memory = 932118528
14/11/26 12:37:17 INFO util.GSet: capacity = 2^21 = 2097152 entries
14/11/26 12:37:17 INFO util.GSet: recommended=2097152, actual=2097152
14/11/26 12:37:17 INFO namenode.FSNamesystem: fsOwner=myuser
14/11/26 12:37:17 INFO namenode.FSNamesystem: supergroup=supergroup
14/11/26 12:37:17 INFO namenode.FSNamesystem: isPermissionEnabled=true
14/11/26 12:37:17 INFO namenode.FSNamesystem: dfs.block.invalidate.limit=100
14/11/26 12:37:17 INFO namenode.FSNamesystem: isAccessTokenEnabled=false accessKeyUpdateInterval=0 min(s), accessTokenLifetime=0 min(s)
14/11/26 12:37:17 INFO namenode.FSEditLog: dfs.namenode.edits.toleration.length = 0
14/11/26 12:37:17 INFO namenode.NameNode: Caching file names occuring more than 10 times
14/11/26 12:37:17 INFO common.Storage: Image file /tmp/hadoop-myuser/dfs/name/current/fsimage of size 115 bytes saved in 0 seconds.
14/11/26 12:37:18 INFO namenode.FSEditLog: closing edit log: position=4, editlog=/tmp/hadoop-myuser/dfs/name/current/edits
14/11/26 12:37:18 INFO namenode.FSEditLog: close success: truncate to 4, editlog=/tmp/hadoop-myuser/dfs/name/current/edits
14/11/26 12:37:18 INFO common.Storage: Storage directory /tmp/hadoop-myuser/dfs/name has been successfully formatted.
14/11/26 12:37:18 INFO namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at chaitanya-OptiPlex-3010/127.0.0.8
************************************************************/
Can someone please let me know what it is doing internally.
I have gone through these posts but there is no correct explanation.
What exactly is hadoop namenode formatting?
hadoop namenode is not formatting
How can I check this practically on my machine so I can see the differences before and after running the command. I am new to Hadoop so this can be a trivial question.
hadoop namenode -format this command deletes all files in your hdfs.
tmp directory contains two folders datanode, namenode in local filesystem. if you format the namenode these two folders becomes empty.
Note : if you want to format your namenode first stop all hadoop services then delete the tmp(contains namenode and datanode) folder in your local file system and start hadoop service surely it will take effect.
Reason for Hadoop namenode -format :
Hadoop NameNode is the centralized place of an HDFS file system which keeps the directory tree of all files in the file system, and tracks where across the cluster the file data is kept. In short, it keeps the metadata related to datanodes. When we format namenode it formats the meta-data related to data-nodes. By doing that, all the information on the datanodes are lost and they can be reused for new data.
By default the namenode location will be at "/tmp/hadoop-myuser/dfs/name"
While you formatting the namenode, this file location was cleared.
To change the namenode location add the follwing properties At hdfs-site.xml
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/search/data/dfs/namenode</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/search/data/dfs/datanode</value>
</property>
I hope this will help you.. :-)
Hadoop namenode -format
Hadoop namenode directory contains the fsimage and edit files which
holds the basic information's about hadoop file system such as where is
data available, which user created files like that
If you format the namenode then the above information's are deleted
from namenode directory which is specified in the hdfs-site.xml as dfs.namenode.name.dir
But you still have the datas on the hadoop but not namenode meta data
Actually formatting a Namenode will not format the Datanode.
It will just format the contents of your namenode (which contains details of datanode). Your namenode will no longer know where your data is. Also namenode -format will assign a new namespace ID to the namenode
You have to change your namespaceID in your datanode to make your datanode work. This will be at dfs/data/current/VERSION
There is a JIRA open now for the same suggesting to format Datanode as well when you format Namenode. HDFS-107
Namenode contains metadata about the Hadoop filesystem.
This command (hadoop-1.2.1$ bin/hadoop namenode -format) will format whole Hadoop distributed file system(HDFS). So if you run this command on existing filesystem you will lose all your data.
Steps
start all the services using "start-all.sh"
check the services are running or not using "JPS"
note: if you use hadoop2.3.0 then following services are need to run
Namenode
Datanode
Resourcemanager
Nodemanager
Move some file from local to HDFS using hdfs -put /
Now check at location "/tmp/hadoop-myuser/dfs/name" you may find this file split into some BLOCKS conatain 64 MB each.
Then start Formatting using **hadoop namenode -format**
Now the file is not available phisically on that location
Further information click here