Failure to start Hadoop after having stopped a running (and working) instance before, because Datanode says that the directory is locked - hadoop

I have a cluster running Hadoop 1.2.1 with Giraph on top. The server runs ok, but when I stop it, I am unable to make it run again. In the datanode log I get the following error: ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: java.io.IOException: Cannot lock storage /pathToFolder/data/datanode. The directory is already locked.
I have tried many solutions that I found online:
Checking permissions of folders.
Checking equal versions of VERSION file for namenode and datanode.
Checking configuration files (core-site, hdfs-site, mapred-site, master, slaves, ...)
Deleting / Changing the namenode and datanode data folders
Removing hadoop temporary files
Bottomline is, everything seems fine, but it is still failing to start the datanode. A complete log file for the datanode is the following:
2020-06-24 11:23:46,624 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: STARTUP_MSG:
/********************
STARTUP_MSG: Starting DataNode
STARTUP_MSG: host = XXXX
STARTUP_MSG: args = []
STARTUP_MSG: version = 1.2.1
STARTUP_MSG: build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.2 -r 1503152; compiled by 'mattf' on Mon Jul 22 15:23:09 PDT 2013
STARTUP_MSG: java = 1.8.0_212
********************/
2020-06-24 11:23:46,719 INFO org.apache.hadoop.metrics2.impl.MetricsConfig: loaded properties from hadoop-metrics2.properties
2020-06-24 11:23:46,725 INFO org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source MetricsSystem,sub=Stats registered.
2020-06-24 11:23:46,726 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled snapshot period at 10 second(s).
2020-06-24 11:23:46,726 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: DataNode metrics system started
2020-06-24 11:23:46,791 INFO org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source ugi registered.
2020-06-24 11:23:46,794 WARN org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Source name ugi already exists!
2020-06-24 11:23:46,903 INFO org.apache.hadoop.hdfs.server.common.Storage: Cannot lock storage /users/lahdak/rojas/AppHadoop/data/datanode. The directory is already locked.
2020-06-24 11:23:47,004 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: java.io.IOException: Cannot lock storage /users/lahdak/rojas/AppHadoop/data/datanode. The directory is already locked.
at org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.lock(Storage.java:599)
at org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.analyzeStorage(Storage.java:452)
at org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:111)
at org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNode.java:414)
at org.apache.hadoop.hdfs.server.datanode.DataNode.<init>(DataNode.java:321)
at org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:1712)
at org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:1651)
at org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:1669)
at org.apache.hadoop.hdfs.server.datanode.DataNode.secureMain(DataNode.java:1795)
at org.apache.hadoop.hdfs.server.datanode.DataNode.main(DataNode.java:1812)
2020-06-24 11:23:47,004 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: SHUTDOWN_MSG:
/********************
SHUTDOWN_MSG: Shutting down DataNode at XXXX
********************/

Still haven't managed to get rid of the problem (Datanode not shutting down correctly), but I found a workaround to the situation. I used lsof +D /pathto detect active processes and killed them. The weird part is that this process was invisible to top and jps commands.

Related

namenode datanode and sec name node not getting started when i hit jps command

I was using Hadoop 1.2.1 in a pseudo-distributed mode in Ubuntu and everything was working fine. But then I had to restart my system . And now when I am hit jps command after giving start-all.sh i was able to see only tasktracker and jobtracker running. Could anyone tell me the possible reason of this problem? and guide me getting this resolved?
************************************************************/
2017-03-13 18:41:16,733 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting DataNode
STARTUP_MSG: host = keerthana-VirtualBox/127.0.1.1
STARTUP_MSG: args = []
STARTUP_MSG: version = 1.2.1
STARTUP_MSG: build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.2 -r 1503152; compiled by 'mattf' on Mon Jul 22 15:23:09 PDT 2013
STARTUP_MSG: java = 1.8.0_121
************************************************************/
2017-03-13 18:41:19,383 INFO org.apache.hadoop.metrics2.impl.MetricsConfig: loaded properties from hadoop-metrics2.properties
2017-03-13 18:41:19,628 INFO org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source MetricsSystem,sub=Stats registered.
2017-03-13 18:41:19,653 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled snapshot period at 10 second(s).
2017-03-13 18:41:19,653 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: DataNode metrics system started
2017-03-13 18:41:21,947 INFO org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source ugi registered.
2017-03-13 18:41:22,117 WARN org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Source name ugi already exists!
2017-03-13 18:41:23,564 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Invalid directory in dfs.data.dir: Incorrect permission for /home/keerthana/hadoop/dfs/data, expected: rwxr-xr-x, while actual: rwxrwxr-x
2017-03-13 18:41:23,564 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: All directories in dfs.data.dir are invalid.
2017-03-13 18:41:23,564 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Exiting Datanode
2017-03-13 18:41:23,630 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down DataNode at keerthana-VirtualBox/127.0.1.1
************************************************************/
As suggested from the logs, there are some problems with the dfs.data.dir "/home/keerthana/hadoop/dfs/data" location.
Provide check if the folder exists or not and whether they have the proper permission or not. If folder exists , then provide 755 access for that folder.
chmod -R 755 /home/keerthana/hadoop/dfs/data

Unable to start TaskTracker.Says Can not start task tracker because java.lang.IllegalArgumentException: Does not contain a valid host:port authority:

Edited mapred-site.xml,core-site.xml,hadoop-env.sh,hdfs-site.xml,masters and slaves.
I have 1 DataNode and 2 Namenodes.Both of them started successfully and I am able to see it in browser.
Started start-mapred.sh and it started JobTracker and TaskTracker on the Namenode but unable to start Tasktracker on the datanaode.
Started Tasktracker and the following is the output.
->hadoop tasktracker
Warning: $HADOOP_HOME is deprecated.
13/10/17 03:21:55 INFO mapred.TaskTracker: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting TaskTracker
STARTUP_MSG: host = tintin/10.193.184.157
STARTUP_MSG: args = []
STARTUP_MSG: version = 1.1.2
STARTUP_MSG: build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.1 -r 1440782; compiled by 'hortonf o' on Thu Jan 31 02:03:24 UTC 2013
************************************************************/
13/10/17 03:21:55 INFO impl.MetricsConfig: loaded properties from hadoop-metrics2.properties
13/10/17 03:21:55 INFO impl.MetricsSourceAdapter: MBean for source MetricsSystem,sub=Stats registered.
13/10/17 03:21:55 INFO impl.MetricsSystemImpl: Scheduled snapshot period at 10 second(s).
13/10/17 03:21:55 INFO impl.MetricsSystemImpl: TaskTracker metrics system started
13/10/17 03:21:55 INFO util.NativeCodeLoader: Loaded the native-hadoop library
13/10/17 03:21:55 INFO impl.MetricsSourceAdapter: MBean for source ugi registered.
13/10/17 03:21:55 WARN impl.MetricsSystemImpl: Source name ugi already exists!
13/10/17 03:21:55 ERROR mapred.TaskTracker: Can not start task tracker because java.lang.IllegalArgumentException: Does no t contain a valid host:port authority:
10.193.184.132:54311
at org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:149)
at org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:130)
at org.apache.hadoop.mapred.JobTracker.getAddress(JobTracker.java:2312)
at org.apache.hadoop.mapred.TaskTracker.<init>(TaskTracker.java:1532)
at org.apache.hadoop.mapred.TaskTracker.main(TaskTracker.java:3906)
13/10/17 03:21:55 INFO mapred.TaskTracker: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down TaskTracker at tintin/10.193.184.157
************************************************************/
Though quite late to answer this, still does your mapred-site.xml contain mapred.job.tracker property ?
My job tracker runs on 8021, so the configuration is as follows:
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>http://localhost:8021</value>
</property>
</configuration>
I faced such an issue due to missing properties.

Incompatible build versions error in Hadoop slave node

My Hadoop cluster was running without any mistake. I don't know what has changed but when I try to start Hadoop components with start-all.sh command from master, I check running processes with jps command and see that DataNode does not work in slave node.
The datanode log is below. Versions of Hadoop installation (1.0.4) are the same on the machines in the cluster. I could not find how to solve the problem.
2013-09-18 09:35:21,638 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting DataNode
STARTUP_MSG: host = noon101/10.240.20.30
STARTUP_MSG: args = []
STARTUP_MSG: version = 1.0.4-SNAPSHOT
STARTUP_MSG: build = -r ; compiled by 'hduser' on Wed May 29 10:55:16 EEST 2013
************************************************************/
2013-09-18 09:35:21,752 INFO org.apache.hadoop.metrics2.impl.MetricsConfig: loaded properties from hadoop-metrics2.properties
2013-09-18 09:35:21,761 INFO org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source MetricsSystem,sub=Stats registered.
2013-09-18 09:35:21,762 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled snapshot period at 10 second(s).
2013-09-18 09:35:21,762 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: DataNode metrics system started
2013-09-18 09:35:21,867 INFO org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source ugi registered.
2013-09-18 09:35:21,869 WARN org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Source name ugi already exists!
2013-09-18 09:35:26,964 FATAL org.apache.hadoop.hdfs.server.datanode.DataNode: Incompatible build versions: namenode BV = 1393290; datanode BV =
2013-09-18 09:35:27,070 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: java.io.IOException: Incompatible build versions: namenode BV = 1393290; datanode BV =
at org.apache.hadoop.hdfs.server.datanode.DataNode.handshake(DataNode.java:566)
at org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNode.java:362)
at org.apache.hadoop.hdfs.server.datanode.DataNode.<init>(DataNode.java:299)
at org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:1582)
at org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:1521)
at org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:1539)
at org.apache.hadoop.hdfs.server.datanode.DataNode.secureMain(DataNode.java:1665)
at org.apache.hadoop.hdfs.server.datanode.DataNode.main(DataNode.java:1682)
2013-09-18 09:35:27,071 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down DataNode at noon101/10.240.20.30
************************************************************/
Below are part of datanode logs.
Slave node:
STARTUP_MSG: Starting DataNode
STARTUP_MSG: host = noon101/10.240.20.30
STARTUP_MSG: args = []
STARTUP_MSG: version = 1.0.4-SNAPSHOT
STARTUP_MSG: build = -r ; compiled by 'hduser' on Wed May 29 10:55:16 EEST 2013
Master node:
STARTUP_MSG: Starting DataNode
STARTUP_MSG: host = noon102/10.240.20.32
STARTUP_MSG: args = []
STARTUP_MSG: version = 1.0.4
STARTUP_MSG: build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.0 -r 1393290; compiled by 'hortonfo' on Wed Oct 3 05:13:58 UTC 2012
When looking at the nodes, I see version and build values are different. Even if this is the problem, I still do not know how to solve it.
This generally would happen if there was any kind of change to the conf directory on the Master Name node. Are you sure there wasn't some kind of an ant script which ran or something which messed with the jar files in the 'lib' directory of Hadoop?
This issue seems to be fairly prevalent. The following link has a clear step-by-step instruction of how you can salvage the situation. The gist of it being, you try to copy the conf directory of the Datanode to Master node after backing it up.
http://permalink.gmane.org/gmane.comp.jakarta.lucene.hadoop.user/22499
See if this helps.
I had the same problem and it's solved! I received the error "Incompatible build versions: namenode BV = ; datanode BV = 985326" many times as I had many < property > ... < /property > tabs under ... < configuration > tab in mapred-site.xml. I had to make sure all configurations are same in all nodes (master+slaves). If a single value in < property > ... < /property > tab differs from one node to another, you are likely to see the Incompatible Build Version error.
So here is the summary:
1) Make sure you have same hadoop version in all nodes.
2) Make sure you have same all *-site.xml are same in all nodes that is you use same configuration in core-site.xml, mapred-site.xml, and hdfs-site.xml.
Good Luck!

ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: java.io.IOException: Call to localhost/127.0.0.1:54310 failed on local exception

I am getting an error in starting the data node while initiating the single node cluster set up on my machine
************************************************************/
2013-02-18 20:21:32,300 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting DataNode
STARTUP_MSG: host = somnath-laptop/127.0.1.1
STARTUP_MSG: args = []
STARTUP_MSG: version = 1.0.4
STARTUP_MSG: build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.0 -r 1393290; compiled by 'hortonfo' on Wed Oct 3 05:13:58 UTC 2012
************************************************************/
2013-02-18 20:21:32,593 INFO org.apache.hadoop.metrics2.impl.MetricsConfig: loaded properties from hadoop-metrics2.properties
2013-02-18 20:21:32,618 INFO org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source MetricsSystem,sub=Stats registered.
2013-02-18 20:21:32,620 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled snapshot period at 10 second(s).
2013-02-18 20:21:32,620 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: DataNode metrics system started
2013-02-18 20:21:33,052 INFO org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source ugi registered.
2013-02-18 20:21:33,056 WARN org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Source name ugi already exists!
2013-02-18 20:21:37,890 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: java.io.IOException: Call to localhost/127.0.0.1:54310 failed on local exception: java.io.IOException: Connection reset by peer
at org.apache.hadoop.ipc.Client.wrapException(Client.java:1107)
at org.apache.hadoop.ipc.Client.call(Client.java:1075)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:225)
at sun.proxy.$Proxy5.getProtocolVersion(Unknown Source)
at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:396)
at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:370)
at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:429)
at org.apache.hadoop.ipc.RPC.waitForProxy(RPC.java:331)
at org.apache.hadoop.ipc.RPC.waitForProxy(RPC.java:296)
at org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNode.java:356)
at org.apache.hadoop.hdfs.server.datanode.DataNode.<init>(DataNode.java:299)
at org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:1582)
at org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:1521)
at org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:1539)
at org.apache.hadoop.hdfs.server.datanode.DataNode.secureMain(DataNode.java:1665)
at org.apache.hadoop.hdfs.server.datanode.DataNode.main(DataNode.java:1682)
Caused by: java.io.IOException: Connection reset by peer
at sun.nio.ch.FileDispatcher.read0(Native Method)
at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:251)
at sun.nio.ch.IOUtil.read(IOUtil.java:224)
Any idea about how to resolve this error?
Ok got the problem solved.
Since I was using my single-node cluster through a network proxy, I had added the following property line to $HADOOP_HOME/conf/mapred-site.xml to by-pass proxy server while communicating across Hadoop daemons.
However, this time I was trying out on a direct internet connection, so had to comment out the property that I added in mapred-site.xml.
Below is the property from mapred-site.xml that I commented out:
<!--
<property>
<name>hadoop.rpc.socket.factory.class.default</name>
<value>org.apache.hadoop.net.StandardSocketFactory</value>
<final>true</final>
<description>
Prevent proxy settings set up by clients in their job configs from affecting our connectivity.
</description>
</property>
-->

no namenode error in pseudo-mode

I'm new to hadoop and is in learning phase.
As per Hadoop Definitve guide, i have set up my hadoop in pseudo distributed mode and everything was working fine. I was even able to execute all the examples from chapter 3 yesterday. Today, when i rebooted my unix and tried to run start-dfs.sh and then tried localhost:50070... it is showing error and when i try to stop dfs (stop-dfs.sh) it says no namenode to stop. I have been googling the issue but no result. Also, when i format my namenode again...everything starts working fine and i'm able to connect to the localhost:50070 and even replicate files and directories in hdfs but as soon as i restart my linux and try to connect to hdfs the same problem comes up.
Below is the error log:
************************************************************/
2011-06-22 15:45:55,249 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG: host = ubuntu/127.0.1.1
STARTUP_MSG: args = []
STARTUP_MSG: version = 0.20.203.0
STARTUP_MSG: build = http://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20-security-203 -r 1099333; compiled by 'oom' on Wed May 4 07:57:50 PDT 2011
************************************************************/
2011-06-22 15:45:56,383 INFO org.apache.hadoop.metrics2.impl.MetricsConfig: loaded properties from hadoop-metrics2.properties
2011-06-22 15:45:56,455 INFO org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source MetricsSystem,sub=Stats registered.
2011-06-22 15:45:56,494 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled snapshot period at 10 second(s).
2011-06-22 15:45:56,494 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: NameNode metrics system started
2011-06-22 15:45:57,007 INFO org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source ugi registered.
2011-06-22 15:45:57,031 WARN org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Source name ugi already exists!
2011-06-22 15:45:57,059 INFO org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source jvm registered.
2011-06-22 15:45:57,070 INFO org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source NameNode registered.
2011-06-22 15:45:57,374 INFO org.apache.hadoop.hdfs.util.GSet: VM type = 32-bit
2011-06-22 15:45:57,374 INFO org.apache.hadoop.hdfs.util.GSet: 2% max memory = 19.33375 MB
2011-06-22 15:45:57,374 INFO org.apache.hadoop.hdfs.util.GSet: capacity = 2^22 = 4194304 entries
2011-06-22 15:45:57,374 INFO org.apache.hadoop.hdfs.util.GSet: recommended=4194304, actual=4194304
2011-06-22 15:45:57,854 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: fsOwner=anshu
2011-06-22 15:45:57,854 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: supergroup=supergroup
2011-06-22 15:45:57,854 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: isPermissionEnabled=true
2011-06-22 15:45:57,868 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: dfs.block.invalidate.limit=100
2011-06-22 15:45:57,869 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: isAccessTokenEnabled=false accessKeyUpdateInterval=0 min(s), accessTokenLifetime=0 min(s)
2011-06-22 15:45:58,769 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Registered FSNamesystemStateMBean and NameNodeMXBean
2011-06-22 15:45:58,809 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: Caching file names occuring more than 10 times
**2011-06-22 15:45:58,825 INFO org.apache.hadoop.hdfs.server.common.Storage: Storage directory /tmp/hadoop-anshu/dfs/name does not exist.
2011-06-22 15:45:58,827 ERROR org.apache.hadoop.hdfs.server.namenode.FSNamesystem: FSNamesystem initialization failed.
org.apache.hadoop.hdfs.server.common.InconsistentFSStateException: Directory /tmp/hadoop-anshu/dfs/name is in an inconsistent state: storage directory does not exist or is not accessible.
at org.apache.h**adoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:291)
at org.apache.hadoop.hdfs.server.namenode.FSDirectory.loadFSImage(FSDirectory.java:97)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(FSNamesystem.java:379)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.<init>(FSNamesystem.java:353)
at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:254)
at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:434)
at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1153)
at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1162)
2011-06-22 15:45:58,828 ERROR org.apache.hadoop.hdfs.server.namenode.NameNode: org.apache.hadoop.hdfs.server.common.InconsistentFSStateException: Directory /tmp/hadoop-anshu/dfs/name is in an inconsistent state: storage directory does not exist or is not accessible.
at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:291)
at org.apache.hadoop.hdfs.server.namenode.FSDirectory.loadFSImage(FSDirectory.java:97)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(FSNamesystem.java:379)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.<init>(FSNamesystem.java:353)
at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:254)
at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:434)
at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1153)
at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1162)
2011-06-22 15:45:58,829 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at ubuntu/127.0.1.1
************************************************************/
Any help is appreciated
Thank-you
here is the kicker:
org.apache.hadoop.hdfs.server.common.InconsistentFSStateException:
Directory /tmp/hadoop-anshu/dfs/name
is in an inconsistent state: storage
directory does not exist or is not
accessible.
i'd been having similar issues. i used stop-all.sh to shut down hadoop. i guess it was foolish of me to think this would properly save the data in my HDFS.
but as far as i can tell from what appears to be the appropriate code chunk in the hadoop-daemon.sh script, this is not the case - it just kills the processes:
(stop)
if [ -f $pid ]; then
if kill -0 `cat $pid` > /dev/null 2>&1; then
echo stopping $command
kill `cat $pid`
else
echo no $command to stop
fi
else
echo no $command to stop
fi
did you look to see if the directory it's complaining about existed? i checked and mine did not, although there was an (empty!) data folder in there here I imagine data might have once lived.
so my guess was that what we need to do is configure Hadoop such that our namenode and datanode are NOT stored in a tmp directory. there is some possibility that the OS is doing maintenance and getting rid of these files. either that hadoop figures you don't care about them anymore because you wouldn't have left them in a tmp directory if you did, and you wouldn't be restarting your machine in the middle of a map-reduce job. I don't really think this should happen (i mean, that's not how i would design things) but it seemed like a good guess.
so, based on this site http://wiki.datameer.com/display/DAS11/Hadoop+configuration+file+templates
i edited my conf/hdfs-site.xml file to point to the following paths (obviously, make your own directories as you see fit):
<property>
<name>dfs.name.dir</name>
<value>/hadoopstorage/name/</value>
</property>
<property>
<name>dfs.data.dir</name>
<value>/hadoopstorage/data/</value>
</property>
Did this, formatted the new namenode (sadly, data loss seems inevitable in this situation), stopped and started hadoop with the shell scripts, restarted the machine, and my files were still there...
YMMV...hope this works for you! i'm on OS X but i don't think you should have dissimilar results.
J
If you dont care about losing data just execute the command:
./hadoop namenode -format
I had similar issue and this helped
chown -R hdfs:hadoop /path/to/namenode/date/dir
Setting this properties in conf/hdfs-site.xml file worked for me!!!
Thanks jsh
<property>
<name>dfs.name.dir</name>
<value>/hadoopstorage/name/</value>
</property>
<property>
<name>dfs.data.dir</name>
<value>/hadoopstorage/data/</value>`enter code here`
</property>
Dont forget to set proper permissions to those directories
JSH answer is correct.
Just a couple of changes for hadoop 2.6 i had to do:
<property>
<name>dfs.namenode.name.dir</name>
<value>/hadoopstorage/name/</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>/hadoopstorage/data/</value>
</property>
If you have not resolved the problem, try this:
give the dfs.name.dir directory to in the user group hadoop and give the group to write permission.
See your coresite.xml in hadoop config directory
Go to config directory
vi core-site.xml,hdf.site.xml
Make sure your port numbers and paths are correct
I have the similar problem, but slightly different.
Running start-all.sh seams quite well, but jps shows that there is no namenodes and I could not see the list when I run hdfs dfs -ls /.
My first attempt is to run hadoop namenode -format, then namenode appears but datanode disappears.
After googling the solution, I run rm -rf /usr/local/hadoop_store/hdfs/datanode/* and restart hadoop, jps shows:
12912 ResourceManager
13391 FsShell
13420 Jps
13038 NodeManager
12733 SecondaryNameNode
12432 NameNode
12556 DataNode
Now I can use hadoop commands as usual.
HTH!

Resources