No data nodes are started

No data nodes are started - hadoop

I am trying to setup Hadoop version 0.20.203.0 in a pseudo distributed configuration using the following guide:
http://www.javacodegeeks.com/2012/01/hadoop-modes-explained-standalone.html
After running the start-all.sh script I run "jps".
I get this output:
4825 NameNode
5391 TaskTracker
5242 JobTracker
5477 Jps
5140 SecondaryNameNode
When I try to add information to the hdfs using:
bin/hadoop fs -put conf input
I got an error:
hadoop#m1a2:~/software/hadoop$ bin/hadoop fs -put conf input
12/04/10 18:15:31 WARN hdfs.DFSClient: DataStreamer Exception: org.apache.hadoop.ipc.RemoteException: java.io.IOException: File /user/hadoop/input/core-site.xml could only be replicated to 0 nodes, instead of 1
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1417)
at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:596)
at sun.reflect.GeneratedMethodAccessor6.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:616)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:523)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1383)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1379)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:416)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1377)
at org.apache.hadoop.ipc.Client.call(Client.java:1030)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:224)
at $Proxy1.addBlock(Unknown Source)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:616)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
at $Proxy1.addBlock(Unknown Source)
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:3104)
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2975)
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:2255)
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2446)
12/04/10 18:15:31 WARN hdfs.DFSClient: Error Recovery for block null bad datanode[0] nodes == null
12/04/10 18:15:31 WARN hdfs.DFSClient: Could not get block locations. Source file "/user/hadoop/input/core-site.xml" - Aborting...
put: java.io.IOException: File /user/hadoop/input/core-site.xml could only be replicated to 0 nodes, instead of 1
12/04/10 18:15:31 ERROR hdfs.DFSClient: Exception closing file /user/hadoop/input/core-site.xml : org.apache.hadoop.ipc.RemoteException: java.io.IOException: File /user/hadoop/input/core-site.xml could only be replicated to 0 nodes, instead of 1
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1417)
at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:596)
at sun.reflect.GeneratedMethodAccessor6.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:616)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:523)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1383)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1379)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:416)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1377)
org.apache.hadoop.ipc.RemoteException: java.io.IOException: File /user/hadoop/input/core-site.xml could only be replicated to 0 nodes, instead of 1
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1417)
at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:596)
at sun.reflect.GeneratedMethodAccessor6.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:616)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:523)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1383)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1379)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:416)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1377)
at org.apache.hadoop.ipc.Client.call(Client.java:1030)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:224)
at $Proxy1.addBlock(Unknown Source)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:616)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
at $Proxy1.addBlock(Unknown Source)
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:3104)
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2975)
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:2255)
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2446)
I am not totally sure but I believe that this may have to do with the fact that the datanode is not running.
Does anybody know what I have done wrong, or how to fix this problem?
EDIT: This is the datanode.log file:
2012-04-11 12:27:28,977 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting DataNode
STARTUP_MSG: host = m1a2/139.147.5.55
STARTUP_MSG: args = []
STARTUP_MSG: version = 0.20.203.0
STARTUP_MSG: build = http://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20-security-203 -r 1099333; compiled by 'oom' on Wed May 4 07:57:50 PDT 2011
************************************************************/
2012-04-11 12:27:29,166 INFO org.apache.hadoop.metrics2.impl.MetricsConfig: loaded properties from hadoop-metrics2.properties
2012-04-11 12:27:29,181 INFO org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source MetricsSystem,sub=Stats registered.
2012-04-11 12:27:29,183 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled snapshot period at 10 second(s).
2012-04-11 12:27:29,183 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: DataNode metrics system started
2012-04-11 12:27:29,342 INFO org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source ugi registered.
2012-04-11 12:27:29,347 WARN org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Source name ugi already exists!
2012-04-11 12:27:29,615 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: java.io.IOException: Incompatible namespaceIDs in /tmp/hadoop-hadoop/dfs/data: namenode namespaceID = 301052954; datanode namespaceID = 229562149
at org.apache.hadoop.hdfs.server.datanode.DataStorage.doTransition(DataStorage.java:232)
at org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:147)
at org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNode.java:354)
at org.apache.hadoop.hdfs.server.datanode.DataNode.<init>(DataNode.java:268)
at org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:1480)
at org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:1419)
at org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:1437)
at org.apache.hadoop.hdfs.server.datanode.DataNode.secureMain(DataNode.java:1563)
at org.apache.hadoop.hdfs.server.datanode.DataNode.main(DataNode.java:1573)
2012-04-11 12:27:29,617 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down DataNode at m1a2/139.147.5.55
************************************************************/

That error you are getting in the DN log is described here: http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-multi-node-cluster/#java-io-ioexception-incompatible-namespaceids
From that page:
At the moment, there seem to be two workarounds as described below.
Workaround 1: Start from scratch
I can testify that the following steps solve this error, but the side effects won’t make you happy (me neither). The crude workaround I have found is to:
Stop the cluster
Delete the data directory on the problematic DataNode: the directory is specified by dfs.data.dir in conf/hdfs-site.xml; if you followed this tutorial, the relevant directory is /app/hadoop/tmp/dfs/data
Reformat the NameNode (NOTE: all HDFS data is lost during this process!)
Restart the cluster
When deleting all the HDFS data and starting from scratch does not sound like a good idea (it might be ok during the initial setup/testing), you might give the second approach a try.
Workaround 2: Updating namespaceID of problematic DataNodes
Big thanks to Jared Stehler for the following suggestion. I have not tested it myself yet, but feel free to try it out and send me your feedback. This workaround is “minimally invasive” as you only have to edit one file on the problematic DataNodes:
Stop the DataNode
Edit the value of namespaceID in /current/VERSION to match the value of the current NameNode
Restart the DataNode
If you followed the instructions in my tutorials, the full path of the relevant files are:
NameNode: /app/hadoop/tmp/dfs/name/current/VERSION
DataNode: /app/hadoop/tmp/dfs/data/current/VERSION
(background: dfs.data.dir is by default set to
${hadoop.tmp.dir}/dfs/data, and we set hadoop.tmp.dir
in this tutorial to /app/hadoop/tmp).
If you wonder how the contents of VERSION look like, here’s one of mine:
# contents of /current/VERSION
namespaceID=393514426
storageID=DS-1706792599-10.10.10.1-50010-1204306713481
cTime=1215607609074
storageType=DATA_NODE
layoutVersion=-13

Okay, I post this once more:
In case someone needs this, for newer version of Hadoop (basically I am running 2.4.0)
In this case stop the cluster sbin/stop-all.sh
Then go to /etc/hadoop for config files.
In the file: hdfs-site.xml Look out for directory paths corresponding to dfs.namenode.name.dir dfs.namenode.data.dir
Delete both the directories recursively (rm -r).
Now format the namenode via bin/hadoop namenode -format
And finally sbin/start-all.sh
Hope this helps.

I had the same issue on pseudo node using hadoop1.1.2
So I ran bin/stop-all.sh to stop the cluster
then saw the configuration of my hadoop tmp directory in hdfs-site.xml
<name>hadoop.tmp.dir</name>
<value>/root/data/hdfstmp</value>
So I went into /root/data/hdfstmp and deleted all the files using command (you may loose ur data)
rm -rf *
and then format namenode again
bin/hadoop namenode -format
and then start the cluster using
bin/start-all.sh
Main reason is bin/hadoop namenode -format didn't remove the old data. So we have to delete it manually.

Do following steps:
1. bin/stop-all.sh
2. remove dfs/ and mapred/ folder of hadoop.tmp.dir in core-site.xml
3. bin/hadoop namenode -format
4. bin/start-all.sh
5. jps

Try formatting your datanode and restart it.

I have been using CDH4 as my version of hadoop and have been having trouble configuring it. Even after trying to reformat my namenode I was still receiving the error.
My VERSION file was located in
/var/lib/hadoop-hdfs/cache/{username}/dfs/data/current/VERSION
You can find the location of the HDFS cache directory by looking for the hadoop.tmp.dir property:
more /etc/hadoop/conf/hdfs-site.xml
I found that by doing
cd /var/lib/hadoop-hdfs/cache/
rm -rf *
and then reformatting the namenode I was finally able to fix the issue. Thanks to the first reply for helping me to figure out what folder I needed to bomb.

I tried with the approach 2 as suggested by Jared Stehler in the Chris Shain answer and i can confirm that after doing these changes , i was able to resolve the above mentioned problem.
I used the same version number for both the name and data VERSION file. Mean to say copied the version number from file VERSION inside (/app/hadoop/tmp/dfs/name/current) to VERSION inside (/app/hadoop/tmp/dfs/data/current) and it worked like charm
Cheers !

I encountered this issue when using an unmodified cloudera quickstart vm 4.4.0-1
For reference, the cloudera manager said my datanode was in good health, even though the error message in the DataStreamer stacktrace said no datanodes were running.
credit goes to workaround #2 from https://stackoverflow.com/a/10110369/249538 but i'll detail my specific experience using the cloudera quickstart vm.
Specifically, I did:
in this order, stop the services hue1, hive1, mapreduce1, hdfs1
via the cloudera manager http://localhost.localdomain:7180/cmf/services/status
found my VERSION files via:
sudo find / -name VERSION
i got:
/dfs/dn/current/BP-780931682-127.0.0.1-1381159027878/current/VERSION
/dfs/dn/current/VERSION
/dfs/nn/current/VERSION
/dfs/snn/current/VERSION
i checked the contents of those files, but they all had a matching namespaceID except one file was just totally missing it. so i added an entry to it.
then i restarted the services in reverse order via the cloudera manager.
now i can -put stuff onto hdfs.

In my case, I wrongly set one destination for dfs.name.dir and dfs.data.dir. The correct format is
<property>
<name>dfs.name.dir</name>
<value>/path/to/name</value>
</property>
<property>
<name>dfs.data.dir</name>
<value>/path/to/data</value>
</property>

I have the same problem with datanode missing
and i follow this step that worked for me
1.find the folder that datanode located in.
cd hadoop/hadoopdata/hdfs
2.look in the folder and you will see what file you have in hdfs
ls
3.delete the datanode folder because it is old version of datanode
rm -rf/datanode/*
4. you will get the new version after run the previous command
5. start new datanode
hadoop-daemon.sh start datanode
6. refresh the web services. You will see the lost node appears
my terminal

Related

Namenode not starting without formatting

I have a hadoop cluster setup manually in high availability with one primary namenode,one standby namenode and one datanode. I formatted the namenode in the initial startup process, but if all the servers shut down due to electricity outage, I have to restart all the services again such as journal node,zookeeper,namenode,datanode and failover daemons.
However, when I start failover daemons(dfszkfailover) on both the namenodes, the namenode stops on both of them. For it to start, I have to format the namenode(on primary namenode) and delete the temp files.
This is my local setup and have to integrate on production as well. I cannot keep formatting namenode in production environment as I will lose all the data.
I read some of the answers explaining to point the namenode and datanode directory(in hdfs-site.xml) to some specific folder other than /tmp. I am already pointing it in my home directory.
Please suggest some method so that namenode and failover daemons both start without formatting the namenode.
EDIT-- I am sharing the last 100 lines of my namenode nn1 logs
192.168.12.12:8485: Journal Storage Directory /tmp/hadoop/dfs/journalnode/ha-cluster not formatted
at org.apache.hadoop.hdfs.qjournal.server.Journal.checkFormatted(Journal.java:479)
at org.apache.hadoop.hdfs.qjournal.server.Journal.getLastPromisedEpoch(Journal.java:247)
at org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.getJournalState(JournalNodeRpcServer.java:139)
at org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.getJournalState(QJournalProtocolServerSideTranslatorPB.java:118)
at org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:25415)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:503)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:989)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:871)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:817)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1893)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2606)
at org.apache.hadoop.hdfs.qjournal.client.QuorumException.create(QuorumException.java:81)
at org.apache.hadoop.hdfs.qjournal.client.QuorumCall.rethrowException(QuorumCall.java:286)
at org.apache.hadoop.hdfs.qjournal.client.AsyncLoggerSet.waitForWriteQuorum(AsyncLoggerSet.java:142)
at org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager.createNewUniqueEpoch(QuorumJournalManager.java:180)
at org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager.recoverUnfinalizedSegments(QuorumJournalManager.java:438)
at org.apache.hadoop.hdfs.server.namenode.JournalSet$8.apply(JournalSet.java:624)
at org.apache.hadoop.hdfs.server.namenode.JournalSet.mapJournalsAndReportErrors(JournalSet.java:393)
at org.apache.hadoop.hdfs.server.namenode.JournalSet.recoverUnfinalizedSegments(JournalSet.java:621)
at org.apache.hadoop.hdfs.server.namenode.FSEditLog.recoverUnclosedStreams(FSEditLog.java:1521)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startActiveServices(FSNamesystem.java:1180)
at org.apache.hadoop.hdfs.server.namenode.NameNode$NameNodeHAContext.startActiveServices(NameNode.java:1919)
at org.apache.hadoop.hdfs.server.namenode.ha.ActiveState.enterState(ActiveState.java:61)
at org.apache.hadoop.hdfs.server.namenode.ha.HAState.setStateInternal(HAState.java:64)
at org.apache.hadoop.hdfs.server.namenode.ha.StandbyState.setState(StandbyState.java:49)
at org.apache.hadoop.hdfs.server.namenode.NameNode.transitionToActive(NameNode.java:1777)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.transitionToActive(NameNodeRpcServer.java:1649)
at org.apache.hadoop.ha.protocolPB.HAServiceProtocolServerSideTranslatorPB.transitionToActive(HAServiceProtocolServerSideTranslatorPB.java:107)
at org.apache.hadoop.ha.proto.HAServiceProtocolProtos$HAServiceProtocolService$2.callBlockingMethod(HAServiceProtocolProtos.java:4460)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:503)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:989)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:871)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:817)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1893)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2606)
2019-12-06 10:57:57,541 INFO org.apache.hadoop.util.ExitUtil: Exiting with status 1: Error: recoverUnfinalizedSegments failed for required journal (JournalAndStream(mgr=QJM to [192.168.12.10:8485, 192.168.12.11:8485, 192.168.12.12:8485], stream=null))
2019-12-06 10:57:57,546 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at nn1/192.168.12.10
************************************************************/

HDFS No valid image files found

we have a old hadoop cluster machine hadoop - version 2.6
all machines in the cluster are redhat version - 7.3
we have a problem to start the standby name-node on the last master machine
from the logs ( under /var/log/hadoop/hdf ) , we can see the error - No valid image files found
I am not sure about my solution , but is its mean that we need to delete the files - ( edits_inprogress_XXXXX ) under /hadoop/hdfs/journal/hdfsha/current , and then restart the standby name-node service ?
2018-01-24 16:10:27,826 ERROR namenode.NameNode (NameNode.java:main(1774)) - Failed to start namenode.
java.io.FileNotFoundException: No valid image files found
at org.apache.hadoop.hdfs.server.namenode.FSImageTransactionalStorageInspector.getLatestImages(FSImageTransactionalStorageInspector.java:165)
at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:618)
at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:289)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:1045)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:703)
at org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:688)
at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:752)
at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:992)
at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:976)
at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1701)
at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1769)
2018-01-24 16:10:27,829 INFO util.ExitUtil (ExitUtil.java:terminate(124)) - Exiting with status 1
2018-01-24 16:10:27,845 INFO namenode.NameNode (LogAdapter.java:info(47)) - SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at master03.sys573.com/102.14.22.29
************************************************************/
I also try with hadoop namenode -format
but its return with the following:
Could not format one or more JournalNodes. 1 exceptions thrown:
:8485: Directory /data/hadoop/hdfs/journal/hdfsha is in an inconsistent state: Can't format the storage directory because the current directory is not empty.
all output:
18/01/24 17:36:19 ERROR namenode.NameNode: Failed to start namenode.
org.apache.hadoop.hdfs.qjournal.client.QuorumException: Could not format one or more JournalNodes. 1 exceptions thrown:
:8485: Directory /data/hadoop/hdfs/journal/hdfsha is in an inconsistent state: Can't format the storage directory because the current directory is not empty.
at org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.checkEmptyCurrent(Storage.java:482)
at org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.analyzeStorage(Storage.java:558)
at org.apache.hadoop.hdfs.qjournal.server.JNStorage.format(JNStorage.java:185)
at org.apache.hadoop.hdfs.qjournal.server.Journal.format(Journal.java:217)
at org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.format(JournalNodeRpcServer.java:145)
at org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.format(QJournalProtocolServerSideTranslatorPB.java:145)
at org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:25419)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:640)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2351)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2347)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1866)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2345)

How to copy files into HDFS?

I am trying to start a hadoop single node cluster on my local machine. I have configured the following files according to https://amodernstory.com/2014/09/23/installing-hadoop-on-mac-osx-yosemite/: hadoop-env.sh, core-site.xml, mapred-site.xml and hdfs-site.xml. When I run the script start-dfs.sh and then the command jps (right after running start-dfs.sh) I see that the datanode is up and running:
15735 Jps
15548 DataNode
15660 SecondaryNameNode
15453 NameNode
A few seconds later, I re-run the command jps and I see that the datanode is not running. Why? How to resolve this?
After that I run the script start-yarn.sh and then the command jps. I see this:
15955 NodeManager
16011 Jps
15660 SecondaryNameNode
15453 NameNode
15854 ResourceManager
My ultimate aim is to copy files into the HDFS from my local filesystem. For doing so, I run the command hdfs dfs -copyFromLocal /source-file-path/filename /destination-file-path/. I get the following error:
17/07/10 17:09:00 WARN hdfs.DataStreamer: DataStreamer Exception
org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /pay/txnlinking/redshift.yml._COPYING_ could only be replicated to 0 nodes instead of minReplication (=1). There are 0 datanode(s) running and no node(s) are excluded in this operation.
at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:1733)
at org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.chooseTargetForNewBlock(FSDirWriteFileOp.java:265)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2496)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:828)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:506)
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:447)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:989)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:845)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:788)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1807)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2455)
at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1481)
at org.apache.hadoop.ipc.Client.call(Client.java:1427)
at org.apache.hadoop.ipc.Client.call(Client.java:1337)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:227)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116)
at com.sun.proxy.$Proxy10.addBlock(Unknown Source)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:440)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:398)
at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:163)
at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:155)
at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:335)
at com.sun.proxy.$Proxy11.addBlock(Unknown Source)
at org.apache.hadoop.hdfs.DataStreamer.locateFollowingBlock(DataStreamer.java:1733)
at org.apache.hadoop.hdfs.DataStreamer.nextBlockOutputStream(DataStreamer.java:1536)
at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:658)
copyFromLocal: File /pay/txnlinking/redshift.yml._COPYING_ could only be replicated to 0 nodes instead of minReplication (=1). There are 0 datanode(s) running and no node(s) are excluded in this operation.
How to avoid the above error and copy files into the HDFS?
P.S: I created the destination path folders in HDFS explicitly before doing the copy.

Do
hadoop namenode -format
then stop all services using
stop-all.sh
then restart all services using
start-all.sh
start-all.sh and stop-all.sh are deprecated use start-dfs.sh and stop-dfs.sh instead

First delete the contents of the hadoop.tmp.dir folder that you specified in core-site.xml. Then do a namenode format using hdfs namenode -format. Your datanode should be up and running properly after which all the copy operations will be successfully executed.

Datanode not starts correctly

I am trying to install Hadoop 2.2.0 in pseudo-distributed mode. While I am trying to start the datanode services it is showing the following error, can anyone please tell how to resolve this?
**2**014-03-11 08:48:15,916 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Block pool <registering> (storage id unknown) service to localhost/127.0.0.1:9000 starting to offer service
2014-03-11 08:48:15,922 INFO org.apache.hadoop.ipc.Server: IPC Server Responder: starting
2014-03-11 08:48:15,922 INFO org.apache.hadoop.ipc.Server: IPC Server listener on 50020: starting
2014-03-11 08:48:16,406 INFO org.apache.hadoop.hdfs.server.common.Storage: Lock on /home/prassanna/usr/local/hadoop/yarn_data/hdfs/datanode/in_use.lock acquired by nodename 3627#prassanna-Studio-1558
2014-03-11 08:48:16,426 FATAL org.apache.hadoop.hdfs.server.datanode.DataNode: Initialization failed for block pool Block pool BP-611836968-127.0.1.1-1394507838610 (storage id DS-1960076343-127.0.1.1-50010-1394127604582) service to localhost/127.0.0.1:9000
java.io.IOException: Incompatible clusterIDs in /home/prassanna/usr/local/hadoop/yarn_data/hdfs/datanode: namenode clusterID = CID-fb61aa70-4b15-470e-a1d0-12653e357a10; datanode clusterID = CID-8bf63244-0510-4db6-a949-8f74b50f2be9
at**** org.apache.hadoop.hdfs.server.datanode.DataStorage.doTransition(DataStorage.java:391)
at org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:191)
at org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:219)
at org.apache.hadoop.hdfs.server.datanode.DataNode.initStorage(DataNode.java:837)
at org.apache.hadoop.hdfs.server.datanode.DataNode.initBlockPool(DataNode.java:808)
at org.apache.hadoop.hdfs.server.datanode.BPOfferService.verifyAndSetNamespaceInfo(BPOfferService.java:280)
at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:222)
at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:664)
at java.lang.Thread.run(Thread.java:662)
2014-03-11 08:48:16,427 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Ending block pool service for: Block pool BP-611836968-127.0.1.1-1394507838610 (storage id DS-1960076343-127.0.1.1-50010-1394127604582) service to localhost/127.0.0.1:9000
2014-03-11 08:48:16,532 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Removed Block pool BP-611836968-127.0.1.1-1394507838610 (storage id DS-1960076343-127.0.1.1-50010-1394127604582)
2014-03-11 08:48:18,532 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Exiting Datanode
2014-03-11 08:48:18,534 INFO org.apache.hadoop.util.ExitUtil: Exiting with status 0
2014-03-11 08:48:18,536 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: SHUTDOWN_MSG:

You can do the following method,
copy to clipboard datanode clusterID for your example, CID-8bf63244-0510-4db6-a949-8f74b50f2be9
and run following command under HADOOP_HOME/bin directory
./hdfs namenode -format -clusterId CID-8bf63244-0510-4db6-a949-8f74b50f2be9
then this code formatted the namenode with datanode cluster ids.

You must do as follow :
bin/stop-all.sh
rm -Rf /home/prassanna/usr/local/hadoop/yarn_data/hdfs/*
bin/hadoop namenode -format
I had the same problem until I found an answer in this web site.

Whenever you are getting below error, trying to start a DN on a slave machine:
java.io.IOException: Incompatible clusterIDs in /home/hadoop/dfs/data: namenode clusterID= ****; datanode clusterID = ****
It is because after you set up your cluster, you, for whatever reason, decided to reformat
your NN. Your DNs on slaves still bear reference to the old NN.
To resolve this simply delete and recreate data folder on that machine in local Linux FS, namely /home/hadoop/dfs/data.
Restarting that DN's daemon on that machine will recreate data/ folder's content and resolve
the problem.

Do following simple steps
Clear the data directory of hadoop
Format the namenode again
start the cluster
After this your cluster will start normally if you are not having any other configuration issue

DataNode dies because of incompatible Clusterids compared to the NameNode. To fix this problem you need to delete the directory /tmp/hadoop-[user]/hdfs/data and restart hadoop.
rm -r /tmp/hadoop-[user]/hdfs/data

I got similar issue in my pseudo distributed environment. I stopped cluster first, then I copied Cluster ID from NameNode's version file and put it in DataNode's version file, then after restarting cluster, its all fine.
my data path is here /usr/local/hadoop/hadoop_store/hdfs/datanode and /usr/local/hadoop/hadoop_store/hdfs/namenode.
FYI : version file is under /usr/local/hadoop/hadoop_store/hdfs/datanode/current/ ; likewise for NameNode.

Here, the datanode gets stopped immediately because the clusterID of datanode and namenode are different. So you have to format the clusterID of namenode with clusterID of datanode
Copy the datanode clusterID for your example, CID-8bf63244-0510-4db6-a949-8f74b50f2be9 and run following command from your home directory. You can go to your home dir by just typing cd on your terminal.
From your home dir now type the command:
hdfs namenode -format -clusterId CID-8bf63244-0510-4db6-a949-8f74b50f2be9

Delete the namenode and datanode directories as specified in the core-site.xml.
After that create the new directories and restart the dfs and yarn.

I also had the similar issue.
I deleted namenode and datanode folders from all the nodes, and rerun:
$HADOOP_HOME/bin> hdfs namenode -format -force
$HADOOP_HOME/sbin> ./start-dfs.sh
$HADOOP_HOME/sbin> ./start-yarn.sh
To check the health report from command line (which I would recommend)
$HADOOP_HOME/bin> hdfs dfsadmin -report
and I got all the nodes working correctly.

I had same issue for hadoop 2.7.7
I removed the namenode/current & datanode/current directory on namenode and all the datanodes
Removed files at /tmp/hadoop-ubuntu/*
then format namenode & datanode
restart all the nodes.
things work fine
steps:
stop all nodes/managers then attempt below steps
rm -rf /tmp/hadoop-ubuntu/* (all nodes)
rm -r /usr/local/hadoop/data/hdfs/namenode/current (namenode: check hdfs-site.xml for path)
rm -r /usr/local/hadoop/data/hdfs/datanode/current (datanode:check hdfs-site.xml for path)
hdfs namenode -format (on namenode)
hdfs datanode -format (on namenode)
Reboot namenode & data nodes

There's been different solutions to this problem, but I tested another easy solution and it worked like a charm :
So if someone get the same error, you just need to change the clusterID in the datanodes with clusterID of the namenode in the VERSION file.
With your case, here's were you can change it on datanode side :
namenode clusterID = CID-fb61aa70-4b15-470e-a1d0-12653e357a10; datanode clusterID = CID-8bf63244-0510-4db6-a949-8f74b50f2be9
Backup the current VERSION : cp /home/prassanna/usr/local/hadoop/yarn_data/hdfs/datanode/current/VERSION /home/prassanna/usr/local/hadoop/yarn_data/hdfs/datanode/current/VERSION.BK
vim /home/prassanna/usr/local/hadoop/yarn_data/hdfs/datanode/current/VERSION and change
clusterID=CID-8bf63244-0510-4db6-a949-8f74b50f2be9
with
clusterID=CID-fb61aa70-4b15-470e-a1d0-12653e357a10
Restart the datanode and it should work.

Unable to identify the .xml/file to update the host after installing Hadoop

I encountered below mentioned error after installing Hadoop and executing hadoop namenode -format command.
Based on the displayed logs, I figured out that I need to update the "host" in the configuration. But, I am unable to find the exact location of the configuration file (.xml), which needs to be updated.
I am installing on Fedora on a single node. I am looking for your help in addressing this issue. Please point me to any specific link or documentation that could be helpful while debugging.
[hadoop#hadoop ~]$ hadoop namenode -format
Warning: $HADOOP_HOME is deprecated.
13/02/03 11:33:09 INFO namenode.NameNode: STARTUP_MSG:
STARTUP_MSG: Starting NameNode
STARTUP_MSG: host = java.net.UnknownHostException: hadoop: hadoop
13/02/03 11:34:27 INFO namenode.NameNode: SHUTDOWN_MSG: :
SHUTDOWN_MSG: Shutting down NameNode at java.net.UnknownHostException: hadoop: hadoop

Configuration files are in $HADOOP_HOME/etc/hadoop, where $HADOOP_HOME probably is /usr/local/hadoop. There can be several files with your hostname, you should check there.
Also, you should check /etc/hosts.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

No data nodes are started - hadoop

Do following steps: 1. bin/stop-all.sh 2. remove dfs/ and mapred/ folder of hadoop.tmp.dir in core-site.xml 3. bin/hadoop namenode -format 4. bin/start-all.sh 5. jps

Try formatting your datanode and restart it.

In my case, I wrongly set one destination for dfs.name.dir and dfs.data.dir. The correct format is <property> <name>dfs.name.dir</name> <value>/path/to/name</value> </property> <property> <name>dfs.data.dir</name> <value>/path/to/data</value> </property>

Related

Namenode not starting without formatting

HDFS No valid image files found

How to copy files into HDFS?

Datanode not starts correctly

Unable to identify the .xml/file to update the host after installing Hadoop

Categories

Resources