Namenode not starting without formatting - hadoop

I have a hadoop cluster setup manually in high availability with one primary namenode,one standby namenode and one datanode. I formatted the namenode in the initial startup process, but if all the servers shut down due to electricity outage, I have to restart all the services again such as journal node,zookeeper,namenode,datanode and failover daemons.
However, when I start failover daemons(dfszkfailover) on both the namenodes, the namenode stops on both of them. For it to start, I have to format the namenode(on primary namenode) and delete the temp files.
This is my local setup and have to integrate on production as well. I cannot keep formatting namenode in production environment as I will lose all the data.
I read some of the answers explaining to point the namenode and datanode directory(in hdfs-site.xml) to some specific folder other than /tmp. I am already pointing it in my home directory.
Please suggest some method so that namenode and failover daemons both start without formatting the namenode.
EDIT-- I am sharing the last 100 lines of my namenode nn1 logs
192.168.12.12:8485: Journal Storage Directory /tmp/hadoop/dfs/journalnode/ha-cluster not formatted
at org.apache.hadoop.hdfs.qjournal.server.Journal.checkFormatted(Journal.java:479)
at org.apache.hadoop.hdfs.qjournal.server.Journal.getLastPromisedEpoch(Journal.java:247)
at org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.getJournalState(JournalNodeRpcServer.java:139)
at org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.getJournalState(QJournalProtocolServerSideTranslatorPB.java:118)
at org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:25415)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:503)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:989)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:871)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:817)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1893)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2606)
at org.apache.hadoop.hdfs.qjournal.client.QuorumException.create(QuorumException.java:81)
at org.apache.hadoop.hdfs.qjournal.client.QuorumCall.rethrowException(QuorumCall.java:286)
at org.apache.hadoop.hdfs.qjournal.client.AsyncLoggerSet.waitForWriteQuorum(AsyncLoggerSet.java:142)
at org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager.createNewUniqueEpoch(QuorumJournalManager.java:180)
at org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager.recoverUnfinalizedSegments(QuorumJournalManager.java:438)
at org.apache.hadoop.hdfs.server.namenode.JournalSet$8.apply(JournalSet.java:624)
at org.apache.hadoop.hdfs.server.namenode.JournalSet.mapJournalsAndReportErrors(JournalSet.java:393)
at org.apache.hadoop.hdfs.server.namenode.JournalSet.recoverUnfinalizedSegments(JournalSet.java:621)
at org.apache.hadoop.hdfs.server.namenode.FSEditLog.recoverUnclosedStreams(FSEditLog.java:1521)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startActiveServices(FSNamesystem.java:1180)
at org.apache.hadoop.hdfs.server.namenode.NameNode$NameNodeHAContext.startActiveServices(NameNode.java:1919)
at org.apache.hadoop.hdfs.server.namenode.ha.ActiveState.enterState(ActiveState.java:61)
at org.apache.hadoop.hdfs.server.namenode.ha.HAState.setStateInternal(HAState.java:64)
at org.apache.hadoop.hdfs.server.namenode.ha.StandbyState.setState(StandbyState.java:49)
at org.apache.hadoop.hdfs.server.namenode.NameNode.transitionToActive(NameNode.java:1777)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.transitionToActive(NameNodeRpcServer.java:1649)
at org.apache.hadoop.ha.protocolPB.HAServiceProtocolServerSideTranslatorPB.transitionToActive(HAServiceProtocolServerSideTranslatorPB.java:107)
at org.apache.hadoop.ha.proto.HAServiceProtocolProtos$HAServiceProtocolService$2.callBlockingMethod(HAServiceProtocolProtos.java:4460)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:503)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:989)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:871)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:817)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1893)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2606)
2019-12-06 10:57:57,541 INFO org.apache.hadoop.util.ExitUtil: Exiting with status 1: Error: recoverUnfinalizedSegments failed for required journal (JournalAndStream(mgr=QJM to [192.168.12.10:8485, 192.168.12.11:8485, 192.168.12.12:8485], stream=null))
2019-12-06 10:57:57,546 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at nn1/192.168.12.10
************************************************************/

Related

hdfs put fails from laptop to remote hadoop cluster

I have my hadoop cluster set up on a different network. Because of this, hdfs put is failing when I run it from my laptop.
Is there a port I should forward or something to access the datanodes remotely? I see it's using the local ip address in the error message.
Here is the command: hdfs dfs -put ~/Documents/reddit-streaming/redditStreaming/target/redditStreaming-1.0-SNAPSHOT.jar hdfs://mydns.asuscomm.com:8021/user/me/jars/
and here is the error message:
2021-10-14 18:04:55,704 WARN hdfs.DataStreamer: Exception in createBlockOutputStream blk_1073742036_1212
java.net.UnknownHostException
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:591)
at org.apache.hadoop.hdfs.DataStreamer.createSocketForPipeline(DataStreamer.java:253)
at org.apache.hadoop.hdfs.DataStreamer.createBlockOutputStream(DataStreamer.java:1757)
at org.apache.hadoop.hdfs.DataStreamer.nextBlockOutputStream(DataStreamer.java:1711)
at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:707)
2021-10-14 18:04:55,708 WARN hdfs.DataStreamer: Abandoning BP-668799564-192.168.50.7-1633461871664:blk_1073742036_1212
2021-10-14 18:04:55,752 WARN hdfs.DataStreamer: Excluding datanode DatanodeInfoWithStorage[192.168.50.31:9866,DS-60974173-31d6-4dcb-a2ba-05ab6431db66,DISK]
2021-10-14 18:05:00,801 WARN hdfs.DataStreamer: Exception in createBlockOutputStream blk_1073742037_1213
java.net.UnknownHostException
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:591)
at org.apache.hadoop.hdfs.DataStreamer.createSocketForPipeline(DataStreamer.java:253)
at org.apache.hadoop.hdfs.DataStreamer.createBlockOutputStream(DataStreamer.java:1757)
at org.apache.hadoop.hdfs.DataStreamer.nextBlockOutputStream(DataStreamer.java:1711)
at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:707)
2021-10-14 18:05:00,801 WARN hdfs.DataStreamer: Abandoning BP-668799564-192.168.50.7-1633461871664:blk_1073742037_1213
2021-10-14 18:05:00,833 WARN hdfs.DataStreamer: Excluding datanode DatanodeInfoWithStorage[192.168.50.19:9866,DS-aeaca5a1-562c-4f35-b2fb-6f0b51c5f695,DISK]
2021-10-14 18:05:00,869 WARN hdfs.DataStreamer: DataStreamer Exception
org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /user/me/jars/redditStreaming-1.0-SNAPSHOT.jar._COPYING_ could only be written to 0 of the 1 minReplication nodes. There are 2 datanode(s) running and 2 node(s) are excluded in this operation.
at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:2329)
at org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.chooseTargetForNewBlock(FSDirWriteFileOp.java:294)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2942)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:915)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:593)
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:600)
at org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:568)
at org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:552)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1093)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1035)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:963)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1878)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2966)
at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1573)
at org.apache.hadoop.ipc.Client.call(Client.java:1519)
at org.apache.hadoop.ipc.Client.call(Client.java:1416)
at org.apache.hadoop.ipc.ProtobufRpcEngine2$Invoker.invoke(ProtobufRpcEngine2.java:242)
at org.apache.hadoop.ipc.ProtobufRpcEngine2$Invoker.invoke(ProtobufRpcEngine2.java:129)
at com.sun.proxy.$Proxy9.addBlock(Unknown Source)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:530)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:422)
at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:165)
at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:157)
at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:359)
at com.sun.proxy.$Proxy10.addBlock(Unknown Source)
at org.apache.hadoop.hdfs.DFSOutputStream.addBlock(DFSOutputStream.java:1084)
at org.apache.hadoop.hdfs.DataStreamer.locateFollowingBlock(DataStreamer.java:1898)
at org.apache.hadoop.hdfs.DataStreamer.nextBlockOutputStream(DataStreamer.java:1700)
at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:707)
I have this property in my hdfs-site.xml file on my laptop:
<property>
<name>dfs.client.use.datanode.hostname</name>
<value>true</value>
</property>
I can also see in the UI that both datanodes are running.
I assume you've forwarded the namenode port (8021) since it can see that 2 datanodes exist?
Yes, the datanodes have their own ports that need to be available to the client for data to actually be written
Check the value for dfs.datanode.address and make sure you can establish a connection to the port listed there for each datanode.
If you look at the error, you can see this is 9866
Excluding datanode DatanodeInfoWithStorage[192.168.50.31:9866
And also, IIUC, the use.datanode.hostname config needs to actually be in the cluster, not your local laptop config, for the protocol to return the hostnames rather than the IPs
There is also an HTTP Port you can open if you want to see each Datanode's web-portal (should be available to be accessed from the Namenode UI as well)
The alternative, more secure / less exposed, option is to establish an edge-node between the networks that you can only SSH to & SFTP files into (assuming you don't otherwise have a shared fileserver), then run your hdfs commands from there. You can setup a SOCKS proxy if you needed to access a Web UI in that network
To re-iterate, you should not expose a Hadoop cluster without Kerberos & TLS over dynamic DNS through any internet-facing router

How to copy files into HDFS?

I am trying to start a hadoop single node cluster on my local machine. I have configured the following files according to https://amodernstory.com/2014/09/23/installing-hadoop-on-mac-osx-yosemite/: hadoop-env.sh, core-site.xml, mapred-site.xml and hdfs-site.xml. When I run the script start-dfs.sh and then the command jps (right after running start-dfs.sh) I see that the datanode is up and running:
15735 Jps
15548 DataNode
15660 SecondaryNameNode
15453 NameNode
A few seconds later, I re-run the command jps and I see that the datanode is not running. Why? How to resolve this?
After that I run the script start-yarn.sh and then the command jps. I see this:
15955 NodeManager
16011 Jps
15660 SecondaryNameNode
15453 NameNode
15854 ResourceManager
My ultimate aim is to copy files into the HDFS from my local filesystem. For doing so, I run the command hdfs dfs -copyFromLocal /source-file-path/filename /destination-file-path/. I get the following error:
17/07/10 17:09:00 WARN hdfs.DataStreamer: DataStreamer Exception
org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /pay/txnlinking/redshift.yml._COPYING_ could only be replicated to 0 nodes instead of minReplication (=1). There are 0 datanode(s) running and no node(s) are excluded in this operation.
at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:1733)
at org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.chooseTargetForNewBlock(FSDirWriteFileOp.java:265)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2496)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:828)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:506)
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:447)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:989)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:845)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:788)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1807)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2455)
at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1481)
at org.apache.hadoop.ipc.Client.call(Client.java:1427)
at org.apache.hadoop.ipc.Client.call(Client.java:1337)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:227)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116)
at com.sun.proxy.$Proxy10.addBlock(Unknown Source)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:440)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:398)
at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:163)
at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:155)
at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:335)
at com.sun.proxy.$Proxy11.addBlock(Unknown Source)
at org.apache.hadoop.hdfs.DataStreamer.locateFollowingBlock(DataStreamer.java:1733)
at org.apache.hadoop.hdfs.DataStreamer.nextBlockOutputStream(DataStreamer.java:1536)
at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:658)
copyFromLocal: File /pay/txnlinking/redshift.yml._COPYING_ could only be replicated to 0 nodes instead of minReplication (=1). There are 0 datanode(s) running and no node(s) are excluded in this operation.
How to avoid the above error and copy files into the HDFS?
P.S: I created the destination path folders in HDFS explicitly before doing the copy.
Do
hadoop namenode -format
then stop all services using
stop-all.sh
then restart all services using
start-all.sh
start-all.sh and stop-all.sh are deprecated use start-dfs.sh and stop-dfs.sh instead
First delete the contents of the hadoop.tmp.dir folder that you specified in core-site.xml. Then do a namenode format using hdfs namenode -format. Your datanode should be up and running properly after which all the copy operations will be successfully executed.

Hadoop Name node is not getting started

I'm trying to configure Hadoop in fully distributed mode with 1 master and 1 slave as different nodes. I have attached a screenshot showing the status of my master and slave nodes.
In Master:
ubuntu#hadoop-master:/usr/local/hadoop/etc/hadoop$ $HADOOP_HOME/bin/hdfs dfsadmin -refreshNodes
refreshNodes: Failed on local exception: com.google.protobuf.InvalidProtocolBufferException: Protocol message tag had invalid wire type.; Host Details : local host is: "hadoop-master/127.0.0.1"; destination host is: "hadoop-master":8020;
This is the error I'm getting when I try to run the refresh nodes command. Can anyone tell me what I'm missing or what mistake have I done ?
Master & Slave Screenshot
2016-04-26 03:29:17,090 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Stopping services started for standby state
2016-04-26 03:29:17,095 INFO org.mortbay.log: Stopped HttpServer2$SelectChannelConnectorWithSafeStartup#0.0.0.0:50070
2016-04-26 03:29:17,095 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Stopping NameNode metrics system...
2016-04-26 03:29:17,095 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: NameNode metrics system stopped.
2016-04-26 03:29:17,096 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: NameNode metrics system shutdown complete.
2016-04-26 03:29:17,097 ERROR org.apache.hadoop.hdfs.server.namenode.NameNode: Failed to start namenode.
java.net.BindException: Problem binding to [hadoop-master:8020] java.net.BindException: Address already in use; For more details see: http://wiki.apache.org/hadoop/BindException
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:792)
at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:721)
at org.apache.hadoop.ipc.Server.bind(Server.java:425)
at org.apache.hadoop.ipc.Server$Listener.(Server.java:574)
at org.apache.hadoop.ipc.Server.(Server.java:2215)
at org.apache.hadoop.ipc.RPC$Server.(RPC.java:938)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server.(ProtobufRpcEngine.java:534)
at org.apache.hadoop.ipc.ProtobufRpcEngine.getServer(ProtobufRpcEngine.java:509)
at org.apache.hadoop.ipc.RPC$Builder.build(RPC.java:783)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.(NameNodeRpcServer.java:344)
at org.apache.hadoop.hdfs.server.namenode.NameNode.createRpcServer(NameNode.java:673)
at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:646)
at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:811)
at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:795)
at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1488)
at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1554)
Caused by: java.net.BindException: Address already in use
at sun.nio.ch.Net.bind0(Native Method)
at sun.nio.ch.Net.bind(Net.java:463)
at sun.nio.ch.Net.bind(Net.java:455)
at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223)
at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74)
at org.apache.hadoop.ipc.Server.bind(Server.java:408)
... 13 more
2016-04-26 03:29:17,103 INFO org.apache.hadoop.util.ExitUtil: Exiting with status 1
2016-04-26 03:29:17,109 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at hadoop-master/127.0.0.1
************************************************************/
ubuntu#hadoop-master:/usr/local/hadoop$
DFS needs to be formatted. Just issue the following command ;
hadoop namenode -format
Or
hdfs namenode -format
Check your namenode address in core-site.xml. Change to 50070 or 9000 and try
The default address of namenode web UI is http://localhost:50070/. You can open this address in your browser and check the namenode information.
The default address of namenode server is hdfs://localhost:8020/. You can connect to it to access HDFS by HDFS api. The is the real service address.
http://blog.cloudera.com/blog/2009/08/hadoop-default-ports-quick-reference/
Try to format the Namenode.
$] hadoop namenode -format
Your error logs clearly say that It is not able to bind the default port.
java.net.BindException: Problem binding to [hadoop-master:8020] java.net.BindException: Address already in use; For more details see: http://wiki.apache.org/hadoop/BindException
You need to change the default port to some port which is free.
Here is the list of ports given in hdfs-default.xml and here.

Namenode starting but Datanode not starting

I get the following exception when I start the distributed file system. I am using hadoop 2.6.0.
2015-08-26 23:10:58,222 FATAL datanode.DataNode (DataNode.java:secureMain(2385)) - Exception in secureMain
java.net.UnknownHostException: IM1948-X0: IM1948-X0
at java.net.InetAddress.getLocalHost(InetAddress.java:1475)
at org.apache.hadoop.security.SecurityUtil.getLocalHostName(SecurityUtil.java:187)
at org.apache.hadoop.security.SecurityUtil.login(SecurityUtil.java:207)
at org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:2153)
at org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:2202)
at org.apache.hadoop.hdfs.server.datanode.DataNode.secureMain(DataNode.java:2378)
at org.apache.hadoop.hdfs.server.datanode.DataNode.main(DataNode.java:2402)
Caused by: java.net.UnknownHostException: IM1948-X0
at java.net.Inet4AddressImpl.lookupAllHostAddr(Native Method)
at java.net.InetAddress$1.lookupAllHostAddr(InetAddress.java:901)
at java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1295)
at java.net.InetAddress.getLocalHost(InetAddress.java:1471)
2015-08-26 23:10:58,227 INFO util.ExitUtil (ExitUtil.java:terminate(124)) - Exiting with status 1
2015-08-26 23:10:58,229 INFO datanode.DataNode (StringUtils.java:run(659)) - SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down DataNode at java.net.UnknownHostException: IM1948-X0: IM1948-X0
Even deleting the hadoop/hdfs-data/current directory doesnot help; I tried formatting the namenode but without success. This generally happens to me when I restart hadoop.
Basically to sum up, datanode process is not running at all for the hadoop cluster.

No data nodes are started

I am trying to setup Hadoop version 0.20.203.0 in a pseudo distributed configuration using the following guide:
http://www.javacodegeeks.com/2012/01/hadoop-modes-explained-standalone.html
After running the start-all.sh script I run "jps".
I get this output:
4825 NameNode
5391 TaskTracker
5242 JobTracker
5477 Jps
5140 SecondaryNameNode
When I try to add information to the hdfs using:
bin/hadoop fs -put conf input
I got an error:
hadoop#m1a2:~/software/hadoop$ bin/hadoop fs -put conf input
12/04/10 18:15:31 WARN hdfs.DFSClient: DataStreamer Exception: org.apache.hadoop.ipc.RemoteException: java.io.IOException: File /user/hadoop/input/core-site.xml could only be replicated to 0 nodes, instead of 1
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1417)
at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:596)
at sun.reflect.GeneratedMethodAccessor6.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:616)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:523)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1383)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1379)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:416)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1377)
at org.apache.hadoop.ipc.Client.call(Client.java:1030)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:224)
at $Proxy1.addBlock(Unknown Source)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:616)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
at $Proxy1.addBlock(Unknown Source)
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:3104)
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2975)
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:2255)
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2446)
12/04/10 18:15:31 WARN hdfs.DFSClient: Error Recovery for block null bad datanode[0] nodes == null
12/04/10 18:15:31 WARN hdfs.DFSClient: Could not get block locations. Source file "/user/hadoop/input/core-site.xml" - Aborting...
put: java.io.IOException: File /user/hadoop/input/core-site.xml could only be replicated to 0 nodes, instead of 1
12/04/10 18:15:31 ERROR hdfs.DFSClient: Exception closing file /user/hadoop/input/core-site.xml : org.apache.hadoop.ipc.RemoteException: java.io.IOException: File /user/hadoop/input/core-site.xml could only be replicated to 0 nodes, instead of 1
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1417)
at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:596)
at sun.reflect.GeneratedMethodAccessor6.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:616)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:523)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1383)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1379)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:416)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1377)
org.apache.hadoop.ipc.RemoteException: java.io.IOException: File /user/hadoop/input/core-site.xml could only be replicated to 0 nodes, instead of 1
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1417)
at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:596)
at sun.reflect.GeneratedMethodAccessor6.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:616)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:523)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1383)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1379)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:416)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1377)
at org.apache.hadoop.ipc.Client.call(Client.java:1030)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:224)
at $Proxy1.addBlock(Unknown Source)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:616)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
at $Proxy1.addBlock(Unknown Source)
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:3104)
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2975)
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:2255)
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2446)
I am not totally sure but I believe that this may have to do with the fact that the datanode is not running.
Does anybody know what I have done wrong, or how to fix this problem?
EDIT: This is the datanode.log file:
2012-04-11 12:27:28,977 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting DataNode
STARTUP_MSG: host = m1a2/139.147.5.55
STARTUP_MSG: args = []
STARTUP_MSG: version = 0.20.203.0
STARTUP_MSG: build = http://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20-security-203 -r 1099333; compiled by 'oom' on Wed May 4 07:57:50 PDT 2011
************************************************************/
2012-04-11 12:27:29,166 INFO org.apache.hadoop.metrics2.impl.MetricsConfig: loaded properties from hadoop-metrics2.properties
2012-04-11 12:27:29,181 INFO org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source MetricsSystem,sub=Stats registered.
2012-04-11 12:27:29,183 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled snapshot period at 10 second(s).
2012-04-11 12:27:29,183 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: DataNode metrics system started
2012-04-11 12:27:29,342 INFO org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source ugi registered.
2012-04-11 12:27:29,347 WARN org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Source name ugi already exists!
2012-04-11 12:27:29,615 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: java.io.IOException: Incompatible namespaceIDs in /tmp/hadoop-hadoop/dfs/data: namenode namespaceID = 301052954; datanode namespaceID = 229562149
at org.apache.hadoop.hdfs.server.datanode.DataStorage.doTransition(DataStorage.java:232)
at org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:147)
at org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNode.java:354)
at org.apache.hadoop.hdfs.server.datanode.DataNode.<init>(DataNode.java:268)
at org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:1480)
at org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:1419)
at org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:1437)
at org.apache.hadoop.hdfs.server.datanode.DataNode.secureMain(DataNode.java:1563)
at org.apache.hadoop.hdfs.server.datanode.DataNode.main(DataNode.java:1573)
2012-04-11 12:27:29,617 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down DataNode at m1a2/139.147.5.55
************************************************************/
That error you are getting in the DN log is described here: http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-multi-node-cluster/#java-io-ioexception-incompatible-namespaceids
From that page:
At the moment, there seem to be two workarounds as described below.
Workaround 1: Start from scratch
I can testify that the following steps solve this error, but the side effects won’t make you happy (me neither). The crude workaround I have found is to:
Stop the cluster
Delete the data directory on the problematic DataNode: the directory is specified by dfs.data.dir in conf/hdfs-site.xml; if you followed this tutorial, the relevant directory is /app/hadoop/tmp/dfs/data
Reformat the NameNode (NOTE: all HDFS data is lost during this process!)
Restart the cluster
When deleting all the HDFS data and starting from scratch does not sound like a good idea (it might be ok during the initial setup/testing), you might give the second approach a try.
Workaround 2: Updating namespaceID of problematic DataNodes
Big thanks to Jared Stehler for the following suggestion. I have not tested it myself yet, but feel free to try it out and send me your feedback. This workaround is “minimally invasive” as you only have to edit one file on the problematic DataNodes:
Stop the DataNode
Edit the value of namespaceID in /current/VERSION to match the value of the current NameNode
Restart the DataNode
If you followed the instructions in my tutorials, the full path of the relevant files are:
NameNode: /app/hadoop/tmp/dfs/name/current/VERSION
DataNode: /app/hadoop/tmp/dfs/data/current/VERSION
(background: dfs.data.dir is by default set to
${hadoop.tmp.dir}/dfs/data, and we set hadoop.tmp.dir
in this tutorial to /app/hadoop/tmp).
If you wonder how the contents of VERSION look like, here’s one of mine:
# contents of /current/VERSION
namespaceID=393514426
storageID=DS-1706792599-10.10.10.1-50010-1204306713481
cTime=1215607609074
storageType=DATA_NODE
layoutVersion=-13
Okay, I post this once more:
In case someone needs this, for newer version of Hadoop (basically I am running 2.4.0)
In this case stop the cluster sbin/stop-all.sh
Then go to /etc/hadoop for config files.
In the file: hdfs-site.xml Look out for directory paths corresponding to dfs.namenode.name.dir dfs.namenode.data.dir
Delete both the directories recursively (rm -r).
Now format the namenode via bin/hadoop namenode -format
And finally sbin/start-all.sh
Hope this helps.
I had the same issue on pseudo node using hadoop1.1.2
So I ran bin/stop-all.sh to stop the cluster
then saw the configuration of my hadoop tmp directory in hdfs-site.xml
<name>hadoop.tmp.dir</name>
<value>/root/data/hdfstmp</value>
So I went into /root/data/hdfstmp and deleted all the files using command (you may loose ur data)
rm -rf *
and then format namenode again
bin/hadoop namenode -format
and then start the cluster using
bin/start-all.sh
Main reason is bin/hadoop namenode -format didn't remove the old data. So we have to delete it manually.
Do following steps:
1. bin/stop-all.sh
2. remove dfs/ and mapred/ folder of hadoop.tmp.dir in core-site.xml
3. bin/hadoop namenode -format
4. bin/start-all.sh
5. jps
Try formatting your datanode and restart it.
I have been using CDH4 as my version of hadoop and have been having trouble configuring it. Even after trying to reformat my namenode I was still receiving the error.
My VERSION file was located in
/var/lib/hadoop-hdfs/cache/{username}/dfs/data/current/VERSION
You can find the location of the HDFS cache directory by looking for the hadoop.tmp.dir property:
more /etc/hadoop/conf/hdfs-site.xml
I found that by doing
cd /var/lib/hadoop-hdfs/cache/
rm -rf *
and then reformatting the namenode I was finally able to fix the issue. Thanks to the first reply for helping me to figure out what folder I needed to bomb.
I tried with the approach 2 as suggested by Jared Stehler in the Chris Shain answer and i can confirm that after doing these changes , i was able to resolve the above mentioned problem.
I used the same version number for both the name and data VERSION file. Mean to say copied the version number from file VERSION inside (/app/hadoop/tmp/dfs/name/current) to VERSION inside (/app/hadoop/tmp/dfs/data/current) and it worked like charm
Cheers !
I encountered this issue when using an unmodified cloudera quickstart vm 4.4.0-1
For reference, the cloudera manager said my datanode was in good health, even though the error message in the DataStreamer stacktrace said no datanodes were running.
credit goes to workaround #2 from https://stackoverflow.com/a/10110369/249538 but i'll detail my specific experience using the cloudera quickstart vm.
Specifically, I did:
in this order, stop the services hue1, hive1, mapreduce1, hdfs1
via the cloudera manager http://localhost.localdomain:7180/cmf/services/status
found my VERSION files via:
sudo find / -name VERSION
i got:
/dfs/dn/current/BP-780931682-127.0.0.1-1381159027878/current/VERSION
/dfs/dn/current/VERSION
/dfs/nn/current/VERSION
/dfs/snn/current/VERSION
i checked the contents of those files, but they all had a matching namespaceID except one file was just totally missing it. so i added an entry to it.
then i restarted the services in reverse order via the cloudera manager.
now i can -put stuff onto hdfs.
In my case, I wrongly set one destination for dfs.name.dir and dfs.data.dir. The correct format is
<property>
<name>dfs.name.dir</name>
<value>/path/to/name</value>
</property>
<property>
<name>dfs.data.dir</name>
<value>/path/to/data</value>
</property>
I have the same problem with datanode missing
and i follow this step that worked for me
1.find the folder that datanode located in.
cd hadoop/hadoopdata/hdfs
2.look in the folder and you will see what file you have in hdfs
ls
3.delete the datanode folder because it is old version of datanode
rm -rf/datanode/*
4. you will get the new version after run the previous command
5. start new datanode
hadoop-daemon.sh start datanode
6. refresh the web services. You will see the lost node appears
my terminal

Resources