Hadoop Name node is not getting started - hadoop

I'm trying to configure Hadoop in fully distributed mode with 1 master and 1 slave as different nodes. I have attached a screenshot showing the status of my master and slave nodes.
In Master:
ubuntu#hadoop-master:/usr/local/hadoop/etc/hadoop$ $HADOOP_HOME/bin/hdfs dfsadmin -refreshNodes
refreshNodes: Failed on local exception: com.google.protobuf.InvalidProtocolBufferException: Protocol message tag had invalid wire type.; Host Details : local host is: "hadoop-master/127.0.0.1"; destination host is: "hadoop-master":8020;
This is the error I'm getting when I try to run the refresh nodes command. Can anyone tell me what I'm missing or what mistake have I done ?
Master & Slave Screenshot
2016-04-26 03:29:17,090 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Stopping services started for standby state
2016-04-26 03:29:17,095 INFO org.mortbay.log: Stopped HttpServer2$SelectChannelConnectorWithSafeStartup#0.0.0.0:50070
2016-04-26 03:29:17,095 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Stopping NameNode metrics system...
2016-04-26 03:29:17,095 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: NameNode metrics system stopped.
2016-04-26 03:29:17,096 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: NameNode metrics system shutdown complete.
2016-04-26 03:29:17,097 ERROR org.apache.hadoop.hdfs.server.namenode.NameNode: Failed to start namenode.
java.net.BindException: Problem binding to [hadoop-master:8020] java.net.BindException: Address already in use; For more details see: http://wiki.apache.org/hadoop/BindException
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:792)
at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:721)
at org.apache.hadoop.ipc.Server.bind(Server.java:425)
at org.apache.hadoop.ipc.Server$Listener.(Server.java:574)
at org.apache.hadoop.ipc.Server.(Server.java:2215)
at org.apache.hadoop.ipc.RPC$Server.(RPC.java:938)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server.(ProtobufRpcEngine.java:534)
at org.apache.hadoop.ipc.ProtobufRpcEngine.getServer(ProtobufRpcEngine.java:509)
at org.apache.hadoop.ipc.RPC$Builder.build(RPC.java:783)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.(NameNodeRpcServer.java:344)
at org.apache.hadoop.hdfs.server.namenode.NameNode.createRpcServer(NameNode.java:673)
at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:646)
at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:811)
at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:795)
at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1488)
at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1554)
Caused by: java.net.BindException: Address already in use
at sun.nio.ch.Net.bind0(Native Method)
at sun.nio.ch.Net.bind(Net.java:463)
at sun.nio.ch.Net.bind(Net.java:455)
at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223)
at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74)
at org.apache.hadoop.ipc.Server.bind(Server.java:408)
... 13 more
2016-04-26 03:29:17,103 INFO org.apache.hadoop.util.ExitUtil: Exiting with status 1
2016-04-26 03:29:17,109 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at hadoop-master/127.0.0.1
************************************************************/
ubuntu#hadoop-master:/usr/local/hadoop$

DFS needs to be formatted. Just issue the following command ;
hadoop namenode -format
Or
hdfs namenode -format

Check your namenode address in core-site.xml. Change to 50070 or 9000 and try
The default address of namenode web UI is http://localhost:50070/. You can open this address in your browser and check the namenode information.
The default address of namenode server is hdfs://localhost:8020/. You can connect to it to access HDFS by HDFS api. The is the real service address.
http://blog.cloudera.com/blog/2009/08/hadoop-default-ports-quick-reference/

Try to format the Namenode.
$] hadoop namenode -format

Your error logs clearly say that It is not able to bind the default port.
java.net.BindException: Problem binding to [hadoop-master:8020] java.net.BindException: Address already in use; For more details see: http://wiki.apache.org/hadoop/BindException
You need to change the default port to some port which is free.
Here is the list of ports given in hdfs-default.xml and here.

Related

Namenode not starting without formatting

I have a hadoop cluster setup manually in high availability with one primary namenode,one standby namenode and one datanode. I formatted the namenode in the initial startup process, but if all the servers shut down due to electricity outage, I have to restart all the services again such as journal node,zookeeper,namenode,datanode and failover daemons.
However, when I start failover daemons(dfszkfailover) on both the namenodes, the namenode stops on both of them. For it to start, I have to format the namenode(on primary namenode) and delete the temp files.
This is my local setup and have to integrate on production as well. I cannot keep formatting namenode in production environment as I will lose all the data.
I read some of the answers explaining to point the namenode and datanode directory(in hdfs-site.xml) to some specific folder other than /tmp. I am already pointing it in my home directory.
Please suggest some method so that namenode and failover daemons both start without formatting the namenode.
EDIT-- I am sharing the last 100 lines of my namenode nn1 logs
192.168.12.12:8485: Journal Storage Directory /tmp/hadoop/dfs/journalnode/ha-cluster not formatted
at org.apache.hadoop.hdfs.qjournal.server.Journal.checkFormatted(Journal.java:479)
at org.apache.hadoop.hdfs.qjournal.server.Journal.getLastPromisedEpoch(Journal.java:247)
at org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.getJournalState(JournalNodeRpcServer.java:139)
at org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.getJournalState(QJournalProtocolServerSideTranslatorPB.java:118)
at org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:25415)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:503)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:989)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:871)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:817)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1893)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2606)
at org.apache.hadoop.hdfs.qjournal.client.QuorumException.create(QuorumException.java:81)
at org.apache.hadoop.hdfs.qjournal.client.QuorumCall.rethrowException(QuorumCall.java:286)
at org.apache.hadoop.hdfs.qjournal.client.AsyncLoggerSet.waitForWriteQuorum(AsyncLoggerSet.java:142)
at org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager.createNewUniqueEpoch(QuorumJournalManager.java:180)
at org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager.recoverUnfinalizedSegments(QuorumJournalManager.java:438)
at org.apache.hadoop.hdfs.server.namenode.JournalSet$8.apply(JournalSet.java:624)
at org.apache.hadoop.hdfs.server.namenode.JournalSet.mapJournalsAndReportErrors(JournalSet.java:393)
at org.apache.hadoop.hdfs.server.namenode.JournalSet.recoverUnfinalizedSegments(JournalSet.java:621)
at org.apache.hadoop.hdfs.server.namenode.FSEditLog.recoverUnclosedStreams(FSEditLog.java:1521)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startActiveServices(FSNamesystem.java:1180)
at org.apache.hadoop.hdfs.server.namenode.NameNode$NameNodeHAContext.startActiveServices(NameNode.java:1919)
at org.apache.hadoop.hdfs.server.namenode.ha.ActiveState.enterState(ActiveState.java:61)
at org.apache.hadoop.hdfs.server.namenode.ha.HAState.setStateInternal(HAState.java:64)
at org.apache.hadoop.hdfs.server.namenode.ha.StandbyState.setState(StandbyState.java:49)
at org.apache.hadoop.hdfs.server.namenode.NameNode.transitionToActive(NameNode.java:1777)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.transitionToActive(NameNodeRpcServer.java:1649)
at org.apache.hadoop.ha.protocolPB.HAServiceProtocolServerSideTranslatorPB.transitionToActive(HAServiceProtocolServerSideTranslatorPB.java:107)
at org.apache.hadoop.ha.proto.HAServiceProtocolProtos$HAServiceProtocolService$2.callBlockingMethod(HAServiceProtocolProtos.java:4460)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:503)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:989)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:871)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:817)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1893)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2606)
2019-12-06 10:57:57,541 INFO org.apache.hadoop.util.ExitUtil: Exiting with status 1: Error: recoverUnfinalizedSegments failed for required journal (JournalAndStream(mgr=QJM to [192.168.12.10:8485, 192.168.12.11:8485, 192.168.12.12:8485], stream=null))
2019-12-06 10:57:57,546 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at nn1/192.168.12.10
************************************************************/

Failed to start namenode.java.lang.IllegalStateException

iam using hadoop apache 2.7.1 high availability cluster that consists of
two name nodes mn1,mn2 and 3 journal nodes
but while i was working on cluster i faced the following error
when i issue start-dfs.sh mn1 is standby and mn2 is active
but after that if one of theses two namenodes are off there is no possibility
to turn it on again
and here are the last lines of log of one of these two name nodes
2017-08-05 09:37:21,063 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Need to save fs image? false (staleImage=true, haEnabled=true, isRollingUpgrade=false)
2017-08-05 09:37:21,063 INFO org.apache.hadoop.hdfs.server.namenode.NameCache: initialized with 3 entries 72 lookups
2017-08-05 09:37:21,088 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Finished loading FSImage in 7052 msecs
2017-08-05 09:37:21,300 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: RPC server is binding to mn2:8020
2017-08-05 09:37:21,304 INFO org.apache.hadoop.ipc.CallQueueManager: Using callQueue class java.util.concurrent.LinkedBlockingQueue
2017-08-05 09:37:21,316 INFO org.apache.hadoop.ipc.Server: Starting Socket Reader #1 for port 8020
2017-08-05 09:37:21,353 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Registered FSNamesystemState MBean
2017-08-05 09:37:21,354 WARN org.apache.hadoop.hdfs.server.common.Util: Path /opt/hadoop/metadata_dir should be specified as a URI in configuration files. Please update hdfs configuration.
2017-08-05 09:37:21,361 ERROR org.apache.hadoop.hdfs.server.namenode.NameNode: Failed to start namenode.
java.lang.IllegalStateException
at com.google.common.base.Preconditions.checkState(Preconditions.java:129)
at org.apache.hadoop.hdfs.server.namenode.LeaseManager.getNumUnderConstructionBlocks(LeaseManager.java:119)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getCompleteBlocksTotal(FSNamesystem.java:5741)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startCommonServices(FSNamesystem.java:1063)
at org.apache.hadoop.hdfs.server.namenode.NameNode.startCommonServices(NameNode.java:678)
at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:664)
at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:811)
at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:795)
at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1488)
at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1554)
2017-08-05 09:37:21,364 INFO org.apache.hadoop.util.ExitUtil: Exiting with status 1
2017-08-05 09:37:21,365 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at mn2/192.168.25.22
************************************************************/
This may be
1.Namenode PORT may be Change for each NODE.
This is a particularly vexing problem.
Swallow IllegalStateExceptions thrown by removeShutdownHook in FileSystem. The javadoc states:
public boolean removeShutdownHook(Thread hook)
Throws:
IllegalStateException - If the virtual machine is already in the process of shutting down
So if we are getting this exception, it MEANS we are already in the process of shutdown, so we CANNOT, try what we may, removeShutdownHook. If Runtime had a method Runtime.isShutdownInProgress(), we could have checked for it before the removeShutdownHook call. As it stands, there is no such method. In my opinion, this would be a good patch regardless of the needs for this JIRA.
Not send SIGTERMs from the NM to the MR-AM in the first place. Rather we should expose a mechanism for the NM to politely tell the AM its no longer needed and should shutdown asap. Even after this, if an admin were to kill the MRAppMaster with a SIGTERM, the JobHistory would be lost defeating the purpose of 3614
i discovered that my problem was in journal node and not in namenode
even though the log of namenode shows the error mentioned in question
jps shows journal node but it is fake because journal node service is shut down
even though it is found in jps output
so as a solution i issue hadoop-daemon.sh stop journalnode
then hadoop-daemon.sh start journalnode
and then namenode starts to work again

Namenode starting but Datanode not starting

I get the following exception when I start the distributed file system. I am using hadoop 2.6.0.
2015-08-26 23:10:58,222 FATAL datanode.DataNode (DataNode.java:secureMain(2385)) - Exception in secureMain
java.net.UnknownHostException: IM1948-X0: IM1948-X0
at java.net.InetAddress.getLocalHost(InetAddress.java:1475)
at org.apache.hadoop.security.SecurityUtil.getLocalHostName(SecurityUtil.java:187)
at org.apache.hadoop.security.SecurityUtil.login(SecurityUtil.java:207)
at org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:2153)
at org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:2202)
at org.apache.hadoop.hdfs.server.datanode.DataNode.secureMain(DataNode.java:2378)
at org.apache.hadoop.hdfs.server.datanode.DataNode.main(DataNode.java:2402)
Caused by: java.net.UnknownHostException: IM1948-X0
at java.net.Inet4AddressImpl.lookupAllHostAddr(Native Method)
at java.net.InetAddress$1.lookupAllHostAddr(InetAddress.java:901)
at java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1295)
at java.net.InetAddress.getLocalHost(InetAddress.java:1471)
2015-08-26 23:10:58,227 INFO util.ExitUtil (ExitUtil.java:terminate(124)) - Exiting with status 1
2015-08-26 23:10:58,229 INFO datanode.DataNode (StringUtils.java:run(659)) - SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down DataNode at java.net.UnknownHostException: IM1948-X0: IM1948-X0
Even deleting the hadoop/hdfs-data/current directory doesnot help; I tried formatting the namenode but without success. This generally happens to me when I restart hadoop.
Basically to sum up, datanode process is not running at all for the hadoop cluster.

Hadoop ResourceManager Can Not Start

I got the following error, but netstat shows 8088 is not in use.
This is a 3 node cluster, Namenode, Jobtracker, Datanode running on different EC2 instance
2014-02-04 02:49:43,519 FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error starting ResourceManager
org.apache.hadoop.yarn.webapp.WebAppException: Error starting http server
at org.apache.hadoop.yarn.webapp.WebApps$Builder.start(WebApps.java:262)
at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startWepApp(ResourceManager.java:623)
at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceStart(ResourceManager.java:655)
at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:872)
Caused by: java.net.BindException: Port in use: jobtracker.hdp-dev.XYZ.com:8088
at org.apache.hadoop.http.HttpServer.openListener(HttpServer.java:742)
at org.apache.hadoop.http.HttpServer.start(HttpServer.java:686)
at org.apache.hadoop.yarn.webapp.WebApps$Builder.start(WebApps.java:257)
... 4 more
Caused by: java.net.BindException: Cannot assign requested address
at sun.nio.ch.Net.bind0(Native Method)
at sun.nio.ch.Net.bind(Net.java:444)
at sun.nio.ch.Net.bind(Net.java:436)
at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:214)
at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74)
at org.mortbay.jetty.nio.SelectChannelConnector.open(SelectChannelConnector.java:216)
at org.apache.hadoop.http.HttpServer.openListener(HttpServer.java:738)
... 6 more
2014-02-04 02:49:43,522 INFO org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down ResourceManager at name01.hdp-dev.XYZ.com/10.xxx.xxx.xxx
************************************************************/
On a Debian-based system, you can run something like
apt-cache policy zookeeper at the terminal. That command will list all the repositories where the package zookeeper is available.
If the package zookeeper is available from two repositories or more: eg :Ubuntu’s Raring Universe Repository and the CDH repository. So, you have a problem.
Specially understanding that this can be a package mix/match issue
Solution is : Create a file at /etc/apt/preferences.d/cloudera.pref with the following contents:
Package: *
Pin: release o=Cloudera, l=Cloudera
Pin-Priority: 501
No apt-get update is required after creating this file.
Here, the default priority of packages is 500. By creating the file above, you provide a higher priority of 501 to any package that has origin specified as “Cloudera” (o=Cloudera) and is coming from Cloudera’s repo (l=Cloudera), which does the trick..
Hope this helps..

hadoop namenode not getting detected

I am trying to configure a pseudo hadoop 1 node cluster on my ubuntu machine. However when i give the following command
bin/start-all.sh
it start all the daemons but when i do jps,it does not give me namnode port and when i go to the namenode logs
The following message is displayed.
2013-04-26 11:59:09,927 ERROR org.apache.hadoop.hdfs.server.namenode.NameNode: java.io.IOException: Incomplete HDFS URI, no host: hdfs://vikasXXX.XX.XX.XX:X000
at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:85)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1386)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:66)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1404)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:254)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:123)
at org.apache.hadoop.fs.Trash.<init>(Trash.java:62)
at org.apache.hadoop.hdfs.server.namenode.NameNode.startTrashEmptier(NameNode.java:314)
at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:310)
at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:496)
at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1279)
at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1288)
2013-04-26 11:59:09,929 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at vikas/XXX.XX.XX.XX
************************************************************/
What could be the reason?
Thanks
I assume that you have masked and shared your URL
hdfs://vikasXXX.XX.XX.XX:X000
I think it is not recognizing your machine by name. Try using localhost and check if it works.
hdfs://localhost:8020

Resources