ERROR namenode.NameNode (NameNode.java:main(1715)) - Failed to start namenode - hadoop

I am trying to restart one of the namenode (nn2) but i get the following error in the logs:
2021-12-17 10:23:53,676 ERROR namenode.NameNode (NameNode.java:main(1715)) - Failed to start namenode.
org.apache.hadoop.hdfs.server.namenode.EditLogInputException: Error replaying edit log at offset 0. Expected transaction ID was 274488049
at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:226)
at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:160)
at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:890)
at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:745)
at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:323)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:1090)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:714)
at org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:632)
at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:694)
at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:937)
at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:910)
at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1643)
at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1710)
Caused by: org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInputStream$PrematureEOFException: got premature end-of-file at txid 274488048; expected file to go up to 274488109
at org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInputStream.nextOp(RedundantEditLogInputStream.java:197)
at org.apache.hadoop.hdfs.server.namenode.EditLogInputStream.readOp(EditLogInputStream.java:85)
at org.apache.hadoop.hdfs.server.namenode.EditLogInputStream.skipUntil(EditLogInputStream.java:151)
at org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInputStream.nextOp(RedundantEditLogInputStream.java:179)
at org.apache.hadoop.hdfs.server.namenode.EditLogInputStream.readOp(EditLogInputStream.java:85)
at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:213)
... 12 more
2021-12-17 10:23:53,678 INFO util.ExitUtil (ExitUtil.java:terminate(210)) - Exiting with status 1: org.apache.hadoop.hdfs.server.namenode.EditLogInputException: Error replaying edit log at offset 0. Expected transaction ID was 274488049
2021-12-17 10:23:53,681 INFO namenode.NameNode (LogAdapter.java:info(51)) - SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at XX-XXX-XX-XXXX.XXXXX.XX/XX.X.XX.XX
************************************************************/
i tryied to do the following steps in order to solve the issue:
i copied from nn01 to the NameNode directories of nn02 the following logs
edits_0000000000274487928-0000000000274488048
edits_0000000000274488049-0000000000274488109
So far the nn02 is still not starting and i get the same error.
Can you please help?

If that is an HA setup, and your NN1 is working properly. Format your NN2(hdfs namenode -format) and do a bootstrap (hdfs namenode -bootstrapStandby)
Then try restarting the NN2.

Related

Hadoop HA Namenode goes down with the Error: flush failed for required journal

Namenode goes down with below error:
using hadoop version : 2.0.0
ERROR
INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: Starting log segment at XXXXXX
FATAL org.apache.hadoop.hdfs.server.namenode.FSEditLog: Error: flush failed for required journal (JournalAndStream(mgr=FileJournalManager(root=/XXXXXXXX/HA/edits), stream=EditLogFileOutputStream(/XXXXXXXXXXXX/HA/edits/current/edits_inprogress_00000000000XXXXXXX)))
java.io.IOException: Input/output error
at sun.nio.ch.FileDispatcherImpl.force0(Native Method)
at sun.nio.ch.FileDispatcherImpl.force(FileDispatcherImpl.java:75)
at sun.nio.ch.FileChannelImpl.force(FileChannelImpl.java:363)
......
.org.apache.hadoop.hdfs.server.namenode.JournalSet$JournalSetOutputStream.flush(JournalSet.java:456)
at
org.apache.hadoop.hdfs.server.namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at CDCTPHADOOPAP1/IP_ADDRESS
************************************************************/
Can someone help me in understanding the exact cause for this issue and how to resolve this.

Failed to start namenode.java.lang.IllegalStateException

iam using hadoop apache 2.7.1 high availability cluster that consists of
two name nodes mn1,mn2 and 3 journal nodes
but while i was working on cluster i faced the following error
when i issue start-dfs.sh mn1 is standby and mn2 is active
but after that if one of theses two namenodes are off there is no possibility
to turn it on again
and here are the last lines of log of one of these two name nodes
2017-08-05 09:37:21,063 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Need to save fs image? false (staleImage=true, haEnabled=true, isRollingUpgrade=false)
2017-08-05 09:37:21,063 INFO org.apache.hadoop.hdfs.server.namenode.NameCache: initialized with 3 entries 72 lookups
2017-08-05 09:37:21,088 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Finished loading FSImage in 7052 msecs
2017-08-05 09:37:21,300 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: RPC server is binding to mn2:8020
2017-08-05 09:37:21,304 INFO org.apache.hadoop.ipc.CallQueueManager: Using callQueue class java.util.concurrent.LinkedBlockingQueue
2017-08-05 09:37:21,316 INFO org.apache.hadoop.ipc.Server: Starting Socket Reader #1 for port 8020
2017-08-05 09:37:21,353 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Registered FSNamesystemState MBean
2017-08-05 09:37:21,354 WARN org.apache.hadoop.hdfs.server.common.Util: Path /opt/hadoop/metadata_dir should be specified as a URI in configuration files. Please update hdfs configuration.
2017-08-05 09:37:21,361 ERROR org.apache.hadoop.hdfs.server.namenode.NameNode: Failed to start namenode.
java.lang.IllegalStateException
at com.google.common.base.Preconditions.checkState(Preconditions.java:129)
at org.apache.hadoop.hdfs.server.namenode.LeaseManager.getNumUnderConstructionBlocks(LeaseManager.java:119)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getCompleteBlocksTotal(FSNamesystem.java:5741)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startCommonServices(FSNamesystem.java:1063)
at org.apache.hadoop.hdfs.server.namenode.NameNode.startCommonServices(NameNode.java:678)
at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:664)
at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:811)
at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:795)
at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1488)
at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1554)
2017-08-05 09:37:21,364 INFO org.apache.hadoop.util.ExitUtil: Exiting with status 1
2017-08-05 09:37:21,365 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at mn2/192.168.25.22
************************************************************/
This may be
1.Namenode PORT may be Change for each NODE.
This is a particularly vexing problem.
Swallow IllegalStateExceptions thrown by removeShutdownHook in FileSystem. The javadoc states:
public boolean removeShutdownHook(Thread hook)
Throws:
IllegalStateException - If the virtual machine is already in the process of shutting down
So if we are getting this exception, it MEANS we are already in the process of shutdown, so we CANNOT, try what we may, removeShutdownHook. If Runtime had a method Runtime.isShutdownInProgress(), we could have checked for it before the removeShutdownHook call. As it stands, there is no such method. In my opinion, this would be a good patch regardless of the needs for this JIRA.
Not send SIGTERMs from the NM to the MR-AM in the first place. Rather we should expose a mechanism for the NM to politely tell the AM its no longer needed and should shutdown asap. Even after this, if an admin were to kill the MRAppMaster with a SIGTERM, the JobHistory would be lost defeating the purpose of 3614
i discovered that my problem was in journal node and not in namenode
even though the log of namenode shows the error mentioned in question
jps shows journal node but it is fake because journal node service is shut down
even though it is found in jps output
so as a solution i issue hadoop-daemon.sh stop journalnode
then hadoop-daemon.sh start journalnode
and then namenode starts to work again

Namenode starting but Datanode not starting

I get the following exception when I start the distributed file system. I am using hadoop 2.6.0.
2015-08-26 23:10:58,222 FATAL datanode.DataNode (DataNode.java:secureMain(2385)) - Exception in secureMain
java.net.UnknownHostException: IM1948-X0: IM1948-X0
at java.net.InetAddress.getLocalHost(InetAddress.java:1475)
at org.apache.hadoop.security.SecurityUtil.getLocalHostName(SecurityUtil.java:187)
at org.apache.hadoop.security.SecurityUtil.login(SecurityUtil.java:207)
at org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:2153)
at org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:2202)
at org.apache.hadoop.hdfs.server.datanode.DataNode.secureMain(DataNode.java:2378)
at org.apache.hadoop.hdfs.server.datanode.DataNode.main(DataNode.java:2402)
Caused by: java.net.UnknownHostException: IM1948-X0
at java.net.Inet4AddressImpl.lookupAllHostAddr(Native Method)
at java.net.InetAddress$1.lookupAllHostAddr(InetAddress.java:901)
at java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1295)
at java.net.InetAddress.getLocalHost(InetAddress.java:1471)
2015-08-26 23:10:58,227 INFO util.ExitUtil (ExitUtil.java:terminate(124)) - Exiting with status 1
2015-08-26 23:10:58,229 INFO datanode.DataNode (StringUtils.java:run(659)) - SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down DataNode at java.net.UnknownHostException: IM1948-X0: IM1948-X0
Even deleting the hadoop/hdfs-data/current directory doesnot help; I tried formatting the namenode but without success. This generally happens to me when I restart hadoop.
Basically to sum up, datanode process is not running at all for the hadoop cluster.

Issue with Hadoop Upgrade from 1.0 to 2.2.0

I'm trying to upgrade my 5 node hadoop cluster from 1.0 to 2.2.0. When I try to upgrade the namenode using hadoop-daemon.sh start namenode -upgrade command and check the log files I get the following error message.
015-03-13 10:02:24,549 FATAL org.apache.hadoop.hdfs.server.namenode.NameNode: Exception in namenode join
java.lang.IllegalArgumentException: Invalid URI for NameNode address (check fs.defaultFS): file:/// has no authority.
at org.apache.hadoop.hdfs.server.namenode.NameNode.getAddress(NameNode.java:347)
at org.apache.hadoop.hdfs.server.namenode.NameNode.getAddress(NameNode.java:335)
at org.apache.hadoop.hdfs.server.namenode.NameNode.getRpcServerAddress(NameNode.java:388)
at org.apache.hadoop.hdfs.server.namenode.NameNode.loginAsNameNodeUser(NameNode.java:471)
at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:483)
at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:684)
at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:669)
at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1254)
at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1320)
2015-03-13 10:02:24,586 INFO org.apache.hadoop.util.ExitUtil: Exiting with status 1
2015-03-13 10:02:24,593 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at nn.cluster.com/192.168.1.75
************************************************************/
Please check your core-site.xml and it should contain valid namenode address.

invalid last txid in stream

I am trying to configure hadoop namenode HA and resourcemanager HA as well. However, when I start namenode as standby, I got IllegalArgumentException as below:
=====================================================
About to bootstrap Standby ID nn2 from:
Nameservice ID: mycluster
Other Namenode ID: nn1
Other NN's HTTP address: http://my1.namenode.com:50070
Other NN's IPC address: my1.namenode.com/xxx.xxx.xxx.xxx:8020
Namespace ID: 1915209867
Block pool ID: BP-740716617-xxx.xxx.xxx.xxx-1409206617148
Cluster ID: CID-51cea219-ffe7-4a52-8a6c-fb83d501ccaa
Layout version: -56
=====================================================
Data exists in Storage Directory /hadoop1/hadoop/hdfs/nn. Formatting anyway.
14/11/05 16:41:20 INFO common.Storage: Storage directory /hadoop1/hadoop/hdfs/nn has been successfully formatted.
14/11/05 16:41:20 WARN common.Util: Path /hadoop1/hadoop/hdfs/nn should be specified as a URI in configuration files. Please update hdfs configuration.
14/11/05 16:41:20 WARN common.Util: Path /hadoop1/hadoop/hdfs/nn should be specified as a URI in configuration files. Please update hdfs configuration.
14/11/05 16:41:21 FATAL namenode.NameNode: Exception in namenode join
java.io.IOException: java.lang.IllegalArgumentException: invalid last txid in stream: http://my3.namenode.com:8480/getJournal?jid=mycluster&segmentTxId=74823&storageInfo=-56%3A1915209867%3A0%3ACID-51cea219-ffe7-4a52-8a6c-fb83d501ccaa
at org.apache.hadoop.hdfs.server.namenode.ha.BootstrapStandby.run(BootstrapStandby.java:317)
at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1306)
at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1395)
Caused by: java.lang.IllegalArgumentException: invalid last txid in stream: http://my3.namenode.com:8480/getJournal?jid=mycluster&segmentTxId=74823&storageInfo=-56%3A1915209867%3A0%3ACID-51cea219-ffe7-4a52-8a6c-fb83d501ccaa
at com.google.common.base.Preconditions.checkArgument(Preconditions.java:115)
at org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInputStream.<init>(RedundantEditLogInputStream.java:101)
at org.apache.hadoop.hdfs.server.namenode.JournalSet.chainAndMakeRedundantStreams(JournalSet.java:300)
at org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager.selectInputStreams(QuorumJournalManager.java:494)
at org.apache.hadoop.hdfs.server.namenode.JournalSet.selectInputStreams(JournalSet.java:260)
at org.apache.hadoop.hdfs.server.namenode.FSEditLog.selectInputStreams(FSEditLog.java:1399)
at org.apache.hadoop.hdfs.server.namenode.FSEditLog.selectInputStreams(FSEditLog.java:1418)
at org.apache.hadoop.hdfs.server.namenode.ha.BootstrapStandby.checkLogsAvailableForRead(BootstrapStandby.java:236)
at org.apache.hadoop.hdfs.server.namenode.ha.BootstrapStandby.doRun(BootstrapStandby.java:203)
at org.apache.hadoop.hdfs.server.namenode.ha.BootstrapStandby.access$000(BootstrapStandby.java:69)
at org.apache.hadoop.hdfs.server.namenode.ha.BootstrapStandby$1.run(BootstrapStandby.java:106)
at org.apache.hadoop.hdfs.server.namenode.ha.BootstrapStandby$1.run(BootstrapStandby.java:102)
at org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:415)
at org.apache.hadoop.hdfs.server.namenode.ha.BootstrapStandby.run(BootstrapStandby.java:102)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
at org.apache.hadoop.hdfs.server.namenode.ha.BootstrapStandby.run(BootstrapStandby.java:312)
... 2 more
14/11/05 16:41:21 INFO util.ExitUtil: Exiting with status 1
14/11/05 16:41:21 INFO namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at
************************************************************/
All others are working well and I've checked hdfs-site.xml configuration for the problem but I couldn't find anything.
Please help me...
Thank you
Restart :
./sbin/hadoop-daemon.sh start journalnode

Resources