CDH upgrade from 5.1 to 5.3 - hadoop

After I finished all distribution, activation steps on manager website,
I got the error as below when I restart the cluster:
2016-07-14 14:51:12,335 INFO org.mortbay.log: Stopped HttpServer2$SelectChannelConnectorWithSafeStartup#UT190320.shis.uth.tmc.edu:50070
2016-07-14 14:51:12,436 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Stopping NameNode metrics system...
2016-07-14 14:51:12,436 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: NameNode metrics system stopped.
2016-07-14 14:51:12,436 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: NameNode metrics system shutdown complete.
2016-07-14 14:51:12,436 FATAL org.apache.hadoop.hdfs.server.namenode.NameNode: Failed to start namenode.
java.io.IOException:
File system image contains an old layout version -55.
An upgrade to version -59 is required.
Please restart NameNode with the "-rollingUpgrade started" option if a rolling upgrade is already started; or restart NameNode with the "-upgrade" option to start a new upgrade.
at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:232)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:1006)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:736)
at org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:553)
at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:609)
at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:776)
at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:760)
at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1466)
at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1534)
2016-07-14 14:51:12,439 INFO org.apache.hadoop.util.ExitUtil: Exiting with status 1

You will need to perform the upgrade as suggested error messages. It is not clear what exactly you did but I suggest you follow the documentation at http://www.cloudera.com/documentation/enterprise/latest/topics/cdh_ig_earlier_cdh5_upgrade.html
sudo service hadoop-hdfs-namenode upgrade is possibly what you need.

Related

Failed to start namenode.java.lang.IllegalStateException

iam using hadoop apache 2.7.1 high availability cluster that consists of
two name nodes mn1,mn2 and 3 journal nodes
but while i was working on cluster i faced the following error
when i issue start-dfs.sh mn1 is standby and mn2 is active
but after that if one of theses two namenodes are off there is no possibility
to turn it on again
and here are the last lines of log of one of these two name nodes
2017-08-05 09:37:21,063 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Need to save fs image? false (staleImage=true, haEnabled=true, isRollingUpgrade=false)
2017-08-05 09:37:21,063 INFO org.apache.hadoop.hdfs.server.namenode.NameCache: initialized with 3 entries 72 lookups
2017-08-05 09:37:21,088 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Finished loading FSImage in 7052 msecs
2017-08-05 09:37:21,300 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: RPC server is binding to mn2:8020
2017-08-05 09:37:21,304 INFO org.apache.hadoop.ipc.CallQueueManager: Using callQueue class java.util.concurrent.LinkedBlockingQueue
2017-08-05 09:37:21,316 INFO org.apache.hadoop.ipc.Server: Starting Socket Reader #1 for port 8020
2017-08-05 09:37:21,353 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Registered FSNamesystemState MBean
2017-08-05 09:37:21,354 WARN org.apache.hadoop.hdfs.server.common.Util: Path /opt/hadoop/metadata_dir should be specified as a URI in configuration files. Please update hdfs configuration.
2017-08-05 09:37:21,361 ERROR org.apache.hadoop.hdfs.server.namenode.NameNode: Failed to start namenode.
java.lang.IllegalStateException
at com.google.common.base.Preconditions.checkState(Preconditions.java:129)
at org.apache.hadoop.hdfs.server.namenode.LeaseManager.getNumUnderConstructionBlocks(LeaseManager.java:119)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getCompleteBlocksTotal(FSNamesystem.java:5741)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startCommonServices(FSNamesystem.java:1063)
at org.apache.hadoop.hdfs.server.namenode.NameNode.startCommonServices(NameNode.java:678)
at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:664)
at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:811)
at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:795)
at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1488)
at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1554)
2017-08-05 09:37:21,364 INFO org.apache.hadoop.util.ExitUtil: Exiting with status 1
2017-08-05 09:37:21,365 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at mn2/192.168.25.22
************************************************************/
This may be
1.Namenode PORT may be Change for each NODE.
This is a particularly vexing problem.
Swallow IllegalStateExceptions thrown by removeShutdownHook in FileSystem. The javadoc states:
public boolean removeShutdownHook(Thread hook)
Throws:
IllegalStateException - If the virtual machine is already in the process of shutting down
So if we are getting this exception, it MEANS we are already in the process of shutdown, so we CANNOT, try what we may, removeShutdownHook. If Runtime had a method Runtime.isShutdownInProgress(), we could have checked for it before the removeShutdownHook call. As it stands, there is no such method. In my opinion, this would be a good patch regardless of the needs for this JIRA.
Not send SIGTERMs from the NM to the MR-AM in the first place. Rather we should expose a mechanism for the NM to politely tell the AM its no longer needed and should shutdown asap. Even after this, if an admin were to kill the MRAppMaster with a SIGTERM, the JobHistory would be lost defeating the purpose of 3614
i discovered that my problem was in journal node and not in namenode
even though the log of namenode shows the error mentioned in question
jps shows journal node but it is fake because journal node service is shut down
even though it is found in jps output
so as a solution i issue hadoop-daemon.sh stop journalnode
then hadoop-daemon.sh start journalnode
and then namenode starts to work again

Issue in Rollback (after rolling upgrade) from hadoop 2.7.1 to 2.4.0

I tried to do rolling upgrade from hadoop 2.4.0 to hadoop 2.7.1. As per http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HdfsRollingUpgrade.html#dfsadmin_-rollingUpgrade one can rollback to previous release provided the finalise step is not done. I upgraded the setup but didnot finalise the upgrade and tried to rollback HDFS to 2.4.0
I tried the following steps
Shutdown all NNs and DNs.
Restore the pre-upgrade release in all machines.
Start NN1 as Active with the "-rollingUpgrade rollbackhttp://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HdfsRollingUpgrade.html#namenode_-rollingUpgrade" option.
I am getting the following error after 3rd step
15/09/01 17:53:35 INFO namenode.AclConfigFlag: ACLs enabled? false
15/09/01 17:53:35 INFO common.Storage: Lock on <<NameNode dir>>/in_use.lock acquired by nodename 12152#VM-2
15/09/01 17:53:35 WARN namenode.FSNamesystem: Encountered exception loading fsimage
org.apache.hadoop.hdfs.server.common.IncorrectVersionException: Unexpected version of storage directory /data/yarn/namenode. Reported: -63. Expecting = -56.
at org.apache.hadoop.hdfs.server.common.StorageInfo.setLayoutVersion(StorageInfo.java:178)
at org.apache.hadoop.hdfs.server.common.StorageInfo.setFieldsFromProperties(StorageInfo.java:131)
at org.apache.hadoop.hdfs.server.namenode.NNStorage.setFieldsFromProperties(NNStorage.java:608)
at org.apache.hadoop.hdfs.server.common.StorageInfo.readProperties(StorageInfo.java:228)
at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverStorageDirs(FSImage.java:309)
at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:202)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:882)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:639)
at org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:455)
at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:511)
at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:670)
at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:655)
at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1304)
at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1370)
15/09/01 17:53:35 INFO mortbay.log: Stopped SelectChannelConnector#0.0.0.0:50070
15/09/01 17:53:35 INFO impl.MetricsSystemImpl: Stopping NameNode metrics system...
15/09/01 17:53:35 INFO impl.MetricsSystemImpl: NameNode metrics system stopped.
15/09/01 17:53:35 INFO impl.MetricsSystemImpl: NameNode metrics system shutdown complete.
15/09/01 17:53:35 FATAL namenode.NameNode: Exception in namenode join
From rolling upgrade documentation it can be inferred that rolling upgrade is supported Hadoop 2.4.0 onwards but rollingUpgrade rollback to Hadoop 2.4.0 seems to be broken in Hadoop 2.4.0. It throws above mentioned error.
Are there any other steps to perform rollback (from rolling upgrade) or is it not supported to rollback to Hadoop 2.4.0.

getting java.net.SocketTimeoutException when trying to run the Hadoop mapReduce on fresh install of Hortonworks

I have a fresh install of Hortonworks version 2.3_1 for oracle virtualbox and I get a java.net.SocketTimeoutException whenever I try to run a mapreduce job. I changed nothing other than the memory and the cores available to the VM.
full text of run:
WARNING: Use "yarn jar" to launch YARN applications.
15/09/01 01:15:17 INFO impl.TimelineClientImpl: Timeline service address: http:/ /sandbox.hortonworks.com:8188/ws/v1/timeline/
15/09/01 01:15:20 INFO client.RMProxy: Connecting to ResourceManager at sandbox. hortonworks.com/10.0.2.15:8050
15/09/01 01:16:19 WARN mapreduce.JobResourceUploader: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your applicatio n with ToolRunner to remedy this.
15/09/01 01:18:09 WARN hdfs.DFSClient: DFSOutputStream ResponseProcessor excepti on for block BP-601678901-10.0.2.15-1439987491556:blk_1073742292_1499
java.net.SocketTimeoutException: 65000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/10.0 .2.15:52924 remote=/10.0.2.15:50010]
at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.ja va:164)
at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:1 61)
at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:1 31)
at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:1 18)
at java.io.FilterInputStream.read(FilterInputStream.java:83)
at java.io.FilterInputStream.read(FilterInputStream.java:83)
at org.apache.hadoop.hdfs.protocolPB.PBHelper.vintPrefixed(PBHelper.java :2280)
at org.apache.hadoop.hdfs.protocol.datatransfer.PipelineAck.readFields(P ipelineAck.java:244)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor .run(DFSOutputStream.java:749)
15/09/01 01:18:11 INFO mapreduce.JobSubmitter: Cleaning up the staging area /use r/root/.staging/job_1441069639378_0001
Exception in thread "main" java.io.IOException: All datanodes DatanodeInfoWithStorage[10.0.2.15:50010,DS-56099a5f-3cb3-426e-8e1a-ff3b53df9bf2,DISK] are bad. Aborting...
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1117)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:909)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:412)
Full name of file ova file I am using: Sandbox_HDP_2.3_1_virtualbox.ova
my host is a window 7 home premium machine with eight lines of execution(four hyperthreaded cores, I think)
The problem was exactly what it seemed a timeout error. Fixed by going to the hadoop config folder and raising all the timeouts as well as the number of retries (although from the log that didn't come into play) and stopping unnecessary services on both the host and guest operating system.
Thank, sunrise76 on of those issues pointed me to the config folder.

Hadoop: slaves in service but doing nothing at all

I set up a hadoop cluster and started a MapReduce job on the cluster.
The master node is running actively but all slaves are doing nothing at all.
JPS on the slave node produces
20390 DataNode
20492 NodeManager
21256 Jps
Here is the screen cast:
The next to last row corresponds to the master node.
So why the slaves using no blocks?
Also running top on master node yields the Java process(hadoop jar jar-file.jar args) taking almost 100% of CPU resources. However, such process does not exist on any slave machines.
That is why I think slaves are at rest, doing nothing at all.
Here is one example of the slave datanode log:
2014-07-24 23:28:01,302 INFO org.apache.hadoop.util.GSet: Computing capacity for map BlockMap
2014-07-24 23:28:01,302 INFO org.apache.hadoop.util.GSet: VM type = 64-bit
2014-07-24 23:28:01,304 INFO org.apache.hadoop.util.GSet: 0.5% max memory 889 MB = 4.4 MB
2014-07-24 23:28:01,304 INFO org.apache.hadoop.util.GSet: capacity = 2^19 = 524288 entries
2014-07-24 23:28:01,304 INFO org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner: Periodic Block Verification Scanner initialized with interval 504 hours for block pool BP-1752077220-193.167.138.8-1406217332464
2014-07-24 23:28:01,310 INFO org.apache.hadoop.hdfs.server.datanode.DataBlockScanner: Added bpid=BP-1752077220-193.167.138.8-1406217332464 to blockPoolScannerMap, new size=1
2014-07-24 23:31:01,116 INFO org.apache.hadoop.hdfs.server.datanode.DirectoryScanner: BlockPool BP-1752077220-193.167.138.8-1406217332464 Total blocks: 0, missing metadata files:0, missing block files:0, missing blocks in memory:0, mismatched blocks:0
And nothing more.
However, for the master data node, the log file contains lines like the following:
2014-07-24 22:27:23,443 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving BP-1752077220-193.167.138.8-1406217332464:blk_1073742749_1925 src: /193.167.138.8:44210 dest: /193.167.138.8:50010
which I think means the node is receiving tasks and processing the data.
The following is from the yarn log file of one the slave node:
2014-07-24 23:28:13,811 INFO org.mortbay.log: Started SelectChannelConnector#0.0.0.0:8042
2014-07-24 23:28:13,812 INFO org.apache.hadoop.yarn.webapp.WebApps: Web app /node started at 8042
2014-07-24 23:28:14,122 INFO org.apache.hadoop.yarn.webapp.WebApps: Registered webapp guice modules
2014-07-24 23:28:14,130 INFO org.apache.hadoop.yarn.client.RMProxy: Connecting to ResourceManager at ugluk/193.167.138.8:8031
2014-07-24 23:28:14,176 INFO org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Registering with RM using finished containers :[]
2014-07-24 23:28:14,366 INFO org.apache.hadoop.yarn.server.nodemanager.security.NMContainerTokenSecretManager: Rolling master-key for container-tokens, got key with id 1336429163
2014-07-24 23:28:14,369 INFO org.apache.hadoop.yarn.server.nodemanager.security.NMTokenSecretManagerInNM: Rolling master-key for nm-tokens, got key with id :1986181585
2014-07-24 23:28:14,370 INFO org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Registered with ResourceManager as shagrat.hiit.fi:48662 with total resource of <memory:8192, vCores:8>
2014-07-24 23:28:14,370 INFO org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Notifying ContainerManager to unblock new container-requests
I am using Hadoop 2.4.0
It seems that you formatted namenode more than once.
The block pool id error is majorly due to formatting of namenode multiple times.
Every time ,you format a namenode ,the blockpool id ,cluster id and the namespace id changes.
So first check the above attributes of the namenode and other datanodes and secondary namenode.
You can check using VERSION file in current directory of these nodes.For this ,first see where you configured your node by checking its path hadoop hdfs-site.xml.
go to that path,and look for the CURRENT directory and make the necessary changes.
Please let me know if this helps.

Hadoop: Datanode process killed

I am currently using Hadoop-2.0.3-alpha and after I could work perfectly with HDFS (copying files into HDFS, getting success from an external framework, using the webfrontend), after a new start of my VM, the datanode process is stopping after a while. The namenode process and all yarn processes work without a problem. I installed Hadoop in a folder under an additional user, as I also still have installed Hadoop 0.2, which worked fine too.
Taking a look at the log-file of all datanode processes I got the following information:
2013-04-11 16:23:50,475 WARN org.apache.hadoop.util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2013-04-11 16:24:17,451 INFO org.apache.hadoop.metrics2.impl.MetricsConfig: loaded properties from hadoop-metrics2.properties
2013-04-11 16:24:23,276 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled snapshot period at 10 second(s).
2013-04-11 16:24:23,279 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: DataNode metrics system started
2013-04-11 16:24:23,480 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Configured hostname is user-VirtualBox
2013-04-11 16:24:28,896 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Opened streaming server at /0.0.0.0:50010
2013-04-11 16:24:29,239 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Balancing bandwith is 1048576 bytes/s
2013-04-11 16:24:38,348 INFO org.mortbay.log: Logging to org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via org.mortbay.log.Slf4jLog
2013-04-11 16:24:44,627 INFO org.apache.hadoop.http.HttpServer: Added global filter 'safety' (class=org.apache.hadoop.http.HttpServer$QuotingIn putFilter)
2013-04-11 16:24:45,163 INFO org.apache.hadoop.http.HttpServer: Added filter static_user_filter (class=org.apache.hadoop.http.lib.StaticUserWebFil ter$StaticUserFilter) to context datanode
2013-04-11 16:24:45,164 INFO org.apache.hadoop.http.HttpServer: Added filter static_user_filter (class=org.apache.hadoop.http.lib.StaticUserWebFil ter$StaticUserFilter) to context logs
2013-04-11 16:24:45,164 INFO org.apache.hadoop.http.HttpServer: Added filter static_user_filter (class=org.apache.hadoop.http.lib.StaticUserWebFil ter$StaticUserFilter) to context static
2013-04-11 16:24:45,355 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Opened info server at 0.0.0.0:50075
2013-04-11 16:24:45,508 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: dfs.webhdfs.enabled = false
2013-04-11 16:24:45,536 INFO org.apache.hadoop.http.HttpServer: Jetty bound to port 50075
2013-04-11 16:24:45,576 INFO org.mortbay.log: jetty-6.1.26
2013-04-11 16:25:18,416 INFO org.mortbay.log: Started SelectChannelConnector#0.0.0.0:50075
2013-04-11 16:25:42,670 INFO org.apache.hadoop.ipc.Server: Starting Socket Reader #1 for port 50020
2013-04-11 16:25:44,955 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Opened IPC server at /0.0.0.0:50020
2013-04-11 16:25:45,483 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Refresh request received for nameservices: null
2013-04-11 16:25:47,079 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Starting BPOfferServices for nameservices: <default>
2013-04-11 16:25:47,660 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Block pool <registering> (storage id unknown) service to localhost/127.0.0.1:8020 starting to offer service
2013-04-11 16:25:50,515 INFO org.apache.hadoop.ipc.Server: IPC Server Responder: starting
2013-04-11 16:25:50,631 INFO org.apache.hadoop.ipc.Server: IPC Server listener on 50020: starting
2013-04-11 16:26:15,068 INFO org.apache.hadoop.hdfs.server.common.Storage: Lock on /home/hadoop/workspace/hadoop_space/hadoop23/dfs/data/in_use.lock acquired by nodename 3099#user-VirtualBox
2013-04-11 16:26:15,720 FATAL org.apache.hadoop.hdfs.server.datanode.DataNode: Initialization failed for block pool Block pool BP-474150866-127.0.1.1-1365686732002 (storage id DS-317990214-127.0.1.1-50010-1365505141363) service to localhost/127.0.0.1:8020
java.io.IOException: Incompatible clusterIDs in /home/hadoop/workspace/hadoop_space/hadoop23/dfs/data: namenode clusterID = CID-1745a89c-fb08-40f0-a14d-d37d01f199c3; datanode clusterID = CID-bb3547b0-03e4-4588-ac25-f0299ff81e4f
at org.apache.hadoop.hdfs.server.datanode.DataStorage .doTransition(DataStorage.java:391)
at org.apache.hadoop.hdfs.server.datanode.DataStorage .recoverTransitionRead(DataStorage.java:191)
at org.apache.hadoop.hdfs.server.datanode.DataStorage .recoverTransitionRead(DataStorage.java:219)
at org.apache.hadoop.hdfs.server.datanode.DataNode.in itStorage(DataNode.java:850)
at org.apache.hadoop.hdfs.server.datanode.DataNode.in itBlockPool(DataNode.java:821)
at org.apache.hadoop.hdfs.server.datanode.BPOfferServ ice.verifyAndSetNamespaceInfo(BPOfferService.java: 280)
at org.apache.hadoop.hdfs.server.datanode.BPServiceAc tor.connectToNNAndHandshake(BPServiceActor.java:22 2)
at org.apache.hadoop.hdfs.server.datanode.BPServiceAc tor.run(BPServiceActor.java:664)
at java.lang.Thread.run(Thread.java:722)
2013-04-11 16:26:16,212 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Ending block pool service for: Block pool BP-474150866-127.0.1.1-1365686732002 (storage id DS-317990214-127.0.1.1-50010-1365505141363) service to localhost/127.0.0.1:8020
2013-04-11 16:26:16,276 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Removed Block pool BP-474150866-127.0.1.1-1365686732002 (storage id DS-317990214-127.0.1.1-50010-1365505141363)
2013-04-11 16:26:18,396 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Exiting Datanode
2013-04-11 16:26:18,940 INFO org.apache.hadoop.util.ExitUtil: Exiting with status 0
2013-04-11 16:26:19,668 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: SHUTDOWN_MSG:
/************************************************** **********
SHUTDOWN_MSG: Shutting down DataNode at user-VirtualBox/127.0.1.1
************************************************** **********/
Any ideas? May be I made a mistake during the installation process? But it is strange, that it worked once. I also have to say, that if I am logged in as my additional user to execute the commands ./hadoop-daemon.sh start namenode and the same with the datanode, I need to add sudo.
I used this installation guide: http://jugnu-life.blogspot.ie/2012/0...rial-023x.html
By the way, I use the Oracle Java-7 version.
The problem could be that the namenode was formatted after the cluster was set up and the datanodes were not, so the slaves are still referring to the old namenode.
We have to delete and recreate the folder /home/hadoop/dfs/data on the local filesystem for the datanode.
Check your hdfs-site.xml file to see where dfs.data.dir is pointing to
and delete that folder
and then restart the datanode daemon on the machine
The steps above should recreate the folder and resolve the problem.
Please share your config info if the instructions above do not work.
DataNode dies because of incompatible Clusterids. To fix this problem
If you are using hadoop 2.X, then you have to delete everything in the folder that you have specified in hdfs-site.xml - "dfs.datanode.data.dir" (but NOT the folder itself).
The ClusterID will be maintained in that folder. Delete and restart dfs.sh. This should work!!!
You need to delete both
C:\hadoop\data\dfs\datanode and
C:\hadoop\data\dfs\namenode folders.
If you don't have this folders - open your C:\hadoop\etc\hadoop\hdfs-site.xml file and get paths for this folders for next deletion. For me it says:
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/hadoop/data/dfs/namenode</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/hadoop/data/dfs/datanode</value>
</property>
Run command for Format namenodec:\hadoop\bin>hdfs namenode -format
Now it should work!
I think the recommended way of doing this without deleting the data directory is to simply change the clusterID variable in the datanode's VERSION file.
If you look in your daemons directory, you will see the datanode directory exmaple
data/hadoop/daemons/datanode
The VERSION file should look like this.
cat current/VERSION
#Tue Oct 14 17:31:58 CDT 2014
storageID=DS-23bf7f3a-085c-4531-808f-801ff6d52d14
clusterID=CID-bb3547b0-03e4-4588-ac25-f0299ff81e4f
cTime=0
datanodeUuid=63154929-ae68-4149-9f75-9a6558545041
storageType=DATA_NODE
layoutVersion=-55
You need to change the clusterId to the first value in the output of the message so in your case that would be CID-1745a89c-fb08-40f0-a14d-d37d01f199c3 instead of CID-bb3547b0-03e4-4588-ac25-f0299ff81e4f
The updated version should appear like this with the altered clusterId
cat current/VERSION
#Tue Oct 14 17:31:58 CDT 2014
storageID=DS-23bf7f3a-085c-4531-808f-801ff6d52d14
clusterID=CID-1745a89c-fb08-40f0-a14d-d37d01f199c3
cTime=0
datanodeUuid=63154929-ae68-4149-9f75-9a6558545041
storageType=DATA_NODE
layoutVersion=-55
Restart hadoop and the datanode should start just fine.

Resources