Hadoop Datanode failed to start and is not running - hadoop

I am trying to install Hadoop 2.7 in Ubuntu 14.04 in VMWare. But datanode always failed to start when I do hadoop datanode, I get this error:

This issue is due to Datanode clusterID is not same as Namenode Cluster ID, both should be same and then only the datanode will communicate to Namenode.
Try to stop services and start the namenode after formatting it.

Related

Namenode service unstable in hadoop 1.2.1

I have setup a hadoop cluster with 1 namenode and 1 datanode (using hadoop version 1.2.1) but when I start both nodes, the namenode service dies (does not appear in list of running java processes) within seconds (datanode service remains up). Can anyone please help me with the reason?
I have tried - removing the temporary files and then re-formatting the namenode before starting the namenode again but that did not help.
I have attached the screenshots of my core-site.xml and hdfs-site.xml entries for both my namenode and datanodes.
Please let me know the reason if possible.
hadoop version and location screenshot
core-site.xml of namenode
hdfs-site.xml of namenode
No errors in formatting namenode
jps listing and unstable namenode
hdfs-site.xml of datanode
namenode log

Apache Spark deployment on Hadoop Yarn Cluster with HA Capability

Am new to the Big data environment and just started with installing a 3 Node Hadoop cluster 2.6 with HA Capability using Zookeeper.
All works good for now and i have tested the Failover scenario using zookeeper on NN1 and NN2 and works well.
Now i was thinking to install Apache Spark on my Hadoop Yarn cluster also with HA Capability.
Can anyone guide me with the installation steps ? I could only find on how to setup Spark on Stand alone mode and which i have setup successfully. Now i want to install the same in Yarn cluster along with HA Capability ,
I have three node cluster (NN1 , NN2 , DN1) , the following daemons are currently running on each of these servers ,
Nodes running in Master NameNode (NN1)
Jps
DataNode
DFSZKFailoverController
JournalNode
ResourceManager
NameNode
QuorumPeerMain
NodeManager
Nodes running in StandBy NameNode (NN2)
Jps
DFSZKFailoverController
NameNode
QuorumPeerMain
NodeManager
JournalNode
DataNode
Nodes running in DataNode (DN1)
QuorumPeerMain
Jps
DataNode
JournalNode
NodeManager
You should setup ResourceManager HA (http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/ResourceManagerHA.html). Spark when run on YARN doesn't run its own daemon processes, so there is no spark part that requires HA in YARN mode.
You can configure the Spark Yarn mode, In Yarn mode you can configure the Driver and Executors Depends on the Cluster capacity.
spark.executor.memory <value>
Number of executors are allocated based on your YARN Container memory!

Datanode not starts correctly

I am trying to install Hadoop 2.2.0 in pseudo-distributed mode. While I am trying to start the datanode services it is showing the following error, can anyone please tell how to resolve this?
**2**014-03-11 08:48:15,916 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Block pool <registering> (storage id unknown) service to localhost/127.0.0.1:9000 starting to offer service
2014-03-11 08:48:15,922 INFO org.apache.hadoop.ipc.Server: IPC Server Responder: starting
2014-03-11 08:48:15,922 INFO org.apache.hadoop.ipc.Server: IPC Server listener on 50020: starting
2014-03-11 08:48:16,406 INFO org.apache.hadoop.hdfs.server.common.Storage: Lock on /home/prassanna/usr/local/hadoop/yarn_data/hdfs/datanode/in_use.lock acquired by nodename 3627#prassanna-Studio-1558
2014-03-11 08:48:16,426 FATAL org.apache.hadoop.hdfs.server.datanode.DataNode: Initialization failed for block pool Block pool BP-611836968-127.0.1.1-1394507838610 (storage id DS-1960076343-127.0.1.1-50010-1394127604582) service to localhost/127.0.0.1:9000
java.io.IOException: Incompatible clusterIDs in /home/prassanna/usr/local/hadoop/yarn_data/hdfs/datanode: namenode clusterID = CID-fb61aa70-4b15-470e-a1d0-12653e357a10; datanode clusterID = CID-8bf63244-0510-4db6-a949-8f74b50f2be9
at**** org.apache.hadoop.hdfs.server.datanode.DataStorage.doTransition(DataStorage.java:391)
at org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:191)
at org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:219)
at org.apache.hadoop.hdfs.server.datanode.DataNode.initStorage(DataNode.java:837)
at org.apache.hadoop.hdfs.server.datanode.DataNode.initBlockPool(DataNode.java:808)
at org.apache.hadoop.hdfs.server.datanode.BPOfferService.verifyAndSetNamespaceInfo(BPOfferService.java:280)
at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:222)
at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:664)
at java.lang.Thread.run(Thread.java:662)
2014-03-11 08:48:16,427 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Ending block pool service for: Block pool BP-611836968-127.0.1.1-1394507838610 (storage id DS-1960076343-127.0.1.1-50010-1394127604582) service to localhost/127.0.0.1:9000
2014-03-11 08:48:16,532 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Removed Block pool BP-611836968-127.0.1.1-1394507838610 (storage id DS-1960076343-127.0.1.1-50010-1394127604582)
2014-03-11 08:48:18,532 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Exiting Datanode
2014-03-11 08:48:18,534 INFO org.apache.hadoop.util.ExitUtil: Exiting with status 0
2014-03-11 08:48:18,536 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: SHUTDOWN_MSG:
You can do the following method,
copy to clipboard datanode clusterID for your example, CID-8bf63244-0510-4db6-a949-8f74b50f2be9
and run following command under HADOOP_HOME/bin directory
./hdfs namenode -format -clusterId CID-8bf63244-0510-4db6-a949-8f74b50f2be9
then this code formatted the namenode with datanode cluster ids.
You must do as follow :
bin/stop-all.sh
rm -Rf /home/prassanna/usr/local/hadoop/yarn_data/hdfs/*
bin/hadoop namenode -format
I had the same problem until I found an answer in this web site.
Whenever you are getting below error, trying to start a DN on a slave machine:
java.io.IOException: Incompatible clusterIDs in /home/hadoop/dfs/data: namenode clusterID= ****; datanode clusterID = ****
It is because after you set up your cluster, you, for whatever reason, decided to reformat
your NN. Your DNs on slaves still bear reference to the old NN.
To resolve this simply delete and recreate data folder on that machine in local Linux FS, namely /home/hadoop/dfs/data.
Restarting that DN's daemon on that machine will recreate data/ folder's content and resolve
the problem.
Do following simple steps
Clear the data directory of hadoop
Format the namenode again
start the cluster
After this your cluster will start normally if you are not having any other configuration issue
DataNode dies because of incompatible Clusterids compared to the NameNode. To fix this problem you need to delete the directory /tmp/hadoop-[user]/hdfs/data and restart hadoop.
rm -r /tmp/hadoop-[user]/hdfs/data
I got similar issue in my pseudo distributed environment. I stopped cluster first, then I copied Cluster ID from NameNode's version file and put it in DataNode's version file, then after restarting cluster, its all fine.
my data path is here /usr/local/hadoop/hadoop_store/hdfs/datanode and /usr/local/hadoop/hadoop_store/hdfs/namenode.
FYI : version file is under /usr/local/hadoop/hadoop_store/hdfs/datanode/current/ ; likewise for NameNode.
Here, the datanode gets stopped immediately because the clusterID of datanode and namenode are different. So you have to format the clusterID of namenode with clusterID of datanode
Copy the datanode clusterID for your example, CID-8bf63244-0510-4db6-a949-8f74b50f2be9 and run following command from your home directory. You can go to your home dir by just typing cd on your terminal.
From your home dir now type the command:
hdfs namenode -format -clusterId CID-8bf63244-0510-4db6-a949-8f74b50f2be9
Delete the namenode and datanode directories as specified in the core-site.xml.
After that create the new directories and restart the dfs and yarn.
I also had the similar issue.
I deleted namenode and datanode folders from all the nodes, and rerun:
$HADOOP_HOME/bin> hdfs namenode -format -force
$HADOOP_HOME/sbin> ./start-dfs.sh
$HADOOP_HOME/sbin> ./start-yarn.sh
To check the health report from command line (which I would recommend)
$HADOOP_HOME/bin> hdfs dfsadmin -report
and I got all the nodes working correctly.
I had same issue for hadoop 2.7.7
I removed the namenode/current & datanode/current directory on namenode and all the datanodes
Removed files at /tmp/hadoop-ubuntu/*
then format namenode & datanode
restart all the nodes.
things work fine
steps:
stop all nodes/managers then attempt below steps
rm -rf /tmp/hadoop-ubuntu/* (all nodes)
rm -r /usr/local/hadoop/data/hdfs/namenode/current (namenode: check hdfs-site.xml for path)
rm -r /usr/local/hadoop/data/hdfs/datanode/current (datanode:check hdfs-site.xml for path)
hdfs namenode -format (on namenode)
hdfs datanode -format (on namenode)
Reboot namenode & data nodes
There's been different solutions to this problem, but I tested another easy solution and it worked like a charm :
So if someone get the same error, you just need to change the clusterID in the datanodes with clusterID of the namenode in the VERSION file.
With your case, here's were you can change it on datanode side :
namenode clusterID = CID-fb61aa70-4b15-470e-a1d0-12653e357a10; datanode clusterID = CID-8bf63244-0510-4db6-a949-8f74b50f2be9
Backup the current VERSION : cp /home/prassanna/usr/local/hadoop/yarn_data/hdfs/datanode/current/VERSION /home/prassanna/usr/local/hadoop/yarn_data/hdfs/datanode/current/VERSION.BK
vim /home/prassanna/usr/local/hadoop/yarn_data/hdfs/datanode/current/VERSION and change
clusterID=CID-8bf63244-0510-4db6-a949-8f74b50f2be9
with
clusterID=CID-fb61aa70-4b15-470e-a1d0-12653e357a10
Restart the datanode and it should work.

Hadoop 2.2.0 jobtracker is not starting

It seems I have no jobtracker with Hadoop 2.2.0. JPS does not show it, there is no one listening on port 50030, and there are no logs about the jobtracker inside the logs folder. Is this because of YARN? How can I configure and start the job tracker?
If you are using YARN framework, there is no jobtracker in it. Its functionality is split and replaced by ResourceManager and ApplicationMaster. Here is expected jps prinout while running YARN
$jps
18509 Jps
17107 NameNode
17170 DataNode
17252 ResourceManager
17309 NodeManager
17626 JobHistoryServer

Datanode Dies After a Few Seconds

I am running Apache Hadoop version 1.0.4.
I followed a tutorial here: http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-single-node-cluster/ with some tweaks to set up Hadoop.
The data node starts when I use the script start-dfs.sh but dies after some time (less then a second).
hduser#mudit-Studio-1535:/usr/local/hadoop/bin$ jps
25672 NameNode
26276 SecondaryNameNode
25970 DataNode
26328 Jps
hduser#mudit-Studio-1535:/usr/local/hadoop/bin$ jps
25672 NameNode
26276 SecondaryNameNode
26360 Jps
Similar problem is encountered when I use start-all.sh
I tried formatting the namenode by ./hadoop namenode -format. The output is
Still I am getting the same problem.

Resources