I have set up 2 node Hadoop cluster of version Cloudera Hadoop 2.7.1.:
namenode and
datanode.
Is it be possible to make namenode work as datanode as well?
Though in namenode's HDFS-site.xml I have added datanode entry and in slaves I have added namenode's host name still it is not working as datanode.
Here are my files :
# masters # : NameNode
# slaves # : NameNode
dataprocessor1
hdfs-site.xml:
This hdfs-site.xml is from NameNode. Here in XML file I have added both properties for namenode as well as datanode.
Related
I have setup a hadoop cluster with 1 namenode and 1 datanode (using hadoop version 1.2.1) but when I start both nodes, the namenode service dies (does not appear in list of running java processes) within seconds (datanode service remains up). Can anyone please help me with the reason?
I have tried - removing the temporary files and then re-formatting the namenode before starting the namenode again but that did not help.
I have attached the screenshots of my core-site.xml and hdfs-site.xml entries for both my namenode and datanodes.
Please let me know the reason if possible.
hadoop version and location screenshot
core-site.xml of namenode
hdfs-site.xml of namenode
No errors in formatting namenode
jps listing and unstable namenode
hdfs-site.xml of datanode
namenode log
I'm learning Hadoop-2.9.2 especially on namenode HA.
At the first time of cluster setting, I put dfs.namenode.secondary.http-address into hdfs-site.xml and it works properly.
After reading this, I found out there is no need of secondary namenode under hadoop HA.
Note that, in an HA cluster, the Standby NameNode also performs
checkpoints of the namespace state, and thus it is not necessary to
run a Secondary NameNode, CheckpointNode, or BackupNode in an HA
cluster. In fact, to do so would be an error. This also allows one who
is reconfiguring a non-HA-enabled HDFS cluster to be HA-enabled to
reuse the hardware which they had previously dedicated to the
Secondary NameNode.
finally, I removed dfs.namenode.secondary.http-address from hdfs.site.xml and restarted the cluster but secondarynamenode on 0.0.0.0 is still there.
here is my hdfs-site.xml below.
<configuration>
<property>
<name>dfs.namenode.name.dir</name>
<value>/home/hadoop/data/nameNode</value>
<name>dfs.datanode.data.dir</name>
<value>/home/hadoop/data/datanode</value>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
hdfs command shows me the same like below.
$ hdfs getconf -secondarynamenodes
18/12/21 00:35:43 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
0.0.0.0
I ensure that all config files are synchronized across all nodes.
According to my understanding, I need to get gid of secondarynamenode before getting into HA setting.
If I'm not wrong, How can I handle with this?
I do know that I can do hadoop-daemon.sh stop secondarynamenode but I don't want it to start at start-dfs.sh.
I'm deploying hadoop as a multi node cluster (distributed mode). But each data node is having different different cluster id.
On slave1,
java.io.IOException: Incompatible clusterIDs in /home/pushuser1/hadoop/tmp/dfs/data: namenode clusterID = CID-c72a7d30-ec64-4e4f-9a80-e6f9b6b1d78c; datanode clusterID = CID-2ecca585-6672-476e-9931-4cfef9946c3b
On slave2,
java.io.IOException: Incompatible clusterIDs in /home/pushuser1/hadoop/tmp/dfs/data: namenode clusterID = CID-c72a7d30-ec64-4e4f-9a80-e6f9b6b1d78c; datanode clusterID = CID-e24b0548-2d8d-4aa4-9b8c-a336193c006e
I followed this link as well Datanode not starts correctly but I dont know which cluster id I should pick. If I pick any then data node starts on that machine but not on another one. And also when I format namenode using basic command (hadoop namenode - format), datanodes on each slave nodes are started but then namenode on master machine doesn't get started.
ClusterIDs of datanodes and namenodes should match, then only datanodes can effectively communicate with namenode. If you do namenode format new ClusterID will be assigned for namenodes then ClusterIDs in datanodes won't match.
You can locate a VERSION files in your /home/pushuser1/hadoop/tmp/dfs/data/current/ (datanode directory ) as well as namenode directory(/home/pushuser1/hadoop/tmp/dfs/name/current/ based on the value your specified for dfs.namenode.name.dir) that contains the ClusterID.
If you are ready for format your hdfs namenode, Stop all HDFS services, Clear out all files inside the following directories
rm -rf /home/pushuser1/hadoop/tmp/dfs/data/* (Need to execute on all data nodes)
rm -rf /home/pushuser1/hadoop/tmp/dfs/name/*
and format hdfs again (hadoop namenode -format )
I am trying to set up a Apache Hadoop 2.3.0 cluster , I have a master and three slave nodes , the slave nodes are listed in the $HADOOP_HOME/etc/hadoop/slaves file and I can telnet from the slaves to the Master Name node on port 9000, however when I start the datanode on any of the slaves I get the following exception .
2014-08-03 08:04:27,952 FATAL
org.apache.hadoop.hdfs.server.datanode.DataNode: Initialization failed
for block pool Block pool BP-1086620743-xx.xy.23.162-1407064313305
(Datanode Uuid null) service to
server1.mydomain.com/xx.xy.23.162:9000
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.protocol.DisallowedDatanodeException):
Datanode denied communication with namenode because hostname cannot be
resolved .
The following are the contents of my core-site.xml.
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://server1.mydomain.com:9000</value>
</property>
</configuration>
Also in my hdfs-site.xml I am not setting any value for dfs.hosts or dfs.hosts.exclude properties.
Thanks.
Each node needs fully qualified unique hostname.
Your error says
hostname cannot be resolved
Can you cat /etc/hosts file on your each slave an make them having distnct hostname
After that try again
I'm trying to construct hadoop cluster which consists of 1 namenode, 1 secondary namenode, and 3 datanodes in ec2.
So I wrote the address of secondary namenode to the masters file and executed start-dfs.sh .
:~/hadoop/etc/hadoop$ cat masters
ec2-54-187-222-213.us-west-2.compute.amazonaws.com
But, the secondary namenode didn't start at the address which was written in the masters file. It just started at the node where the stat-dfs.sh script was executed.
:~/hadoop/etc/hadoop$ start-dfs.sh
...
Starting secondary namenodes [0.0.0.0]
0.0.0.0: starting secondarynamenode, logging to /home/ubuntu/hadoop/logs/hadoop-ubuntu-secondarynamenode-ip-172-31-26-190.out
I don't figure why secondary namenode started at [0.0.0.0]. It should start at ec2-54-187-222-213.us-west-2.compute.amazonaws.com.
Are there anyone who know this reason?
============================================================
Oh I solved this problem. I added
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>ec2-54-187-222-213.us-west-2.compute.amazonaws.com:50090</value>
</property>
to hdfs-site.xml file and it works! The masters file is useless.
It is okay, as long as the node roles are configured correctly in hadoop configuration. You can use dfsadmin to check the IP address of the secondary namenode. If it is 172.31.26.190 then it means fine. The secondary namenode serves at 0.0.0.0 means it accepts any incoming connections from localhost or from any nodes within the network.