Datanode not allowed to connect to the Namenode in Hadoop 2.3.0 cluster - hadoop

I am trying to set up a Apache Hadoop 2.3.0 cluster , I have a master and three slave nodes , the slave nodes are listed in the $HADOOP_HOME/etc/hadoop/slaves file and I can telnet from the slaves to the Master Name node on port 9000, however when I start the datanode on any of the slaves I get the following exception .
2014-08-03 08:04:27,952 FATAL
org.apache.hadoop.hdfs.server.datanode.DataNode: Initialization failed
for block pool Block pool BP-1086620743-xx.xy.23.162-1407064313305
(Datanode Uuid null) service to
server1.mydomain.com/xx.xy.23.162:9000
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.protocol.DisallowedDatanodeException):
Datanode denied communication with namenode because hostname cannot be
resolved .
The following are the contents of my core-site.xml.
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://server1.mydomain.com:9000</value>
</property>
</configuration>
Also in my hdfs-site.xml I am not setting any value for dfs.hosts or dfs.hosts.exclude properties.
Thanks.

Each node needs fully qualified unique hostname.
Your error says
hostname cannot be resolved
Can you cat /etc/hosts file on your each slave an make them having distnct hostname
After that try again

Related

Hadoop java.net.SocketException: Network is unreachable

I´m executing this command in a 4 node hadoop cluster on the namenode node:
hadoop fs -ls /
But it shows an error:
ls: Failed on local exception: java.net.SocketException:
Network is unreachable; Host Details: local host is "namenode/172.16.1.2";
destination host is: "namenode":9000;
core-site.xml
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://namenode:9000</value>
</property>
</configuration>
cat /etc/hosts:
172.16.1.2 namenode
172.16.1.3 datanode1
172.16.1.4 datanode2
172.16.1.5 datanode3
First try to ping namenode and see what happen. If ping reaches the host, check the firewall via iptables on your current machine and namenode because it is probably blocking related traffic.
For me work setting JVM config
-Djava.net.preferIPv4Stack=true

HBase fails to start in single node cluster mode on Mac OSX

I am trying to get a personal HBase development environment set up. I have hdfs and yarn running, but cannot get HBase to start.
I have started up hadoop 2.7.1, by running start-dfs.sh and start-yarn.sh. I have verified these are running by testing hdfs dfs -mkdir /test and running a sample MR job bundled in the examples, I have browsed HDFS at port 50070.
I have started zookeeper 3.4.6 on port 2181 and set its dataDir. My zoo.cfg has:
dataDir=/Users/.../tools/hd/zookeeper_data
clientPort=2181
I observe its zookeeper_server.PID file in the dataDir I chose, and when I run jps I see the below:
51074 NodeManager
50743 DataNode
50983 ResourceManager
50856 SecondaryNameNode
57848 QuorumPeerMain
58731 Jps
50653 NameNode
QuorumPeerMain above matches the PID in zookeeper_server.PID, as I would expect. Is this expectation correct? From what I have done so far, should it be expected that any more processes should be showing here?
I installed hbase-1.1.2. I configure hbase-site.xml. I set the hbase.rootDir to be hdfs://localhost:8200/hbase, my hdfs is running at localhost:8200. I set hbase.zookeeper.property.dataDir to my zookeeper's dataDir, with the expectation that it will use this property to find the PID of a running zookeeper. Is this expectation correct or have I misunderstood? The config in hbase-site.xml is:
<property>
<name>hbase.rootdir</name>
<value>hdfs://localhost:8020/hbase</value>
</property>
<property>
<name>hbase.zookeeper.property.dataDir</name>
<value>Users/.../tools/hd/zookeeper_data</value>
</property>
When I run start-hbase.sh my server fails to start. I see this log message:
2015-09-26 19:32:43,617 ERROR [main] master.HMasterCommandLine: Master exiting
To investigate I ran hbase master start and get more detail:
2015-09-26 19:41:26,403 INFO [Thread-1] server.NIOServerCnxn: Stat command output
2015-09-26 19:41:26,405 INFO [Thread-1] server.NIOServerCnxn: Closed socket connection for client /127.0.0.1:63334 (no session established for client)
2015-09-26 19:41:26,406 INFO [main] zookeeper.MiniZooKeeperCluster: Started MiniZooKeeperCluster and ran successful 'stat' on client port=2182
Could not start ZK at requested port of 2181. ZK was started at port: 2182. Aborting as clients (e.g. shell) will not be able to find this ZK quorum.
2015-09-26 19:41:26,406 ERROR [main] master.HMasterCommandLine: Master exiting
java.io.IOException: Could not start ZK at requested port of 2181. ZK was started at port: 2182. Aborting as clients (e.g. shell) will not be able to find this ZK quorum.
at org.apache.hadoop.hbase.master.HMasterCommandLine.startMaster(HMasterCommandLine.java:214)
at org.apache.hadoop.hbase.master.HMasterCommandLine.run(HMasterCommandLine.java:139)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.hbase.util.ServerCommandLine.doMain(ServerCommandLine.java:126)
at org.apache.hadoop.hbase.master.HMaster.main(HMaster.java:2304)
So I have a few questions:
Should I be trying to set up a zookeeper before running HBase?
Why when I have started a zookeeper and told HBase where its dataDir is, does HBase try to start its own zookeeper?
Anything obviously stupid/misguided in the above?
The script you are using to start hbase start-hbase.sh will try to start the following components, in order:
zookeeper
hbase master
hbase regionserver
hbase master-backup
So, you could either stop the zookeeper which is started by you (or) you could start the daemons individually yourself:
# start hbase master
bin/hbase-daemon.sh --config ${HBASE_CONF_DIR} start master
# start region server
bin/hbase-daemons.sh --config ${HBASE_CONF_DIR} --hosts ${HBASE_CONF_DIR}/regionservers start regionserver
HBase stand alone starts it's own zookeeper (if you run start-hbase.sh), but it if fails to start or keep running, the other need hbase daemons won't work.
Make sure you explicitly set the properties for your interface lo0 in the hbase-site.xml file:
<property>
<name>hbase.zookeeper.dns.interface</name>
<value>lo0</value>
</property>
<property>
<name>hbase.regionserver.dns.interface</name>
<value>lo0</value>
</property>
<property>
<name>hbase.master.dns.interface</name>
<value>lo0</value>
</property>
I found that when my wifi was on, if these entries were missing, zookeeper filed to start.

hadoop Protocol message tag had invalid wire type

I Set up hadoop 2.6 cluster using two nodes of 8 cores each on Ubuntu 12.04. sbin/start-dfs.sh and sbin/start-yarn.sh both succeed. And I can see the following after jps on the master node.
22437 DataNode
22988 ResourceManager
24668 Jps
22748 SecondaryNameNode
23244 NodeManager
The jps outcome on the slave node is
19693 DataNode
19966 NodeManager
I then run the PI example.
bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.0.jar pi 30 100
Which gives me there error-log
java.io.IOException: Failed on local exception: com.google.protobuf.InvalidProtocolBufferException: Protocol message tag had invalid wire type.; Host Details : local host is: "Master-R5-Node/xxx.ww.y.zz"; destination host is: "Master-R5-Node":54310;
at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:772)
at org.apache.hadoop.ipc.Client.call(Client.java:1472)
at org.apache.hadoop.ipc.Client.call(Client.java:1399)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232)
at com.sun.proxy.$Proxy9.getFileInfo(Unknown Source)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:752)
The problem seems with the HDFS file system since trying out the command bin/hdfs dfs -mkdir /user fails with the similar exception.
java.io.IOException: Failed on local exception: com.google.protobuf.InvalidProtocolBufferException: Protocol message tag had invalid wire type.; Host Details : local host is: "Master-R5-Node/xxx.ww.y.zz"; destination host is: "Master-R5-Node":54310;
where xxx.ww.y.zz is the ip-address of Master-R5-Node
I have checked and followed all the recommendations of ConnectionRefused on Apache and on this site.
Despite the week long effort, I cannot get it fixed.
Thanks.
There are so many reasons to what may lead to the problem I faced. But I finally ended up fixing it using some of the following things.
Make sure that you have the needed permission to the /hadoop and hdfs temporary files. (you have to figure out where that is for your paticular case)
remove the port number from fs.defaultFS in $HADOOP_CONF_DIR/core-site.xml. It should look like this:
`<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://my.master.ip.address/</value>
<description>NameNode URI</description>
</property>
</configuration>`
Add the following two properties to `$HADOOP_CONF_DIR/hdfs-site.xml
<property>
<name>dfs.datanode.use.datanode.hostname</name>
<value>false</value>
</property>
<property>
<name>dfs.namenode.datanode.registration.ip-hostname-check</name>
<value>false</value>
</property>
Voila! You should now be up and running!

Hadoop: How to start secondary namenode on other node?

I'm trying to construct hadoop cluster which consists of 1 namenode, 1 secondary namenode, and 3 datanodes in ec2.
So I wrote the address of secondary namenode to the masters file and executed start-dfs.sh .
:~/hadoop/etc/hadoop$ cat masters
ec2-54-187-222-213.us-west-2.compute.amazonaws.com
But, the secondary namenode didn't start at the address which was written in the masters file. It just started at the node where the stat-dfs.sh script was executed.
:~/hadoop/etc/hadoop$ start-dfs.sh
...
Starting secondary namenodes [0.0.0.0]
0.0.0.0: starting secondarynamenode, logging to /home/ubuntu/hadoop/logs/hadoop-ubuntu-secondarynamenode-ip-172-31-26-190.out
I don't figure why secondary namenode started at [0.0.0.0]. It should start at ec2-54-187-222-213.us-west-2.compute.amazonaws.com.
Are there anyone who know this reason?
============================================================
Oh I solved this problem. I added
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>ec2-54-187-222-213.us-west-2.compute.amazonaws.com:50090</value>
</property>
to hdfs-site.xml file and it works! The masters file is useless.
It is okay, as long as the node roles are configured correctly in hadoop configuration. You can use dfsadmin to check the IP address of the secondary namenode. If it is 172.31.26.190 then it means fine. The secondary namenode serves at 0.0.0.0 means it accepts any incoming connections from localhost or from any nodes within the network.

Start Hadoop 50075 Port is not resolved

I have installed Hadoop in my system, Jobtracker : localhost:50030/jobtracker.jsp is working fine but localhost:50075/ host is not resolved. Can anybody help my what is the problem in my Ubuntu system. Below check my code-site.xml configuration :
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>
never seen 50075 before, but 50070 is the local NameNode, I suggest you format the NameNode to have a try:
rm -r /tmp/hadoop-*;
bin/hadoop namenode -format;
./bin/start-all.sh
Check your port configurations. Here is a list of hadoop daemon configurable parameters
dfs.http.address: The address and the base port where the dfs namenode web ui will listen on. If the port is 0 then the server will start on a free port.
The default for the name node is usually set to port 50070, so try localhost:50070.
http://hadoop.apache.org/common/docs/r0.20.2/hdfs-default.html
I think you mean core-site.xml (not code-site.xml). The fs.default.name configuration here does not determine the location of the hadoop web ui/dashboard. This is the port used by data nodes to communicate with the name node.

Resources