I have set up a Hadoop cluster of 5 virtual machines , using plain vanilla Hadoop. The cluster details are below:
192.168.1.100 - Configured to Run NameNode and SNN daemons
192.168.1.101 - Configured to Run ResourceManager daemon.
192.168.1.102 - Configured to Run DataNode and NodeManager daemons.
192.168.1.103 - Configured to Run DataNode and NodeManager daemons.
192.168.1.104 - Configured to Run DataNode and NodeManager daemons.
I have kept masters and slaves files in each virtual servers.
masters:
192.168.1.100
192.168.1.101
slaves file:
192.168.1.102
192.168.1.103
192.168.1.104
Now when I run start-all.sh command from NameNode machine, how is it able to start all the daemons? I am not able to understand it? There are no adapters installed (or I am not aware of), there are simple hadoop jars present in all the machines so how is NameNode machine able to start all the daemons in all the machines (virtual servers).
Can anyone help me understand this?
The namenode connects to the slaves via SSH and runs the slave services.
That is why you need public ssh-keys in ~/.ssh/authorized_keys on the slaves, to have their private counterparts be present for the user running the Hadoop namenode.
Related
I have a ubuntu server VM in virtual box(in Mac OSX). And I configured a Hadoop Cluster via docker: 1 master(172.17.0.3), 2 slave nodes(172.17.0.4, 172.17.0.6). After run "./sbin/start-dfs.sh" under Hadoop home folder, I found below error in datanode machine:
Datanode denied communication with namenode because hostname cannot be
resolved (ip=172.17.0.4, hostname=172.17.0.4): DatanodeRegistration(0.0.0.0,
datanodeUuid=4c613e35-35b8-41c1-a027-28589e007e78, infoPort=50075,
ipcPort=50020, storageInfo=lv=-55;cid=CID-9bac5643-1f9f-4bc0-abba-
34dba4ddaff6;nsid=1748115706;c=0)
Because docker does not support bidirectional name linking and further more, my docker version does not allow editing /etc/hosts file, So I use IP address to set name node and slaves. Following is my slaves file:
172.17.0.4
172.17.0.6
After searching on google and stackoverflow, no solution works for my problem. However I guess that Hadoop Namenode regard 172.17.0.4 as a "hostname", so it reports "hostname can not be resolved" where "hostname=172.17.0.4".
Any Suggestions?
Finally I got a solution, which proved my suppose:
1.upgrade my docker to 1.4.1, following instructions from: https://askubuntu.com/questions/472412/how-do-i-upgrade-docker.
2.write IP=>hostname mappings of master and slaves into /etc/hosts
3.use hostname instead of ip address in Hadoop slaves file.
4."run ./sbin/start-dfs.sh"
5.Done!
I have setup mutlinode hadoop with 3 datanodes and 1 namenode using virtualbox on Ubuntu. My host system serves as NameNode (also datanode) and two VMs serve as DataNodes. My systems are:
192.168.1.5: NameNode (also datanode)
192.168.1.10: DataNode2
192.168.1.11: DataNode3
I am able to SSH all systems from each system. My hadoop/etc/hadoop/slaves on all systems have entry as:
192.168.1.5
192.168.1.10
192.168.1.11
hadoop/etc/hadoop/master on all systems have entry as: 192.168.1.5
All core-site.xml, yarn-site.xml, hdfs-site.xml, mapred-site.xml, hadoop-env.sh are same on machines except of missing entry for dfs.namenode.name.dir in hdfs-site.xml in both DataNodes.
When I execute start-yarn.sh and start-dfs.sh from NameNode, all work fine and through JPS I am able to see all required services on all machines.
Jps on NameNode:
5840 NameNode
5996 DataNode
7065 Jps
6564 NodeManager
6189 SecondaryNameNode
6354 ResourceManager
Jps on DataNodes:
3070 DataNode
3213 NodeManager
3349 Jps
However when I want to check from namenode/dfshealth.html#tab-datanode and namenode:50070/dfshealth.html#tab-overview, both indicates only 2 datanodes.
tab-datanode shows NameNode and DataNode2 as active datanodes. DataNode3 is not displayed at all.
I checked all configuration files (mentioned xml, sh and slves/master) multiple times to make sure nothing is different on both datanodes.
Also etc/hosts file also contains all node's entry in all systems:
127.0.0.1 localhost
#127.0.1.1 smishra-VM2
192.168.1.11 DataNode3
192.168.1.10 DataNode2
192.168.1.5 NameNode
One thing I'll like mention is that I configured 1 VM 1st then I made clone of that. So both VMs have same configuration. So its more confusing why 1 datanode is shown but not the other one.
Take a look at http://blog.cloudera.com/blog/2014/01/how-to-create-a-simple-hadoop-cluster-with-virtualbox/
I'll bet that your problems come from the network configuration on your Virtual Box VMs. The post above has a lot of detail around how to ensure that the internal network between the VMs is set up correctly, with forward and reverse name resolution working, no duplicate MAC addresses, etc, which is critical for a Hadoop cluster to work correctly.
I configured properly two node cluster environment for Hadoop, and Master is also configured for datanode as well.
So currently I have two data nodes, without any issue I am able to start all the services in Master.
Slave datanode is also able to stop start from Master Node.
But when I am checking the health by using the url http://<IP>:50070/dfshealth.jsp Live node count is always showing only one not two.
Master Process:
~/hadoop-1.2.0$ jps
9112 TaskTracker
8805 SecondaryNameNode
9182 Jps
8579 DataNode
8887 JobTracker
8358 NameNode
Slave Process:
~/hadoop-1.2.0$ jps
18130 DataNode
18380 Jps
18319 TaskTracker
Please help me to know what I am doing wrong.
The second DataNode is running but not connecting to the NameNode. Chances are you re-formatted the NameNode and now have different version numbers in the NameNode and DataNode.
A fix is to manually delete the directory where the DataNode keeps its data (dfs.datanode.data.dir) and then reformat the NameNode. A less extreme one is to manually edit the version but for study purposes you can just axe the whole directory.
Finally I got the solution,
After #charles input I checked the Datanode logs and got the below error.
org.apache.hadoop.ipc.Client: Retrying connect to server: masternode/192.168.157.132:8020. Already tried 9 time(s);
I was able to do the ssh but there was issue with telnet from datanode to master for 8020 port.
>telnet 192.168.157.132 8020
Trying 192.168.157.132...
telnet: connect to address 192.168.157.132: No route to host
I just added in iptables to allow the port 8020 by using below command and restarted the hadoop services and everything worked fine.
iptables -I INPUT 5 -p tcp --dport 8020 -j ACCEPT
It was the issue with firewall only.
Thanks to all for valuable inputs.
I am trying to configure multinode cluster with one master and slave in my laptop. when i ran the start-all.sh from master all daemon process running in master node but Datanode and tasktracker is not starting on slave node. Password less ssh is enabled and i can do ssh for both master and slave from my masternode without pwd but if i try to do ssh master from slave node it is asking for pwd. is this a problem for not starting daemon process in slave node? do we required password less ssh on both master and slave?
ssh slave from slave node is not asking pwd only to master it is asking. Please give me some solution why i am not able to start the process in slave node from masternode?
You don't need password-less ssh from slave to master, only from master to slave.
A few things to consider:
Can you run hadoop locally on the slave node?
Is the slave node included in the $HADOOP_CONF_DIR/slaves file of the master?
Have you added the slave node in the /etc/hosts file of the master?
Are there any error messages in the log files of the slave?
Is the same version of hadoop installed on the same path in both machines?
I'm setting up a hadoop 2.2 cluster. I have successfully configured a master and a slave. When I enter start-dfs.sh and start-yarn.sh on the master, all the daemons start correctly.
To be specific, on the master the following are running:
DataNode
NodeManager
NameNode
ResourceManager
SecondaryNameNode
On the slave, the following are running:
DataNode
NodeManager
When I open http://master-host:50070 I see that there is only 1 "Live Node" and it is referring to the datanode on the master.
The datanode on the slave is started, but not being able to tell the master that it started. This is the only error I can find:
From /logs/hadoop-hduser-datanode.log on the slave:
WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Problem connecting to server: ec2-xx-xxx-xx-xx.compute-1.amazonaws.com/xx.xxx.xx.xxx:9001
Things I have checked/verified:
9001 is open
both nodes can ssh into each other
both nodes can ping each other
Any suggestions are greatly appreciated.
My issue was in the hosts file:
The hosts file on the slave and master needed to be (they're identical_:
127.0.0.1 localhost
<master internal ip> master
<slave internal ip> slave
For AWS you need to use the internal ip that is something like xx.xxx.xxx.xxx (not the external ip in the ec2-xx-xx-xxx.aws.com and not the ip-xx-xx-xxx).
Also, core-site.xml should refer to the location of hdfs as http://master:9000.