Hadoop Multinode Cluster, slave permission denied - hadoop

I'm trying to do multinode cluster (actually with 2 nodes - 1 master and 1 slave) on Hadoop. I follow the instruction Multinode Cluster for Hadoop 2.x
When I execute the order:
./sbin/start-all.sh
I got the error message for my slave node:
slave: Permission denied (publickey)
I already modified both .ssh/authorized_keys files on master and slave and add the keyprint from .ssh/id_rsa.pub from master and slave.
Finally I restarted the ssh with the next command sudo service ssh restart also on the both nodes (master and slaves).
By the executing of the order ./sbin/start-all.sh I don't have a problem with the master node, but slave node get me back the error message permission denied.
Has anybody some ideas, why I can not see the slave node?
The execution of the jps order get me currently next result:
master
18339 Jps
17717 SecondaryNameNode
18022 NodeManager
17370 NameNode
17886 ResourceManager
slave
2317 Jps
I think, master is ok, but I have troubles with slave.

After ssh-keygen on the Master. Copy the id_rsa.pub to the authorized_keys using cat id_rsa.pub >> authorized_keys on all the slaves. Test the password-less ssh using:
ssh <slave_node_IP>

if you have copied the whole hadoop folder from master to slave nodes(for easy replication), make sure that the slave node's hadoop folder has the correct owner from the slave system.
chown * 777 <slave's username> </path/to/hadoop>
I ran this command on my slave system and it solved my problem.

Related

How does master node start all the process in a hadoop cluster?

I have set up a Hadoop cluster of 5 virtual machines , using plain vanilla Hadoop. The cluster details are below:
192.168.1.100 - Configured to Run NameNode and SNN daemons
192.168.1.101 - Configured to Run ResourceManager daemon.
192.168.1.102 - Configured to Run DataNode and NodeManager daemons.
192.168.1.103 - Configured to Run DataNode and NodeManager daemons.
192.168.1.104 - Configured to Run DataNode and NodeManager daemons.
I have kept masters and slaves files in each virtual servers.
masters:
192.168.1.100
192.168.1.101
slaves file:
192.168.1.102
192.168.1.103
192.168.1.104
Now when I run start-all.sh command from NameNode machine, how is it able to start all the daemons? I am not able to understand it? There are no adapters installed (or I am not aware of), there are simple hadoop jars present in all the machines so how is NameNode machine able to start all the daemons in all the machines (virtual servers).
Can anyone help me understand this?
The namenode connects to the slaves via SSH and runs the slave services.
That is why you need public ssh-keys in ~/.ssh/authorized_keys on the slaves, to have their private counterparts be present for the user running the Hadoop namenode.

unable to start start-dfs.sh in Hadoop Multinode cluster

I have created a hadoop multinode cluster and also configured SSH in both master and slave nodes now i can connect to slave without password in master node
But when i try to start-dfs.sh in master node I'm unable to connect to slave node the execution stops at below line
log:
HNname#master:~$ start-all.sh
starting namenode, logging to /usr/local/hadoop/libexec/../logs/hadoop-HNname-namenode-master.out
HDnode#slave's password: master: starting datanode, logging to /usr/local/hadoop/libexec/../logs/hadoop-HNname-datanode-master.out
I pressed Enter
slave: Connection closed by 192.168.0.2
master: starting secondarynamenode, logging to /usr/local/hadoop/libexec/../logs/hadoop-HNname-secondarynamenode-master.out
jobtracker running as process 10396. Stop it first.
HDnode#slave's password: master: starting tasktracker, logging to /usr/local/hadoop/libexec/../logs/hadoop-HNname-tasktracker-master.out
slave: Permission denied, please try again.
HDnode#slave's password:
after entering the slave password the connection is closed
Below things I have tried but no results:
formatted namenode in both master & slave node
created new ssh key and configured in both the nodes
override the default HADOOP_LOG_DIR form the this post
I think you missed this step "Add the SSH Public Key to the authorized_keys file on your target hosts"
Just redo the password-less ssh step correctly. Follow this:
Generate public and private SSH keys
ssh-keygen
Copy the SSH Public Key (id_rsa.pub) to the root account on your
target hosts
.ssh/id_rsa
.ssh/id_rsa.pub
Add the SSH Public Key to the authorized_keys file on your target
hosts
cat id_rsa.pub >> authorized_keys
Depending on your version of SSH, you may need to set permissions on
the .ssh directory (to 700) and the authorized_keys file in that
directory (to 600) on the target hosts.
chmod 700 ~/.ssh
chmod 600 ~/.ssh/authorized_keys
Check the connection:
ssh root#<remote.target.host>
where <remote.target.host> has the value of each host name in your cluster.
If the following warning message displays during your first
connection: Are you sure you want to continue connecting (yes/no)?
Enter Yes.
Refer: Set Up Password-less SSH
Note: password will not be asked, if your passwordless ssh is setup properly.
Make sure to start hadoop services with a new user called hadoop.
Then make sure to add the public key to the slaves with that new user.
If this doesn't work, check your firewall or iptables
I hope it helps
That mean you haven't created public key properly.
Follow below sequence.
Create User
Give all required permissions to that user
Generate public key with same user
Format Name Node
Start hadoop services.
Now it should not ask for password.

Hadoop slaves file regard IP as hostname?

I have a ubuntu server VM in virtual box(in Mac OSX). And I configured a Hadoop Cluster via docker: 1 master(172.17.0.3), 2 slave nodes(172.17.0.4, 172.17.0.6). After run "./sbin/start-dfs.sh" under Hadoop home folder, I found below error in datanode machine:
Datanode denied communication with namenode because hostname cannot be
resolved (ip=172.17.0.4, hostname=172.17.0.4): DatanodeRegistration(0.0.0.0,
datanodeUuid=4c613e35-35b8-41c1-a027-28589e007e78, infoPort=50075,
ipcPort=50020, storageInfo=lv=-55;cid=CID-9bac5643-1f9f-4bc0-abba-
34dba4ddaff6;nsid=1748115706;c=0)
Because docker does not support bidirectional name linking and further more, my docker version does not allow editing /etc/hosts file, So I use IP address to set name node and slaves. Following is my slaves file:
172.17.0.4
172.17.0.6
After searching on google and stackoverflow, no solution works for my problem. However I guess that Hadoop Namenode regard 172.17.0.4 as a "hostname", so it reports "hostname can not be resolved" where "hostname=172.17.0.4".
Any Suggestions?
Finally I got a solution, which proved my suppose:
1.upgrade my docker to 1.4.1, following instructions from: https://askubuntu.com/questions/472412/how-do-i-upgrade-docker.
2.write IP=>hostname mappings of master and slaves into /etc/hosts
3.use hostname instead of ip address in Hadoop slaves file.
4."run ./sbin/start-dfs.sh"
5.Done!

start-all.sh not working to run the process on slave node

I am trying to configure multinode cluster with one master and slave in my laptop. when i ran the start-all.sh from master all daemon process running in master node but Datanode and tasktracker is not starting on slave node. Password less ssh is enabled and i can do ssh for both master and slave from my masternode without pwd but if i try to do ssh master from slave node it is asking for pwd. is this a problem for not starting daemon process in slave node? do we required password less ssh on both master and slave?
ssh slave from slave node is not asking pwd only to master it is asking. Please give me some solution why i am not able to start the process in slave node from masternode?
You don't need password-less ssh from slave to master, only from master to slave.
A few things to consider:
Can you run hadoop locally on the slave node?
Is the slave node included in the $HADOOP_CONF_DIR/slaves file of the master?
Have you added the slave node in the /etc/hosts file of the master?
Are there any error messages in the log files of the slave?
Is the same version of hadoop installed on the same path in both machines?

DataNode can't talk to NameNode in Hadoop 2.2

I'm setting up a hadoop 2.2 cluster. I have successfully configured a master and a slave. When I enter start-dfs.sh and start-yarn.sh on the master, all the daemons start correctly.
To be specific, on the master the following are running:
DataNode
NodeManager
NameNode
ResourceManager
SecondaryNameNode
On the slave, the following are running:
DataNode
NodeManager
When I open http://master-host:50070 I see that there is only 1 "Live Node" and it is referring to the datanode on the master.
The datanode on the slave is started, but not being able to tell the master that it started. This is the only error I can find:
From /logs/hadoop-hduser-datanode.log on the slave:
WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Problem connecting to server: ec2-xx-xxx-xx-xx.compute-1.amazonaws.com/xx.xxx.xx.xxx:9001
Things I have checked/verified:
9001 is open
both nodes can ssh into each other
both nodes can ping each other
Any suggestions are greatly appreciated.
My issue was in the hosts file:
The hosts file on the slave and master needed to be (they're identical_:
127.0.0.1 localhost
<master internal ip> master
<slave internal ip> slave
For AWS you need to use the internal ip that is something like xx.xxx.xxx.xxx (not the external ip in the ec2-xx-xx-xxx.aws.com and not the ip-xx-xx-xxx).
Also, core-site.xml should refer to the location of hdfs as http://master:9000.

Resources