Hadoop Cluster - "hadoop" user ssh communication - hadoop

I am setting up Hadoop 2.7.3 cluster on EC2 servers - 1 NameNode, 1 Secondary NameNode and 2 DataNodes.
Hadoop core uses SSH for communication with slaves to launch the processes on the slave node.
Do we need to have same SSH keys on all the nodes for the hadoop user?
What is the best practice/ideal way to copy or add the NameNode to Slave nodes SSH credentials?

Do we need to have same SSH keys on all the nodes for the hadoop user?
The same public key needs to be on all of the nodes
What is the best practice/ideal way to copy or add the NameNode to
Slave nodes SSH credentials?
Per documentation:
Namenode: Password Less SSH
Password-less SSH between the name nodes and the data nodes. Let us
create a public-private key pair for this purpose on the namenode.
namenode> ssh-keygen
Use the default (/home/ubuntu/.ssh/id_rsa) for the key location and
hit enter for an empty passphrase.
Datanodes: Setup Public Key
The public key is saved in /home/ubuntu/.ssh/id_rsa.pub. We need to
copy this file from the namenode to each data node and append the
contents to /home/ubuntu/.ssh/authorized_keys on each data node.
datanode1> cat id_rsa.pub >> ~/.ssh/authorized_keys
datanode2> cat id_rsa.pub >> ~/.ssh/authorized_keys
datanode3> cat id_rsa.pub >> ~/.ssh/authorized_keys
Namenode: Setup SSH Config
SSH uses a configuration file located at ~/.ssh/config for various
parameters. Set it up as shown below. Again, substitute each node’s
Public DNS for the HostName parameter (for example, replace
with EC2 Public DNS for NameNode).
Host nnode
HostName <nnode>
User ubuntu
IdentityFile ~/.ssh/id_rsa
Host dnode1
HostName <dnode1>
User ubuntu
IdentityFile ~/.ssh/id_rsa
Host dnode2
HostName <dnode2>
User ubuntu
IdentityFile ~/.ssh/id_rsa
Host dnode3
HostName <dnode3>
User ubuntu
IdentityFile ~/.ssh/id_rsa
At this point, verify that password-less operation works on each node
as follows (the first time, you will get a warning that the host is
unknown and whether you want to connect to it. Type yes and hit enter.
This step is needed once only):
namenode> ssh nnode
namenode> ssh dnode1
namenode> ssh dnode2
namenode> ssh dnode3

Related

ssh key setting for hadoop connection in mutli clusters

I know that ssh key connection should be required for the hadoop operation.
Suppose that there are five clusters consisting of one namenode and four data nodes.
By setting the ssh key connection, we can connect from namenode to datanode and vice versa.
Note that two-way connection should be required for hadoop operation, which means that only one side (namenode to datanode, but not connect to from datanode to namenode) is not possible to operate hadoop as far as I know.
For above scenario, if we have 50 nodes or 100 nodes, it is very laborious jobs to configure all the ssh-key command by connecting the machine and typing same commands ssh-keygen -t ...
For these reasons, I have tried to script the shell code and but failed to do it in an automatic way.
my code is as below.
list.txt
namenode1
datanode1
datanode2
datanode3
datanode4
datanode5
...
cat list.txt | while read server
do
ssh $server 'ssh-keygen' < /dev/null
while read otherserver
do
ssh $server 'ssh-copy-id $otherserver' < /dev/null
done
done
However, it didn't work. As you can understand, the code means that it iterates over all the nodes and creates the key and then copy the generated key into other server using the ssh-copy-id command. But the code didn't work.
So my question is that how to script the codes which enables ssh connection (bothways) using shell scripts...It takes a lot of time for me to achieve it and I cannot find any document describing the ssh connection for multi nodes for avoiding laborious tasks.
You only need to create a public/private key pair at the master node, then use ssh-copy-id -i ~/.ssh/id_rsa.pub $server in the loop. And the master should be in the loop. And there is no need to do this in reverse at the namenodes. The keys have to belong and installed by the user that is running the hadoop cluster. After running the script, you should be able to ssh to all namenodes, as the hadoop user, without using a password.

Setup SSH keys but server still prompts for password?

ssh localhost
ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
i followed all above steps in my teminal for disable the password to start the hadoop services ($start-all.sh) but it is still asking for password please anyone help me to disable password..
Please refer the below link to setup password less ssh it has a best example to get more clarity on ssh setup
https://www.tecmint.com/ssh-passwordless-login-using-ssh-keygen-in-5-easy-steps/
Hope this Helps!!!..
I had a world of problems with permissions and the .ssh directory.
I think the permissions had to be 600 too but I can't remember exactly.
Good luck
If you are doing a multi-node setup, all the nodes must be able to communicate with one another without password. On each node, you generate SSH keys. For example using this command :
ssh-keygen -t rsa -b 4096 -C someemail#.example.com
Then you replicate the keys to all the nodes :
ssh-copy-id hadoop#master
ssh-copy-id hadoop#slave-01
ssh-copy-id hadoop#slave-02
etc.
This needs to be done on each node (every node should have all the keys).
Hope this help !
It worked for me.
Use ssh-keygen on local server to generate public and private keys.
$ssh-keygenEnter passphrase (empty for no passphrase):
Enter same passphrase again:
the ssh-copy-id copies public key to the remote host
ssh-copy-id copies public key to remote host
Use ssh-copy-id, to copy the public key to the remote host
ssh-copy-id -i ~/.ssh/id_rsa.pub 192.168.200.10
Perform rsync/SCP over ssh without password**
Now, you should be able to ssh to the remote host without entering the password.
ssh 192.168.200.10
Perform the rsync again, it should not ask you to enter any password this time
rsync -avz -e ssh /home/Sangita/ sangita#192.168.200.10:/backup/Sangita/
or
scp /home/Sangita/ sangita#192.168.200.10:/backup/Sangita

Need password each time start hadoop

I installed the Hadoop 2.6.4 on the Ubuntu server, and i use SSH to login the Ubuntu server from my Mac, since the rsa key was used for login, so i don't have to input any password. but when i run the start_dfs.sh to start the server, i do have to input the password for each service as below:
jianrui#cloudfoundry:~$ start-dfs.sh
Starting namenodes on [localhost]
Password:
localhost: starting namenode, logging to /home/jianrui/hadoop-2.6.4/logs/hadoop-dingjianrui-namenode-cloudfoundry.out
Password:
localhost: starting datanode, logging to /home/jianrui/hadoop-2.6.4/logs/hadoop-dingjianrui-datanode-cloudfoundry.out
Starting secondary namenodes [0.0.0.0]
Password:
0.0.0.0: starting secondarynamenode, logging to /home/jianrui/hadoop-2.6.4/logs/hadoop-dingjianrui-secondarynamenode-cloudfoundry.out
dingjianrui#cloudfoundry:~$
I am able to resolve the issue using below commands.
The following commands are used for generating a key value pair using SSH. Copy the public keys form id_rsa.pub to authorized_keys, and provide the owner with read and write permissions to authorized_keys file respectively.
$ ssh-keygen -t rsa
$ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
$ chmod 0600 ~/.ssh/authorized_keys
If you tried all of this you don't succeed.
Try below.
$ ssh-keygen -t rsa -P ""
$ ssh-copy-id -i ~/.ssh/id_rsa [id]#[domain]
I'm using Redhat7
If you may use IP address instead of a domain name.
and
You wanna use a domain name easily.
Edit your /etc/hosts file.
ex> 192.168.0.11 cluster01
The matter is just how to copy a key file to another machine.
I succeed so easily.

how to do passwordless ssh access to slave while starting services of hadoop in its multinod cluster..

I've installed multinode cluster of hadoop. Now I am trying to do passwordless ssh access to slave. i.e. my problem is that,when I start services from master it asks me password to start every service, and takes much time to start it.If anyone has solution then please help me
You have to generate and copy RSA key from Namenode to all the Datanodes.
user#namenode:~> ssh-keygen -t rsa
Just press 'Enter' for any passphrase
user#namenode:~> ssh user#datanode mkdir -p .ssh
user#datanode's password:
Finally append namenode's new public key to user#datanode:.ssh/authorized_keys and enter datanode's password one last time:
user#namenode:~> cat .ssh/id_rsa.pub | ssh user#datanoe 'cat >> .ssh/authorized_keys'
user#datanode's password:
You can test by
user#namenode:~> ssh user#datanode

50 nodes hadoop passphraseless

My question is very simple, I want to setup a 50 nodes hadoop cluster, how can I setup the passphraseless between the 50 nodes. if manually operating is very difficult! Thanks in advance!
You don't need to setup SSH between the nodes, it is sufficient to have it unidirectional between the master and the slaves. (So only the master must access the slaves without password).
The usual approach is to write a bash script that loops over your slaves file and logs into your slave copying the public key of the master into the authorized keys of the slaves.
You can see a small workthrough on Praveen Sripati's blog.
However, I'm no admin so I can't tell you if there is a smarter way. Maybe this is better suited on Superuser.com
Maybe this can help:
To work seamlessly, SSH needs to be set up to allow password-less
login for the hadoop user from machines in the cluster. The simplest
way to achieve this is to generate a public/private key pair, and
place it in an NFS location that is shared across the cluster.
First,
generate an RSA key pair by typing the following in the hadoop user
account:
% ssh-keygen -t rsa -f ~/.ssh/id_rsa
Even though we want
password-less logins, keys without passphrases are not considered good
practice (it’s OK to have an empty passphrase when running a local
pseudodistributed cluster, as described in Appendix A), so we specify
a passphrase when prompted for one. We shall use ssh-agent to avoid
the need to enter a password for each connection.
The private key is
in the file specified by the -f option, ~/.ssh/id_rsa, and the public
key is stored in a file with the same name with .pub appended,
~/.ssh/id_rsa.pub.
Next we need to make sure that the public key is in
the ~/.ssh/authorized_keys file on all the machines in the cluster
that we want to connect to. If the hadoop user’s home directory is an
NFS filesystem, as described earlier, then the keys can be shared
across the cluster by typing:
% cat ~/.ssh/id_rsa.pub >>
~/.ssh/authorized_keys
If the home directory is not shared using NFS,
then the public keys will need to be shared by some other means.
Test
that you can SSH from the master to a worker machine by making sure
sshagent is running,3 and then run ssh-add to store your passphrase.
You should be able to ssh to a worker without entering the passphrase
again.
Source:
Tom White, Hadoop: The Definitive Guide, page 301
Found it googling here:
https://www.google.rs/url?sa=t&rct=j&q=&esrc=s&source=web&cd=22&cad=rja&ved=0CDYQFjABOBQ&url=http%3A%2F%2Fbigdata.googlecode.com%2Ffiles%2FOreilly.Hadoop.The.Definitive.Guide.3rd.Edition.Jan.2012.pdf&ei=sGzZULb6OfOM0wWhlYDYAw&usg=AFQjCNGvNUZcQBvM_Ucqf_K0JGAlCRxr3A&sig2=Qpa_KZyP1mXXm9yQv0ynRw&bvm=bv.1355534169,d.d2k

Resources