ssh key setting for hadoop connection in mutli clusters - hadoop

I know that ssh key connection should be required for the hadoop operation.
Suppose that there are five clusters consisting of one namenode and four data nodes.
By setting the ssh key connection, we can connect from namenode to datanode and vice versa.
Note that two-way connection should be required for hadoop operation, which means that only one side (namenode to datanode, but not connect to from datanode to namenode) is not possible to operate hadoop as far as I know.
For above scenario, if we have 50 nodes or 100 nodes, it is very laborious jobs to configure all the ssh-key command by connecting the machine and typing same commands ssh-keygen -t ...
For these reasons, I have tried to script the shell code and but failed to do it in an automatic way.
my code is as below.
list.txt
namenode1
datanode1
datanode2
datanode3
datanode4
datanode5
...
cat list.txt | while read server
do
ssh $server 'ssh-keygen' < /dev/null
while read otherserver
do
ssh $server 'ssh-copy-id $otherserver' < /dev/null
done
done
However, it didn't work. As you can understand, the code means that it iterates over all the nodes and creates the key and then copy the generated key into other server using the ssh-copy-id command. But the code didn't work.
So my question is that how to script the codes which enables ssh connection (bothways) using shell scripts...It takes a lot of time for me to achieve it and I cannot find any document describing the ssh connection for multi nodes for avoiding laborious tasks.

You only need to create a public/private key pair at the master node, then use ssh-copy-id -i ~/.ssh/id_rsa.pub $server in the loop. And the master should be in the loop. And there is no need to do this in reverse at the namenodes. The keys have to belong and installed by the user that is running the hadoop cluster. After running the script, you should be able to ssh to all namenodes, as the hadoop user, without using a password.

Related

Hadoop Cluster - "hadoop" user ssh communication

I am setting up Hadoop 2.7.3 cluster on EC2 servers - 1 NameNode, 1 Secondary NameNode and 2 DataNodes.
Hadoop core uses SSH for communication with slaves to launch the processes on the slave node.
Do we need to have same SSH keys on all the nodes for the hadoop user?
What is the best practice/ideal way to copy or add the NameNode to Slave nodes SSH credentials?
Do we need to have same SSH keys on all the nodes for the hadoop user?
The same public key needs to be on all of the nodes
What is the best practice/ideal way to copy or add the NameNode to
Slave nodes SSH credentials?
Per documentation:
Namenode: Password Less SSH
Password-less SSH between the name nodes and the data nodes. Let us
create a public-private key pair for this purpose on the namenode.
namenode> ssh-keygen
Use the default (/home/ubuntu/.ssh/id_rsa) for the key location and
hit enter for an empty passphrase.
Datanodes: Setup Public Key
The public key is saved in /home/ubuntu/.ssh/id_rsa.pub. We need to
copy this file from the namenode to each data node and append the
contents to /home/ubuntu/.ssh/authorized_keys on each data node.
datanode1> cat id_rsa.pub >> ~/.ssh/authorized_keys
datanode2> cat id_rsa.pub >> ~/.ssh/authorized_keys
datanode3> cat id_rsa.pub >> ~/.ssh/authorized_keys
Namenode: Setup SSH Config
SSH uses a configuration file located at ~/.ssh/config for various
parameters. Set it up as shown below. Again, substitute each node’s
Public DNS for the HostName parameter (for example, replace
with EC2 Public DNS for NameNode).
Host nnode
HostName <nnode>
User ubuntu
IdentityFile ~/.ssh/id_rsa
Host dnode1
HostName <dnode1>
User ubuntu
IdentityFile ~/.ssh/id_rsa
Host dnode2
HostName <dnode2>
User ubuntu
IdentityFile ~/.ssh/id_rsa
Host dnode3
HostName <dnode3>
User ubuntu
IdentityFile ~/.ssh/id_rsa
At this point, verify that password-less operation works on each node
as follows (the first time, you will get a warning that the host is
unknown and whether you want to connect to it. Type yes and hit enter.
This step is needed once only):
namenode> ssh nnode
namenode> ssh dnode1
namenode> ssh dnode2
namenode> ssh dnode3

Hadoop alternate SSH key

I'm setting up a multinode hadoop cluster and have a shared key to passwordless SSH between nodes. I named the file ~/.ssh/hadoop_rsa and can connect to other hosts using ssh -i ~/.ssh/hadoop_rsa host.
I need some way to tell hadoop to use this alternate SSH key when connecting to other nodes.
It appears that commands are run on each slave using the script:
$HADOOP_HOME/sbin/slaves.sh
That script includes a reference to the environment variable $HADOOP_SSH_OPTS when calling ssh. I was able to tell Hadoop to use a different key file by setting an environment variable like this:
export HADOOP_SSH_OPTS="-i ~/.ssh/hadoop_rsa"
Thanks to Varun on the Hadoop mailing list for pointing me in the right direction

how to do passwordless ssh access to slave while starting services of hadoop in its multinod cluster..

I've installed multinode cluster of hadoop. Now I am trying to do passwordless ssh access to slave. i.e. my problem is that,when I start services from master it asks me password to start every service, and takes much time to start it.If anyone has solution then please help me
You have to generate and copy RSA key from Namenode to all the Datanodes.
user#namenode:~> ssh-keygen -t rsa
Just press 'Enter' for any passphrase
user#namenode:~> ssh user#datanode mkdir -p .ssh
user#datanode's password:
Finally append namenode's new public key to user#datanode:.ssh/authorized_keys and enter datanode's password one last time:
user#namenode:~> cat .ssh/id_rsa.pub | ssh user#datanoe 'cat >> .ssh/authorized_keys'
user#datanode's password:
You can test by
user#namenode:~> ssh user#datanode

Setup Pseudo Distributed / Single Node Setup Apache Hadoop 2.2

I have installed Apache Hadoop 2.2 as Single Node Cluster. When I am trying to execute giraph example, it ends up with error "LocalJobRunner, you cannot run in split master/worker mode since there is only 1 task at a time".
I was going through forums, and I found that I can update mapred-site.xml to have 4 mappers. I tried that but still no help. I came across, one more forum were I can change single node setup to behave as pseudo distributed mode and it resolved the issue.
Can someone please let me know, which config files do I need to change to get single node setup behave as pseudo distributed mode.
Adding to renZzz answer, You also need to check that if you can ssh to the localhost without a passphrase:
$ ssh localhost
If you cannot ssh to localhost without a passphrase, execute the following commands:
$ ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa
$ cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys
Following link can help you- https://hadoop.apache.org/docs/current2/hadoop-project-dist/hadoop-common/SingleNodeSetup.html
for my first setup, I followed some manuals, but surely the best one for single node setup, was the pdf Apache Hadoop YARN_sample. I recommond you to use this manual step by step
First, ensure that the number of workers is one. Then, you need to configure Giraph not to split workers and master via:
giraph.SplitMasterWorker=false
You can either set it in giraph-site.xml or pass via command
line option:
-ca giraph.SplitMasterWorker=false
Ref:
https://www.mail-archive.com/user#giraph.apache.org/msg01631.html

50 nodes hadoop passphraseless

My question is very simple, I want to setup a 50 nodes hadoop cluster, how can I setup the passphraseless between the 50 nodes. if manually operating is very difficult! Thanks in advance!
You don't need to setup SSH between the nodes, it is sufficient to have it unidirectional between the master and the slaves. (So only the master must access the slaves without password).
The usual approach is to write a bash script that loops over your slaves file and logs into your slave copying the public key of the master into the authorized keys of the slaves.
You can see a small workthrough on Praveen Sripati's blog.
However, I'm no admin so I can't tell you if there is a smarter way. Maybe this is better suited on Superuser.com
Maybe this can help:
To work seamlessly, SSH needs to be set up to allow password-less
login for the hadoop user from machines in the cluster. The simplest
way to achieve this is to generate a public/private key pair, and
place it in an NFS location that is shared across the cluster.
First,
generate an RSA key pair by typing the following in the hadoop user
account:
% ssh-keygen -t rsa -f ~/.ssh/id_rsa
Even though we want
password-less logins, keys without passphrases are not considered good
practice (it’s OK to have an empty passphrase when running a local
pseudodistributed cluster, as described in Appendix A), so we specify
a passphrase when prompted for one. We shall use ssh-agent to avoid
the need to enter a password for each connection.
The private key is
in the file specified by the -f option, ~/.ssh/id_rsa, and the public
key is stored in a file with the same name with .pub appended,
~/.ssh/id_rsa.pub.
Next we need to make sure that the public key is in
the ~/.ssh/authorized_keys file on all the machines in the cluster
that we want to connect to. If the hadoop user’s home directory is an
NFS filesystem, as described earlier, then the keys can be shared
across the cluster by typing:
% cat ~/.ssh/id_rsa.pub >>
~/.ssh/authorized_keys
If the home directory is not shared using NFS,
then the public keys will need to be shared by some other means.
Test
that you can SSH from the master to a worker machine by making sure
sshagent is running,3 and then run ssh-add to store your passphrase.
You should be able to ssh to a worker without entering the passphrase
again.
Source:
Tom White, Hadoop: The Definitive Guide, page 301
Found it googling here:
https://www.google.rs/url?sa=t&rct=j&q=&esrc=s&source=web&cd=22&cad=rja&ved=0CDYQFjABOBQ&url=http%3A%2F%2Fbigdata.googlecode.com%2Ffiles%2FOreilly.Hadoop.The.Definitive.Guide.3rd.Edition.Jan.2012.pdf&ei=sGzZULb6OfOM0wWhlYDYAw&usg=AFQjCNGvNUZcQBvM_Ucqf_K0JGAlCRxr3A&sig2=Qpa_KZyP1mXXm9yQv0ynRw&bvm=bv.1355534169,d.d2k

Resources