50 nodes hadoop passphraseless - hadoop

My question is very simple, I want to setup a 50 nodes hadoop cluster, how can I setup the passphraseless between the 50 nodes. if manually operating is very difficult! Thanks in advance!

You don't need to setup SSH between the nodes, it is sufficient to have it unidirectional between the master and the slaves. (So only the master must access the slaves without password).
The usual approach is to write a bash script that loops over your slaves file and logs into your slave copying the public key of the master into the authorized keys of the slaves.
You can see a small workthrough on Praveen Sripati's blog.
However, I'm no admin so I can't tell you if there is a smarter way. Maybe this is better suited on Superuser.com

Maybe this can help:
To work seamlessly, SSH needs to be set up to allow password-less
login for the hadoop user from machines in the cluster. The simplest
way to achieve this is to generate a public/private key pair, and
place it in an NFS location that is shared across the cluster.
First,
generate an RSA key pair by typing the following in the hadoop user
account:
% ssh-keygen -t rsa -f ~/.ssh/id_rsa
Even though we want
password-less logins, keys without passphrases are not considered good
practice (it’s OK to have an empty passphrase when running a local
pseudodistributed cluster, as described in Appendix A), so we specify
a passphrase when prompted for one. We shall use ssh-agent to avoid
the need to enter a password for each connection.
The private key is
in the file specified by the -f option, ~/.ssh/id_rsa, and the public
key is stored in a file with the same name with .pub appended,
~/.ssh/id_rsa.pub.
Next we need to make sure that the public key is in
the ~/.ssh/authorized_keys file on all the machines in the cluster
that we want to connect to. If the hadoop user’s home directory is an
NFS filesystem, as described earlier, then the keys can be shared
across the cluster by typing:
% cat ~/.ssh/id_rsa.pub >>
~/.ssh/authorized_keys
If the home directory is not shared using NFS,
then the public keys will need to be shared by some other means.
Test
that you can SSH from the master to a worker machine by making sure
sshagent is running,3 and then run ssh-add to store your passphrase.
You should be able to ssh to a worker without entering the passphrase
again.
Source:
Tom White, Hadoop: The Definitive Guide, page 301
Found it googling here:
https://www.google.rs/url?sa=t&rct=j&q=&esrc=s&source=web&cd=22&cad=rja&ved=0CDYQFjABOBQ&url=http%3A%2F%2Fbigdata.googlecode.com%2Ffiles%2FOreilly.Hadoop.The.Definitive.Guide.3rd.Edition.Jan.2012.pdf&ei=sGzZULb6OfOM0wWhlYDYAw&usg=AFQjCNGvNUZcQBvM_Ucqf_K0JGAlCRxr3A&sig2=Qpa_KZyP1mXXm9yQv0ynRw&bvm=bv.1355534169,d.d2k

Related

ssh key setting for hadoop connection in mutli clusters

I know that ssh key connection should be required for the hadoop operation.
Suppose that there are five clusters consisting of one namenode and four data nodes.
By setting the ssh key connection, we can connect from namenode to datanode and vice versa.
Note that two-way connection should be required for hadoop operation, which means that only one side (namenode to datanode, but not connect to from datanode to namenode) is not possible to operate hadoop as far as I know.
For above scenario, if we have 50 nodes or 100 nodes, it is very laborious jobs to configure all the ssh-key command by connecting the machine and typing same commands ssh-keygen -t ...
For these reasons, I have tried to script the shell code and but failed to do it in an automatic way.
my code is as below.
list.txt
namenode1
datanode1
datanode2
datanode3
datanode4
datanode5
...
cat list.txt | while read server
do
ssh $server 'ssh-keygen' < /dev/null
while read otherserver
do
ssh $server 'ssh-copy-id $otherserver' < /dev/null
done
done
However, it didn't work. As you can understand, the code means that it iterates over all the nodes and creates the key and then copy the generated key into other server using the ssh-copy-id command. But the code didn't work.
So my question is that how to script the codes which enables ssh connection (bothways) using shell scripts...It takes a lot of time for me to achieve it and I cannot find any document describing the ssh connection for multi nodes for avoiding laborious tasks.
You only need to create a public/private key pair at the master node, then use ssh-copy-id -i ~/.ssh/id_rsa.pub $server in the loop. And the master should be in the loop. And there is no need to do this in reverse at the namenodes. The keys have to belong and installed by the user that is running the hadoop cluster. After running the script, you should be able to ssh to all namenodes, as the hadoop user, without using a password.

Copy file to multiple hosts from a shared file server with password

I have about 20 Macs on my network that always need fonts installed.
I have a folder location where I ask them to put the fonts they need synced to every machine (as to save time i will install the font on every machine so that if they move machines, i don't need to do it again)
at the moment I am just manually rsyncing the fonts from this server location to all the machines one by one using
rsync -avrP /server/fonts/ /Library/Fonts/
this requires me to ssh into every machine
is there a way i can script this using a hosts.txt file with the ips? the password is the same for every machine and i'd rather not type it 20 times. Security isn't an issue.
something that allows me to call the script and point it at a font i.e.
./install-font font.ttf
I've looked into scp but I don't see any example of specifying a password anywhere in the script.
cscp.sh
#!/bin/bash
while read host; do
scp $1 ${host}:
done
project-prod-web1
project-prod-web2
project-prod-web3
Usage
Copy file to multiple hosts:
cscp.sh file < hosts
But this asks me to type a password every time and doesn't specify the target location on the host.
I don't see any example of specifying a password anywhere in the script.
Use ssh-copy-id command to install your public key to each of these hosts. After that ssh and scp will use public-private key authentication without requiring you to enter the password.

Hadoop alternate SSH key

I'm setting up a multinode hadoop cluster and have a shared key to passwordless SSH between nodes. I named the file ~/.ssh/hadoop_rsa and can connect to other hosts using ssh -i ~/.ssh/hadoop_rsa host.
I need some way to tell hadoop to use this alternate SSH key when connecting to other nodes.
It appears that commands are run on each slave using the script:
$HADOOP_HOME/sbin/slaves.sh
That script includes a reference to the environment variable $HADOOP_SSH_OPTS when calling ssh. I was able to tell Hadoop to use a different key file by setting an environment variable like this:
export HADOOP_SSH_OPTS="-i ~/.ssh/hadoop_rsa"
Thanks to Varun on the Hadoop mailing list for pointing me in the right direction

SSH and agent for Ubuntu file transfer automation

I had a script which is used to create dumps of Database and transfers the files from Ubuntu server to Linux machine, I use scp for file transfer it prompts for password every time, need to automate it. I had the Rsa public key of Linux in Ubuntu machine as authorized_keys, when i scp it says Permission denied (publickey,gssapi-keyex,gssapi-with-mic,password) checked the permissions and every thing like passwordAuthontication off etc no luck.
Can i write the password in my script and use regardless of security as i will provide 700 permissin and no one can access it except me the root user.
This is my script:
export DB_DUMP_DIR=/home/database_dump
export DB_NAME=database_name_$(date '+%Y_%m_%d').sql
mysqldump -u root mysql > ${DB_DUMP_DIR}/${DB_NAME}
if [ $? -eq 0 ];then
scp -i /root/.ssh/id_rsa ${DB_DUMP_DIR}/${DB_NAME} root#192.0.0.0:
else
echo "Error generating database dump"
fi
The first things that come to mind are
Is the server set to allow key authentication authentication? (that's PubkeyAuthentication yes in sshd_config)
Is the server allowing RSA keys? (this might look like RSAAuthentication no in your sshd_config)
Is root's ~/.ssh directory set to 700? (or tighter)
Is root's ~/.ssh/authorized_keys set to 600? (or tighter)
Is the remote machine allowing you to log in as root? (the PermitRootLogin no option in sshd_config)
Is it really the right key you're sending here? Did you try with a different key you created just to test this?
Lastly, it is never, ever a good idea to write the password down in a script. Just don't do it. Fix the problem you have with key authentication here instead.

Using a variable's value as password for scp, ssh etc. instead of prompting for user input every time

AFAIK, the commands ssh or scp do not have/take a password parameter. Otherwise I could keep the password in a shell variable and probably get rid of the enter password prompt. If I write an scp command in my shell script, it prompts the user to input the password. I have multiple ssh and scp commands in my script and I do not want the user to enter the password every time. I would prefer to save the password in a shell variable in the beginning (by asking password once), then use it for every ssh or scp.
I read about "public key identification" in this question. Is it related to the solution I am looking for?
Update
I read in How to use ssh command in shell script? why it is unsafe to specify passwords on the commandline. Does using expect also store the password and is world visible (using ps aux)? Is that the security issue with using expect?
Further Explanation
To further make it clear, I am writing this shell script to automate code and database backup, do code upload, run necessary database queries, do all the things that are needed for a new version release of a LAMP project from a developer system to a remote live server. My shell script will be there inside the main codebase of the project in every developer instance.
Requirement
I want all developers (all may be working from different remote systems) knowing the SSH/FTP password to be able to use the shell by entering the ssh/ftp password same only at run-time once. I would prefer the password to be the ssh/ftp password
Note - I do not want other developers who don't know the SSH password to be able to use it (So I guess public key authentication will not work because it stores the passwords in the systems).
I do not want any command line solution which stores the password in some log in the system and can be world visible using ps aux or something.
Opening Bounty
From all the answers so far and my anaylsis of those solutions, it looks like other than public key authentication all others are insecure. I am not yet sure if using expect is insecure. I think it is otherwise the correct solution for me. In that case, I am getting command not found errors while trying to do that as already commented on one of the answers.
From http://www.debianadmin.com/sshpass-non-interactive-ssh-password-authentication.html -
First and foremost, users of sshpass
should realize that ssh’s insistance
on only getting the password
interactively is not without reason.
It is close to be impossible to
securely store the password, and users
of sshpass should consider whether
ssh’s public key authentication
provides the same end-user experience,
while involving less hassle and being
more secure.
So, is it not possible to securely run multiple ssh, scp commands by entering the ssh/ftp password (if only once at runtime? Please read my Requirement section again.
Also, can anyone explain this -
In particular, people writing programs
that satisfies are meant to
communicate the above points)password
programatically are encouraged to use
an anonymous pipe and pass the pipe’s
reading end to sshpass using the -d
option.
Does this mean anything is possible?
Indeed, you'll definitely want to look into setting up ssh keys, over saving a password in a bash script. If the key is passwordless, then no user input will be required to ssh/scp. You just set it up to use the key on both ends and voila, secured communication.
However, I'll get downvoted to hell if I don't say this. Many consider passwordless ssh keys to be a Bad Idea(TM). If anybody gets their hands on the keys, the have full access. This means that you are relying on other security measures such as file permissions to keep your password safe.
Also, look into ssh-agent. It allows you to set it up so that you have a password protected ssh-key, but you only need to type it in once and it will manage the password for the key for you and use it when necessary. On my linux box at home, I have ssh-agent set up to run in my .xinitrc file so that it prompts me once and then starts X. YMMV.
UPDATE:
With regards to your requirements, password protected public key authentication + ssh-agent still seems to fit. Only the developers privy to the SSH/FTP password could start up ssh-agent, type in the password and ssh-agent would manage the passwords for the public keys for the rest of the session, never requiring interaction again.
Of course, how it stores it is another matter entirely. IANASE, but for more information on security concerns of using ssh-agent, I found symantec's article to be pretty informative: http://www.symantec.com/connect/articles/ssh-and-ssh-agent
"The ssh-agent creates a unix domain
socket, and then listens for
connections from /usr/bin/ssh on this
socket. It relies on simple unix
permissions to prevent access to this
socket, which means that any keys you
put into your agent are available to
anyone who can connect to this socket.
[ie. root]" ...
"however, [..] they are only usable
while the agent is running -- root
could use your agent to authenticate
to your accounts on other systems, but
it doesn't provide direct access to
the keys themselves. This means that
the keys can't be taken off the
machine and used from other locations
indefinitely."
Hopefully you're not in a situation where you're trying to use an untrusted root's system.
The right way to do that is as follows:
Ensure that all your users are using ssh-agent (nowadays this is the default for most Linux systems). You can check it running the following command:
echo $SSH_AUTH_SOCK
If that variable is not empty, it means that the user is using ssh-agent.
Create a pair of authentication keys for every user ensuring they are protected by a non empty passphrase.
Install the public part of the authentication keys on the remote host so that users can log there.
You are done!
Now, the first time an user wants to log into the remote machine from some session it will have to enter the passphrase for its private key.
In later logins from the same session ssh-agent will provide the unlocked key for authentication in behalf of the user that will not be required to introduce the passphrase again.
Ugh. I hit the man pages hard for this. Here's what I got:
Use this code near the beginning of the script to silently get the ssh password:
read -p "Password: " -s SSHPASS # *MUST* be SSHPASS
export SSHPASS
And then use sshpass for ssh like so:
sshpass -e ssh username#hostname
Hope that helps.
You can Using expect to pass a password to ssh do this or as said already use public key authentication instead if that's a viable option.
For password authentication, as you mentioned in you description, you can use "sshpass". On Ubuntu, you can install as "sudo apt-get install sshpass".
For public/private key-pair base authentication,
First generate keys using, "ssh-keygen"
Then copy your key to the remote machine, using "ssh-copy-id username#remote-machine"
Once copied, the subsequent logins should not ask for password.
Expect is insecure
It drives an interactive session. If you were to pass a password via expect it would be no different from you typing a password on the command line except that the expect script would have retrieve the password from somewhere. It's typically insecure because people will put the password in the script, or in a config file.
It's also notoriously brittle because it waits on particular output as the event mechanism for input.
ssh-agent
ssh-agent is a fine solution if this is script that will always be driven manually. If there is someone who will be logged in to drive the execution of the script than an agent is a good way to go. It is not a good solution for automation because an agent implies a session. You usually don't initiate a session to automatically kick of a script (ie. cron).
ssh command keys
Ssh command keys is your best bet for an automated solution. It doesn't require a session, and the command key restricts what runs on the server to only the command specified in the authorized_keys. They are also typically setup without passwords. This can be a difficult solution to manage if you have thousands of servers. If you only have a few then it's pretty easy to setup and manage.
service ssh accounts
I've also seen setups with password-less service accounts. Instead of the command entry in tehh authorized_keys file, and alternative mechanism is used to restrict access/commands. These solutions often use sudo or restricted shells. However, I think these are more complicated to manage correctly, and therefore tend to be more insecure.
host to host automatic authentication
You can also setup host 2 host automatic authentication, but there are alot of things to get write to do this correctly. From setting up your network properly, using a bastion host for host key dissemination, proper ssh server configuration, etc. As a result this is not a solution a recommend unless you know what your doing and have the capacity and ability to set everything up correctly and maintain it as such.
For those for who setting up a keypair is not an option and absolutely need to perform password authentication, use $SSH_ASKPASS:
SSH_ASKPASS - If ssh needs a passphrase, it will read the passphrase from the current terminal if it was run from a terminal. If ssh does not have a terminal associated with it but DISPLAY and SSH_ASKPASS are set, it will execute the program specified by SSH_ASKPASS and open an X11 window to read the passphrase. This is particularly useful when calling ssh from a .xsession or related script. (Note that on some machines it may be necessary to redirect the input from /dev/null to make this work.)
E.g.:
$ echo <<EOF >password.sh
#!/bin/sh
echo 'password'
EOF
$ chmod 500 password.sh
$ echo $(DISPLAY=bogus SSH_ASKPASS=$(pwd)/password.sh setsid ssh user#host id </dev/null)
See also Tell SSH to use a graphical prompt for key passphrase.
Yes, you want pubkey authentication.
Today, the only way I was able to do this in a bash script via crontab was like that:
eval $(keychain --eval --agents ssh id_rsa id_dsa id_ed25519)
source $HOME/.keychain/$HOSTNAME-sh
This is with the ssh agent already running and to achieve that it was needed the passphrase.
ssh, ssh-keygen, ssh-agent, ssh-add and a correct configuration in /etc/ssh_config on the remote systems are necessary ingredients for securing access to remote systems.
First, a private/public keypair needs to be generated with ssh-keygen. The result of the keygen process are two files: the public key and the private key.
The public key file, usually stored in ~/.ssh/id_dsa.pub (or ~/.ssh/id_rsa.pub, for RSA encryptions) needs to be copied to each remote system that will be granting remote access to the user.
The private key file should remain on the originating system, or on a portable USB ("thumb") drive that is referenced from the sourcing system.
When generating the key pair, a passphrase is used to protect it from usage by non-authenticated users. When establishing an ssh session for the first time, the private key can only be unlocked with the passphrase. Once unlocked, it is possible for the originating system to remember the unlocked private key with ssh-agent. Some systems (e.g., Mac OS X) will automatically start up ssh-agent as part of the login process, and then do an automatic ssh-add -k that unlocks your private ssh keys using a passphrase previously stored in the keychain file.
Connections to remote systems can be direct, or proxied through ssh gateways. In the former case, the remote system only needs to have the public key corresponding to the available unlocked private keys. In the case of using a gateway, the intermediate system must have the public key as well as the eventual target system. In addition, the original ssh command needs to enable agent forwarding, either by configuration in ~/.ssh/config or by command option -A.
For example, to login to remote system "app1" through an ssh gateway system called "gw", the following can be done:
ssh -At gw ssh -A app1
or the following stanzas placed in the ~/.ssh/config file:
Host app1
ForwardAgent = yes
ProxyCommand = ssh -At gw nc %h %p 2>/dev/null
which runs "net cat" (aka nc) on the ssh gateway as a network pipe.
The above setup will allow very simple ssh commands, even through ssh gateways:
ssh app1
Sometimes, even more important than terminal sessions are scp and rsync commands for moving files around securely. For example, I use something like this to synchronize my personal environment to a remote system:
rsync -vaut ~/.env* ~/.bash* app1:
Without the config file and nc proxy command, the rsync would get a little more complicated:
rsync -vaut -e 'ssh -A gw' app1:
None of this will work correctly unless the remote systems' /etc/ssh_config is configured correctly. One such configuration is to remove "root" access via ssh, which improve tracking and accountability when several staff can perform root functions.
In unattended batch scripts, a special ssh key-pair needs to be generated for the non-root userid under which the scripts are run. Just as with ssh session management, the batch user ssh key-pair needs to be deployed similarly, with the public key copied to the remote systems, and the private key residing on the source system.
The private key can be locked with a passphrase or unlocked, as desired by the system managers and/or developers. The way to use the special batch ssh key, even in a script running under root, is to use the "ssh -i ~/.ssh/id_dsa" command options with all remote access commands. For example, to copy a file within a script using the special "batch" user access:
rsync -vaut -e 'ssh -i ~batch/.ssh/id_dsa -A gw' $sourcefiles batch#app2:/Sites/www/
This causes rsync to use a special ssh command as the remote access shell. The special-case ssh command uses the "batch" user's DSA private key as its identity. The rsync command's target remote system will be accessed using the "batch" user.

Resources