Want to keep username#hostname of hadoop slaves different - hadoop

I am setting a hadoop-2.7.3 multi-node cluster. For adding slave node i edited the slave file and /etc/hosts file. Also I added ssh pass key to them Now after executing start-dfs.sh the hadoop connects to user1#myStyle which is me, its all right till here. But now instead of connecting to other node having name user2#node1 it connects to user1#node1 which does not exists. So, how can I connect to user2#node1 instead of user1#node1
OS:-Ubuntu 16.04
Hadoop Version:-2.7.3

Step-1:
The slaves file must have entries in the form (one machine name per line):
machine_hostname1
machine_hostname2
...
In the above, each line represents the actual name of the machine in the cluster and must be exactly the same as specified in /etc/hosts file.
Step-2:
Check whether you are manually able to connect to each machine by using the following command:
ssh -i ~/.ssh/<"keyfilename"> <"username">#publicNameOfMachine
Don't type the quotes or angle-brackets in the above command, and replace the components with the names you have chosen.
Step-3:
If you are not able to connect manually, then either your key file is not correct, or it has not been placed in the .ssh directory on the target machine, or it does not Linux 600 permission for the file.
Step-4:
Do you have a config file on the NameNode under .ssh directory. That file should have entries like the following 4 lines per machine:
Host <"ShortMachineName">
HostName <"MachinePublicName">
User <"username">
IdentityFile ~/.ssh/<keyfilename>
Don't type the quotes or angle-brackets in the above 4 commands, and replace the components with the names you have chosen. These 4 lines are per machine.
Make sure you are not repeating (cut-paste error) the username and/or machine name for each machine. It must match what username and machine names you have configured.

Related

Get the remote bash shell to use a .bash_history file that is on my local machine

My environment contains clusters with multiple hosts in each cluster and as such I tend to run similar or equivalent commands on the hosts in such a cluster.
Sometimes, I am ssh-ed into a cluster host and remember that I had run a certain command on another host in this cluster but I can't remember which host I ran it on, however I need to run that command again.
Since every host in the cluster has its own .bash_history, I have to log in to each and every one of them and look through the .bash_history file to locate that command.
However, if I could use one .bash_history file for all hosts in the cluster (e.g. named .bash_history.clusterX) then I would be able to search the command in the bash history (with CTRL+R) and execute it.
Is that possible?
In my setup shared home directory (via nfs, etc.) is not an option.
Another approach is to leave the relevant commands to execute in an executable file ('ssh_commands') in the home folder of each remote user on each machine.
Those ssh_commands will include the commands you need to execute on each server whenever you open an SSH session.
To call that file on each SSH session:
ssh remoteUser#remoteServer -t "/bin/bash --init-file <(echo 'source ssh_commands')"
That way, you don't have to look for the right commands to execute, locally or remotely: your SSH session opens and execute right away what you want.

Copy file to multiple hosts from a shared file server with password

I have about 20 Macs on my network that always need fonts installed.
I have a folder location where I ask them to put the fonts they need synced to every machine (as to save time i will install the font on every machine so that if they move machines, i don't need to do it again)
at the moment I am just manually rsyncing the fonts from this server location to all the machines one by one using
rsync -avrP /server/fonts/ /Library/Fonts/
this requires me to ssh into every machine
is there a way i can script this using a hosts.txt file with the ips? the password is the same for every machine and i'd rather not type it 20 times. Security isn't an issue.
something that allows me to call the script and point it at a font i.e.
./install-font font.ttf
I've looked into scp but I don't see any example of specifying a password anywhere in the script.
cscp.sh
#!/bin/bash
while read host; do
scp $1 ${host}:
done
project-prod-web1
project-prod-web2
project-prod-web3
Usage
Copy file to multiple hosts:
cscp.sh file < hosts
But this asks me to type a password every time and doesn't specify the target location on the host.
I don't see any example of specifying a password anywhere in the script.
Use ssh-copy-id command to install your public key to each of these hosts. After that ssh and scp will use public-private key authentication without requiring you to enter the password.

Hadoop alternate SSH key

I'm setting up a multinode hadoop cluster and have a shared key to passwordless SSH between nodes. I named the file ~/.ssh/hadoop_rsa and can connect to other hosts using ssh -i ~/.ssh/hadoop_rsa host.
I need some way to tell hadoop to use this alternate SSH key when connecting to other nodes.
It appears that commands are run on each slave using the script:
$HADOOP_HOME/sbin/slaves.sh
That script includes a reference to the environment variable $HADOOP_SSH_OPTS when calling ssh. I was able to tell Hadoop to use a different key file by setting an environment variable like this:
export HADOOP_SSH_OPTS="-i ~/.ssh/hadoop_rsa"
Thanks to Varun on the Hadoop mailing list for pointing me in the right direction

Bash script to ssh to specific urls of a common format?

All the vms at work I need to ssh into are of a common format (stuff014.stuff.com) with differing numbers. Is there a quick way to connect to them without making a big ssh config file and without using alias?
(Replace <your_user_name> with your user name.)
#!/bin/bash
ssh <your_user_name>#stuff$1.stuff.com
The $1 is the first parameter given, so if this was named easyssh.sh and you needed to get to 014 do
./easyssh.sh 014
To make this even better add it to a folder on your PATH (or add the directory to your path, whichever suits your needs).
You wouldn't need a big config file. A minimal implementation only requires two lines.
host stuff*
HostName %h.stuff.com
Any host you try to connect to is matched against the host patterns in your config file, stopping at the first one that matches. The HostName directive uses the matched host (%h) to construct the actual host name to connect to.
Then you can abbreviate the host name when running ssh:
$ ssh stuff014
# Connects to stuff014.stuff.com

OpenMPI: Simple 2-Node Setup

I'm having trouble running an OpenMPI program using only two nodes (one of the nodes is the same machine that is executing the mpiexec command and the other node is a separate machine).
I'll call the machine that is running mpiexec, master, and the other node slave.
On both master and slave, I've installed OpemMPI in my home directory under ~/mpi
I have a file called ~/machines.txt on master.
Ideally, ~/machines.txt should contain:
master
slave
However, when I run the following on master:
mpiexec -n 2 --hostfile ~/machines.txt hostname
OUTPUT, I get the following error:
bash: orted: command not found
But if ~/maschines.txt only contains the name of the node that the command is running on, it works.
~/machines.txt:
master
Command:
mpiexec -n 2 --hostfile ~/machines.txt hostname
OUTPUT:
mastermaster
I've tried running the same command on slave, and changed the machines.txt file to contain only slave, and it worked too. I've made sure that my .bashrc file contains the proper paths for OpenMPI.
What am I doing wrong? In short, there is only a problem when I try to execute a program on a remote machine, but I can run mpiexec perfectly fine on the machine that is executing the command. This makes me believe that it's not a path issue. Am I missing a step in connecting both machines? I have passwordless ssh login capability from master to slave.
This error message means that you either do not have Open MPI installed on the remote machine, or you do not have your PATH set properly on the remote machine for non-interactive logins (i.e., such that it can't find the installation of Open MPI on the remote machine). "orted" is one of the helper executables that Open MPI uses to launch processes on remote nodes -- so if "orted" was not found, then it didn't even get to the point of trying to launch "hostname" on the remote node.
Note that there might be a difference between interactive and non-interactive logins in your shell startup files (e.g., in your .bashrc).
Also note that it is considerably simpler to have Open MPI installed in the same path location on all nodes -- in that way, the prefix method described above will automatically add the right PATH and LD_LIBRARY_PATH when executing on the remote nodes, and you don't have to muck with your shell startup files.
Note that there are a bunch of FAQ items about these kinds of topics on the main Open MPI web site.
Either explicitly set the absolute OpenMPI prefix with the --prefix option:
prompt> mpiexec --prefix=$HOME/mpi ...
or invoke mpiexec with the absolute path to it:
prompt> $HOME/mpi/bin/mpiexec ...
The latter option sets the prefix automatically. The prefix is then used to set PATH and LD_LIBRARY_PATH on the remote machines.
This answer comes very late but for linux users, it is a bad habit to add the environment variables at the end of the ~/.bashrc file, because carefully looking at the top, you will notice an if function exiting if in non-interactive mode, which is precisely what you do compiling your program through the ssh host. So put your environment variables at the TOP of the file, before this exiting if
try edit the file
/etc/environment
PATH="/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/home/hadoop/openmpi_install/bin"
LD_LIBRARY_PATH=/home/hadoop/openmpi_install/lib

Resources