Messed up sed syntactics in hadoop startup script after reinstalling JVM - hadoop

i'm trying to run 3 node Hadoop cluster on Windows Azure cloud. I've gone through configuration, and test launch. Everything look fine, however, as i used to use OpedJDK which is not recommended as VM for Hadoop according to what i read, i decide to replace it with Oracle Server JVM. Removed old installation of java with Yum, along with all java folders in /usr/lib, installed most recent version of Oracle JVM, updated PATH and JAVA_HOME variables; however, now on launch i getting following masseges:
sed: -e expression #1, char 6: unknown option to `s'
64-Bit: ssh: Could not resolve hostname 64-Bit: Name or service not known
HotSpot(TM): ssh: Could not resolve hostname HotSpot(TM): Name or service not known
Server: ssh: Could not resolve hostname Server: Name or service not known
VM: ssh: Could not resolve hostname VM: Name or service not known
e.t.c. (in total about 20-30 strings with words which should not have anything in common with hostnames)
For me it looks like it's trying to pass part of code as Hostname because of incorrect usage of sed in start up script:
if [ "$HADOOP_SLAVE_NAMES" != '' ] ; then
SLAVE_NAMES=$HADOOP_SLAVE_NAMES
else
SLAVE_FILE=${HADOOP_SLAVES:-${HADOOP_CONF_DIR}/slaves}
SLAVE_NAMES=$(cat "$SLAVE_FILE" | sed 's/#.*$//;/^$/d')
fi
# start the daemons
for slave in $SLAVE_NAMES ; do
ssh $HADOOP_SSH_OPTS $slave $"${#// /\\ }" \
2>&1 | sed "s/^/$slave: /" &
if [ "$HADOOP_SLAVE_SLEEP" != "" ]; then
sleep $HADOOP_SLAVE_SLEEP
fi
done
Which looks unchanged, so the question is: how change of JVM could affect sed? And how can i fix it?

So i found an answer to this question: My guess was wrong, and everything with sed is fine. Problem however was in how Oracle JVM works with external libraries compare to OpenJDK. It did throw exception where script was not expecting it, and it ruin whole sed input.
You can fix it by adding following system variables:
HADOOP_COMMON_LIB_NATIVE_DIR which should point to /lib/native folder of your Hadoop installation and add -Djava.library.path=/opt/hadoop/lib to whatever options you already have in HADOOP_OPTS variable (notice that /opt/hadoop is my installation folder, you might need to change it in order for stuff to work properly).
I personally add export commands to hadoop-env.sh script, but adding it to .bash file or start-all.sh should work as well.

Related

unable to start a job using spark-submit via ssh (on EC2)

I set up spark on a single EC2 machine and, when I am connected to it, I am able to use spark either with jupyter or spark-submit, without any issue. Unfortunately, though, I am not able to use spark-submit via ssh.
So, to recap:
This works:
ubuntu#ip-198-43-52-121:~$ spark-submit job.py
This does not work:
ssh -i file.pem ubuntu#blablablba.compute.amazon.com "spark-submit job.py"
Initially, I kept getting the following error message over and over:
'java.io.IOException: Cannot run program "python": error=2, No such file or directory'
After having read many articles and posts about this issue, I thought that the problem was due to some variables not having been set properly, so I added the following lines to the machine's .bashrc file:
export SPARK_HOME=/home/ubuntu/spark-3.0.1-bin-hadoop2.7 #(it's where i unzipped the spark file)
export PATH=$SPARK_HOME/bin:$PATH
export PYTHONPATH=/usr/bin/python3
export PYSPARK_PYTHON=python3
(As the error message referenced python, I also tried adding the line "alias python=python3" to .bashrc, but nothing changed)
After all this, if I try to submit the spark job via ssh I get the following error message:
"command spark-submit not found".
As it looks like the system ignores all the environment variables when sending commands via SSH, I decided to source the machine's .bashrc file before trying to run the spark job. As I was not sure about the most appropriate way to send multiple commands via SSH, I tried all the following ways:
ssh -i file.pem ubuntu#blabla.compute.amazon.com "source .bashrc; spark-submit job.file"
ssh -i file.pem ubuntu#blabla.compute.amazon.com << HERE
source .bashrc
spark-submit job.file
HERE
ssh -i file.pem ubuntu#blabla.compute.amazon.com <<- HERE
source .bashrc
spark-submit job.file
HERE
(ssh -i file.pem ubuntu#blabla.compute.amazon.com "source .bashrc; spark-submit job.file")
All attempts worked with other commands like ls or mkdir, but not with source and spark-submit.
I have also tried providing the full path running the following line:
ssh -i file.pem ubuntu#blabla.compute.amazon.com "/home/ubuntu/spark-3.0.1-bin-hadoop2.7/bin/spark-submit job.py"
In this case too I get, once again, the following message:
'java.io.IOException: Cannot run program "python": error=2, No such file or directory'
How can I tell spark which python to use if SSH seems to ignore all environment variables, no matter how many times I set them?
It's worth mentioning I have got into coding and data a bit more than a year ago, so I am really a newbie here and any help would be highly appreciated. The solution may be very simple, but I cannot get my head around it. Please help.
Thanks a lot in advance :)
The problem was indeed with the way I was expecting the shell to work (which was wrong).
My issue was solved by:
Setting my variables in .profile instead of .bashrc
Providing full path to python
Now I can launch spark jobs via ssh.
I found the solution in the answer #VinkoVrsalovic gave to this post:
Why does an SSH remote command get fewer environment variables then when run manually?
Cheers

Hadoop : start-dfs.sh Connection refused

I have a vagrant box on debian/stretch64
I try to install Hadoop3 with documentation
http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/SingleCluster.htm
When I run start-dfs.sh
I have this message
vagrant#stretch:/opt/hadoop$ sudo sbin/start-dfs.sh
Starting namenodes on [localhost]
pdsh#stretch: localhost: connect: Connection refused
Starting datanodes
pdsh#stretch: localhost: connect: Connection refused
Starting secondary namenodes [stretch]
pdsh#stretch: stretch: connect: Connection refused
vagrant#stretch:/opt/hadoop$
of course I tried to update my hadoop-env.sh with :
export HADOOP_SSH_OPTS="-p 22"
ssh localhost work (without password)
I have not ideas what I can change to solve this problem
There is a problem the way pdsh works by default (see edit), but Hadoop can go without it. Hadoop checks if the system has pdsh on /usr/bin/pdsh and uses it if so. An easy way get away from using pdsh is editing $HADOOP_HOME/libexec/hadoop-functions.sh
replace the line
if [[ -e '/usr/bin/pdsh' ]]; then
by
if [[ ! -e '/usr/bin/pdsh' ]]; then
then hadoop goes without pdsh and everything works.
EDIT:
A better solution would be use pdsh, but with ssh instead rsh as explained here, so replace line from $HADOOP_HOME/libexec/hadoop-functions.sh:
PDSH_SSH_ARGS_APPEND="${HADOOP_SSH_OPTS}" pdsh \
by
PDSH_RCMD_TYPE=ssh PDSH_SSH_ARGS_APPEND="${HADOOP_SSH_OPTS}" pdsh \
Obs: Only doing export PDSH_RCMD_TYPE=ssh, as I mention in the comment, doesn't work. I don't know why...
I've also opened a issue and submitted a patch to this problem: HADOOP-15219
I fixed this problem for hadoop 3.1.0 by adding
PDSH_RCMD_TYPE=ssh
in my .bashrc as well as $HADOOP_HOME/etc/hadoop/hadoop-env.sh.
check if your /etc/hosts file contains the hostname stretch and localhost mapping or not
my /etc/hosts file
Go to your hadoop home directory
~$ cd libexec
~$ nano hadoop-functions.sh
edit this line:
if [[ -e '/usr/bin/pdsh' ]]; then
with:
if [[ ! -e '/usr/bin/pdsh' ]]; then
Additionally, it is recommended that pdsh also be installed for better ssh resource management. —— Hadoop: Setting up a Single Node Cluster
We can remove pdsh to solve this problem.
apt-get remove pdsh
Check if the firewalls are running on your vagrant box
chkconfig iptables off
/etc/init.d/iptables stop
if not that have a look in the underlying logs /var/log/...
I was dealing with my colleague's problem.
he configured ssh using hostname from the hosts file and specified ip in the workers.
after I rewrote the workers file everything worked.
~/hosts file
10.0.0.1 slave01
#ssh-copy-id hadoop#slave01
~/hadoop/etc/workers
slave01
I added export PDSH_RCMD_TYPE=ssh to my .bashrc file, logged out and back in and it worked.
For some reason simply exporting and running right away did not work for me.

How to test if hbase is correctly running

I just installed hbase on a EC2 server (I also have HDFS installed, it's working).
My problem is that I don't know how to check if my Hbase is correctly installed.
To install hbase I followed this tutorial in wich they say we can check the hbase instance on the webUI on the address addressOfMyMachine:60010, I also checked on the port 16010 but this is not working.
I have an error saying this :
Sorry, the page you are looking for is currently unavailable.
Please try again later.
If you are the system administrator of this resource then you should check the error log for details.
I managed to run the hbase shell but I don't know if my installation is working well.
To check if Hbase is running or not using shell script, execute the command below.
if echo -e "list" | hbase shell 2>&1 | grep -q "ERROR:" 2>/dev/null ;then echo "Hbase is not running"; fi

How to copy files from one machine to another machine

I want to copy /home/cmind012/m.sh from one system to another system (both system Linux) using shell script.
Command $
scp /home/cmind012/m.sh cmind013:/home/cmind013/tanu
getting message
ssh: cmind013: Name or service not known
lost connection
It seems that cmind013 is not being resolved, I would try using first
nslookup cming013
and see what why donesn't it resolve.
It seems that you are missing the IP Address/Domain of the remote host. The format should be user#host:[directory]
You could do the following:
scp -r [directory/files] [remote host]:[destination directory]
ex: scp -r /var/www/html/* root#192.168.1.0:/var/www/html/
Try the following command:
scp /home/cmind012/m.sh denil#172.22.192.105:/home/denil/

How can I automate running commands remotely over SSH to multiple servers in parallel?

I've searched around a bit for similar questions, but other than running one command or perhaps a few command with items such as:
ssh user#host -t sudo su -
However, what if I essentially need to run a script on (let's say) 15 servers at once. Is this doable in bash? In a perfect world I need to avoid installing applications if at all possible to pull this off. For argument's sake, let's just say that I need to do the following across 10 hosts:
Deploy a new Tomcat container
Deploy an application in the container, and configure it
Configure an Apache vhost
Reload Apache
I have a script that does all of that, but it relies on me logging into all the servers, pulling a script down from a repo, and then running it. If this isn't doable in bash, what alternatives do you suggest? Do I need a bigger hammer, such as Perl (Python might be preferred since I can guarantee Python is on all boxes in a RHEL environment thanks to yum/up2date)? If anyone can point to me to any useful information it'd be greatly appreciated, especially if it's doable in bash. I'll settle for Perl or Python, but I just don't know those as well (working on that). Thanks!
You can run a local script as shown by che and Yang, and/or you can use a Here document:
ssh root#server /bin/sh <<\EOF
wget http://server/warfile # Could use NFS here
cp app.war /location
command 1
command 2
/etc/init.d/httpd restart
EOF
Often, I'll just use the original Tcl version of Expect. You only need to have that on the local machine. If I'm inside a program using Perl, I do this with Net::SSH::Expect. Other languages have similar "expect" tools.
The issue of how to run commands on many servers at once came up on a Perl mailing list the other day and I'll give the same recommendation I gave there, which is to use gsh:
http://outflux.net/unix/software/gsh
gsh is similar to the "for box in box1_name box2_name box3_name" solution already given but I find gsh to be more convenient. You set up a /etc/ghosts file containing your servers in groups such as web, db, RHEL4, x86_64, or whatever (man ghosts) then you use that group when you call gsh.
[pdurbin#beamish ~]$ gsh web "cat /etc/redhat-release; uname -r"
www-2.foo.com: Red Hat Enterprise Linux AS release 4 (Nahant Update 7)
www-2.foo.com: 2.6.9-78.0.1.ELsmp
www-3.foo.com: Red Hat Enterprise Linux AS release 4 (Nahant Update 7)
www-3.foo.com: 2.6.9-78.0.1.ELsmp
www-4.foo.com: Red Hat Enterprise Linux Server release 5.2 (Tikanga)
www-4.foo.com: 2.6.18-92.1.13.el5
www-5.foo.com: Red Hat Enterprise Linux Server release 5.2 (Tikanga)
www-5.foo.com: 2.6.18-92.1.13.el5
[pdurbin#beamish ~]$
You can also combine or split ghost groups, using web+db or web-RHEL4, for example.
I'll also mention that while I have never used shmux, its website contains a list of software (including gsh) that lets you run commands on many servers at once. Capistrano has already been mentioned and (from what I understand) could be on that list as well.
Take a look at Expect (man expect)
I've accomplished similar tasks in the past using Expect.
You can pipe the local script to the remote server and execute it with one command:
ssh -t user#host 'sh' < path_to_script
This can be further automated by using public key authentication and wrapping with scripts to perform parallel execution.
You can try paramiko. It's a pure-python ssh client. You can program your ssh sessions. Nothing to install on remote machines.
See this great article on how to use it.
To give you the structure, without actual code.
Use scp to copy your install/setup script to the target box.
Use ssh to invoke your script on the remote box.
pssh may be interesting since, unlike most solutions mentioned here, the commands are run in parallel.
(For my own use, I wrote a simpler small script very similar to GavinCattell's one, it is documented here - in french).
Have you looked at things like Puppet or Cfengine. They can do what you want and probably much more.
For those that stumble across this question, I'll include an answer that uses Fabric, which solves exactly the problem described above: Running arbitrary commands on multiple hosts over ssh.
Once fabric is installed, you'd create a fabfile.py, and implement tasks that can be run on your remote hosts. For example, a task to Reload Apache might look like this:
from fabric.api import env, run
env.hosts = ['host1#example.com', 'host2#example.com']
def reload():
""" Reload Apache """
run("sudo /etc/init.d/apache2 reload")
Then, on your local machine, run fab reload and the sudo /etc/init.d/apache2 reload command would get run on all the hosts specified in env.hosts.
You can do it the same way you did before, just script it instead of doing it manually. The following code remotes to machine named 'loca' and runs two commands there. What you need to do is simply insert commands you want to run there.
che#ovecka ~ $ ssh loca 'uname -a; echo something_else'
Linux loca 2.6.25.9 #1 (blahblahblah)
something_else
Then, to iterate through all the machines, do something like:
for box in box1_name box2_name box3_name
do
ssh $box 'commmands_to_run_everywhere'
done
In order to make this ssh thing work without entering passwords all the time, you'll need to set up key authentication. You can read about it at IBM developerworks.
You can run the same command on several servers at once with a tool like cluster ssh. The link is to a discussion of cluster ssh on the Debian package of the day blog.
Well, for step 1 and 2 isn't there a tomcat manager web interface; you could script that with curl or zsh with the libwww plug in.
For SSH you're looking to:
1) not get prompted for a password (use keys)
2) pass the command(s) on SSH's commandline, this is similar to rsh in a trusted network.
Other posts have shown you what to do, and I'd probably use sh too but I'd be tempted to use perl like ssh tomcatuser#server perl -e 'do-everything-on-one-line;' or you could do this:
either scp the_package.tbz tomcatuser#server:the_place/.
ssh tomcatuser#server /bin/sh <<\EOF
define stuff like TOMCAT_WEBAPPS=/usr/local/share/tomcat/webapps
tar xj the_package.tbz or rsync rsync://repository/the_package_place
mv $TOMCAT_WEBAPPS/old_war $TOMCAT_WEBAPPS/old_war.old
mv $THE_PLACE/new_war $TOMCAT_WEBAPPS/new_war
touch $TOMCAT_WEBAPPS/new_war [you don't normally have to restart tomcat]
mv $THE_PLACE/vhost_file $APACHE_VHOST_DIR/vhost_file
$APACHECTL restart [might need to login as apache user to move that file and restart]
EOF
You want DSH or distributed shell, which is used in clusters a lot. Here is the link: dsh
You basically have node groups (a file with lists of nodes in them) and you specify which node group you wish to run commands on then you would use dsh, like you would ssh to run commands on them.
dsh -a /path/to/some/command/or/script
It will run the command on all the machines at the same time and return the output prefixed with the hostname. The command or script has to be present on the system, so a shared NFS directory can be useful for these sorts of things.
Creates hostname ssh command of all machines accessed.
by Quierati
http://pastebin.com/pddEQWq2
#Use in .bashrc
#Use "HashKnownHosts no" in ~/.ssh/config or /etc/ssh/ssh_config
# If known_hosts is encrypted and delete known_hosts
[ ! -d ~/bin ] && mkdir ~/bin
for host in `cut -d, -f1 ~/.ssh/known_hosts|cut -f1 -d " "`;
do
[ ! -s ~/bin/$host ] && echo ssh $host '$*' > ~/bin/$host
done
[ -d ~/bin ] && chmod -R 700 ~/bin
export PATH=$PATH:~/bin
Ex Execute:
$for i in hostname{1..10}; do $i who;done
There is a tool called FLATT (FLexible Automation and Troubleshooting Tool) that allows you to execute scripts on multiple Unix/Linux hosts with a click of a button. It is a desktop GUI app that runs on Mac and Windows but there is also a command line java client.
You can create batch jobs and reuse on multiple hosts.
Requires Java 1.6 or higher.
Although it's a complex topic, I can highly recommend Capistrano.
I'm not sure if this method will work for everything that you want, but you can try something like this:
$ cat your_script.sh | ssh your_host bash
Which will run the script (which resides locally) on the remote server.
Just read a new blog using setsid without any further installation/configuration besides the mainstream kernel. Tested/Verified under Ubuntu14.04.
While the author has a very clear explanation and sample code as well, here's the magic part for a quick glance:
#----------------------------------------------------------------------
# Create a temp script to echo the SSH password, used by SSH_ASKPASS
#----------------------------------------------------------------------
SSH_ASKPASS_SCRIPT=/tmp/ssh-askpass-script
cat > ${SSH_ASKPASS_SCRIPT} <<EOL
#!/bin/bash
echo "${PASS}"
EOL
chmod u+x ${SSH_ASKPASS_SCRIPT}
# Tell SSH to read in the output of the provided script as the password.
# We still have to use setsid to eliminate access to a terminal and thus avoid
# it ignoring this and asking for a password.
export SSH_ASKPASS=${SSH_ASKPASS_SCRIPT}
......
......
# Log in to the remote server and run the above command.
# The use of setsid is a part of the machinations to stop ssh
# prompting for a password.
setsid ssh ${SSH_OPTIONS} ${USER}#${SERVER} "ls -rlt"
Easiest way I found without installing or configuring much software is using plain old tmux. Say you have 9 linux servers. Pick a box as your main. Start a tmux session:
tmux
Then create 9 split tmux panes by doing this 8 times:
ctrl-b + %
Now SSH into each box in each pane. You'll need to know some tmux shortcuts. To navigate, press:
ctrl+b <arrow-keys>
Once your logged in to all your boxes on each pane. Now turn on pane synchronization where it lets you type the same thing into each box:
ctrl+b :setw synchronize-panes on
now when you press any keys, it will show up on every pane. to turn it off, just make on to off. to cycle resize panes, press ctrl+b < space-bar >.
This works alot better for me since I need to see each terminal output as sometimes servers crash or hang for whatever reason when downloading or upgrade software. Any issues, you can just isolate and resolve individually.

Resources