Scripting (bash currently) - check various process status on a cluster of other hosts quickly - bash

Have a slew of services that run as part of a hadoop stack; want a simple CLI script that checks the various processes and gives a simple output for end user.
There will be over 50 hosts; and around 10 services it will need to check on each host.
Currently written in bash; I like the output but the code is slllloooowww as it checks each process; 1 at a time via passwordless ssh and pgrep.
Looking for advice or hints on making this faster.
ie:
Hostname | IP | Ping | SSH | Zookeeper | Namenode | Datanode
localhost | 127.0.0.1 | online | online | _ | _ | _
node1 | 172.30.50.150 | online | online | _ | _ | _
dn1 | 10.142.0.100 | online | online | online | online | online
sample code:
fun_datanode () {
zup=`ssh $1 "ps ax | grep -v grep | grep datanode | wc -l"`
if [ $zup -gt 0 ]; then
dn=online
else
dn="_"
fi
}
#main
#main loop that reads host file
for host in `awk '/^[0-9]/ { print $1 }' /etc/hosts`
do
#ping
fping -c1 -t10 -n $host > /dev/null 2>&1
RETVAL=$?
hname=`getent hosts $host | awk '{print $2 }'`
if [ $RETVAL -eq 0 ]; then
if ssh $host 'pgrep ssh' > /dev/null 2>&1; then
ssh=online
fun_zookeeper $host
fun_namenode $host
fun_datanode $host
fi
fun_print "$hname $host "online" $ssh $zoo $nn $dn"
echo
else
fun_print $hname $host "${red}offline${norm}" "_" "_" "_" "_"
echo
fi
done

You should use Ganglia or Ambari to monitor large clusters. They are free and open source. They have monitoring as well as alerting capabilities based up on thresholds.

There are utilities like pdsh (parallel distributed shell)
https://code.google.com/p/pdsh/wiki/UsingPDSH
This can be used to run process checks in parallel on many nodes.
Parallel SSH was archived (read-only) in Google Code. For more up-to-date releases see https://github.com/pkittenis/parallel-ssh .
Another option is Fabric:
http://www.fabfile.org/

Found a working solution without scope creeping into a major project;
Instead of going to the well each time for getting process status on node via SSH; grab the ps ax once on every node then assign to local variable. Then interrogate the variable each time for current process status.
Instead of doing (amount of nodes X amount of services) = SSH connections; now it only does (amount of nodes) = SSH connections.
From there; I may background / fork each SSH...
fun_grabps () {
psout=`ssh $1 "ps ax"`
}
fun_zookeeper () {
zup=`echo $psout | grep -v grep | grep zoo | wc -l`
if [ $zup -gt 0 ]; then
zoo=online
else
zoo="_"
fi
}

Related

Execute command when connected to a specified ip

I want to create a script that execute a command when wlan0 if connected to a specific ip. If connected to a different ip, launch a different command (I have a static ip)
I want to avoid launching this script in public wifi.
I hope you guys understand. English is not my main langage
Run this script in script in system startup
cal()
{
a=$(ip addr | grep "wlan0" | sed '1d' |awk '{print $2}' | sed -e 's/\(.*\)...$/\1/')
echo $a
b=10.98.35.96
if [ $b = $a ]
then
echo same
#give command
kill -9 $$
else
echo notsame
sleep 3
cal
fi
}
cal

DD-WRT Bash script at startup issue

Hey all I have the following BASH script running at startup on my WRT1900ac linksys:
USER="admin"
PASS="passhere"
PROTOCOL="http"
ROUTER_IP="192.168.1.1"
# Port to connect to which will provide the JSON data.
PORT=9898
while [ 1 ]
do
# Grab connected device MAC addresses through router status page.
MACS=$(curl -s --user $USER:$PASS $PROTOCOL://$ROUTER_IP/Status_Wireless.live.asp)
# clear temp JSON file
echo > temp.log
# Get hostname and IP (just in case there is no hostname).
for MAC in $(echo $MACS | grep -oE "wl_mac::[a-z0-9]{2}:[a-z0-9]{2}:[a-z0-9]{2}:[a-z0-9]{2}:[a-z0-9]{2}:[a-z0-9]{2}" | cut -c 9-);
do
grep 0x /proc/net/arp | awk '{print $1 " " $4}' | while IFS= read -r line
do
IP=$(echo $line | cut -d' ' -f1)
MACTEMP=$(echo $line | cut -d' ' -f2)
HOST=$(arp -a | grep $IP | cut -d' ' -f1)
# if no hostname exists, just use IP.
if [ "$HOST" == "" ]
then
HOST=$IP
fi
if [ "$MAC" == "$MACTEMP" ]
then
JSON="{'hostname' : '$HOST', 'mac_address' : '$MAC'}"
echo $JSON >> temp.log
fi
done
done
# Provide the JSON formatted output on $PORT of router.
# This allows one connection before closing the port (connect, receive data, close).
# Port will reopen every 5 minutes with new data as setup in a cron job.
echo -e "HTTP/1.1 200 OK\n\n $(cat temp.log)" | nc -l -p$PORT >/dev/null
# Wait for 10 seconds and do it all over.
sleep 10
done
And for some reason when I reboot the router and then try to visit http://192.168.1.1:9898 it just shows a blank page even though I have my android cell phone connected via wifi to the router and the router shows the MAC address on the status page.
What should be on that page is all the wireless MAC address that are currently connected to the router and displaying them out in JSON form.
Any BASH guru's here that can help spot the problem?
I think it should be
echo -e "HTTP/1.1 200 OK\n\n $(cat temp.log)" | nc -l -p$PORT 0.0.0.0 >/dev/null

Issue with Script

We have a script which is checking and sending an alert if process goes down. For some reason it is not capturing it properly for all the users and not sending the alerts in all scenarios.
Please suggest what could be the problem.
Environments – uatwrk1, uatwrk2, uatwrk3 ------- uatwrk100
ServerName - myuatserver
Process to be checked - Amc/apache/bin/httpd
Script is :
#!/bin/ksh
i=1
while (( i<=100 ))
do
myuser=uatwrk$i
NoOfProcess=`ps -ef | grep -v grep | grep $myuser | grep "Amc/apache/bin/httpd" | wc -l`
if [[ $NoOfProcess -eq 0 ]]
then
echo "Amc process is down, sending an alert"
# Assume sendAlert.ksh is fine
./sendAlert.ksh
else
echo "Amc process is running fine" >> /dev/null
fi
(( i+=1 ))
done
I think #Mahesh already indicated the problem in a comment.
When you only want to have a mail once, you can count the users running a httpd process. The backslash in the following command is for avoiding grep -v grep.
ps -ef | grep "A\mc/apache/bin/httpd" | cut -d " " -f1 | grep "^uatwrk"| sort -u | wc -l

Bash script checking cpu usage of specific process

First off, I'm new to this. I have some experience with windows scripting and apple script but not much with bash. What I'm trying to do is grab the PID and %CPU of a specific process. then compare the %CPU against a set number, and if it's higher, kill the process. I feel like I'm close, but now I'm getting the following error:
[[: 0.0: syntax error: invalid arithmetic operator (error token is ".0")
what am I doing wrong? here's my code so far:
#!/bin/bash
declare -i app_pid
declare -i app_cpu
declare -i cpu_limit
app_name="top"
cpu_limit="50"
app_pid=`ps aux | grep $app_name | grep -v grep | awk {'print $2'}`
app_cpu=`ps aux | grep $app_name | grep -v grep | awk {'print $3'}`
if [[ ! $app_cpu -gt $cpu_limit ]]; then
echo "crap"
else
echo "we're good"
fi
Obviously I'm going to replace the echos in the if/then statement but it's acting as if the statement is true regardless of what the cpu load actually is (I tested this by changing the -gt to -lt and it still echoed "crap"
Thank you for all the help. Oh, and this is on a OS X 10.7 if that is important.
I recommend taking a look at the facilities of ps to avoid multiple horrible things you do.
On my system (ps from procps on linux, GNU awk) I would do this:
ps -C "$app-name" -o pid=,pcpu= |
awk --assign maxcpu="$cpu_limit" '$2>maxcpu {print "crappy pid",$1}'
The problem is that bash can't handle decimals. You can just multiply them by 100 and work with plain integers instead:
#!/bin/bash
declare -i app_pid
declare -i app_cpu
declare -i cpu_limit
app_name="top"
cpu_limit="5000"
app_pid=`ps aux | grep $app_name | grep -v grep | awk {'print $2'}`
app_cpu=`ps aux | grep $app_name | grep -v grep | awk {'print $3*100'}`
if [[ $app_cpu -gt $cpu_limit ]]; then
echo "crap"
else
echo "we're good"
fi
Keep in mind that CPU percentage is a suboptimal measurement of application health. If you have two processes running infinite loops on a single core system, no other application of the same priority will ever go over 33%, even if they're trashing around.
#!/bin/sh
PROCESS="java"
PID=`pgrep $PROCESS | tail -n 1`
CPU=`top -b -p $PID -n 1 | tail -n 1 | awk '{print $9}'`
echo $CPU
I came up with this, using top and bc.
Use it by passing in ex: ./script apache2 50 # max 50%
If there are many PIDs matching your program argument, only one will be calculated, based on how top lists them. I could have extended the script by catching them all and avergaing the percentage or something, but this will have to do.
You can also pass in a number, ./script.sh 12345 50, which will force it to use an exact PID.
#!/bin/bash
# 1: ['command\ name' or PID number(,s)] 2: MAX_CPU_PERCENT
[[ $# -ne 2 ]] && exit 1
PID_NAMES=$1
# get all PIDS as nn,nn,nn
if [[ ! "$PID_NAMES" =~ ^[0-9,]+$ ]] ; then
PIDS=$(pgrep -d ',' -x $PID_NAMES)
else
PIDS=$PID_NAMES
fi
# echo "$PIDS $MAX_CPU"
MAX_CPU="$2"
MAX_CPU="$(echo "($MAX_CPU+0.5)/1" | bc)"
LOOP=1
while [[ $LOOP -eq 1 ]] ; do
sleep 0.3s
# Depending on your 'top' version and OS you might have
# to change head and tail line-numbers
LINE="$(top -b -d 0 -n 1 -p $PIDS | head -n 8 \
| tail -n 1 | sed -r 's/[ ]+/,/g' | \
sed -r 's/^\,|\,$//')"
# If multiple processes in $PIDS, $LINE will only match\
# the most active process
CURR_PID=$(echo "$LINE" | cut -d ',' -f 1)
# calculate cpu limits
CURR_CPU_FLOAT=$(echo "$LINE"| cut -d ',' -f 9)
CURR_CPU=$(echo "($CURR_CPU_FLOAT+0.5)/1" | bc)
echo "PID $CURR_PID: $CURR_CPU""%"
if [[ $CURR_CPU -ge $MAX_CPU ]] ; then
echo "PID $CURR_PID ($PID_NAMES) went over $MAX_CPU""%"
echo "[[ $CURR_CPU""% -ge $MAX_CPU""% ]]"
LOOP=0
break
fi
done
echo "Stopped"
Erik, I used a modified version of your code to create a new script that does something similar. Hope you don't mind it.
A bash script to get the CPU usage by process
usage:
nohup ./check_proc bwengine 70 &
bwegnine is the process name we want to monitor 70 is to log only when the process is using over 70% of the CPU.
Check the logs at: /var/log/check_procs.log
The output should be like:
DATE | TOTAL CPU | CPU USAGE | Process details
Example:
03/12/14 17:11 |20.99|98| ProdPROXY-ProdProxyPA.tra
03/12/14 17:11 |20.99|100| ProdPROXY-ProdProxyPA.tra
Link to the full blog:
http://felipeferreira.net/?p=1453
It is also useful to have app_user information available to test whether the current user has the rights to kill/modify the running process. This information can be obtained along with the needed app_pid and app_cpu by using read eliminating the need for awk or any other 3rd party parser:
read app_user app_pid tmp_cpu stuff <<< \
$( ps aux | grep "$app_name" | grep -v "grep\|defunct\|${0##*/}" )
You can then get your app_cpu * 100 with:
app_cpu=$((${tmp_cpu%.*} * 100))
Note: Including defunct and ${0##*/} in grep -v prevents against multiple processes matching $app_name.
I use top to check some details. It provides a few more details like CPU time.
On Linux this would be:
top -b -n 1 | grep $app_name
On Mac, with its BSD version of top:
top -l 1 | grep $app_name

Check if Tomcat is running via shell script

I need to check if Tomcat is running in my system via a shell script. If not I need to catch the process id and kill Tomcat. How shall it be achieved?
in order to get the running process, I've used this command:
ps x | grep [full_path_to_tomcat] | grep -v grep | cut -d ' ' -f 1
You have to be careful, though. It works on my setup, but it may not run everywhere... I have two installations of tomcat, one is /usr/local/tomcat on port 8080 and /usr/local/tomcat_8081 on port 8081. I have to use '/usr/local/tomcat/' (with the final slash) as the full_path because otherwise it would return 2 different pids if tomcat_8081 is running as well.
Here's the explanation of what this command does:
1) ps x gives you a list of running processes ordered by pid, tty, stat, time running and command.
2) Applying grep [full_path_to_tomcat] to it will find the pattern [full_path_to_tomcat] within that list. For instance, running ps x | grep /usr/local/tomcat/ might get you the following:
13277 ? Sl 7:13 /usr/local/java/bin/java -Djava.util.logging.config.fil
e=/usr/local/tomcat/conf/logging.properties [...] -Dcatalina.home=/usr/local/tomca
t [...]
21149 pts/0 S+ 0:00 grep /usr/local/tomcat/
3) As we get 2 entries instead of one due to the grep /usr/local/tomcat/ matching the pattern, let's remove it. -v is the invert-match flag for grep, meaning it will select only lines that do not match the pattern. So, in the previous example, using ps -x | grep /usr/local/tomcat/ | grep -v grep will return:
13277 ? Sl 7:13 /usr/local/java/bin/java -Djava.util.logging.config.fil
e=/usr/local/tomcat/conf/logging.properties [...] -Dcatalina.home=/usr/local/tomca
t [...]
4) Cool, now we have the pid we need. Still, we need to strip all the rest. In order to do that, let's use cut. This command removes sections from a FILE or a standard output. The -d option is the delimiter and the -f is the field you need. Great. So we can use a space (' ') as a delimiter, and get the first field, which corresponds to the pid. Running ps x | grep /usr/local/tomcat/ | grep -v grep | cut -d ' ' -f 1 will return:
13277
Which is what you need. To use it in your script, it's simple:
#replace below with your tomcat path
tomcat_path=/users/tomcat/apache-tomcat-8.0.30
pid=$(ps x | grep "${tomcat_path}" | grep -v grep | cut -d ' ' -f 1)
if [ "${pid}" ]; then
eval "kill ${pid}"
fi
One way to check by using wget for your server address and checking the status.
Check this link here :
http://www.velvettools.com/2013/07/shell-script-to-check-tomcat-status-and.html#.VX_jfVz-X1E
TOMCAT_HOME=/usr/local/tomcat-folder/
is_Running ()
{
wget -O - http://yourserver.com/ >& /dev/null
if( test $? -eq 0 ) then
return 0
else
return 1
fi
}
stop_Tomcat ()
{
echo "shutting down......"
$TOMCAT_HOME/bin/shutdown.sh
}
start_Tomcat ()
{
echo "starting......"
$TOMCAT_HOME/bin/startup.sh
}
restart ()
{
stop_Tomcat
sleep 10
kill_Hanged_Processes
start_Tomcat
sleep 60
}
the easy way to do that is :
ps -ef | grep tomcat
by using this command you'll get :
user [id-to-kill] Date [tomcat-path]
last step is killing the process
sudo kill -9 [id-to-kill]
Congratulation, your process was killed lOol
Tomcat's default port is 8080. u can grep it and use port status in comparision loop.
#!/bin/bash
STAT=`netstat -na | grep 8080 | awk '{print $7}'`
if [ "$STAT" = "LISTEN" ]; then
echo "DEFAULT TOMCAT PORT IS LISTENING, SO ITS OK"
elif [ "$STAT" = "" ]; then
echo "8080 PORT IS NOT IN USE SO TOMCAT IS NOT WORKING"
## only if you defined CATALINA_HOME in JAVA ENV ##
cd $CATALINA_HOME/bin
./startup.sh
fi
RESULT=`netstat -na | grep 8080 | awk '{print $7}' | wc -l`
if [ "$RESULT" = 0 ]; then
echo "TOMCAT PORT STILL NOT LISTENING"
elif [ "$RESULT" != 0 ]; then
echo "TOMCAT PORT IS LISTENINS AND SO TOMCAT WORKING"
fi
this way you can compare the script.you grep port 8080 if you are using the default port for tomcat.this will only check whether tomcat is running.
then you can check the processes using the port
lsof -i:8080 //if using port 8080
the if you want to free the port by killing the process using it use this command
kill 75782 //if for instance 75782 is the process using the port

Resources