I have a bash script that does a pretty decent job on reporting CPU level above 95%. The issue I am running into is it will report on even "spikes". This script runs every 10 minutes and checks all of my servers. Is there a way to only report if the server reports a level above 95% for 3 iterations? say after the 3rd time it runs, i.e 30 min.
12:00 - 1st report - 98%
12:10 - 2nd report - 99%
12:20 - 3rd report - 98% (now alert the admin)
here is the section of the script:
for sn in $(cat /tmp/hosts |grep -v "#"); do
cpuuse=$(ssh -qn -o ConnectTimeout=15 -oStrictHostKeyChecking=no -o BatchMode=yes $sn "top -b -n2 -p 1 | fgrep \"Cpu(s)\" | tail -1 | awk -F'id,' -v prefix=\"\$prefix\" '{ split(\$1, vs, \",\"); v=vs[length(vs)]; sub(\"%\", \"\", v); printf \"%s%.1f%%\n\", prefix, 100 - v }' | rev | cut -c 4- | rev")
if [[ "$cpuuse" -ge 95 ]]; then
echo "CPU Alert!! $sn CPU is high - $cpuuse%" | mailx -s "CPU Alert on $sn" admin#sample.com
fi
done
AFAIK There isn't really a bash trick. You just need to store a counter somewhere. Something like this could do the trick:
for sn in $(cat /tmp/hosts |grep -v "#"); do
cpuuse=$(ssh -qn -o ConnectTimeout=15 -oStrictHostKeyChecking=no -o BatchMode=yes $sn "top -b -n2 -p 1 | fgrep \"Cpu(s)\" | tail -1 | awk -F'id,' -v prefix=\"\$prefix\" '{ split(\$1, vs, \",\"); v=vs[length(vs)]; sub(\"%\", \"\", v); printf \"%s%.1f%%\n\", prefix, 100 - v }' | rev | cut -c 4- | rev")
counter_file=/tmp/my-counter-file-$sn # separate counter file for each server
if [[ "$cpuuse" -ge 95 ]]; then
date >> $counter_file # just add a line to the counter file
if [[ $(wc -l $counter_file) -ge 3 ]]; then
echo "CPU Alert!! $sn CPU is high - $cpuuse%" | mailx -s "CPU Alert on $sn" admin#sample.com
rm $counter_file # message was sent, reset counter
fi
else
rm $counter_file # below limit, reset counter
fi
done
The trick here is to store a counter in a file. The number of lines in the file is your counter value.
Related
I need help completing this. Trying to take user sessions sitting idle for greater than 15 minutes which aren't being kicked off by sshd_config and kill them. this is what I have to pull the sessions, how do I filter for greater than 15 minutes.
#!/bin/bash
IFS=$'\n'
for output in $(w | tr -s " " | cut -d" " -f1,5 | tail -n+3 | awk '{print $2}')
do
echo "$output \> 15:00"
done
If you are using Awk anyway, a shell loop is a clumsy antipattern. Awk already knows how to loop over lines; use it.
A serious complication is that the output from w is system-dependent and typically reformatted for human legibility.
tripleee$ w | head -n 4
8:16 up 37 days, 19:02, 17 users, load averages: 3.49 3.21 3.11
USER TTY FROM LOGIN# IDLE WHAT
tripleee console - 27Aug18 38days -
tripleee s003 - 27Aug18 38 ssh -t there screen -D -r
If yours looks similar, probably filter out anything where the IDLE field contains non-numeric information
w -h | awk '$5 ~ /[^0-9]/ || $5 > 15'
This prints the entire w output line. You might want to extract just the TTY field ({print $2} on my system) and figure out from there which session to kill.
A more fruitful approach on Linux-like systems is probably to examine the /proc filesystem.
You can try something like this …
for i in $(w --no-header | awk '{print $4}')
do
echo $i | grep days > /dev/null 2>&1
if [ $? == 0 ]
then
echo "greater that 15 mins"
fi
echo $i | grep min> /dev/null 2>&1
if [ $? == 0 ]
then
mins=$(echo $i | sed -e's/mins//g')
if [ $min -gt 15 ]
then
echo "Greater than 15 mins"
fi
fi
done
The tricky part is going to be figuring out what pid to kill.
I was wondering if anyone could help with the reasons that this is not triggering properly
HOSTNAME=`hostname -s`
LOAD=25.00
CAT=/bin/cat
MAILFILE=/home/jboss/monitor.mail
MAILER=/bin/mail
mailto="bob#bob.bob"
CPU_LOAD=`sar -P ALL 1 10 |grep 'Average.*all' |awk -F" " '{print 100.0 -$NF}'`
if [[ $CPU_LOAD > $LOAD ]];
then
PROC=`ps -eo pcpu,pid -o comm= | sort -k1 -n -r | head -1`
echo -e "Please check processes on ${HOSTNAME} the value of cpu load is $CPU_LOAD%.
Highest process is: $PROC" > $MAILFILE
$CAT $MAILFILE | $MAILER -s "CPU Load is on ${HOSTNAME} is $CPU_LOAD %" $mailto
fi
This seems to be working properly for the sar and ps however I'm still getting alerts emailed for things like CPU Load is 3.18%. Unless I'm missing something it shouldn't trigger unless load is greater than 25%.
It seems though that it's more doing if load is greater than 2.5% Any suggestions?
Thank you
Instead of using:
if [[ $CPU_LOAD > $LOAD ]];then
you must use
if [[ $CPU_LOAD -gt $LOAD ]]; then
Bash only handles integers, so to use higher precision, you could do something like this:
cpu_limit=25
# read the 5min load-average straight from the special file on /proc
read -r _ load_avg _ </proc/loadavg
# multiply by 100 for precision
load_avg=$(bc <<<"scale=0; $load_avg * 100 / 1")
# compare numbers with (( )) instead
if (( load_avg > cpu_limit )); then
...
fi
Try this code - (Tested - working fine)
$ cat f.sh
HOSTNAME=$(hostname -s)
LOAD=25.00
MAILFILE=$HOME/a.txt
MAILER=/bin/mailx
mailto="vipinkumarr89#gmail.com"
CPU_LOAD=$(sar -P ALL 1 10 |grep 'Average.*all' |awk -F" " '{print 100.0 -$NF}')
if [[ $CPU_LOAD > $LOAD ]];then
{
PROC=$(ps -eo pcpu,pid -o comm= | sort -k1 -n -r | head -1)
echo -e "Please check processes on ${HOSTNAME} the value of cpu load is $CPU_LOAD%.
Highest process is: $PROC" > $MAILFILE
cat $MAILFILE | $MAILER -s "CPU Load is on ${HOSTNAME} is $CPU_LOAD %" $mailto
}
fi
This is a shell script return 2 values one for packet loss percentage and another for True or False :
SERVER_IP=$1
checkip=`ping -c 2 -W 2 $SERVER_IP | grep "packet loss" | cut -d " " -f 6 | cut -d "%" -f1`
test1=$?
echo $checkip
if [ $test1 -eq 0 ]; then
echo "1"
else
echo "0"
fi
in zabbix when you create an item you enter only one parameter with value but i have 2 values one packet loss and second for ping result (0 and 1)
how can i create two items 1 for packet lost percentage and second for ping health check with only this script? i dont want to create another one
Thanks to Andre
try this script this will guide you to what exactly you want :
#!/bin/bash
case $1 in
packetloss) ping -c2 -W1 -q 8.8.8.8 | grep -oP '\d+(?=% packet loss)' ;;
timeout) ping -c2 -q 8.8.8.8 | grep 'time' | awk -F',' '{ print$4}' | awk '{print $2}' | cut -c 1-4 ;;
*) echo "Use: packetloss , timeout";;
esac
try (im in zsh):
zabbix_agentd -t ping.loss\[timeout\]
ping.loss[timeout] [t|1000]
or in zabbix server use get ( im also in zsh here too):
zabbix_get -s 172.20.4.49 -k ping.loss\[timeout\]
1001
now create items with these keys.
UserParameter=key[*],/path_of_script.sh $1
At the GUI:
Key: key[Server_IP]
Another example:
UserParameter=general[*],/usr/local/etc/scripts/general.sh $1 $2 $3 $4 $5 $6 $7 $8 $9
$ cat general.sh
#!/bin/bash
case $1 in
ddate) ddate;;
minute) echo "`date +%M`%2" | bc;;
files) ls -l $2 | grep ^- | wc -l;;
size.dir) du -s $2 | cut -f1;;
script) /bin/bash /usr/local/etc/scripts/script.sh;;
*) echo "Use: ddate, minute, files <parameters>, size.dir <parameters> or script";;
esac
$ zabbix_get -s Server_IP -k general[minute]
Create a shell script that relates the speed of transfer of a HOP between the machine and the IP chosen. Use the PING command and express the result in kB/sec.
!/bin/bash
I create a temporary file
touch info.txt;
I take the second line of command PING stopped after two seconds
I post the results in the file.
ping -t 2 $1 | head -2 | tail -1 > info.txt;
I take bytes
cut -c -2 info.txt;
I take ms
cut -c 53-59 info.txt;
Now, how to make transformations in KB and in Sec?
Show result
echo "Result: .....";
I delete the file.
rm file.txt;
You can do:
bytes=$(cut -c -2 info.txt)
ms=$(cut -c 53-59 info.txt)
echo "KiB: "$(($bytes/1024))
echo "Sec: "$(($ms/1000))
speed=$((1000*$bytes*1000/1024/$ms))
speed=$(echo $speed|sed -r 's/^(.*)(.{3})$/\1.\2/')
echo "Speed: $speed KiB/s"
This is of course considering 1 KiB = 1024 bytes, where KiB is usually used for KB.
RESULT=$(ping -t 2 -c 2 $1 | grep 'time=' | head -1 | sed 's/([0-9][0-9]).(time=)(.)(ms)/\1:\3/g')
echo "BYTES = ${RESULT%:}"
echo "SPEED = ${RESULT#:}"
For the conversion part you might need to use python or perl. In bash it is not possible to calculate fractional numbers.
Thank you all! WORK!
touch info.txt;
ping -t 2 $1 | head -2 | tail -1 > info.txt;
bytes=$(cut -c -2 info.txt);
ms=$(cut -c 53-59 info.txt);
KB=$(echo "scale=5; $bytes /1024" | bc);
Sec=$(echo "scale=5; $ms /1000" | bc);
Speed=$(echo "scale=5; $KB/$Sec" | bc);
echo "Speed of HOP: $Speed KB/sec.";
rm info.txt;
First off, I'm new to this. I have some experience with windows scripting and apple script but not much with bash. What I'm trying to do is grab the PID and %CPU of a specific process. then compare the %CPU against a set number, and if it's higher, kill the process. I feel like I'm close, but now I'm getting the following error:
[[: 0.0: syntax error: invalid arithmetic operator (error token is ".0")
what am I doing wrong? here's my code so far:
#!/bin/bash
declare -i app_pid
declare -i app_cpu
declare -i cpu_limit
app_name="top"
cpu_limit="50"
app_pid=`ps aux | grep $app_name | grep -v grep | awk {'print $2'}`
app_cpu=`ps aux | grep $app_name | grep -v grep | awk {'print $3'}`
if [[ ! $app_cpu -gt $cpu_limit ]]; then
echo "crap"
else
echo "we're good"
fi
Obviously I'm going to replace the echos in the if/then statement but it's acting as if the statement is true regardless of what the cpu load actually is (I tested this by changing the -gt to -lt and it still echoed "crap"
Thank you for all the help. Oh, and this is on a OS X 10.7 if that is important.
I recommend taking a look at the facilities of ps to avoid multiple horrible things you do.
On my system (ps from procps on linux, GNU awk) I would do this:
ps -C "$app-name" -o pid=,pcpu= |
awk --assign maxcpu="$cpu_limit" '$2>maxcpu {print "crappy pid",$1}'
The problem is that bash can't handle decimals. You can just multiply them by 100 and work with plain integers instead:
#!/bin/bash
declare -i app_pid
declare -i app_cpu
declare -i cpu_limit
app_name="top"
cpu_limit="5000"
app_pid=`ps aux | grep $app_name | grep -v grep | awk {'print $2'}`
app_cpu=`ps aux | grep $app_name | grep -v grep | awk {'print $3*100'}`
if [[ $app_cpu -gt $cpu_limit ]]; then
echo "crap"
else
echo "we're good"
fi
Keep in mind that CPU percentage is a suboptimal measurement of application health. If you have two processes running infinite loops on a single core system, no other application of the same priority will ever go over 33%, even if they're trashing around.
#!/bin/sh
PROCESS="java"
PID=`pgrep $PROCESS | tail -n 1`
CPU=`top -b -p $PID -n 1 | tail -n 1 | awk '{print $9}'`
echo $CPU
I came up with this, using top and bc.
Use it by passing in ex: ./script apache2 50 # max 50%
If there are many PIDs matching your program argument, only one will be calculated, based on how top lists them. I could have extended the script by catching them all and avergaing the percentage or something, but this will have to do.
You can also pass in a number, ./script.sh 12345 50, which will force it to use an exact PID.
#!/bin/bash
# 1: ['command\ name' or PID number(,s)] 2: MAX_CPU_PERCENT
[[ $# -ne 2 ]] && exit 1
PID_NAMES=$1
# get all PIDS as nn,nn,nn
if [[ ! "$PID_NAMES" =~ ^[0-9,]+$ ]] ; then
PIDS=$(pgrep -d ',' -x $PID_NAMES)
else
PIDS=$PID_NAMES
fi
# echo "$PIDS $MAX_CPU"
MAX_CPU="$2"
MAX_CPU="$(echo "($MAX_CPU+0.5)/1" | bc)"
LOOP=1
while [[ $LOOP -eq 1 ]] ; do
sleep 0.3s
# Depending on your 'top' version and OS you might have
# to change head and tail line-numbers
LINE="$(top -b -d 0 -n 1 -p $PIDS | head -n 8 \
| tail -n 1 | sed -r 's/[ ]+/,/g' | \
sed -r 's/^\,|\,$//')"
# If multiple processes in $PIDS, $LINE will only match\
# the most active process
CURR_PID=$(echo "$LINE" | cut -d ',' -f 1)
# calculate cpu limits
CURR_CPU_FLOAT=$(echo "$LINE"| cut -d ',' -f 9)
CURR_CPU=$(echo "($CURR_CPU_FLOAT+0.5)/1" | bc)
echo "PID $CURR_PID: $CURR_CPU""%"
if [[ $CURR_CPU -ge $MAX_CPU ]] ; then
echo "PID $CURR_PID ($PID_NAMES) went over $MAX_CPU""%"
echo "[[ $CURR_CPU""% -ge $MAX_CPU""% ]]"
LOOP=0
break
fi
done
echo "Stopped"
Erik, I used a modified version of your code to create a new script that does something similar. Hope you don't mind it.
A bash script to get the CPU usage by process
usage:
nohup ./check_proc bwengine 70 &
bwegnine is the process name we want to monitor 70 is to log only when the process is using over 70% of the CPU.
Check the logs at: /var/log/check_procs.log
The output should be like:
DATE | TOTAL CPU | CPU USAGE | Process details
Example:
03/12/14 17:11 |20.99|98| ProdPROXY-ProdProxyPA.tra
03/12/14 17:11 |20.99|100| ProdPROXY-ProdProxyPA.tra
Link to the full blog:
http://felipeferreira.net/?p=1453
It is also useful to have app_user information available to test whether the current user has the rights to kill/modify the running process. This information can be obtained along with the needed app_pid and app_cpu by using read eliminating the need for awk or any other 3rd party parser:
read app_user app_pid tmp_cpu stuff <<< \
$( ps aux | grep "$app_name" | grep -v "grep\|defunct\|${0##*/}" )
You can then get your app_cpu * 100 with:
app_cpu=$((${tmp_cpu%.*} * 100))
Note: Including defunct and ${0##*/} in grep -v prevents against multiple processes matching $app_name.
I use top to check some details. It provides a few more details like CPU time.
On Linux this would be:
top -b -n 1 | grep $app_name
On Mac, with its BSD version of top:
top -l 1 | grep $app_name