CPU monitoring script not triggering properly - bash

I was wondering if anyone could help with the reasons that this is not triggering properly
HOSTNAME=`hostname -s`
LOAD=25.00
CAT=/bin/cat
MAILFILE=/home/jboss/monitor.mail
MAILER=/bin/mail
mailto="bob#bob.bob"
CPU_LOAD=`sar -P ALL 1 10 |grep 'Average.*all' |awk -F" " '{print 100.0 -$NF}'`
if [[ $CPU_LOAD > $LOAD ]];
then
PROC=`ps -eo pcpu,pid -o comm= | sort -k1 -n -r | head -1`
echo -e "Please check processes on ${HOSTNAME} the value of cpu load is $CPU_LOAD%.
Highest process is: $PROC" > $MAILFILE
$CAT $MAILFILE | $MAILER -s "CPU Load is on ${HOSTNAME} is $CPU_LOAD %" $mailto
fi
This seems to be working properly for the sar and ps however I'm still getting alerts emailed for things like CPU Load is 3.18%. Unless I'm missing something it shouldn't trigger unless load is greater than 25%.
It seems though that it's more doing if load is greater than 2.5% Any suggestions?
Thank you

Instead of using:
if [[ $CPU_LOAD > $LOAD ]];then
you must use
if [[ $CPU_LOAD -gt $LOAD ]]; then

Bash only handles integers, so to use higher precision, you could do something like this:
cpu_limit=25
# read the 5min load-average straight from the special file on /proc
read -r _ load_avg _ </proc/loadavg
# multiply by 100 for precision
load_avg=$(bc <<<"scale=0; $load_avg * 100 / 1")
# compare numbers with (( )) instead
if (( load_avg > cpu_limit )); then
...
fi

Try this code - (Tested - working fine)
$ cat f.sh
HOSTNAME=$(hostname -s)
LOAD=25.00
MAILFILE=$HOME/a.txt
MAILER=/bin/mailx
mailto="vipinkumarr89#gmail.com"
CPU_LOAD=$(sar -P ALL 1 10 |grep 'Average.*all' |awk -F" " '{print 100.0 -$NF}')
if [[ $CPU_LOAD > $LOAD ]];then
{
PROC=$(ps -eo pcpu,pid -o comm= | sort -k1 -n -r | head -1)
echo -e "Please check processes on ${HOSTNAME} the value of cpu load is $CPU_LOAD%.
Highest process is: $PROC" > $MAILFILE
cat $MAILFILE | $MAILER -s "CPU Load is on ${HOSTNAME} is $CPU_LOAD %" $mailto
}
fi

Related

bash send mail when threshold is exceeded in three successive runs

I have a bash script that does a pretty decent job on reporting CPU level above 95%. The issue I am running into is it will report on even "spikes". This script runs every 10 minutes and checks all of my servers. Is there a way to only report if the server reports a level above 95% for 3 iterations? say after the 3rd time it runs, i.e 30 min.
12:00 - 1st report - 98%
12:10 - 2nd report - 99%
12:20 - 3rd report - 98% (now alert the admin)
here is the section of the script:
for sn in $(cat /tmp/hosts |grep -v "#"); do
cpuuse=$(ssh -qn -o ConnectTimeout=15 -oStrictHostKeyChecking=no -o BatchMode=yes $sn "top -b -n2 -p 1 | fgrep \"Cpu(s)\" | tail -1 | awk -F'id,' -v prefix=\"\$prefix\" '{ split(\$1, vs, \",\"); v=vs[length(vs)]; sub(\"%\", \"\", v); printf \"%s%.1f%%\n\", prefix, 100 - v }' | rev | cut -c 4- | rev")
if [[ "$cpuuse" -ge 95 ]]; then
echo "CPU Alert!! $sn CPU is high - $cpuuse%" | mailx -s "CPU Alert on $sn" admin#sample.com
fi
done
AFAIK There isn't really a bash trick. You just need to store a counter somewhere. Something like this could do the trick:
for sn in $(cat /tmp/hosts |grep -v "#"); do
cpuuse=$(ssh -qn -o ConnectTimeout=15 -oStrictHostKeyChecking=no -o BatchMode=yes $sn "top -b -n2 -p 1 | fgrep \"Cpu(s)\" | tail -1 | awk -F'id,' -v prefix=\"\$prefix\" '{ split(\$1, vs, \",\"); v=vs[length(vs)]; sub(\"%\", \"\", v); printf \"%s%.1f%%\n\", prefix, 100 - v }' | rev | cut -c 4- | rev")
counter_file=/tmp/my-counter-file-$sn # separate counter file for each server
if [[ "$cpuuse" -ge 95 ]]; then
date >> $counter_file # just add a line to the counter file
if [[ $(wc -l $counter_file) -ge 3 ]]; then
echo "CPU Alert!! $sn CPU is high - $cpuuse%" | mailx -s "CPU Alert on $sn" admin#sample.com
rm $counter_file # message was sent, reset counter
fi
else
rm $counter_file # below limit, reset counter
fi
done
The trick here is to store a counter in a file. The number of lines in the file is your counter value.

how to speed up checking if file exists in bash

I'm new at Bashing and wrote a code to check my photos files but find it very slow and gets a few empty returns checking 17000+ photos. Is there any way to use all 4 cpus running this script and so speed it up
Please help
#!/bin/bash
readarray -t array < ~/Scripts/ourphotos.txt
totalfiles="${#array[#]}"
echo $totalfiles
i=0
ii=0
check1=""
while :
do
check=${array[$i]}
if [[ ! -r $( echo $check ) ]] ; then
if [ $check = $check1 ]; then
echo "empty "$check
else
unset array[$i]
ii=$((ii + 1 ))
fi
fi
if [ $totalfiles = $i ]; then
break
fi
i=$(( i + 1 ))
done
if [ $ii -gt "1" ]; then
notify-send -u critical $ii" files have been deleted or are unreadable"
fi
It's a filesystem operation so multiple cores will hardly help.
Simplification might:
while read file; do
i=$((i+1)); [ -e "$file" ] || ii=$(ii+1));
done < "$HOME/Scripts/ourphotos.txt"
#...
Two points:
you don't need to keep the whole file in memory (no arrays needed)
$( echo $check ) forks a proces. You generally want to avoid forking and execing in loops.
This is an old question, but a common problem lacking an evidence-based solution.
awk '{print "[ -e "$1" ] && echo "$2}' | parallel # 400 files/s
awk '{print "[ -e "$1" ] && echo "$2}' | bash # 6000 files/s
while read file; do [ -e $file ] && echo $file; done # 12000 files/s
xargs find # 200000 files/s
parallel --xargs find # 250000 files/s
xargs -P2 find # 400000 files/s
xargs -P96 find # 800000 files/s
I tried this on a few different systems and the results were not consistent, but xargs -P (parallel execution) was consistently the fastest. I was surprised that xargs -P was faster than GNU parallel (not reported above, but sometimes much faster), and I was surprised that parallel execution helped so much — I thought that file I/O would be the limiting factor and parallel execution wouldn't matter much.
Also noteworthy is that xargs find is about 20x faster than the accepted solution, and much more concise. For example, here is a rewrite of OP's script:
#!/bin/bash
total=$(wc -l ~/Scripts/ourphotos.txt | awk '{print $1}')
# tr '\n' '\0' | xargs -0 handles spaces and other funny characters in filenames
found=$(cat ~//Scripts/ourphotos.txt | tr '\n' '\0' | xargs -0 -P4 find | wc -l)
if [ $total -ne $found ]; then
ii=$(expr $total - $found)
notify-send -u critical $ii" files have been deleted or are unreadable"
fi

replacing some commands in an existing BASH script

I have the following script which sends the results of an iwlist scan via OSC:
#!/bin/bash
NUM_BANKS=20
while [[ "$input" != "\e" ]] ; do
networks=$(iwlist wlan0 scanning | awk 'BEGIN{ FS="[:=]"; OFS = " " }
/ESSID/{
#gsub(/ /,"\\ ",$2)
#gsub(/\"/,"",$2)
essid[c++]=$2
}
/Address/{
gsub(/.*Address: /,"")
address[a++]=$0
}
/Encryption key/{ encryption[d++]=$2 }
/Quality/{
gsub(/ dBm /,"")
signal[b++]=$3
}
END {
for( c in essid ) { print "/wlan_scan ",essid[c],signal[c],encryption[c] }
}'
)
read -t 0.1 input
echo "$networks" | while read network; do
set $network
hash=` echo "$2" | md5sum | awk '{ print $1 }'| tr '[:lower:]' '[:upper:]'`
bank=`echo "ibase=16;obase=A; $hash%$NUM_BANKS " | bc`
echo "$1$bank $2 $3 $4"
echo "$1$bank $2 $3 $4" | sendOSC -h localhost 9997
done
#echo "$networks" | sendOSC -h localhost 9997
done
An example of the output from this is '/wlan_scan13 BTHomehub757 -85 On', which is then sent via the sendOSC program.
I basically need to replace the iwlist scan data with the results of this tshark scan:
sudo tshark -I -i en1 -T fields -e wlan.sa_resolved -e wlan_mgt.ssid -e radiotap.dbm_antsignal type mgt subtype probe
which similarly outputs two strings and an int, outputting a result like:
'Hewlett-_91:fa:xx EE-BrightBox-mjmxxx -78'.
So eventually I want the script to give me an output in this instance of
'/wlan13 Hewlett-_91:fa:xx EE-BrightBox-mjmxxx -78'.
Both scans constantly generate results in this format at about the same rate, updating as new wifi routers are detected, and these are sent out as soon as they arrive over the sendOSC program.
This is probably a pretty simple edit for an experienced coder, but I've been trying to work this out for days and I figured I should ask for help!
If someone could clarify what needs to stay and what needs to go here I'd really appreciate it.
Many thanks.
Do you really want to replace commands? The sane approach would seem to be to add an option to the script to specify which piece of code to run, and include them both.
# TODO: replace with proper option parsing
case $1 in
--tshark) command=tshark_networks; shift;;
*) command=iwlist_networks;;
esac
tshark_networks () {
sudo tshark -I -i en1 -T fields \
-e wlan.sa_resolved \
-e wlan_mgt.ssid \
-e radiotap.dbm_antsignal type mgt subtype probe
}
iwlist_networks () {
iwlist wlan0 scanning | awk .... long Awk script here ....
}
while [[ "$input" != "\e" ]] ; do
networks=$($command)
read -t 0.1 input
echo "$networks" | while read network; do
: the rest as before, except fix your indentation
This also has the nice side effect that the hideous iwlist command is encapsulated in its own function, outside of the main loop.
... Well, in fact, I might refactor the main loop to
while true; do
$command |
while read a b c d; do
hash=$(echo "$b" | md5sum | awk '{ print toupper($1) }')
bank=$(echo "ibase=16;obase=A; $hash%$NUM_BANKS " | bc)
echo "$a$bank $b $c $d"
echo "$a$bank $b $c $d" | sendOSC -h localhost 9997
done
read -t 0.1 input
case $input in '\e') break;; esac
done

Bash script checking cpu usage of specific process

First off, I'm new to this. I have some experience with windows scripting and apple script but not much with bash. What I'm trying to do is grab the PID and %CPU of a specific process. then compare the %CPU against a set number, and if it's higher, kill the process. I feel like I'm close, but now I'm getting the following error:
[[: 0.0: syntax error: invalid arithmetic operator (error token is ".0")
what am I doing wrong? here's my code so far:
#!/bin/bash
declare -i app_pid
declare -i app_cpu
declare -i cpu_limit
app_name="top"
cpu_limit="50"
app_pid=`ps aux | grep $app_name | grep -v grep | awk {'print $2'}`
app_cpu=`ps aux | grep $app_name | grep -v grep | awk {'print $3'}`
if [[ ! $app_cpu -gt $cpu_limit ]]; then
echo "crap"
else
echo "we're good"
fi
Obviously I'm going to replace the echos in the if/then statement but it's acting as if the statement is true regardless of what the cpu load actually is (I tested this by changing the -gt to -lt and it still echoed "crap"
Thank you for all the help. Oh, and this is on a OS X 10.7 if that is important.
I recommend taking a look at the facilities of ps to avoid multiple horrible things you do.
On my system (ps from procps on linux, GNU awk) I would do this:
ps -C "$app-name" -o pid=,pcpu= |
awk --assign maxcpu="$cpu_limit" '$2>maxcpu {print "crappy pid",$1}'
The problem is that bash can't handle decimals. You can just multiply them by 100 and work with plain integers instead:
#!/bin/bash
declare -i app_pid
declare -i app_cpu
declare -i cpu_limit
app_name="top"
cpu_limit="5000"
app_pid=`ps aux | grep $app_name | grep -v grep | awk {'print $2'}`
app_cpu=`ps aux | grep $app_name | grep -v grep | awk {'print $3*100'}`
if [[ $app_cpu -gt $cpu_limit ]]; then
echo "crap"
else
echo "we're good"
fi
Keep in mind that CPU percentage is a suboptimal measurement of application health. If you have two processes running infinite loops on a single core system, no other application of the same priority will ever go over 33%, even if they're trashing around.
#!/bin/sh
PROCESS="java"
PID=`pgrep $PROCESS | tail -n 1`
CPU=`top -b -p $PID -n 1 | tail -n 1 | awk '{print $9}'`
echo $CPU
I came up with this, using top and bc.
Use it by passing in ex: ./script apache2 50 # max 50%
If there are many PIDs matching your program argument, only one will be calculated, based on how top lists them. I could have extended the script by catching them all and avergaing the percentage or something, but this will have to do.
You can also pass in a number, ./script.sh 12345 50, which will force it to use an exact PID.
#!/bin/bash
# 1: ['command\ name' or PID number(,s)] 2: MAX_CPU_PERCENT
[[ $# -ne 2 ]] && exit 1
PID_NAMES=$1
# get all PIDS as nn,nn,nn
if [[ ! "$PID_NAMES" =~ ^[0-9,]+$ ]] ; then
PIDS=$(pgrep -d ',' -x $PID_NAMES)
else
PIDS=$PID_NAMES
fi
# echo "$PIDS $MAX_CPU"
MAX_CPU="$2"
MAX_CPU="$(echo "($MAX_CPU+0.5)/1" | bc)"
LOOP=1
while [[ $LOOP -eq 1 ]] ; do
sleep 0.3s
# Depending on your 'top' version and OS you might have
# to change head and tail line-numbers
LINE="$(top -b -d 0 -n 1 -p $PIDS | head -n 8 \
| tail -n 1 | sed -r 's/[ ]+/,/g' | \
sed -r 's/^\,|\,$//')"
# If multiple processes in $PIDS, $LINE will only match\
# the most active process
CURR_PID=$(echo "$LINE" | cut -d ',' -f 1)
# calculate cpu limits
CURR_CPU_FLOAT=$(echo "$LINE"| cut -d ',' -f 9)
CURR_CPU=$(echo "($CURR_CPU_FLOAT+0.5)/1" | bc)
echo "PID $CURR_PID: $CURR_CPU""%"
if [[ $CURR_CPU -ge $MAX_CPU ]] ; then
echo "PID $CURR_PID ($PID_NAMES) went over $MAX_CPU""%"
echo "[[ $CURR_CPU""% -ge $MAX_CPU""% ]]"
LOOP=0
break
fi
done
echo "Stopped"
Erik, I used a modified version of your code to create a new script that does something similar. Hope you don't mind it.
A bash script to get the CPU usage by process
usage:
nohup ./check_proc bwengine 70 &
bwegnine is the process name we want to monitor 70 is to log only when the process is using over 70% of the CPU.
Check the logs at: /var/log/check_procs.log
The output should be like:
DATE | TOTAL CPU | CPU USAGE | Process details
Example:
03/12/14 17:11 |20.99|98| ProdPROXY-ProdProxyPA.tra
03/12/14 17:11 |20.99|100| ProdPROXY-ProdProxyPA.tra
Link to the full blog:
http://felipeferreira.net/?p=1453
It is also useful to have app_user information available to test whether the current user has the rights to kill/modify the running process. This information can be obtained along with the needed app_pid and app_cpu by using read eliminating the need for awk or any other 3rd party parser:
read app_user app_pid tmp_cpu stuff <<< \
$( ps aux | grep "$app_name" | grep -v "grep\|defunct\|${0##*/}" )
You can then get your app_cpu * 100 with:
app_cpu=$((${tmp_cpu%.*} * 100))
Note: Including defunct and ${0##*/} in grep -v prevents against multiple processes matching $app_name.
I use top to check some details. It provides a few more details like CPU time.
On Linux this would be:
top -b -n 1 | grep $app_name
On Mac, with its BSD version of top:
top -l 1 | grep $app_name

Bash - Memory usage

I have a problem that I can't solve, so I've come to you.
I need to write a program that will read all processes and a program must sort them by users and for each user it must display how much of a memory is used.
For example:
user1: 120MB
user2: 300MB
user3: 50MB
total: 470MB
I was thinking to do this with ps aux command and then get out pid and user with awk command. Then with pmap I just need to get total memory usage of a process.
it's just a little update, users are automatically selected
#!/bin/bash
function mem_per_user {
# take username as only parameter
local user=$1
# get all pid's of a specific user
# you may elaborate the if statement in awk obey your own rules
pids=`ps aux | awk -v username=$user '{if ($1 == username) {print $2}}'`
local totalmem=0
for pid in $pids
do
mem=`pmap $pid | tail -1 | \
awk '{pos = match($2, /([0-9]*)K/, mem); if (pos > 0) print mem[1]}'`
# when variable properly set
if [ ! -z $mem ]
then
totalmem=$(( totalmem + $mem))
fi
done
echo $totalmem
}
total_mem=0
for username in `ps aux | awk '{ print $1 }' | tail -n +2 | sort | uniq`
do
per_user_memory=0
per_user_memory=$(mem_per_user $username)
if [ "$per_user_memory" -gt 0 ]
then
total_mem=$(( $total_mem + $per_user_memory))
echo "$username: $per_user_memory KB"
fi
done
echo "Total: $total_mem KB"
Try this script, which may solve your problem:
#!/bin/bash
function mem_per_user {
# take username as only parameter
local user=$1
# get all pid's of a specific user
# you may elaborate the if statement in awk obey your own rules
pids=`ps aux | awk -v username=$user '{if ($1 == username) {print $2}}'`
local totalmem=0
for pid in $pids
do
mem=`pmap $pid | tail -1 | \
awk '{pos = match($2, /([0-9]*)K/, mem); if (pos > 0) print mem[1]}'`
# when variable properly set
if [ ! -z $mem ]
then
totalmem=$(( totalmem + $mem))
fi
done
echo $totalmem
}
total_mem=0
for i in `seq 1 $#`
do
per_user_memory=0
eval username=\$$i
per_user_memory=$(mem_per_user $username)
total_mem=$(( $total_mem + $per_user_memory))
echo "$username: $per_user_memory KB"
done
echo "Total: $total_mem KB"
Best regards!
You can access the shell commands in python using the subprocess module. It allows you to spawn subprocesses and connect to the out/in/error. You can execute the ps -aux command and parse the output in python.
check out the docs here
Here is my version. I think that Tim's version is not working correctly, the values in KB are too large. I think the RSS column from pmap -x command should be used to give more accurate value. But do note that you can't always get correct values because processes can share memmory. Read this A way to determine a process's "real" memory usage, i.e. private dirty RSS?
#!/bin/bash
if [ "$(id -u)" != "0" ]; then
echo "WARNING: you have to run as root if you want to see all users"
fi
echo "Printing only users that current memmory usage > 0 Kilobytes "
all=0
for username in `ps aux | awk '{ print $1 }' | tail -n +2 | sort | uniq`
do
pids=`ps aux | grep $username | awk -F" " '{print $2}'`
total_memory=0
for pid in $pids
do
process_mem=`pmap -x $pid | tail -1 | awk -F" " '{print $4}'`
if [ ! -z $process_mem ]
then #don't try to add if string has no length
total_memory=$((total_memory+$process_mem))
fi
done
#print only those that use any memmory
if [ $total_memory -gt 0 ]
then
total_memory=$((total_memory/(1024)))
echo "$username : $total_memory MB"
all=$((all+$total_memory))
fi
done
echo "----------------------------------------"
echo "Total: $all MB"
echo "WARNING: Use at your own risk"

Resources