Determining if process is running using pgrep - bash

I have a script that I only want to be running one time. If the script gets called a second time I'm having it check to see if a lockfile exists. If the lockfile exists then I want to see if the process is actually running.
I've been messing around with pgrep but am not getting the expected results:
#!/bin/bash
COUNT=$(pgrep $(basename $0) | wc -l)
PSTREE=$(pgrep $(basename $0) ; pstree -p $$)
echo "###"
echo $COUNT
echo $PSTREE
echo "###"
echo "$(basename $0) :" `pgrep -d, $(basename $0)`
echo sleeping.....
sleep 10
The results I'm getting are:
$ ./test.sh
###
2
2581 2587 test.sh(2581)---test.sh(2587)---pstree(2591)
###
test.sh : 2581
sleeping.....
I don't understand why I'm getting a "2" when only one process is actually running.
Any ideas? I'm sure it's the way I'm calling it. I've tried a number of different combinations and can't quite seem to figure it out.
SOLUTION:
What I ended up doing was doing this (portion of my script):
function check_lockfile {
# Check for previous lockfiles
if [ -e $LOCKFILE ]
then
echo "Lockfile $LOCKFILE already exists. Checking to see if process is actually running...." >> $LOGFILE 2>&1
# is it running?
if [ $(ps -elf | grep $(cat $LOCKFILE) | grep $(basename $0) | wc -l) -gt 0 ]
then
abort "ERROR! - Process is already running at PID: $(cat $LOCKFILE). Exitting..."
else
echo "Process is not running. Removing $LOCKFILE" >> $LOGFILE 2>&1
rm -f $LOCKFILE
fi
else
echo "Lockfile $LOCKFILE does not exist." >> $LOGFILE 2>&1
fi
}
function create_lockfile {
# Check for previous lockfile
check_lockfile
#Create lockfile with the contents of the PID
echo "Creating lockfile with PID:" $$ >> $LOGFILE 2>&1
echo -n $$ > $LOCKFILE
echo "" >> $LOGFILE 2>&1
}
# Acquire lock file
create_lockfile >> $LOGFILE 2>&1 \
|| echo "ERROR! - Failed to acquire lock!"

The argument for pgrep is an extended regular expression pattern.
In you case the command pgrep $(basename $0) will evaluate to pgrep test.sh which will match match any process that has test followed by any character and lastly followed by sh. So it wil match btest8sh, atest_shell etc.
You should create a lock file. If the lock file exists program should exit.
lock=$(basename $0).lock
if [ -e $lock ]
then
echo Process is already running with PID=`cat $lock`
exit
else
echo $$ > $lock
fi

You are already opening a lock file. Use it to make your life easier.
Write the process id to the lock file. When you see the lock file exists, read it to see what process id it is supposedly locking, and check to see if that process is still running.
Then in version 2, you can also write program name, program arguments, program start time, etc. to guard against the case where a new process starts with the same process id.

Put this near the top of your script...
pid=$$
script=$(basename $0)
guard="/tmp/$script-$(id -nu).pid"
if test -f $guard ; then
echo >&2 "ERROR: Script already runs... own PID=$pid"
ps auxw | grep $script | grep -v grep >&2
exit 1
fi
trap "rm -f $guard" EXIT
echo $pid >$guard
And yes, there IS a small window for a race condition between the test and echo commands, which can be fixed by appending to the guard file, and then checking that the first line is indeed our own PID. Also, the diagnostic output in the if can be commented out in a production version.

Related

How to detect a non-rolling log file and pattern match in a shell script which is using tail, while, read, and?

I am monitoring a log file and if PATTERN didn't appear in it within THRESHOLD seconds, the script should print "error", otherwise, it should print "clear". The script is working fine, but only if the log is rolling.
I've tried reading 'timeout' but didn't work.
log_file=/tmp/app.log
threshold=120
tail -Fn0 ${log_file} | \
while read line ; do
echo "${line}" | awk '/PATTERN/ { system("touch pattern.tmp") }'
code to calculate how long ago pattern.tmp touched and same is assigned to DIFF
if [ ${diff} -gt ${threshold} ]; then
echo "Error"
else
echo "Clear"
done
It is working as expected only when there is 'any' line printed in the app.log.
If the application got hung for any reason and the log stopped rolling, there won't be any output by the script.
Is there a way to detect the 'no output' of tail and do some command at that time?
It looks like the problem you're having is that the timing calculations inside your while loop never get a chance to run when read is blocking on input. In that case, you can pipe the tail output into a while true loop, inside of which you can do if read -t $timeout:
log_file=/tmp/app.log
threshold=120
timeout=10
tail -Fn0 "$log_file" | while true; do
if read -t $timeout line; then
echo "${line}" | awk '/PATTERN/ { system("touch pattern.tmp") }'
fi
# code to calculate how long ago pattern.tmp touched and same is assigned to diff
if [ ${diff} -gt ${threshold} ]; then
echo "Error"
else
echo "Clear"
fi
done
As Ed Morton pointed out, all caps variable names are not a good idea in bash scripts, so I used lowercase variable names.
How about something simple like:
sleep "$threshold"
grep -q 'PATTERN' "$log_file" && { echo "Clear"; exit; }
echo "Error"
If that's not all you need then edit your question to clarify your requirements. Don't use all upper case for non exported shell variable names btw - google it.
To build further on your idea, it might be beneficial to run the awk part in the background and a continuous loop to do the checking.
#!/usr/bin/env bash
log_file="log.txt"
# threshold in seconds
threshold=10
# run the following process in the background
stdbuf -oL tail -f0n "$log_file" \
| awk '/PATTERN/{system("touch "pattern.tmp") }' &
while true; do
match=$(find . -type f -iname "pattern.tmp" -newermt "-${threshold} seconds")
if [[ -z "${match}" ]]; then
echo "Error"
else
echo "Clear"
fi
done
This looks to me like a watchdog timer. I've implemented something like this by forcing a background process to update my log, so I don't have to worry about read -t. Here's a working example:
#!/usr/bin/env bash
threshold=10
grain=2
errorstate=0
while sleep "$grain"; do
date '+[%F %T] watchdog timer' >> log
done &
trap "kill -HUP $!" 0 HUP INT QUIT TRAP ABRT TERM
printf -v lastseen '%(%s)T'
tail -F log | while read line; do
printf -v now '%(%s)T'
if (( now - lastseen > threshold )); then
echo "ERROR"
errorstate=1
else
if (( errorstate )); then
echo "Recovered, yay"
errorstate=0
fi
fi
if [[ $line =~ .*PATTERN.* ]]; then
lastseen=$now
fi
done
Run this in one window, wait $threshold seconds for it to trigger, then in another window echo PATTERN >> log to see the recovery.
While this can be made as granular as you like (I've set it to 2 seconds in the example), it does pollute your log file.
Oh, and note that printf '%(%s)T' format requires bash version 4 or above.

How to modify call stack in Bash?

Suppose I want to write a smart logging function log, that would read the line that is immediately after the log invocation and store it and its output in the log file. The function can find, read and execute the line of code that is in question. The problem is, that when the function returns, bash executes the line again.
Everything works fine except that assignment to BASH_LINENO[0] is silently discarded. Reading the http://wiki.bash-hackers.org/syntax/shellvars#bash_lineno I've learned that the variable is not read only.
function log()
{
BASH_LINENO[0]=$((${BASH_LINENO[0]}+1))
file=${BASH_SOURCE[1]##*/}
linenr=$((${BASH_LINENO[0]} + 1 ))
line=`sed "1,$((${linenr}-1)) d;${linenr} s/^ *//; q" $file`
if [ -f /tmp/tmp.txt ]; then
rm /tmp/tmp.txt
fi
exec 3>&1 4>&2 >>/tmp/tmp.txt 2>&1
set -x
eval $line
exitstatus=$?
set +x
exec 1>&3 2>&4 4>&- 3>&-
#Here goes the code that parses the /tmp/tmp.txt and stores it in the log
if [ "$exitstatus" -ne "0" ]; then
exit $exitstatus
fi
}
#Test case:
log
echo "Unfortunately this line gets appended twice" | tee -a bla.txt;
After consulting the wisdom of users on bug-bash#gnu.org mailing list it appears that modifying the call stack is not possible, after all. Here is an answer I got from Chet Ramey:
BASH_LINENO is a call stack; assignments to it should be (and are)
ignored. That's been the case since at least bash-3.2 (that's where I
quit looking).
There is an indirect way to force bash to not execute the next
command: set the extdebug option and have the DEBUG trap return a
non-zero status.
The above technique works very well for my purposes. I am finally able to do a production version of the log function.
#!/bin/bash
shopt -s extdebug
repetition_count=0
_ERR_HDR_FMT="%.8s %s#%s:%s:%s"
_ERR_MSG_FMT="[${_ERR_HDR_FMT}]%s \$ "
msg() {
printf "$_ERR_MSG_FMT" $(date +%T) $USER $HOSTNAME $PWD/${BASH_SOURCE[2]##*/} ${BASH_LINENO[1]}
echo ${#}
}
function rlog()
{
case $- in *x*) USE_X="-x";; *) USE_X=;; esac
set +x
if [ "${BASH_LINENO[0]}" -ne "$myline" ]; then
repetition_count=0
return 0;
fi
if [ "$repetition_count" -gt "0" ]; then
return -1;
fi
if [ -z "$log" ]; then
return 0
fi
file=${BASH_SOURCE[1]##*/}
line=`sed "1,$((${myline}-1)) d;${myline} s/^ *//; q" $file`
if [ -f /tmp/tmp.txt ]; then
rm /tmp/tmp.txt
fi
echo "$line" > /tmp/tmp2.txt
mymsg=`msg`
exec 3>&1 4>&2 >>/tmp/tmp.txt 2>&1
set -x
source /tmp/tmp2.txt
exitstatus=$?
set +x
exec 1>&3 2>&4 4>&- 3>&-
repetition_count=1 #This flag is to prevent multiple execution of the current line of code. This condition gets checked at the beginning of the function
frstline=`sed '1q' /tmp/tmp.txt`
[[ "$frstline" =~ ^(\++)[^+].*$ ]]
# echo "BASH_REMATCH[1]=${BASH_REMATCH[1]}"
eval 'tmp="${BASH_REMATCH[1]}"'
pluscnt=$(( (${#tmp} + 1) *2 ))
pluses="\+\+\+\+\+\+\+\+\+\+\+\+\+\+\+\+\+\+\+\+\+\+\+\+\+\+\+\+"
pluses=${pluses:0:$pluscnt}
commandlines="`awk \" gsub(/^${pluses}\\s/,\\\"\\\")\" /tmp/tmp.txt`"
n=0
#There might me more then 1 command in the debugged line. The next loop appends each command to the log.
while read -r line; do
if [ "$n" -ne "0" ]; then
echo "+ $line" >>$log
else
echo "${mymsg}$line" >>$log
n=1
fi
done <<< "$commandlines"
#Next line extracts all lines that are prefixed by sufficent number of "+" (usually 3), that are immidiately after the last line prefixed with $pluses, i.e. after the last command line.
awk "BEGIN {flag=0} /${pluses}/ { flag=1 } /^[^+]/ { if (flag==1) print \$0; }" /tmp/tmp.txt | tee -a $log
if [ "$exitstatus" -ne "0" ]; then
echo "## Exit status: $exitstatus" >>$log
fi
echo >>$log
if [ "$exitstatus" -ne "0" ]; then
exit $exitstatus
fi
if [ -n "$USE_X" ]; then
set -x
fi
return -1
}
log_next_line='eval if [ -n "$log" ]; then myline=$(($LINENO+1)); trap "rlog" DEBUG; fi;'
logoff='trap - DEBUG'
The usage of the file is intended as follows:
#!/bin/bash
log=mylog.log
if [ -f mylog.log ]; then
rm mylog.log
fi
. ./log.sh
a=example
x=a
$log_next_line
echo "KUKU!"
$log_next_line
echo $x
$log_next_line
echo ${!x}
$log_next_line
echo ${!x} > /dev/null
$log_next_line
echo "Proba">/tmp/mtmp.txt
$log_next_line
touch ${!x}.txt
$log_next_line
if [ $(( ${#a} + 6 )) -gt 10 ]; then echo "Too long string"; fi
$log_next_line
echo "\$a and \$x">/dev/null
$log_next_line
echo $x
$log_next_line
ls -l
$log_next_line
mkdir /ddad/adad/dad #Generates an error
The output (`mylog.log):
[13:39:51 adam#adam-N56VZ:/home/Adama-docs/Adam/Adam/linux/tmp/log/log-test-case.sh:14] $ echo 'KUKU!'
KUKU!
[13:39:51 adam#adam-N56VZ:/home/Adama-docs/Adam/Adam/linux/tmp/log/log-test-case.sh:16] $ echo a
a
[13:39:51 adam#adam-N56VZ:/home/Adama-docs/Adam/Adam/linux/tmp/log/log-test-case.sh:18] $ echo example
example
[13:39:51 adam#adam-N56VZ:/home/Adama-docs/Adam/Adam/linux/tmp/log/log-test-case.sh:20] $ echo example
[13:39:51 adam#adam-N56VZ:/home/Adama-docs/Adam/Adam/linux/tmp/log/log-test-case.sh:22] $ echo 1,2,3
[13:39:51 adam#adam-N56VZ:/home/Adama-docs/Adam/Adam/linux/tmp/log/log-test-case.sh:24] $ touch example.txt
[13:39:51 adam#adam-N56VZ:/home/Adama-docs/Adam/Adam/linux/tmp/log/log-test-case.sh:26] $ '[' 13 -gt 10 ']'
+ echo 'Too long string'
Too long string
[13:39:51 adam#adam-N56VZ:/home/Adama-docs/Adam/Adam/linux/tmp/log/log-test-case.sh:28] $ echo '$a and $x'
[13:39:51 adam#adam-N56VZ:/home/Adama-docs/Adam/Adam/linux/tmp/log/log-test-case.sh:30] $ echo a
a
[13:39:51 adam#adam-N56VZ:/home/Adama-docs/Adam/Adam/linux/tmp/log/log-test-case.sh:32] $ ls -l
total 12
-rw-rw-r-- 1 adam adam 0 gru 4 13:39 example.txt
lrwxrwxrwx 1 adam adam 66 gru 4 13:29 log.sh -> /home/Adama-docs/Adam/Adam/MyDocs/praca/Puppet/bootstrap/common.sh
-rwxrwxr-x 1 adam adam 520 gru 4 13:29 log-test-case.sh
-rw-rw-r-- 1 adam adam 995 gru 4 13:39 mylog.log
[13:39:51 adam#adam-N56VZ:/home/Adama-docs/Adam/Adam/linux/tmp/log/log-test-case.sh:34] $ mkdir /ddad/adad/dad
mkdir: cannot create directory ‘/ddad/adad/dad’: No such file or directory
## Exit status: 1
The standard output is unchanged.
Limitations
Limitations are serious, unfortunately.
Exit code of logged command gets discarded
First of all, the exit code of the logged command is discarded, so user cannot test for it in the next statement. The current code exits the script if there was an error (which I believe is the best behavior). It is possible to modify the script to test
Limited support for bash tracing
The function honors bash tracing with -x. If it finds that the user traces output, it temporarily disables the output (as it would interfere with the trace anyway), and restores it back at the end. Unfortunately, it also appends a few extra lines to the trace.
Unless user turns off logging (with $logoff) there is a considerable speed penalty for all commands after the first $log_next_line, even if no logging takes place.
In ideal world the function should disable debug trapping (trap - DEBUG) after each invocation. Unfortunately I don't know how to do it, so beginning with the first $log_next_line macro, interpretation of each line invokes a custom function.
I use this function before every key command in my complex bootstrapping scripts. With it I can see what exactly and when was executed and what was the output, without the need to really understand the logic of the lengthy and sometimes messy scripts.

Bash variable change doesn't persist

I have a short bash script to check to see if a Python program is running. The program writes out a PID file when it runs, so comparing this to the current list of running processes gives me what I need. But I'm having a problem with a variable being changed and then apparently changing back! Here's the script:
#!/bin/bash
# Test whether Home Server is currently running
PIDFILE=/tmp/montSvr.pid
isRunning=0
# does a pid file exist?
if [ -f "$PIDFILE" ]; then
# pid file exists
# now get contents of pid file
cat $PIDFILE | while read PID; do
if [ $PID != "" ]; then
PSGREP=$(ps -A | grep $PID | awk '{print $1}')
if [ -n "$PSGREP" ]; then
isRunning=1
echo "RUNNING: $isRunning"
fi
fi
done
fi
echo "Running: $isRunning"
exit $isRunning
The output I get, when the Python script is running, is:
RUNNING: 1
Running: 0
And the exit value of the bash script is 0. So isRunning is getting changed within all those if statements (ie, the code is performing as expected), but then somehow isRunning reverts to 0 again. Confused...
Commands after a pipe | are run in a subshell. Changes to variable values in a subshell do not propagate to the parent shell.
Solution: change your loop to
while read PID; do
# ...
done < $PIDFILE
It's the pipe that is the problem. Using a pipe in this way means that the loop runs in a sub-shell, with its own environment. Kill the cat, use this syntax instead:
while read PID; do
if [ $PID != "" ]; then
PSGREP=$(ps -A | grep $PID | awk '{print $1}')
if [ -n "$PSGREP" ]; then
isRunning=1
echo "RUNNING: $isRunning"
fi
fi
done < "$PIDFILE"

Loop shell script until successful log message

I am trying to get a shell script to recognize when an app instance has come up. That way it can continue issuing commands.
I've been thinking it would be something like this:
#/bin/bash
startApp.sh
while [ `tail -f server.log` -ne 'regex line indicating success' ]
do
sleep 5
done
echo "App up"
But, even if this worked, it wouldn't address some concerns:
What if the app doesn't come up, how long will it wait
What if there is an error when bringing the app up
How can I capture the log line and echo it
Am I close, or is there a better way? I imagine this is something that other admins have had to overcome.
EDIT:
I found this on super user
https://superuser.com/questions/270529/monitoring-a-file-until-a-string-is-found
tail -f logfile.log | while read LOGLINE
do
[[ "${LOGLINE}" == *"Server Started"* ]] && pkill -P $$ tail
done
My only problem with this is that it might never exit. Is there a way to add in a maximum time?
Ok the first answer was close, but didn't account for everything I thought could happen.
I adapted the code from this link:
Ending tail -f started in a shell script
Here's what I came up with:
#!/bin/bash
instanceDir="/usr/username/server.name"
serverLogFile="$instanceDir/server/app/log/server.log"
function stopServer() {
touch ${serverLogFile}
# 3 minute timeout.
sleep 180 &
local timerPid=$!
tail -n0 -F --pid=${timerPid} ${serverLogFile} | while read line
do
if echo ${line} | grep -q "Shutdown complete"; then
echo 'Server Stopped'
# stop the timer..
kill ${timerPid} > /dev/null 2>&1
fi
done &
echo "Stoping Server."
$instanceDir/bin/stopserver.sh > /dev/null 2>&1
# wait for the timer to expire (or be killed)
wait %sleep
}
function startServer() {
touch ${serverLogFile}
# 3 minute timeout.
sleep 180 &
local timerPid=$!
tail -n0 -F --pid=${timerPid} ${serverLogFile} | while read line
do
if echo ${line} | grep -q "server start complete"; then
echo 'Server Started'
# stop the timer..
kill ${timerPid} > /dev/null 2>&1
fi
done &
echo "Starting Server."
$instanceDir/bin/startserver.sh > /dev/null 2>&1 &
# wait for the timer to expire (or be killed)
wait %sleep
}
stopServer
startServer
Well, tail -f won't ever exit, so that's not what you want.
numLines=10
timeToSleep=5
until tail -n $numLines server.log | grep -q "$serverStartedPattern"; do
sleep $timeToSleep
done
Be sure that $numLines is greater than the number of lines that might show up during $timeToSleep when the server has come up.
This will continue forever; if you want to only allow so much time, you could put a cap on the number of loop iterations with something like this:
let maxLoops=60 numLines=10 timeToSleep=5 success=0
for (( try=0; try < maxLoops; ++try )); do
if tail -n $numLines server.log | grep -q "$serverStartedPattern"; then
echo "Server started!"
success=1
break
fi
sleep $timeToSleep
done
if (( success )); then
echo "Server started!"
else
echo "Server never started!"
fi
exit $(( 1-success ))

How to terminate script's process tree in Cygwin bash from bash script

I have a Cygwin bash script that I need to watch and terminate under certain conditions - specifically, after a certain file has been created. I'm having difficulty figuring out how exactly to terminate the script with the same level of completeness that Ctrl+C does, however.
Here's a simple script (called test1) that does little more than wait around to be terminated.
#!/bin/bash
test -f kill_me && rm kill_me
touch kill_me
tail -f kill_me
If this script is run in the foreground, Ctrl+C will terminate both the tail and the script itself. If the script is run in the background, a kill %1 (assuming it is job 1) will also terminate both tail and the script.
However, when I try to do the same thing from a script, I'm finding that only the bash process running the script is terminated, while tail hangs around disconnected from its parent. Here's one way I tried (test2):
#!/bin/bash
test -f kill_me && rm kill_me
(
touch kill_me
tail -f kill_me
) &
while true; do
sleep 1
test -f kill_me && {
kill %1
exit
}
done
If this is run, the bash subshell running in the background is terminated OK, but tail still hangs around.
If I use an explicitly separate script, like this, it still doesn't work (test3):
#!/bin/bash
test -f kill_me && rm kill_me
# assuming test1 above is included in the same directory
./test1 &
while true; do
sleep 1
test -f kill_me && {
kill %1
exit
}
done
tail is still hanging around after this script is run.
In my actual case, the process creating files is not particularly instrumentable, so I can't get it to terminate of its own accord; by finding out when it has created a particular file, however, I can at that point know that it's OK to terminate it. Unfortunately, I can't use a simple killall or equivalent, as there may be multiple instances running, and I only want to kill the specific instance.
/bin/kill (the program, not the bash builtin) interprets a negative PID as “kill the process group” which will get all the children too.
Changing
kill %1
to
/bin/kill -- -$$
works for me.
Adam's link put me in a direction that will solve the problem, albeit not without some minor caveats.
The script doesn't work unmodified under Cygwin, so I rewrote it, and with a couple more options. Here's my version:
#!/bin/bash
function usage
{
echo "usage: $(basename $0) [-c] [-<sigspec>] <pid>..."
echo "Recursively kill the process tree(s) rooted by <pid>."
echo "Options:"
echo " -c Only kill children; don't kill root"
echo " <sigspec> Arbitrary argument to pass to kill, expected to be signal specification"
exit 1
}
kill_parent=1
sig_spec=-9
function do_kill # <pid>...
{
kill "$sig_spec" "$#"
}
function kill_children # pid
{
local target=$1
local pid=
local ppid=
local i
# Returns alternating ids: first is pid, second is parent
for i in $(ps -f | tail +2 | cut -b 10-24); do
if [ ! -n "$pid" ]; then
# first in pair
pid=$i
else
# second in pair
ppid=$i
(( ppid == target && pid != $$ )) && {
kill_children $pid
do_kill $pid
}
# reset pid for next pair
pid=
fi
done
}
test -n "$1" || usage
while [ -n "$1" ]; do
case "$1" in
-c)
kill_parent=0
;;
-*)
sig_spec="$1"
;;
*)
kill_children $1
(( kill_parent )) && do_kill $1
;;
esac
shift
done
The only real downside is the somewhat ugly message that bash prints out when it receives a fatal signal, namely "Terminated", "Killed" or "Interrupted" (depending on what you send). However, I can live with that in batch scripts.
This script looks like it'll do the job:
#!/bin/bash
# Author: Sunil Alankar
##
# recursive kill. kills the process tree down from the specified pid
#
# foreach child of pid, recursive call dokill
dokill() {
local pid=$1
local itsparent=""
local aprocess=""
local x=""
# next line is a single line
for x in `/bin/ps -f | sed -e '/UID/d;s/[a-zA-Z0-9_-]\{1,\}
\{1,\}\([0-9]\{1,\}\) \{1,\}\([0-9]\{1,\}\) .*/\1 \2/g'`
do
if [ "$aprocess" = "" ]; then
aprocess=$x
itsparent=""
continue
else
itsparent=$x
if [ "$itsparent" = "$pid" ]; then
dokill $aprocess
fi
aprocess=""
fi
done
echo "killing $1"
kill -9 $1 > /dev/null 2>&1
}
case $# in
1) PID=$1
;;
*) echo "usage: rekill <top pid to kill>";
exit 1;
;;
esac
dokill $PID

Resources