Loop shell script until successful log message - shell

I am trying to get a shell script to recognize when an app instance has come up. That way it can continue issuing commands.
I've been thinking it would be something like this:
#/bin/bash
startApp.sh
while [ `tail -f server.log` -ne 'regex line indicating success' ]
do
sleep 5
done
echo "App up"
But, even if this worked, it wouldn't address some concerns:
What if the app doesn't come up, how long will it wait
What if there is an error when bringing the app up
How can I capture the log line and echo it
Am I close, or is there a better way? I imagine this is something that other admins have had to overcome.
EDIT:
I found this on super user
https://superuser.com/questions/270529/monitoring-a-file-until-a-string-is-found
tail -f logfile.log | while read LOGLINE
do
[[ "${LOGLINE}" == *"Server Started"* ]] && pkill -P $$ tail
done
My only problem with this is that it might never exit. Is there a way to add in a maximum time?

Ok the first answer was close, but didn't account for everything I thought could happen.
I adapted the code from this link:
Ending tail -f started in a shell script
Here's what I came up with:
#!/bin/bash
instanceDir="/usr/username/server.name"
serverLogFile="$instanceDir/server/app/log/server.log"
function stopServer() {
touch ${serverLogFile}
# 3 minute timeout.
sleep 180 &
local timerPid=$!
tail -n0 -F --pid=${timerPid} ${serverLogFile} | while read line
do
if echo ${line} | grep -q "Shutdown complete"; then
echo 'Server Stopped'
# stop the timer..
kill ${timerPid} > /dev/null 2>&1
fi
done &
echo "Stoping Server."
$instanceDir/bin/stopserver.sh > /dev/null 2>&1
# wait for the timer to expire (or be killed)
wait %sleep
}
function startServer() {
touch ${serverLogFile}
# 3 minute timeout.
sleep 180 &
local timerPid=$!
tail -n0 -F --pid=${timerPid} ${serverLogFile} | while read line
do
if echo ${line} | grep -q "server start complete"; then
echo 'Server Started'
# stop the timer..
kill ${timerPid} > /dev/null 2>&1
fi
done &
echo "Starting Server."
$instanceDir/bin/startserver.sh > /dev/null 2>&1 &
# wait for the timer to expire (or be killed)
wait %sleep
}
stopServer
startServer

Well, tail -f won't ever exit, so that's not what you want.
numLines=10
timeToSleep=5
until tail -n $numLines server.log | grep -q "$serverStartedPattern"; do
sleep $timeToSleep
done
Be sure that $numLines is greater than the number of lines that might show up during $timeToSleep when the server has come up.
This will continue forever; if you want to only allow so much time, you could put a cap on the number of loop iterations with something like this:
let maxLoops=60 numLines=10 timeToSleep=5 success=0
for (( try=0; try < maxLoops; ++try )); do
if tail -n $numLines server.log | grep -q "$serverStartedPattern"; then
echo "Server started!"
success=1
break
fi
sleep $timeToSleep
done
if (( success )); then
echo "Server started!"
else
echo "Server never started!"
fi
exit $(( 1-success ))

Related

wait doesn't wait for the processes in the while loop to finish

Here is my code:
count=0
head -n 10 urls.txt | while read LINE; do
curl -o /dev/null -s "$LINE" -w "%{time_total}\n" &
count=$((count+1))
[ 0 -eq $((count % 3)) ] && wait && echo "process wait" # wait for 3 urls
done
echo "before wait"
wait
echo "after wait"
I am expecting the last curl to finish before printing the last echo, but actually it's not the case:
0.595499
0.602349
0.618237
process wait
0.084970
0.084243
0.099969
process wait
0.067999
0.068253
0.081602
process wait
before wait
after wait
➜ Downloads 0.088755 # already exited the script
Does anyone know why it's happening? And how to fix this?
As described in BashFAQ #24, this is caused by your pipeline causing the while loop to be performed in a different shell from the rest of your script.
Consequently, your curls are subprocesses of that subshell, not the outer interpreter; so the outer interpreter cannot wait for them.
This can be resolved by not piping to while read, but instead redirecting its input in a way that doesn't shuffle it into a pipeline element -- as with <(...), a process substitution:
#!/usr/bin/env bash
# ^^^^ - NOT /bin/sh; also, must not start with "sh scriptname"
count=0
while IFS= read -r line; do
curl -o /dev/null -s "$line" -w "%{time_total}\n" &
count=$((count+1))
(( count % 3 == 0 )) && { wait; echo "process wait"; } # wait for 3 urls
done < <(head -n 10 urls.txt)
echo "before wait"
wait
echo "after wait"
why it's happening?
Because you run the processes in the subshell, the parent process can't wait for them.
$ echo | { echo subshell; sleep 100 & }
$ wait # exits immiedately
$
Call wait from the same process the background processes were spawned:
someotherthing | {
while someotherthing; do
something &
done
wait # will wait for something
}
And how to fix this?
I recommend not to use a crude while read loop and use different approach using some tool. Use GNU xargs with -P option to run 3 processes concurently:
head -n 10 urls.txt | xargs -P3 -n1 -d '\n' curl -o /dev/null -w "%{time_total}\n" -s
But you could just use move wait into the subshell as above, or make the while loop to be executed in the parent shell alternatively.

How to detect a non-rolling log file and pattern match in a shell script which is using tail, while, read, and?

I am monitoring a log file and if PATTERN didn't appear in it within THRESHOLD seconds, the script should print "error", otherwise, it should print "clear". The script is working fine, but only if the log is rolling.
I've tried reading 'timeout' but didn't work.
log_file=/tmp/app.log
threshold=120
tail -Fn0 ${log_file} | \
while read line ; do
echo "${line}" | awk '/PATTERN/ { system("touch pattern.tmp") }'
code to calculate how long ago pattern.tmp touched and same is assigned to DIFF
if [ ${diff} -gt ${threshold} ]; then
echo "Error"
else
echo "Clear"
done
It is working as expected only when there is 'any' line printed in the app.log.
If the application got hung for any reason and the log stopped rolling, there won't be any output by the script.
Is there a way to detect the 'no output' of tail and do some command at that time?
It looks like the problem you're having is that the timing calculations inside your while loop never get a chance to run when read is blocking on input. In that case, you can pipe the tail output into a while true loop, inside of which you can do if read -t $timeout:
log_file=/tmp/app.log
threshold=120
timeout=10
tail -Fn0 "$log_file" | while true; do
if read -t $timeout line; then
echo "${line}" | awk '/PATTERN/ { system("touch pattern.tmp") }'
fi
# code to calculate how long ago pattern.tmp touched and same is assigned to diff
if [ ${diff} -gt ${threshold} ]; then
echo "Error"
else
echo "Clear"
fi
done
As Ed Morton pointed out, all caps variable names are not a good idea in bash scripts, so I used lowercase variable names.
How about something simple like:
sleep "$threshold"
grep -q 'PATTERN' "$log_file" && { echo "Clear"; exit; }
echo "Error"
If that's not all you need then edit your question to clarify your requirements. Don't use all upper case for non exported shell variable names btw - google it.
To build further on your idea, it might be beneficial to run the awk part in the background and a continuous loop to do the checking.
#!/usr/bin/env bash
log_file="log.txt"
# threshold in seconds
threshold=10
# run the following process in the background
stdbuf -oL tail -f0n "$log_file" \
| awk '/PATTERN/{system("touch "pattern.tmp") }' &
while true; do
match=$(find . -type f -iname "pattern.tmp" -newermt "-${threshold} seconds")
if [[ -z "${match}" ]]; then
echo "Error"
else
echo "Clear"
fi
done
This looks to me like a watchdog timer. I've implemented something like this by forcing a background process to update my log, so I don't have to worry about read -t. Here's a working example:
#!/usr/bin/env bash
threshold=10
grain=2
errorstate=0
while sleep "$grain"; do
date '+[%F %T] watchdog timer' >> log
done &
trap "kill -HUP $!" 0 HUP INT QUIT TRAP ABRT TERM
printf -v lastseen '%(%s)T'
tail -F log | while read line; do
printf -v now '%(%s)T'
if (( now - lastseen > threshold )); then
echo "ERROR"
errorstate=1
else
if (( errorstate )); then
echo "Recovered, yay"
errorstate=0
fi
fi
if [[ $line =~ .*PATTERN.* ]]; then
lastseen=$now
fi
done
Run this in one window, wait $threshold seconds for it to trigger, then in another window echo PATTERN >> log to see the recovery.
While this can be made as granular as you like (I've set it to 2 seconds in the example), it does pollute your log file.
Oh, and note that printf '%(%s)T' format requires bash version 4 or above.

bash sh script nohup excuting not complete?

i have a problem, plese watch this code. (j_restart.sh file)
#!/bin/bash
printf "Killing j-Chat server script... "
nyret=`pkill -f index.php`
printf "OK !\n"
printf "Wait killing instances."
while : ; do
nyret=`netstat -ap | grep :8008 | wc -l`
if [ "$nyret" == "0" ]; then
printf "OK !\n"
break
fi
printf "."
sleep 3
done
echo "Runing j-Chat server script... "
nyret=`nohup php -q /home/jChat/public_html/index.php < /dev/null &`
echo "OK !"
echo "j-Chat Server Working ON !";
ssh return val :
root#server [~]# sh /home/jChat/public_html/j_restart.sh
Killing jChat Server Script... OK !
Wait killing instances................ OK !
Runing jChat Server Script...
nohup: redirecting stderr to stdout
(and waiting not jump next line..)
im press manualy ctrl+c keys
^C
root#server [~]#
How to fix this problem ? why not working complete ? Stop and wait line 16...how to countinue next line 17 and 18... ?? Help me please..
Here's a simpler example reproducing your problem:
nyret=`nohup sleep 30 < /dev/null &`
echo "This doesn't run (until sleep exits)"
The problem is the shell is waiting to capture all output from your command. It runs in the background, but it still keeps the pipe open, so the shell waits.
The solution is to not capture the output, because you don't use it anyways:
nohup sleep 30 < /dev/null &
echo "This runs fine"

Conditional variables in bash script?

I'm not used to writing code in bash but I'm self teaching myself. I'm trying to create a script that will query info from the process list. I've done that but I want to take it further and make it so:
The script runs with one set of commands if A OS is present.
The script runs with a different set of commands if B OS is present.
Here's what I have so far. It works on my Centos distro but won't work on my Ubuntu. Any help is greatly appreciated.
#!/bin/bash
pid=$(ps -eo pmem,pid | sort -nr -k 1 | cut -d " " -f 2 | head -1)
howmany=$(lsof -l -n -p $pid | wc -l)
nameofprocess=$(ps -eo pmem,fname | sort -nr -k 1 | cut -d " " -f 2 | head -1)
percent=$(ps -eo pmem,pid,fname | sort -k 1 -nr | head -1 | cut -d " " -f 1)
lsof -l -n -p $pid > ~/`date "+%Y-%m-%d-%H%M"`.process.log 2>&1
echo " "
echo "$nameofprocess has $howmany files open, and is using $percent"%" of memory."
echo "-----------------------------------"
echo "A log has been created in your home directory"
echo "-----------------------------------"
echo " "
echo ""$USER", do you want to terminate? (y/n)"
read yn
case $yn in
[yY] | [yY][Ee][Ss] )
kill -15 $pid
;;
[nN] | [n|N][O|o] )
echo "Not killing. Powering down."
echo "......."
sleep 2
;;
*) echo "Does not compute"
;;
esac
Here's my version of your script. It works with Ubuntu and Debian. It's probably safer than yours in some regards (I clearly had a bug in yours when a process takes more than 10% of memory, due to your awkward cut). Moreover, your ps are not "atomic", so things can change between different calls of ps.
#!/bin/bash
read percent pid nameofprocess < <(ps -eo pmem,pid,fname --sort=-pmem h)
mapfile -t openfiles < <(lsof -l -n -p $pid)
howmany=${#openfiles[#]}
printf '%s\n' "${openfiles[#]}" > ~/$(date "+%Y-%m-%d-%H%M.process.log")
cat <<EOF
$nameofprocess has $howmany files open, and is using $percent% of memory.
-----------------------------------
A log has been created in your home directory
-----------------------------------
EOF
read -p "$USER, do you want to terminate? (y/n) "
case $REPLY in
[yY] | [yY][Ee][Ss] )
kill -15 $pid
;;
[nN] | [n|N][O|o] )
echo "Not killing. Powering down."
echo "......."
sleep 2
;;
*) echo "Does not compute"
;;
esac
First, check that your version of ps has the --sort flag and the h option:
--sort=-pmem tells ps to sort wrt decreasing pmem
h tells ps to not show any header
All this is given to the read bash builtin, which reads space-separated fields, here the fields pmem, pid, fname and puts these values in the corresponding variables percent, pid and nameofprocess.
The mapfile command reads standard input (here the output of the lsof command) and puts each line in an array field. The size of this array is computed by the line howmany=${#openfiles[#]}. The output of lsof, as stored in the array openfiles is output to the corresponing file.
Then, instead of the many echos, we use a cat <<EOF, and then the read is use with the -p (prompt) option.
I don't know if this really answers your question, but at least, you have a well-written bash script, with less multiple useless command calls (until your case statement, you called 16 processes, I only called 4). Moreover, after the first ps call, things can change in your script (even though it's very unlikely to happen), not in mine.
You might also like the following which doesn't put the output of lsof in an array, but uses an extra wc command:
#!/bin/bash
read percent pid nameofprocess < <(ps -eo pmem,pid,fname --sort=-pmem h)
logfilename="~/$(date "+%Y-%m-%d-%H%M.process.log")
lsof -l -n -p $pid > "$logfilename"
howmany=$(wc -l < "$logfilename")
cat <<EOF
$nameofprocess has $howmany files open, and is using $percent% of memory.
-----------------------------------
A log has been created in your home directory ($logfilename)
-----------------------------------
EOF
read -p "$USER, do you want to terminate? (y/n) "
case $REPLY in
[yY] | [yY][Ee][Ss] )
kill -15 $pid
;;
[nN] | [n|N][O|o] )
echo "Not killing. Powering down."
echo "......."
sleep 2
;;
*) echo "Does not compute"
;;
esac
You could achieve this for example by (update)
#!/bin/bash
# place distribution independent code here
# dist=$(lsb_release -is)
if [[ -f /etc/redheat-release ]];
then # this is a RedHead based distribution like centos, fedora, ...
dist="redhead"
elif [[ -f /etc/issue.net ]];
then
# dist=$(cat /etc/issue.net | cut -d' ' -f1) # debian, ubuntu, ...
dist="ubuntu"
else
dist="unknown"
fi
if [[ $dist == "ubuntu" ]];
then
# use your ubuntu command set
elif [[ $dist == "redhead" ]];
then
# use your centos command set
else
# do some magic here
fi
# place distribution independent code here

Determining if process is running using pgrep

I have a script that I only want to be running one time. If the script gets called a second time I'm having it check to see if a lockfile exists. If the lockfile exists then I want to see if the process is actually running.
I've been messing around with pgrep but am not getting the expected results:
#!/bin/bash
COUNT=$(pgrep $(basename $0) | wc -l)
PSTREE=$(pgrep $(basename $0) ; pstree -p $$)
echo "###"
echo $COUNT
echo $PSTREE
echo "###"
echo "$(basename $0) :" `pgrep -d, $(basename $0)`
echo sleeping.....
sleep 10
The results I'm getting are:
$ ./test.sh
###
2
2581 2587 test.sh(2581)---test.sh(2587)---pstree(2591)
###
test.sh : 2581
sleeping.....
I don't understand why I'm getting a "2" when only one process is actually running.
Any ideas? I'm sure it's the way I'm calling it. I've tried a number of different combinations and can't quite seem to figure it out.
SOLUTION:
What I ended up doing was doing this (portion of my script):
function check_lockfile {
# Check for previous lockfiles
if [ -e $LOCKFILE ]
then
echo "Lockfile $LOCKFILE already exists. Checking to see if process is actually running...." >> $LOGFILE 2>&1
# is it running?
if [ $(ps -elf | grep $(cat $LOCKFILE) | grep $(basename $0) | wc -l) -gt 0 ]
then
abort "ERROR! - Process is already running at PID: $(cat $LOCKFILE). Exitting..."
else
echo "Process is not running. Removing $LOCKFILE" >> $LOGFILE 2>&1
rm -f $LOCKFILE
fi
else
echo "Lockfile $LOCKFILE does not exist." >> $LOGFILE 2>&1
fi
}
function create_lockfile {
# Check for previous lockfile
check_lockfile
#Create lockfile with the contents of the PID
echo "Creating lockfile with PID:" $$ >> $LOGFILE 2>&1
echo -n $$ > $LOCKFILE
echo "" >> $LOGFILE 2>&1
}
# Acquire lock file
create_lockfile >> $LOGFILE 2>&1 \
|| echo "ERROR! - Failed to acquire lock!"
The argument for pgrep is an extended regular expression pattern.
In you case the command pgrep $(basename $0) will evaluate to pgrep test.sh which will match match any process that has test followed by any character and lastly followed by sh. So it wil match btest8sh, atest_shell etc.
You should create a lock file. If the lock file exists program should exit.
lock=$(basename $0).lock
if [ -e $lock ]
then
echo Process is already running with PID=`cat $lock`
exit
else
echo $$ > $lock
fi
You are already opening a lock file. Use it to make your life easier.
Write the process id to the lock file. When you see the lock file exists, read it to see what process id it is supposedly locking, and check to see if that process is still running.
Then in version 2, you can also write program name, program arguments, program start time, etc. to guard against the case where a new process starts with the same process id.
Put this near the top of your script...
pid=$$
script=$(basename $0)
guard="/tmp/$script-$(id -nu).pid"
if test -f $guard ; then
echo >&2 "ERROR: Script already runs... own PID=$pid"
ps auxw | grep $script | grep -v grep >&2
exit 1
fi
trap "rm -f $guard" EXIT
echo $pid >$guard
And yes, there IS a small window for a race condition between the test and echo commands, which can be fixed by appending to the guard file, and then checking that the first line is indeed our own PID. Also, the diagnostic output in the if can be commented out in a production version.

Resources