Introduce timeout in a bash for-loop - bash

I have a task that is very well inside of a bash for loop. The situation is though, that a few of the iterations seem to not terminate. What I'm looking for is a way to introduce a timeout that if that iteration of command hasn't terminated after e.g. two hours it will terminate, and move on to the next iteration.
Rough outline:
for somecondition; do
while time-run(command) < 2h do
continue command
done
done

One (tedious) way is to start the process in the background, then start another background process that attempts to kill the first one after a fixed timeout.
timeout=7200 # two hours, in seconds
for somecondition; do
command & command_pid=$!
( sleep $timeout & wait; kill $command_pid 2>/dev/null) & sleep_pid=$!
wait $command_pid
kill $sleep_pid 2>/dev/null # If command completes prior to the timeout
done
The wait command blocks until the original command completes, whether naturally or because it was killed after the sleep completes. The wait immediately after sleep is used in case the user tries to interrupt the process, since sleep ignores most signals, but wait is interruptible.

If I'm understanding your requirement properly, you have a process that needs to run, but you want to make sure that if it gets stuck it moves on, right? I don't know if this will fully help you out, but here is something I wrote a while back to do something similar (I've since improved this a bit, but I only have access to a gist at present, I'll update with the better version later).
#!/bin/bash
######################################################
# Program: logGen.sh
# Date Created: 22 Aug 2012
# Description: parses logs in real time into daily error files
# Date Updated: N/A
# Developer: #DarrellFX
######################################################
#Prefix for pid file
pidPrefix="logGen"
#output direcory
outDir="/opt/Redacted/logs/allerrors"
#Simple function to see if running on primary
checkPrime ()
{
if /sbin/ifconfig eth0:0|/bin/grep -wq inet;then isPrime=1;else isPrime=0;fi
}
#function to kill previous instances of this script
killScript ()
{
/usr/bin/find /var/run -name "${pidPrefix}.*.pid" |while read pidFile;do
if [[ "${pidFile}" != "/var/run/${pidPrefix}.${$}.pid" ]];then
/bin/kill -- -$(/bin/cat ${pidFile})
/bin/rm ${pidFile}
fi
done
}
#Check to see if primary
#If so, kill any previous instance and start log parsing
#If not, just kill leftover running processes
checkPrime
if [[ "${isPrime}" -eq 1 ]];then
echo "$$" > /var/run/${pidPrefix}.$$.pid
killScript
commands && commands && commands #Where the actual command to run goes.
else
killScript
exit 0
fi
I then set this script to run on cron every hour. Every time the script is run, it
creates a lock file named after a variable that describes the script that contains the pid of that instance of the script
calls the function killScript which:
uses the find command to find all lock files for that version of the script (this lets more than one of these scripts be set to run in cron at once, for different tasks). For each file it finds, it kills the processes of that lock file and removes the lock file (it automatically checks that it's not killing itself)
Starts doing whatever it is I need to run and not get stuck (I've omitted that as it's hideous bash string manipulation that I've since redone in python).
If this doesn't get you squared let me know.
A few notes:
the checkPrime function is poorly done, and should either return a status, or just exit the script itself
there are better ways to create lock files and be safe about it, but this has worked for me thus far (famous last words)

Related

wait command not working on parent process [duplicate]

Context:
Users provide me their custom scripts to run. These scripts can be of any sort like scripts to start multiple GUI programs, backend services. I have no control over how the scripts are written. These scripts can be of blocking type i.e. execution waits till all the child processes (programs that are run sequentially) exit
#exaple of blocking script
echo "START"
first_program
second_program
echo "DONE"
or non blocking type i.e. ones that fork child process in the background and exit something like
#example of non-blocking script
echo "START"
first_program &
second_program &
echo "DONE"
What am I trying to achieve?
User provided scripts can be of any of the above two types or mix of both. My job is to run the script and wait till all the processes started by it exit and then shutdown the node. If its of blocking type, case is plain simple i.e. get the PID of script execution process and wait till ps -ef|grep -ef PID has no more entries. Non-blocking scripts are the ones giving me trouble
Is there a way I can get list of PIDs of all the child process spawned by execution of a script? Any pointers or hints will be highly appreciated
You can use wait to wait for all the background processes started by userscript to complete. Since wait only works on children of the current shell, you'll need to source their script instead of running it as a separate process.
( source userscript; wait )
Sourcing the script in an explicit subshell should simulate starting a new process closely enough. If not, you can also background the subshell, which forces a new process to be started, then wait for it to complete.
( source userscript; wait ) & wait
ps --ppid $PID will list all child processes of the process with $PID.
You can open a file descriptor that gets inherited by other processes, and then wait until it's no longer in use. This is a low overhead method that usually works fine, though it's possible for processes to work around it if they want:
foo=$(mktemp)
( flock -x 5000; theirscript; ) 5000> "$foo"
flock -x 0 < "$foo"
rm "$foo"
echo "The script and its subprocesses are done"
You can follow all invoked processes using ptrace, such as with strace. This is easier, but has some associated overhead and may not work when scripts invoke suid binaries:
strace -f -e none theirscript
You can use pgrep -P <parent_pid> to get a list of child processes. Example:
IFS=$'\n' read -ra CHILD_PROCS -d '' < <(exec pgrep -P "$1")
And to get the grand-children, simply do the same procedure on each child process.
Check out my blog Bash functions to list and kill or send signals to process trees.
You can use one of those function to properly list all processes spawned under one process. Each has their own method or order of sending signals to process.
The only limitation by those is that process still have to be connected and not orphaned. If you could somehow find a way to group your processes, then that might be your solution.
To simply answer the question that was asked. You could store the process ID of each script you're calling into the same variable:
echo "START"
first_program &
child_process_ids+="$! "
second_program &
child_process_ids+="$! "
echo $child_process_ids
echo "DONE"
$child_process_ids would just be a space delimited string of process Ids. Now, this answers the question asked, however, what I would do would be a bit different. I would call each script from a for loop, store its process ID, then wait on each one in another for loop to finish and inspect each exit code individually. Using the same example, here's what it would look like.
echo "START"
scripts="first_program second_program"
for script in $scripts; do
#Call script and send to background
./$script &
#Store the script's processID that was just sent to the background
child_process_ids+="$! "
done
for child_process_id in $child_process_ids; do
#Pass each processId into the wait command to retrieve its exit
#code and store it in $rc
wait $child_process_id
rc=$?
#Inspect each processes exit code
if [ $rc -ne 0 ]; then
echo "$child_process_id failed with an exit code of $rc"
else
echo "$child_process_id was successful"
fi
done

pause for loop until grep pattern from command meets condition (bash script)

I want to send multiple jobs to a remote computer. Therefore I wrote a for loop which iterates over i jobs which consist of several subcommands. I need to pause the subsequent iteration until a certain subcommand is executed and the job actually runs on the remote computer.
So the idea is to check whether the string "PEND" appears in the output of a command on the remote computer. I want the for loop to continue when "PEND" changes to "RUN". I don't know whether the if statement is the right thing to use here. A fixed waiting time by using sleep wouldn't do the trick as the status change from PEND to RUN is highly irregular.
Additional information: The subcommands comprise compilation of an executable.
Erroneous pseudocode:
for i in {1..10}
do
subcommands
...
if [[ jobs | grep "PEND" == TRUE ]]; then sleep 1
fi
done

How to make bash interpreter stop until a command is finished?

I have a bash script with a loop that calls a hard calculation routine every iteration. I use the results from every calculation as input to the next. I need make bash stop the script reading until every calculation is finished.
for i in $(cat calculation-list.txt)
do
./calculation
(other commands)
done
I know the sleep program, and i used to use it, but now the time of the calculations varies greatly.
Thanks for any help you can give.
P.s>
The "./calculation" is another program, and a subprocess is opened. Then the script passes instantly to next step, but I get an error in the calculation because the last is not finished yet.
If your calculation daemon will work with a precreated empty logfile, then the inotify-tools package might serve:
touch $logfile
inotifywait -qqe close $logfile & ipid=$!
./calculation
wait $ipid
(edit: stripped a stray semicolon)
if it closes the file just once.
If it's doing an open/write/close loop, perhaps you can mod the daemon process to wrap some other filesystem event around the execution? `
#!/bin/sh
# Uglier, but handles logfile being closed multiple times before exit:
# Have the ./calculation start this shell script, perhaps by substituting
# this for the program it's starting
trap 'echo >closed-on-calculation-exit' 0 1 2 3 15
./real-calculation-daemon-program
Well, guys, I've solved my problem with a different approach. When the calculation is finished a logfile is created. I wrote then a simple until loop with a sleep command. Although this is very ugly, it works for me and it's enough.
for i in $(cat calculation-list.txt)
do
(calculations routine)
until [[ -f $logfile ]]; do
sleep 60
done
(other commands)
done
Easy. Get the process ID (PID) via some awk magic and then use wait too wait for that PID to end. Here are the details on wait from the advanced Bash scripting guide:
Suspend script execution until all jobs running in background have
terminated, or until the job number or process ID specified as an
option terminates. Returns the exit status of waited-for command.
You may use the wait command to prevent a script from exiting before a
background job finishes executing (this would create a dreaded orphan
process).
And using it within your code should work like this:
for i in $(cat calculation-list.txt)
do
./calculation >/dev/null 2>&1 & CALCULATION_PID=(`jobs -l | awk '{print $2}'`);
wait ${CALCULATION_PID}
(other commands)
done

How to wait on all child (and grandchild etc) process spawned by a script

Context:
Users provide me their custom scripts to run. These scripts can be of any sort like scripts to start multiple GUI programs, backend services. I have no control over how the scripts are written. These scripts can be of blocking type i.e. execution waits till all the child processes (programs that are run sequentially) exit
#exaple of blocking script
echo "START"
first_program
second_program
echo "DONE"
or non blocking type i.e. ones that fork child process in the background and exit something like
#example of non-blocking script
echo "START"
first_program &
second_program &
echo "DONE"
What am I trying to achieve?
User provided scripts can be of any of the above two types or mix of both. My job is to run the script and wait till all the processes started by it exit and then shutdown the node. If its of blocking type, case is plain simple i.e. get the PID of script execution process and wait till ps -ef|grep -ef PID has no more entries. Non-blocking scripts are the ones giving me trouble
Is there a way I can get list of PIDs of all the child process spawned by execution of a script? Any pointers or hints will be highly appreciated
You can use wait to wait for all the background processes started by userscript to complete. Since wait only works on children of the current shell, you'll need to source their script instead of running it as a separate process.
( source userscript; wait )
Sourcing the script in an explicit subshell should simulate starting a new process closely enough. If not, you can also background the subshell, which forces a new process to be started, then wait for it to complete.
( source userscript; wait ) & wait
ps --ppid $PID will list all child processes of the process with $PID.
You can open a file descriptor that gets inherited by other processes, and then wait until it's no longer in use. This is a low overhead method that usually works fine, though it's possible for processes to work around it if they want:
foo=$(mktemp)
( flock -x 5000; theirscript; ) 5000> "$foo"
flock -x 0 < "$foo"
rm "$foo"
echo "The script and its subprocesses are done"
You can follow all invoked processes using ptrace, such as with strace. This is easier, but has some associated overhead and may not work when scripts invoke suid binaries:
strace -f -e none theirscript
You can use pgrep -P <parent_pid> to get a list of child processes. Example:
IFS=$'\n' read -ra CHILD_PROCS -d '' < <(exec pgrep -P "$1")
And to get the grand-children, simply do the same procedure on each child process.
Check out my blog Bash functions to list and kill or send signals to process trees.
You can use one of those function to properly list all processes spawned under one process. Each has their own method or order of sending signals to process.
The only limitation by those is that process still have to be connected and not orphaned. If you could somehow find a way to group your processes, then that might be your solution.
To simply answer the question that was asked. You could store the process ID of each script you're calling into the same variable:
echo "START"
first_program &
child_process_ids+="$! "
second_program &
child_process_ids+="$! "
echo $child_process_ids
echo "DONE"
$child_process_ids would just be a space delimited string of process Ids. Now, this answers the question asked, however, what I would do would be a bit different. I would call each script from a for loop, store its process ID, then wait on each one in another for loop to finish and inspect each exit code individually. Using the same example, here's what it would look like.
echo "START"
scripts="first_program second_program"
for script in $scripts; do
#Call script and send to background
./$script &
#Store the script's processID that was just sent to the background
child_process_ids+="$! "
done
for child_process_id in $child_process_ids; do
#Pass each processId into the wait command to retrieve its exit
#code and store it in $rc
wait $child_process_id
rc=$?
#Inspect each processes exit code
if [ $rc -ne 0 ]; then
echo "$child_process_id failed with an exit code of $rc"
else
echo "$child_process_id was successful"
fi
done

How to switch a sequence of tasks to background?

I'm running two tests on a remote server, here is the command I used several hours ago:
% ./test1.sh; ./test2.sh
The two tests are supposed to run one by one.If the second runs before the first completes, everything will be in ruin, and I'll have to restart the whole procedure.
The dilemma is, these two tasks cost too many hours to complete, and when I prepare to logout the server and wait for the result. I don't know how to switch both of them to background... If I use Ctrl+Z, only the first task will be suspended, while the second starts doing nothing useful while wiping out current data.
Is it possible to switch both of them to background, preserving their orders? Actually I should make these two tasks in the same process group like (./test1.sh; ./test2.sh) &, but sadly, the first test have run several hours, and it's quite a pity to restart the tests.
An option is to kill the second test before it starts, but is there any mechanism to cope with this?
First rename the ./test2.sh to ./test3.sh. Then do [CTRL+Z], followed by bg and disown -h. Then save this script (test4.sh):
while :; do
sleep 5;
pgrep -f test1.sh &> /dev/null
if [ $? -ne 0 ]; then
nohup ./test3.sh &
break
fi
done
then do: nohup ./test4.sh &.
and you can logout.
First, screen or tmux are your friends here, if you don't already work with them (they make remote machine work an order of magnitude easier).
To use conditional consecutive execution you can write:
./test1.sh && ./test2.sh
which will only execute test2.sh if test1.sh returns with 0 (conventionally meaning: no error). Example:
$ true && echo "first command was successful"
first command was successful
$ ! true && echo "ain't gonna happen"
More on control operators: http://www.humbug.in/docs/the-linux-training-book/ch08s01.html

Resources