Bash function hangs once conditions are met - bash

All,
I am trying to run a bash script that kicks off several sub processes. The processes redirect to their own log files and I must kick them off in parallel. To do this i have written a check_procs procedure, that monitors for the number of processes using the same parent PID. Once the number reaches 1 again, the script should continue. However, it seems to just hang. I am not sure why, but the code is below:
check_procs() {
while true; do
mypid=$$
backup_procs=`ps -eo ppid | grep -w $mypid | wc -w`
until [ $backup_procs == 1 ]; do
echo $backup_procs
sleep 5
backup_procs=`ps -eo ppid | grep -w $mypid | wc -w`
done
done
}
This function is called after the processes are kicked off, and I can see it echoing out the number of processes, but then the echoing stops (suggesting the function has completed since the process count is now 1, but then nothing happens, and I can see the script is still in the process list of the server. I have to kill it off manually. The part where the function is called is below:
for ((i=1; i <= $threads; i++)); do
<Some trickery here to generate $cmdfile and $logfile>
nohup rman target / cmdfile=$cmdfile log=$logfile &
x=$(($x+1))
done
check_procs
$threads is a command line parameter passed to the script, and is a small number like 4 or 6. These are kicked off using nohup, as shown. When the IF in check_procs is satisfied, everything hangs instead of executing the remainder of the script. What's wrong with my function?

Maybe I'm mistaken, but it is not expected? Your outer loop runs forever, there is no exit point. Unless the process count increases again the outer loop runs infinitely (without any delay which is not recommended).

Related

While loop that exit when the condition is met

I need to write a while loop in bash script that does exit when the process is ended successfully what I have tried so far is;
VAR=`ps -ef |grep -i tail* |grep -v grep |wc -l`
while true; do
{
if [ $VAR = 0 ]
then
echo "Sending MAils ...."
exit
fi
}
done
use break instead of exit to continue the execution of your script.
Also, there is no need for {.
Your script has numerous errors. Probably try https://shellcheck.net/ before asking for human assistance.
You need to update the value of the variable inside the loop.
You seem to be reinventing pgrep, poorly.
(The regular expression tail* looks for tai, tail, taill, tailll ... What do you actually hope this should do?)
To break out of a loop and continue outside, use break.
The braces around your loop are superfluous. This is shell script, not C or Perl.
You are probably looking for something like
while true; do
if ! pgrep tail; then
echo "Sending mails ...."
break
fi
done
This avoids the use of a variable entirely; if you do need a variable, don't use upper case for your private variables.
Based on information in comments, if you have a number of processes like
tail -2000f /var/log/log.{1..10}
and no way to check their PIDs any longer, you might want to use fuser to tell you when none of them are running any longer:
while true; do
fuser /var/log/log.{1..10} || break
sleep 60
done
echo All processes are gone now.
Unfortunately, fuser does not reliably set its exit code - test on the command line (run tail -f $HOME/.bash_profile in one window and then fuser $HOME/.bash_profile && echo yes in another; then quit the tail and run fuser again. If it still prints yes you need something more.)
On MacOS, I found that fuser -u will print parentheses when the files are still open, and not when not:
while true; do
fuser -u /var/log/log.{1..10} 2>&1 | grep -q '[()]' || break
sleep 60
done
On Debian, fuser is in the package psmisc, and does set its exit code properly. You will probably also want to use the -s option to make it run quietly.
Notice also the addition of a sleep to avoid checking hundreds or thousands of times per second. You would probably want to make the same change to the original solution. How long to wait between iterations depends on how urgently you need the notification, and how heavy the operation to check the condition is. Even sleep 0.1 would be a significant improvement over the sleepless spin lock.

lazy (non-buffered) processing of shell pipeline

I'm trying to figure out how to perform the laziest possible processing of a standard UNIX shell pipeline. For example, let's say I have a command which does some calculations and outputting along the way, but the calculations get more and more expensive so that the first few lines of output arrive quickly but then subsequent lines get slower. If I'm only interested in the first few lines then I want to obtain those via lazy evaluation, terminating the calculations ASAP before they get too expensive.
This can be achieved with a straight-forward shell pipeline, e.g.:
./expensive | head -n 2
However this does not work optimally. Let's simulate the calculations with a script which gets exponentially slower:
#!/bin/sh
i=1
while true; do
echo line $i
sleep $(( i ** 4 ))
i=$(( i+1 ))
done
Now when I pipe this script through head -n 2, I observe the following:
line 1 is output.
After sleeping one second, line 2 is output.
Despite head -n 2 having already received two (\n-terminated) lines and exiting, expensive carries on running and now waits a further 16 seconds (2 ** 4) before completing, at which point the pipeline also completes.
Obviously this is not as lazy as desired, because ideally expensive would terminate as soon as the head process receives two lines. However, this does not happen; IIUC it actually terminates after trying to write its third line, because at this point it tries to write to its STDOUT which is connected through a pipe to STDIN the head process which has already exited and is therefore no longer reading input from the pipe. This causes expensive to receive a SIGPIPE, which causes the bash interpreter running the script to invoke its SIGPIPE handler which by default terminates running the script (although this can be changed via the trap command).
So the question is, how can I make it so that expensive quits immediately when head quits, not just when expensive tries to write its third line to a pipe which no longer has a listener at the other end? Since the pipeline is constructed and managed by the interactive shell process I typed the ./expensive | head -n 2 command into, presumably that interactive shell is the place where any solution for this problem would lie, rather than in any modification of expensive or head? Is there any native trick or extra utility which can construct pipelines with the behaviour I want? Or maybe it's simply impossible to achieve what I want in bash or zsh, and the only way would be to write my own pipeline manager (e.g. in Ruby or Python) which spots when the reader terminates and immediately terminates the writer?
If all you care about is foreground control, you can run expensive in a process substitution; it still blocks until it next tries to write, but head exits immediately (and your script's flow control can continue) after it's received its input
head -n 2 < <(exec ./expensive)
# expensive still runs 16 seconds in the background, but doesn't block your program
In bash 4.4, these store their PIDs in $! and allow process management in the same manner as other background processes.
# REQUIRES BASH 4.4 OR NEWER
exec {expensive_fd}< <(exec ./expensive); expensive_pid=$!
head -n 2 <&"$expensive_fd" # read the content we want
exec {expensive_fd}<&- # close the descriptor
kill "$expensive_pid" # and kill the process
Another approach is a coprocess, which has the advantage of only requiring bash 4.0:
# magic: store stdin and stdout FDs in an array named "expensive", and PID in expensive_PID
coproc expensive { exec ./expensive }
# read two lines from input FD...
head -n 2 <&"${expensive[0]}"
# ...and kill the process.
kill "$expensive_PID"
I'll answer with a POSIX shell in mind.
What you can do is use a fifo instead of a pipe and kill the first link the moment the second finishes.
If the expensive process is a leaf process or if it takes care of killing its children, you can use a simple kill. If it's a process-spawning shell script, you should run it in a process group (doable with set -m) and kill it with a process-group kill.
Example code:
#!/bin/sh -e
expensive()
{
i=1
while true; do
echo line $i
sleep 0.$i #sped it up a little
echo >&2 slept
i=$(( i+1 ))
done
}
echo >&2 NORMAL
expensive | head -n2
#line 1
#slept
#line 2
#slept
echo >&2 SPED-UP
mkfifo pipe
exec 3<>pipe
rm pipe
set -m; expensive >&3 & set +m
<&3 head -n 2
kill -- -$!
#line 1
#slept
#line 2
If you run this, the second run should not have the second slept line, meaning the first link was killed the moment head finished, not when the first link tried to output after head was finished.

BASH - After 'wait', why does 'jobs -p' sometimes show 'Done' for a background process?

The short version: My bash script has a function.
This function then launches several instances (a maximum of 10) of another function in the background (with &).
I keep a count of how many are still active with jobs -p | wc -w in a do loop. When I'm done with the loop, I break.
I then use wait to ensure that all those processes terminate before continuing.
However, when I check the count (with jobs -p) I sometimes find this:
[10] 9311 Done my_background_function_name $param
How can I get wait to only proceed when all the launched child-processes have completely terminated and the jobs list is empty?
Why are jobs sometimes shown with "Done" and sometimes not?
Clearly, my knowledge of how jobs works is deficient. :)
Thanks.
Inside a bash script, it seems that when all jobs has ended, jobs -p still returns the last one finished.
This works for me in bash:
while true; do
sleep 5
jobs_running=($(jobs -l | grep Running | awk '{print $2}'))
if [ ${#jobs_running[#]} -eq 0 ]; then
break
fi
echo "Jobs running: ${jobs_running[#]}"
done
Using the "wait" command you cannot tell when each process ends.
With the previous algorithm you can.

unix wait command for polling

Where I work, I have seen the below snippet in shell scripts to check for the completion of background jobs:
until [[ `ps -ef | grep backgroundjob | grep -v grep | wc -l` -eq 0 ]];
do
sleep 30
done
Having read the man page of wait command, I know these 3 lines can be replaced by wait command in a more short and easily readable way. My questions are:
Are there any disadvantages or scenarios where wait command might
not work as well as the snippet above?
How is wait commandimplemented? It seems to return almost
immediately, so probably a
tight loop? If it is a tight loop, then probably the above snippet
which sleeps for 30 seconds would go easy on the CPU than the wait command?
wait only works for child processes of the current shell. This means that, if your process forked to background itself, the original child process will have exited, and the shell won't be able to wait on it.
Some shells' wait builtin will return immediately if there is no such child process; others, like bash, will warn you:
$ wait 1234
bash: wait: pid 1234 is not a child of this shell
wait doesn't impose any CPU load because it uses the waitpid(2) system call, which pauses the process until the nominated process has exited.

Exit from BASH Infinite loop in a pipeline

I encountered a somewhat strange behavior of BASH infinite loops which outputs are pipelined to another processes. Namely, I run these two commands:
(while true; do echo xxx; done) | head -n 1
(while true; do date; done) | head -n 1
The first one exits instantly while the second one does not (and I assume it would run forever without being killed). I also tried an implicit infinite loop:
yes | head -n 1
and it also exits by itself. An appropriate line of output is immediately printed on the screen in each case. I am just curious what determines if such a commmand will finish.
When head exits, the standard output of the parenthesized expression is closed. If an external command, like date, is used, the loop hangs. If an internal command of bash is used, like echo, the loop exits. For proof, use
(while true; do /bin/echo xxx; done) | head -n 1
and it will hang. If you use
(while true; do date; echo $? 1>&2; sleep 1; done) | head -n 1
you will see that on the second round, the date command returns an error exit code, i.e. something other but zero. Bash obviously does this not take as serious as when an internal command gets into problems. I wonder if this is intended or rather a bug in bash.
To make sure the loop is exited, this seems to work:
(set -e; while true; do date ; done) | head -n 1

Resources