How do I terminate all the subshell processes? - bash

I have a bash script to test how a server performs under load.
num=1
if [ $# -gt 0 ]; then
num=$1
fi
for i in {1 .. $num}; do
(while true; do
{ time curl --silent 'http://localhost'; } 2>&1 | grep real
done) &
done
wait
When I hit Ctrl-C, the main process exits, but the background loops keep running. How do I make them all exit? Or is there a better way of spawning a configurable number of logic loops executing in parallel?

Here's a simpler solution -- just add the following line at the top of your script:
trap "kill 0" SIGINT
Killing 0 sends the signal to all processes in the current process group.

One way to kill subshells, but not self:
kill $(jobs -p)

Bit of a late answer, but for me solutions like kill 0 or kill $(jobs -p) go too far (kill all child processes).
If you just want to make sure one specific child-process (and its own children) are tidied up then a better solution is to kill by process group (PGID) using the sub-process' PID, like so:
set -m
./some_child_script.sh &
some_pid=$!
kill -- -${some_pid}
Firstly, the set -m command will enable job management (if it isn't already), this is important, as otherwise all commands, sub-shells etc. will be assigned to the same process group as your parent script (unlike when you run the commands manually in a terminal), and kill will just give a "no such process" error. This needs to be called before you run the background command you wish to manage as a group (or just call it at script start if you have several).
Secondly, note that the argument to kill is negative, this indicates that you want to kill an entire process group. By default the process group ID is the same as the first command in the group, so we can get it by simply adding a minus sign in front of the PID we fetched with $!. If you need to get the process group ID in a more complex case, you will need to use ps -o pgid= ${some_pid}, then add the minus sign to that.
Lastly, note the use of the explicit end of options --, this is important, as otherwise the process group argument will be treated as an option (signal number), and kill will complain it doesn't have enough arguments. You only need this if the process group argument is the first one you wish to terminate.
Here is a simplified example of a background timeout process, and how to cleanup as much as possible:
#!/bin/bash
# Use the overkill method in case we're terminated ourselves
trap 'kill $(jobs -p | xargs)' SIGINT SIGHUP SIGTERM EXIT
# Setup a simple timeout command (an echo)
set -m
{ sleep 3600; echo "Operation took longer than an hour"; } &
timeout_pid=$!
# Run our actual operation here
do_something
# Cancel our timeout
kill -- -${timeout_pid} >/dev/null 2>&1
wait -- -${timeout_pid} >/dev/null 2>&1
printf '' 2>&1
This should cleanly handle cancelling this simplistic timeout in all reasonable cases; the only case that can't be handled is the script being terminated immediately (kill -9), as it won't get a chance to cleanup.
I've also added a wait, followed by a no-op (printf ''), this is to suppress "terminated" messages that can be caused by the kill command, it's a bit of a hack, but is reliable enough in my experience.

You need to use job control, which, unfortunately, is a bit complicated. If these are the only background jobs that you expect will be running, you can run a command like this one:
jobs \
| perl -ne 'print "$1\n" if m/^\[(\d+)\][+-]? +Running/;' \
| while read -r ; do kill %"$REPLY" ; done
jobs prints a list of all active jobs (running jobs, plus recently finished or terminated jobs), in a format like this:
[1] Running sleep 10 &
[2] Running sleep 10 &
[3] Running sleep 10 &
[4] Running sleep 10 &
[5] Running sleep 10 &
[6] Running sleep 10 &
[7] Running sleep 10 &
[8] Running sleep 10 &
[9]- Running sleep 10 &
[10]+ Running sleep 10 &
(Those are jobs that I launched by running for i in {1..10} ; do sleep 10 & done.)
perl -ne ... is me using Perl to extract the job numbers of the running jobs; you can obviously use a different tool if you prefer. You may need to modify this script if your jobs has a different output format; but the above output is also on Cygwin, so it's very likely identical to yours.
read -r reads a "raw" line from standard input, and saves it into the variable $REPLY. kill %"$REPLY" will be something like kill %1, which "kills" (sends an interrupt signal to) job number 1. (Not to be confused with kill 1, which would kill process number 1.) Together, while read -r ; do kill %"$REPLY" ; done goes through each job number printed by the Perl script, and kills it.
By the way, your for i in {1 .. $num} won't do what you expect, since brace expansion is handled before parameter expansion, so what you have is equivalent to for i in "{1" .. "$num}". (And you can't have white-space inside the brace expansion, anyway.) Unfortunately, I don't know of a clean alternative; I think you have to do something like for i in $(bash -c "{1..$num}"), or else switch to an arithmetic for-loop or whatnot.
Also by the way, you don't need to wrap your while-loop in parentheses; & already causes the job to be run in a subshell.

Here's my eventual solution. I'm keeping track of the subshell process IDs using an array variable, and trapping the Ctrl-C signal to kill them.
declare -a subs #array of subshell pids
function kill_subs() {
for pid in ${subs[#]}; do
kill $pid
done
exit 0
}
num=1 if [ $# -gt 0 ]; then
num=$1 fi
for ((i=0;i < $num; i++)); do
while true; do
{ time curl --silent 'http://localhost'; } 2>&1 | grep real
done &
subs[$i]=$! #grab the pid of the subshell
done
trap kill_subs 1 2 15
wait

While these is not an answer, I just would like to point out something which invalidates the selected one; using jobs or kill 0 might have unexpected results; in my case it killed unintended processes which in my case is not an option.
It has been highlighted somehow in some of the answers but I am afraid not with enough stress or it has been not considered:
"Bit of a late answer, but for me solutions like kill 0 or kill $(jobs -p) go too far (kill all child processes)."
"If these are the only background jobs that you expect will be running, you can run a command like this one:"

Related

Using sleep and wait -n to implement simple timeout in bash, race condition or not?

If I do this in a bash script:
sleep 10 &
sleep_pid=$!
some_command &
wait -n
cmd_pid=$!
if kill -0 $sleep_pid 2> /dev/null; then
# all ok
kill $sleep_pid
else
# some_command hung
...code to log diagnostics and then kill -9 $cmd_pid...
fi
where some_command is something that should be quick but can hang due to rare errors.
Is there then a risk that some_command can be done and cleaned up before "wait -n" starts, so there is only the sleep to wait for? Or does the '&' after one command guarantee that the shell won't call waitpid() on it until the next line of input has been handled?
It works in interactive shells. If you do:
sleep 10 &
sleep 0 &
wait -n
then the "wait -n" returns right away even if you wait a couple of seconds before running it. But I'm not sure if it can be trusted for non-interactive shells?
EDIT: Clarifying need for diagnostics + some grammar.
I believe you may be able to use the timeout command to do this.
http://man7.org/linux/man-pages/man1/timeout.1.html
timeout 10s command_to_run
You can check the exit status of the timeout command to know if it timed out.
timeout 2s sleep 10
if [[ $? -gt 0 ]]; then
echo "it timed out"
else
echo "It was successful"
fi
By using the $! variable, we avoid relying on interactive job control features. Try this:
...long executing command... &
pid_long=$!
sleep 3 &
pid_sleep=$!
wait -n
kill -KILL $pid_long
The problem here is PID recycling. Unlikely to happen in 3 seconds, though.
In the case when the command finishes earlier than the sleep (and its PID has not been recycled to a new process) kill produces an error message; we could pipe that to /dev/null.
We should probably also kill the sleep in case it is the one that is lingering.
As #CharlesDuffy pointed out in comments, the answer is no, there is no race (provided it is run in a non-interactive shell).
Also there is no need (in non-interactive shells) to make sure the wait comes directly after the command, as non-interactive shells don't do automatic reaping of children.
But I guess one should wrap this in a sub-shell, so "wait -n" won't return early due to some previously started unrelated background job.

limit spawned parallel processes and exit all upon failure of any

I'm running some tests in parallel by calling a process from a script. Each process prints only to stdout > a file, and exits 0 iff successful (otherwise -1).
If and when a process exits with -1, I print something to its (or a related) output file (namely, the arguments it was called with), kill all other processes, and exit.
I have written a script using trap "..." CHLD to run some code when a subprocess exits and this works under certain conditions, but I find my script is not very robust. If I send a keyboard interrupt sometimes the subprocesses keep going, and sometimes the number of subprocesses simply overwhelm the machine(s) and none of them seem to advance.
I am using this on my quad core laptop as well as a cluster of 128 CPUs, over which subprocesses are distributed automatically. How do I run a large number of background subprocesses in a bash script, limited to some number of them running concurrently, and do something + exit if one of them returns with a bad code? I would also like the script to clean up after keyboard interrupt. Should I use GNU-parallel? how?
Here is a MWE of my script so far, which spawns subprocesses unhindered, annotated with what I think each part means. I got the idea to use trap from shell - get exit code of background process
$ cat parallel_tests.sh
#!/bin/bash
# some help from https://stackoverflow.com/questions/1570262/shell-get-exit-code-of-background-process
handle_chld() {
#echo pids are ${pids[#]}
local tmp=() ###temporary storage for pids that haven't finished
#for each pid that hadn't finished since the last trap
for((i=0;i<${#pids[#]};++i)); do
#if this pid is still running
if [[ $(ps -p ${pids[i]} -o pid=) ]]
then
tmp+=(${pids[i]}) ### add pid to list of pids that are running
else
wait ${pids[i]} ### put the exit code of this pid into $?
if [ "$?" != "0" ] ### if the exit code $? is non-zero
then
#kill all remaning processes
for((j=0;j<${#pids[#]};++j))
do
if [[ $(ps -p ${pids[j]} -o pid=) ]]
then
echo killing child processes of ${pids[j]}
pkill -P ${pids[j]}
fi
done
cat _tmp${pids[i]}
#print things to the terminal here
echo "FAILED process ${pids[i]} args: `cat _tmpargs${pids[i]}`"
exit 1
else
echo "FINISHED process ${pids[i]} args: `cat _tmpargs${pids[i]}`"
fi
fi
done
#update list of running pids
pids=(${tmp[#]})
}
# set this to monitor SIGCHLD
set -o monitor
# call handle_chld() when SIGCHLD signal is triggered
trap "handle_chld" CHLD
ALL_ARGS="2 32 87" ### ad nauseam
for A in $ALL_ARGS; do
(sleep $A; false) > _tmp$! &
pids+=($!)
echo $A > _tmpargs${pids[${#pids[#]}-1]}
echo "STARTED process ${pids[${#pids[#]}-1]} args: `cat _tmpargs${pids[${#pids[#]}-1]}`"
done
echo "Every process started. Now waiting on PIDS:"
echo ${pids[#]}
wait ${pids[#]} ###wait until every process is finished (or exit in the trap)
The output of this version after 2+epsilon seconds is:
$ ./parallel_tests.sh
STARTED process 66369 args: 2
STARTED process 66374 args: 32
STARTED process 66381 args: 87
Every process started. Now waiting on PIDS:
66369 66374 66381
killing child processes of 66374
./parallel_tests.sh: line 43: 66376 Terminated: 15 sleep $A
killing child processes of 66381
./parallel_tests.sh: line 43: 66383 Terminated: 15 sleep $A
FAILED process 66369 args: 2
Essentially, pid 66369 fails first, and the other two processes are dealt with in the trap. I have simplified the construction of the test processes here, so we can't assume that I'll manually insert waits before spawning new ones. Additionally, some of the test processes can be nearly instant. Essentially, I have a whole mess of test processes, long and short, starting as soon as resources can be allotted.
I'm not sure what's causing the problems I mentioned above, as this script uses several features that are new to me. General pointers are welcomed!
(I have seen this question and it does not answer my question)
cat arguments | parallel --halt now,fail=1 my_prg
Alternatively:
parallel --halt now,fail=1 my_prg ::: $ALL_ARGS
GNU Parallel is designed so it will also kill remote jobs. It does that using process groups and heavy perl scripting on the remote server: https://www.gnu.org/software/parallel/parallel_design.html#The-remote-system-wrapper

Checking and killing hanged background processes in a bash script

Say I have this pseudocode in bash
#!/bin/bash
things
for i in {1..3}
do
nohup someScript[i] &
done
wait
for i in {4..6}
do
nohup someScript[i] &
done
wait
otherThings
and say this someScript[i] sometimes end up hanging.
Is there a way I can take the process IDs (with $!)
and check periodically if the process is taking more than a specified amount of time after which I want to kill the hanged processes with kill -9 ?
Unfortunately the answer from #Eugeniu did not work for me, timeout gave an error.
However I found useful doing this routine, I'll post it here so anyone can take advantage of it if in my same problem.
Create another script which goes like this
#!/bin/bash
#monitor.sh
pid=$1
counter=10
while ps -p $pid > /dev/null
do
if [[ $counter -eq 0 ]] ; then
kill -9 $pid
#if it's still there then kill it
fi
counter=$((counter-1))
sleep 1
done
then in the main work you just put
things
for i in {1..3}
do
nohup someScript[i] &
./monitor.sh $! &
done
wait
In this way for any of your someScript you will have a parallel process that checks if it's still there every chosen interval (until maximum time decided by the counter) and that actually quit itself if the associated process dies (or gets killed)
One possible approach:
#!/bin/bash
# things
mypids=()
for i in {1..3}; do
# launch the script with timeout (3600s)
timeout 3600 nohup someScript[i] &
mypids[i]=$! # store the PID
done
wait "${mypids[#]}"

executing bash loop while command is running

I want to build a bash script that executes a command and in the meanwhile performs other stuff, with the possibility of killing the command if the script is killed. Say, executes a cp of a large file and in the meanwhile prints the elapsed time since copy started, but if the script is killed it kills also the copy.
I don't want to use rsync, for 2 reasons: 1) is slow and 2) I want to learn how to do it, it could be useful.
I tried this:
until cp SOURCE DEST
do
#evaluates time, stuff, commands, file dimensions, not important now
#and echoes something
done
but it doesn't execute the do - done block, as it is waiting that the copy ends. Could you please suggest something?
until is the opposite of while. It's nothing to do with doing stuff while another command runs. For that you need to run your task in the background with &.
cp SOURCE DEST &
pid=$!
# If this script is killed, kill the `cp'.
trap "kill $pid 2> /dev/null" EXIT
# While copy is running...
while kill -0 $pid 2> /dev/null; do
# Do stuff
...
sleep 1
done
# Disable the trap on a normal exit.
trap - EXIT
kill -0 checks if a process is running. Note that it doesn't actually signal the process and kill it, as the name might suggest. Not with signal 0, at least.
There are three steps involved in solving your problem:
Execute a command in the background, so it will keep running while your script does something else. You can do this by following the command with &. See the section on Job Control in the Bash Reference Manual for more details.
Keep track of that command's status, so you'll know if it is still running. You can do this with the special variable $!, which is set to the PID (process identifier) of the last command you ran in the background, or empty if no background command was started. Linux creates a directory /proc/$PID for every process that is running and deletes it when the process exits, so you can check for the existence of that directory to find out if the background command is still running. You can learn more than you ever wanted to know about /proc from the Linux Documentation Project's File System Hierarchy page or Advanced Bash-Scripting Guide.
Kill the background command if your script is killed. You can do this with the trap command, which is a bash builtin command.
Putting the pieces together:
# Look for the 4 common signals that indicate this script was killed.
# If the background command was started, kill it, too.
trap '[ -z $! ] || kill $!' SIGHUP SIGINT SIGQUIT SIGTERM
cp $SOURCE $DEST & # Copy the file in the background.
# The /proc directory exists while the command runs.
while [ -e /proc/$! ]; do
echo -n "." # Do something while the background command runs.
sleep 1 # Optional: slow the loop so we don't use up all the dots.
done
Note that we check the /proc directory to find out if the background command is still running, because kill -0 will generate an error if it's called when the process no longer exists.
Update to explain the use of trap:
The syntax is trap [arg] [sigspec …], where sigspec … is a list of signals to catch, and arg is a command to execute when any of those signals is raised. In this case, the command is a list:
'[ -z $! ] || kill $!'
This is a common bash idiom that takes advantage of the way || is processed. An expression of the form cmd1 || cmd2 will evaluate as successful if either cmd1 OR cmd2 succeeds. But bash is clever: if cmd1 succeeds, bash knows that the complete expression must also succeed, so it doesn't bother to evaluate cmd2. On the other hand, if cmd1 fails, the result of cmd2 determines the overall result of the expression. So an important feature of || is that it will execute cmd2 only if cmd1 fails. That means it's a shortcut for the (invalid) sequence:
if cmd1; then
# do nothing
else
cmd2
fi
With that in mind, we can see that
trap '[ -z $! ] || kill $!' SIGHUP SIGINT SIGQUIT SIGTERM
will test whether $! is empty (which means the background task was never executed). If that fails, which means the task was executed, it kills the task.
here is the simplest way to do that using ps -p :
[command_1_to_execute] &
pid=$!
while ps -p $pid &>/dev/null; do
[command_2_to_be_executed meanwhile command_1 is running]
sleep 10
done
This will run every 10 seconds the command_2 if the command_1 is still running in background .
hope this will help you :)
What you want is to do two things at once in shell. The usual way to do that is with a job. You can start a background job by ending the command with an ampersand.
copy $SOURCE $DEST &
You can then use the jobs command to check its status.
Read more:
Gnu Bash Job Control

How can I run multiple bash scripts in unison?

I'm learning Bash for a Unix class, and I'm trying to figure out how to run a script, then run a second script while the first is running and have the two interact. To clarify, the scripts look like this:
#!/bin/bash
num = 1
trap exit 0 SIGINT SIGTERM
trap "{ echo &num ; num++; }" SIGUSR1
while :
do
sleep 2
done
and the second one:
#!/bin/bash
if ps | grep "$1" > /dev/null
then
kill -SIGUSR1 $1
else
echo "Process doesn't exist"
fi
exit 0
In case the code isn't correct, the general idea is for the first script to loop until it recieves a SIGINT or SIGTERM, and echo and increment a number whenever it receives a SIGUSR1. The second script takes a pid as an argument and checks if it exists, and sends a SIGUSR1 to the given process. The problem is that when I run the first script, I can't do anything unless I move it to the background with ctrl-z, but when it's there it doesn't seem to respond to any signal except a kill signal. Any ideas on how to make this work?
You can use mycommand & to run a script in the background. Ctrl-Z stops the script, but you can then use bg to let it run in the background. In either case, you can use fg to bring it to the foreground again.
Also note that you can't have spaces around the = in assignments, and you can use let num++ to increment num. You should also singlequote the command in trap, to prevent "$num" from expanding.
All in all:
#!/bin/bash
num=1
trap exit 0 SIGINT SIGTERM
trap '{ echo $num ; let num++; }' SIGUSR1
while :
do
sleep 2
done
Finally, you can more easily check if a pid exists by just using kill -0 pid, or just attempting to sigusr1 it and check the result, to avoid grep "123" matching the substring of pid "1234" and such.
You need to make the first script run in the background. When you press Ctrl+Z it is suspended. Then you can type "bg" to make it run in the background (it will stop again if it tries to read from standard input, to allow you to switch back to it with the "fg" command).
Another way is to start script1 already in the background like this:
$ ./script1 &
The ampersand starts a job in the background and returns you to the prompt immediately.
Look in the bash man page under "JOB CONTROL" (here's a copy) for more information on how this works. The key commands to deal with jobs from an interactive shell is "jobs", "fg", and "bg".

Resources