Bash: check job is finished - bash

Theoretical question. Can someone explain me why jobs keep returning Done when it's already done?
root#test:~# cat 1.sh
#!/bin/bash
sleep 5 &
while true; do
echo $(jobs) && sleep 1
done
root#test:~# ./1.sh
[1]+ Running sleep 5 &
[1]+ Running sleep 5 &
[1]+ Running sleep 5 &
[1]+ Running sleep 5 &
[1]+ Running sleep 5 &
[1]+ Done sleep 5
[1]+ Done sleep 5
[1]+ Done sleep 5
[1]+ Done sleep 5
^C
GNU bash, version 5.0.3(1)-release (x86_64-pc-linux-gnu)

Because job control is disabled in scripts, bash ignores signal SIGCHLD and is not notified (and doesn't want to) about terminating background processes.
Because jobs is executed inside a subshell, the parent shell environment doesn't know that the last jobs already checked the child exit status and that the child terminated. Because of that, each time a new subshell is created, it's fresh environment is not aware that the message was printed, so it prints another one.

Related

Bash: why wait returns prematurely with code 145

This problem is very strange and I cannot find any documentation about this online. In the following code snippet I am merely trying to run a bunch of sub-processes in parallel, printing something when they exit and collect/print their exit code at the end. I find that without catching SIGCHLD things work as I would expect however, things break when I catch the signal. Here is the code:
#!/bin/bash
#enabling job control
set -m
cmd_array=( "$#" ) #array of commands to run in parallel
cmd_count=$# #number of commands to run
cmd_idx=0; #current index of command
cmd_pids=() #array of child proc pids
trap 'echo "Child job existed"' SIGCHLD #setting up signal handler on SIGCHLD
#running jobs in parallel
while [ $cmd_idx -lt $cmd_count ]; do
cmd=${cmd_array[$cmd_idx]} #retreiving the job command as a string
eval "$cmd" &
cmd_pids[$cmd_idx]=$! #keeping track of the job pid
echo "Job #$cmd_idx launched '$cmd']"
(( cmd_idx++ ))
done
#all jobs have been launched, collecting exit codes
idx=0
for pid in "${cmd_pids[#]}"; do
wait $pid
child_exit_code=$?
if [ $child_exit_code -ne 0 ]; then
echo "ERROR: Job #$idx failed with return code $child_exit_code. [job_command: '${cmd_array[$idx]}']"
fi
(( idx++ ))
done
You can tell something is wrong when you try to run this the following command:
./parallel_script.sh "sleep 20; echo done_20" "sleep 3; echo done_3"
The interesting thing here is that you can tell as soon as the signal handler is called (when sleep 3 is done), the wait (which is waiting on sleep 20) is interrupted right away with a return code 145. I can tell the sleep 20 is still running even after the script is done.
I can't find any documentation about such a return code from wait. Can anyone shed some light as to what is going on here?
(By the way if I add a while loop when I wait and keep on waiting while the return code is 145, I actually get the result I expect)
Thanks to #muru, I was able to reproduce the "problem" using much less code, which you can see below:
#!/bin/bash
set -m
trap "echo child_exit" SIGCHLD
function test() {
sleep $1
echo "'sleep $1' just returned now"
}
echo sleeping for 6 seconds in the background
test 6 &
pid=$!
echo sleeping for 2 second in the background
test 2 &
echo waiting on the 6 second sleep
wait $pid
echo "wait return code: $?"
If you run this you will get the following output:
linux:~$ sh test2.sh
sleeping for 6 seconds in the background
sleeping for 2 second in the background
waiting on the 6 second sleep
'sleep 2' just returned now
child_exit
wait return code: 145
lunux:~$ 'sleep 6' just returned now
Explanation:
As #muru pointed out "When a command terminates on a fatal signal whose number is N, Bash uses the value 128+N as the exit status." (c.f. Bash manual on Exit Status).
Now what mislead me here is the "fatal" signal. I was looking for a command to fail somewhere when nothing did.
Digging a little deeper in Bash manual on Signals: "When Bash is waiting for an asynchronous command via the wait builtin, the reception of a signal for which a trap has been set will cause the wait builtin to return immediately with an exit status greater than 128, immediately after which the trap is executed."
So there you have it, what happens in the script above is the following:
sleep 6 starts in the background
sleep 3 starts in the background
wait starts waiting on sleep 6
sleep 3terminates and the SIGCHLD trap if fired interrupting wait, which returns 128 + SIGCHLD = 145
my script exits since it does not wait anymore
the background sleep 6 terminates hence the "'sleep 6' just returned now" after the script already exited

restart program if it outputs some string

I want to loop a process in a bash script, it is a process which should run forever but which sometimes fails.
When it fails, it outputs >>747;3R to its last line, but keeps running.
I tried (just for testing)
while [ 1 ]
do
mono Program.exe
last_pid=$!
sleep 3000
kill $last_pid
done
but it doesn't work at all, the process mono Program.exe just runs forever (until it crashes, but even then my script does nothing.)
$! expands to the PID of the last process started in the background. This can be seen with:
cat test
sleep 2
lastpid=$!
echo $lastpid
~$ bash -x test
+ sleep 2
+ lastpid=
+ echo
vs
~$ cat test
sleep 2 &
lastpid=$!
echo $lastpid
:~$ bash -x test
+ lastpid=25779
+ sleep 2
+ echo 25779
The fixed version of your script would read:
while true; do
mono Program.exe &
last_pid=$!
sleep 3000
kill $last_pid
done
Your version was running mono Program.exe and then sitting there. It didn't make it to the next line as it was waiting for the process to finish. Your kill command then didn't work as $! never populated (wasn't a background process).

Kill not killing process if exiting properly

I have a simple bash script which I have written to simplify some work I am doing. All it needs to do is start one process, process_1, as a background process then start another, process_2. Once process_2 is finished I then need to terminate process_1.
process_1 starts a program which does not actually stop unless it receives the kill signal, or CTRL+C when I run it myself. The program is output into a file via {program} {args} > output_file
process_2 can take an arbitrary amount of time depending on the arguments it is given.
Code:
#!/bin/bash
#Call this on exit to kill all background processes
function killJobs () {
#Check process is still running before killing
if kill -0 "$PID"; then
kill $PID
fi
}
...Check given arguments are valid...
#Start process_1
eval "./process_1 ${Arg1} ${Arg2} ${Arg3}" &
PID=$!
#Lay a trap to catch any exits from script
trap killJobs TERM INT
#Start process_2 - sleep for 5 seconds before and after
#Need space between process_1 and process_2 starting and stopping
sleep 5
eval "./process_2 ${Arg1} ${Arg2} ${Arg3} ${Arg4} 2> ${output_file}"
sleep 5
#Make sure background job is killed on exit
killJobs
I check process_1 has been terminated by checking of its output file is still being updated after my script has ended.
If I run the script and then press CTRL+C the script is terminated and process_1 is also killed, the output file is no longer updated.
If I let the script run to its completion without my intervention process_2 and the script both terminate but when I check the output from process_1 it is still being updated.
To check this I put an echo statement just after process_1 is started and another within the if statement of killJobs, so it would only be echoed if kill $PID is called.
Doing this I can see that both ways of exiting start process_1 and then also enter the if statement to kill it. Yet kill does not actually kill the process in the case of normal exit. No error messages are produced either.
You're backgrounding the eval instead of process_1, which sets $! to the PID of the script itself, not to process_1. Change to:
#!/bin/bash
#Call this on exit to kill all background processes
function killJobs () {
#Check process is still running before killing
if kill -0 "$PID"; then
kill $PID
fi
}
...Check given arguments are valid...
#Start process_1
./process_1 ${Arg1} ${Arg2} ${Arg3} &
PID=$!
#Lay a trap to catch any exits from script
trap killJobs TERM INT
#Start process_2 - sleep for 5 seconds before and after
#Need space between process_1 and process_2 starting and stopping
sleep 5
./process_2 ${Arg1} ${Arg2} ${Arg3} ${Arg4} 2> ${output_file}
sleep 5
#Make sure background job is killed on exit
killJobs

How can I silence the "Terminated" message when my command is killed by timeout?

By referencing bash: silently kill background function process and Timeout a command in bash without unnecessary delay, I wrote my own script to set a timeout for a command, as well as silencing the kill message.
But I still am getting a "Terminated" message when my process gets killed. What's wrong with my code?
#!/bin/bash
silent_kill() {
kill $1 2>/dev/null
wait $1 2>/dev/null
}
timeout() {
limit=$1 #timeout limit
shift
command=$* #command to run
interval=1 #default interval between checks if the process is still alive
delay=1 #default delay between SIGTERM and SIGKILL
(
((t = limit))
while ((t > 0)); do
sleep $interval;
#kill -0 $$ || exit 0
((t -= interval))
done
silent_kill $$
#kill -s SIGTERM $$ && kill -0 $$ || exit 0
sleep $delay
#kill -s SIGKILL $$
) &> /dev/null &
exec $*
}
timeout 1 sleep 10
There's nothing wrong with your code, that "Terminated" message doesn't come from your script but from the invoking shell (the one you launch your script
from).
You can deactivate if by disabling job control:
$ set +m
$ bash <your timeout script>
Perhaps bash has moved on in 4 years. I do know you can avoid
getting Terminated by disowning a child process. You can no longer job control it though. Eg:
$ sleep 100 &
[1] 15436
$ disown -r
$ kill -9 15436
help disown:
disown [-h] [-ar] [jobspec ...]
Remove jobs from current shell.
Removes each JOBSPEC argument from the table of active jobs. Without
any JOBSPECs, the shell uses its notion of the current job.
-a remove all jobs if JOBSPEC is not supplied
-h mark each JOBSPEC so that SIGHUP is not sent to the job if the shell receives a SIGHUP
-r remove only running jobs
Internally the shell maintains a list of children it forked and wait()s for any of them to exit or be killed. When a child's exit status was collected, the shell prints a message. This is called monitoring in shell parlance.
It seems you want to turn off monitoring. Monitoring is managed with the m option; to turn it on, use set -m (the default at startup). To turn it off, set +m.
Note that monitoring off also disables messages for asynchronous jobs, e.g. no more messages like
$ sleep 5 &
[1] 59468
$
[1] + done sleep 5
$

inconsistent signal behavior? Only works for the first signal?

Trying to have a script that is able to restart itself with exec (so it can pick up any "upgrade") given a specific signal (tried SIGHUP & SIGUSR1).
This seems to work the first time, but not the second, even tho the registration (trap) does recur in the execed instance (which is still the same PID).
#!/usr/bin/env bash
set -x
readonly PROGNAME="${0}"
function run_prog()
{
echo hi
sleep 2
echo ho
sleep 1000 &
wait $!
}
restart()
{
sleep 5
exec "${PROGNAME}"
}
trap restart USR1
echo -e "TRAPS:"
trap
echo
run_prog
This is how I run it:
./tst.sh & TSTPID=$! # Starts ok, see both "hi" & "ho" messages
sleep 10
kill -USR1 ${TSTPID} # Restarts ok, see both "hi" & "ho" messages
sleep 10
kill -USR1 ${TSTPID} # NOTHING HAPPENS
sleep 5
kill ${TSTPID}
Any idea why the second signal is ignored? (some code, like de-registering the trap in the cleanup may just be paranoia)
Maybe because you're execing from a signal handler, the signal code is continuing to run and continuing into oblivion, due to the exec, or preventing other cleanup code or daisy-chained handlers from executing.
Who knows what's going on in the blackbox of the OS signal handling code and bash's own layering over it that might be circumvented by exec. exec is a very draconian measure :-)
Also check out this cool bash site. I'm looking for the bash source code that handles signals. Just curious.
Your solution here is the right approach:
#!/usr/bin/env bash
set -x
readonly PROGNAME="${0}"
DO_RESTART=
function run_prog()
{
echo hi
sleep 2
echo ho
sleep 1000 &
SLEEPPID=$!
#builtin
wait ${SLEEPPID}
}
trap DO_RESTART=1 SIGUSR1
echo -e "TRAPS:"
trap -p
echo
run_prog
if [ -n "${DO_RESTART}" ]; then
sleep 5
kill ${SLEEPPID}
exec "${PROGNAME}"
fi

Resources