inconsistent signal behavior? Only works for the first signal? - bash

Trying to have a script that is able to restart itself with exec (so it can pick up any "upgrade") given a specific signal (tried SIGHUP & SIGUSR1).
This seems to work the first time, but not the second, even tho the registration (trap) does recur in the execed instance (which is still the same PID).
#!/usr/bin/env bash
set -x
readonly PROGNAME="${0}"
function run_prog()
{
echo hi
sleep 2
echo ho
sleep 1000 &
wait $!
}
restart()
{
sleep 5
exec "${PROGNAME}"
}
trap restart USR1
echo -e "TRAPS:"
trap
echo
run_prog
This is how I run it:
./tst.sh & TSTPID=$! # Starts ok, see both "hi" & "ho" messages
sleep 10
kill -USR1 ${TSTPID} # Restarts ok, see both "hi" & "ho" messages
sleep 10
kill -USR1 ${TSTPID} # NOTHING HAPPENS
sleep 5
kill ${TSTPID}
Any idea why the second signal is ignored? (some code, like de-registering the trap in the cleanup may just be paranoia)

Maybe because you're execing from a signal handler, the signal code is continuing to run and continuing into oblivion, due to the exec, or preventing other cleanup code or daisy-chained handlers from executing.
Who knows what's going on in the blackbox of the OS signal handling code and bash's own layering over it that might be circumvented by exec. exec is a very draconian measure :-)
Also check out this cool bash site. I'm looking for the bash source code that handles signals. Just curious.
Your solution here is the right approach:
#!/usr/bin/env bash
set -x
readonly PROGNAME="${0}"
DO_RESTART=
function run_prog()
{
echo hi
sleep 2
echo ho
sleep 1000 &
SLEEPPID=$!
#builtin
wait ${SLEEPPID}
}
trap DO_RESTART=1 SIGUSR1
echo -e "TRAPS:"
trap -p
echo
run_prog
if [ -n "${DO_RESTART}" ]; then
sleep 5
kill ${SLEEPPID}
exec "${PROGNAME}"
fi

Related

Prevent CTRL+C being sent to process called from a Bash script

Here is a simplified version of some code I am working on:
#!/bin/bash
term() {
echo ctrl c pressed!
# perform cleanup - don't exit immediately
}
trap term SIGINT
sleep 100 &
wait $!
As you can see, I would like to trap CTRL+C / SIGINT and handle these with a custom function to perform some cleanup operation, rather than exiting immediately.
However, upon pressing CTRL+C, what actually seems to happen is that, while I see ctrl c pressed! is echoed as expected, the wait command is also killed which I would not like to happen (part of my cleanup operation kills sleep a bit later but first does some other things). Is there a way I can prevent this, i.e. stop CTRL+C input being sent to the wait command?
You can prevent a process called from a Bash script from receiving sigint by first ignoring the signal with trap:
#!/bin/bash
# Cannot be interrupted
( trap '' INT; exec sleep 10; )
However, only a parent process can wait for its child, so wait is a shell builtin and not a new process. This therefore doesn't apply.
Instead, just restart the wait after it gets interrupted:
#!/bin/bash
n=0
term() {
echo "ctrl c pressed!"
n=$((n+1))
}
trap term INT
sleep 100 &
while
wait "$!"
[ "$?" -eq 130 ] # Sigint results in exit code 128+2
do
if [ "$n" -ge 3 ]
then
echo "Jeez, fine"
exit 1
fi
done
I ended up using a modified version of what #thatotherguy suggested:
#!/bin/bash
term() {
echo ctrl c pressed!
# perform cleanup - don't exit immediately
}
trap term SIGINT
sleep 100 &
pid=$!
while ps -p $pid > /dev/null; do
wait $pid
done
This checks if the process is still running and, if so, runs wait again.

Bash: why wait returns prematurely with code 145

This problem is very strange and I cannot find any documentation about this online. In the following code snippet I am merely trying to run a bunch of sub-processes in parallel, printing something when they exit and collect/print their exit code at the end. I find that without catching SIGCHLD things work as I would expect however, things break when I catch the signal. Here is the code:
#!/bin/bash
#enabling job control
set -m
cmd_array=( "$#" ) #array of commands to run in parallel
cmd_count=$# #number of commands to run
cmd_idx=0; #current index of command
cmd_pids=() #array of child proc pids
trap 'echo "Child job existed"' SIGCHLD #setting up signal handler on SIGCHLD
#running jobs in parallel
while [ $cmd_idx -lt $cmd_count ]; do
cmd=${cmd_array[$cmd_idx]} #retreiving the job command as a string
eval "$cmd" &
cmd_pids[$cmd_idx]=$! #keeping track of the job pid
echo "Job #$cmd_idx launched '$cmd']"
(( cmd_idx++ ))
done
#all jobs have been launched, collecting exit codes
idx=0
for pid in "${cmd_pids[#]}"; do
wait $pid
child_exit_code=$?
if [ $child_exit_code -ne 0 ]; then
echo "ERROR: Job #$idx failed with return code $child_exit_code. [job_command: '${cmd_array[$idx]}']"
fi
(( idx++ ))
done
You can tell something is wrong when you try to run this the following command:
./parallel_script.sh "sleep 20; echo done_20" "sleep 3; echo done_3"
The interesting thing here is that you can tell as soon as the signal handler is called (when sleep 3 is done), the wait (which is waiting on sleep 20) is interrupted right away with a return code 145. I can tell the sleep 20 is still running even after the script is done.
I can't find any documentation about such a return code from wait. Can anyone shed some light as to what is going on here?
(By the way if I add a while loop when I wait and keep on waiting while the return code is 145, I actually get the result I expect)
Thanks to #muru, I was able to reproduce the "problem" using much less code, which you can see below:
#!/bin/bash
set -m
trap "echo child_exit" SIGCHLD
function test() {
sleep $1
echo "'sleep $1' just returned now"
}
echo sleeping for 6 seconds in the background
test 6 &
pid=$!
echo sleeping for 2 second in the background
test 2 &
echo waiting on the 6 second sleep
wait $pid
echo "wait return code: $?"
If you run this you will get the following output:
linux:~$ sh test2.sh
sleeping for 6 seconds in the background
sleeping for 2 second in the background
waiting on the 6 second sleep
'sleep 2' just returned now
child_exit
wait return code: 145
lunux:~$ 'sleep 6' just returned now
Explanation:
As #muru pointed out "When a command terminates on a fatal signal whose number is N, Bash uses the value 128+N as the exit status." (c.f. Bash manual on Exit Status).
Now what mislead me here is the "fatal" signal. I was looking for a command to fail somewhere when nothing did.
Digging a little deeper in Bash manual on Signals: "When Bash is waiting for an asynchronous command via the wait builtin, the reception of a signal for which a trap has been set will cause the wait builtin to return immediately with an exit status greater than 128, immediately after which the trap is executed."
So there you have it, what happens in the script above is the following:
sleep 6 starts in the background
sleep 3 starts in the background
wait starts waiting on sleep 6
sleep 3terminates and the SIGCHLD trap if fired interrupting wait, which returns 128 + SIGCHLD = 145
my script exits since it does not wait anymore
the background sleep 6 terminates hence the "'sleep 6' just returned now" after the script already exited

Signalling a bash script running in the background in an infinite loop

I have a bash script (sleeping in an infinite loop) running as a background process. I want to periodically signal this process, to do some processing without killing the script, ie. the script on receiving the signal should run a function and then go back to sleep. How do I signal the background process without killing it?
Here's what the code looks like for the script test.sh:
MY_PID=$$
echo $MY_PID > test.pid
while true
do
sleep 5
done
trap 'run' SIGUSR1
run()
{
// data processing
}
This is how I am running it and triggering it:
run.sh &
kill -SIGUSR1 `cat test.pid`
This works for me.
EDITED 2014-01-20: This edit avoids the frequent waking up. Also see this question:
Bash: a sleep in a while loop gets its own pid
#!/bin/bash
MY_PID=$$
echo $MY_PID > test.pid
trap 'kill ${!}; run' SIGUSR1
run()
{
echo "signal!"
}
while true
do
sleep 1000 & wait ${!}
done
Running:
> ./test.sh &
[1] 14094
> kill -SIGUSR1 `cat test.pid`
> signal!
>

How to capture Ctrl-C and use it to exit an endless loop properly

I'm trying to run a program inside an endless loop because it sometimes dies for no reason. I would like to be able to hit Ctrl-C to prevent the program being restarted though.
I don't want Ctrl-C to kill the program, just to wait until it dies, then not restart it again.
theprogram is a wine program (utorrent).
Bonus points for telling me how to make it so it will safely exit theprogram just like clicking on the 'x' in the top right of it. When I manually kill it from the command line or hit Ctrl-C, it doesn't get to run its cleanup code. Hence my attempt to just stop it being restarted.
I checked a few of the other questions about trapping SIGINT, but I couldn't work out how to do this.
Can anyone fix this code? My code seems to kill theprogram then exit the loop when Ctrl-C is pressed, without letting theprogram clean up.
#!/bin/bash
EXIT=0
trap exiting SIGINT
exiting() { echo "Ctrl-C trapped, will not restart utorrent" ; EXIT=1;}
while [ $EXIT -eq 0 ] ; do
wine theprogram
echo "theprogram killed or finished"
date
echo "exit code $?"
echo "sleeping for 20 seconds, then restarting theprogram..."
sleep 20
done
echo "out of loop"
Try this:
while true
do
xterm -e wine theprogram || break
sleep 3
done
The trick is done by using another xterm to start the wine. That way the wine has a different controlling tty and won't be affected by the Ctrl-c press.
The drawback is that there will be an additional xterm lingering around on your desktop. You could use the option -iconic to start it iconified.
Well, I ended up not using Ctrl-C as per my question because I couldn't find a good solution, but I used zenity to popup a box that I can click to exit the loop:
#!/bin/bash
zenity --info --title "thewineprogram" --text "Hit OK to disable thewineprogram auto-restart" & # run zenity in the background
zen_pid=$!
while :
do
wine <wineprogramlocation>
EXITCODE=$?
echo "thewineprog killed or finished"
echo "exit code was $EXITCODE"
date
kill -0 $zen_pid > /dev/null 2>&1 # kill -0 just checks if a pid exists
if [ $? -eq 1 ] # process does not exist
then
break
fi
echo "sleeping for 5 seconds, then restarting the wine program..."
sleep 5
done
echo "finished"
Use a monitoring process:
This allows the SIGINT signal to hit the monitor process trap handler without affecting the child.
(this could also be done in perl, python or any language)
#!/bin/bash
cmd() {
trap '' INT
trap 'echo "Signal USR1 received (pid=$BASHPID)"; EXIT=1' USR1
EXIT=0
while [ $EXIT -eq 0 ]
do
echo "Starting (pid=$BASHPID)..."
sleep 5 # represents "wine theprogram"
echo "theprogram killed or finished"
date
echo "Exit code $?"
if [ $EXIT -eq 0 ]; then
echo "Sleeping for 2 seconds, then restarting theprogram..."
sleep 2
fi
done
echo "Exiting (pid=$BASHPID)"
}
run() { cmd & PID=$!; echo Started $PID; }
graceful_exit() { kill -s USR1 $PID && echo "$PID signalled to exit (USR1)"; }
shutdown() { kill -0 $PID 2>/dev/null && echo "Unexpected exit, killing $PID" && kill $PID; }
trap 'graceful_exit' INT
trap 'shutdown' EXIT
run
while :
do
wait && break
done
echo "Exiting monitor process"
It appears that trap on SIGINT must terminate the currently executing sub-command. The only exception appears to be the empty-string handler.
To demonstrate this: When ctrl-c is pressed this (trap "" INT;echo 1;sleep 5;echo 2) does not halt the sleep command. However this (trap "echo hi" INT;echo 1;sleep 5;echo 2) does. After this trap handler executes, execution continues on the command that follows, specifically "echo 2". So empty-string as a handler seems to be a special case which does not kill the current sub-command. There seems to be no way to run a handler plus not kill the current sub-command.
Why this happens: Shell forks + execs to execute each program. On system call exec, it resets signal handlers to their default behavior (calling process is overwritten so the handlers are gone). Ignored signals are inherited (see "man 2 execve", "man 7 signal" and POSIX.1; http://www.perlmonks.org/?node_id=1198044)
I had a second idea: use 'trap "" INT' to fully disable ctrl-c and then trap ctrl-z as the signal to gracefully exit your program. Only trapping ctrl-z (STP) seems to not work properly for me. When I run '(trap "echo test" TSTP;sleep 5)' and press ctrl-z, my shell is hung. sleep never completes after 5 seconds and oddly ctrl-c no longer works. I don't know any other hotkey-signals to use other than ctrl-c and ctrl-z. This is known behavior: see Bash script: can not properly handle SIGTSTP.

Catch SIGINT in bash, handle AND ignore

Is it possible in bash to intercept a SIGINT, do something, and then ignore it (keep bash running).
I know that I can ignore the SIGINT with
trap '' SIGINT
And I can also do something on the sigint with
trap handler SIGINT
But that will still stop the script after the handler executes. E.g.
#!/bin/bash
handler()
{
kill -s SIGINT $PID
}
program &
PID=$!
trap handler SIGINT
wait $PID
#do some other cleanup with results from program
When I press ctrl+c, the SIGINT to program will be sent, but bash will skip the wait BEFORE program was properly shut down and created its output in its signal handler.
Using #suspectus answer I can change the wait $PID to:
while kill -0 $PID > /dev/null 2>&1
do
wait $PID
done
This actually works for me I am just not 100% sure if this is 'clean' or a 'dirty workaround'.
trap will return from the handler, but after the command called when the handler was invoked.
So the solution is a little clumsy but I think it does what is required. trap handler INT also will work.
trap 'echo "Be patient"' INT
for ((n=20; n; n--))
do
sleep 1
done
The short answer:
SIGINT in bash can be caught, handled and then ignored, assumed that "ignored" here means that bash continues to run the script.
The wanted actions of the handler can even be postponed to build a kind of "transaction" so that SIGINT will be fired (or "ignored") AFTER a group of statements have done their work.
But since the above example touches many aspects of bash (foreground vs. background behavior, trap and wait) AND 8 years went away since then, the solution discussed here may not immediately work on all systems without further finetuning.
The solution discussed here was successfully tested on a "Linux mint-mate 5.4.0-73-generic x86_64" system with "GNU bash, Version 4.4.20(1)-release":
The wait shell builtin command IS DESIGNED to be interruptable. But one can examine the exit status of wait, which is 128 + signal number = 130 (in the case of SIGINT).
So if you want to trick around and wait til the background is process really finished, one can also do something like this:
wait ${programPID}
while [ $? -ge 128 ]; do
# 1st opportunity to place your **handler actions** is here
wait ${programPID}
done
But let it also said that we ran into a bug/feature while testing all of this. The problem was that wait kept on returning 130 even after the process in the background was no longer there. The documentation says that wait will return 127 in the case of a false process id, but this did not happen in our tests.
Keep in mind to check the existence of the background process before running the wait command in the while loop, if you also run into this problem.
Assumed that the following script is your program, which simply counts down from 5 to 0 and also tee's its output to a file named program.out. The while loop here is considered as a "transaction", which shall not be disturbed by SIGINT. And one last comment: This code does NOT ignore SIGINT after doing postponed actions, but instead restores the old SIGINT handler and raises a SIGINT:
#!/bin/bash
rm -f program.out
# Will be set to 1 by the SIGINT ignoring/postponing handler
declare -ig SIGINT_RECEIVED=0
# On <CTRL>+C or "kill -s SIGINT $$" set flag for [later|postponed] examination
function _set_SIGINT_RECEIVED {
SIGINT_RECEIVED=1
}
# Remember current SIGINT handler
old_SIGINT_handler=$(trap -p SIGINT)
# Prepare for later restoration via ${old_SIGINT_handler}
old_SIGINT_handler=${old_SIGINT_handler:-trap - SIGINT}
# Start your "transaction", which should NOT be disturbed by SIGINT
trap -- '_set_SIGINT_RECEIVED' SIGINT
count=5
echo $count | tee -a program.out
while (( count-- )); do
sleep 1
echo $count | tee -a program.out
done
# End of your "transaction"
# Look whether SIGINT was received
if [ ${SIGINT_RECEIVED} -eq 1 ]; then
# Your **handler actions** are here
echo "SIGINT was received during transaction..." | tee -a program.out
echo "... doing postponed work now..." | tee -a program.out
echo "... restoring old SIGINT handler and sending SIGINT" | tee -a program.out
echo "program finished after SIGINT postponed." | tee -a program.out
${old_SIGINT_handler}
kill -s SIGINT $$
fi
echo "program finished without having received SIGINT." | tee -a program.out
But let it also be said here that we ran into problems after sending program in the background. The problem was that program inherited a trap '' SIGINT which means that SIGINT was generally ignored and program was NOT able to set another handler via trap -- '_set_SIGINT_RECEIVED' SIGINT.
We solved this problem by putting program in a subshell and sending this subshell in the background, as you will see now in the MAIN script example, which runs in the foreground. And one last comment also: In this script you can decide via variable ignore_SIGINT_after_handling whether to finally ignore SIGINT and continue to run the script OR to execute the default SIGINT behavior after your handler action has finished its work:
#!/bin/bash
# Will be set to 1 by the SIGINT ignoring/postponing handler
declare -ig SIGINT_RECEIVED=0
# On <CTRL>+C or "kill -s SIGINT $$" set flag for later examination
function _set_SIGINT_RECEIVED {
SIGINT_RECEIVED=1
}
# Set to 1 if you want to keep bash running after handling SIGINT in a particular way
# or to 0 (or any other value) to run original SIGINT action after postponing SIGINT
ignore_SIGINT_after_handling=1
# Remember current SIGINT handler
old_SIGINT_handler=$(trap -p SIGINT)
# Prepare for later restoration via ${old_SIGINT_handler}
old_SIGINT_handler=${old_SIGINT_handler:-trap - SIGINT}
# Start your "transaction", which should NOT be disturbed by SIGINT
trap -- '_set_SIGINT_RECEIVED' SIGINT
# Do your work, for eample
(./program) &
programPID=$!
wait ${programPID}
while [ $? -ge 128 ]; do
# 1st opportunity to place a part of your **handler actions** is here
# i.e. send SIGINT to ${programPID} and make sure that it is only sent once
# even if MAIN receives more SIGINT's during this loop
wait ${programPID}
done
# End of your "transaction"
# Look whether SIGINT was received
if [ ${SIGINT_RECEIVED} -eq 1 ]; then
# Your postponed **handler actions** are here
echo -e "\nMAIN is doing postponed work now..."
if [ ${ignore_SIGINT_after_handling} -eq 1 ]; then
echo "... and continuing with normal program execution..."
else
echo "... and restoring old SIGINT handler and sending SIGINT via 'kill -s SIGINT \$\$'"
${old_SIGINT_handler}
kill -s SIGINT $$
fi
fi
# Restore "old" SIGINT behaviour
${old_SIGINT_handler}
# Prepare for next "transaction"
SIGINT_RECEIVED=0
echo ""
echo "This message has to be shown in the case of normal program execution"
echo "as well as after a caught and handled and then ignored SIGINT"
echo "End of MAIN script received"
Hope this helps a bit.
Shall everybody have a good time.
i had the same problem: my script was exiting after my sigint handler
i solved this by recursion
#! /bin/sh
# devloop.sh
# run command in infinite loop
# wait before restarting, to allow stopping the loop
# license: MIT, author: milahu
# https://stackoverflow.com/questions/15785522/catch-sigint-in-bash-handle-and-ignore
restart_delay=2
command="$1" # TODO use all args: $#
# example: drop cache, run vite
#command="rm -rf node_modules/.vite/ ; npx vite --clearScreen false"
if [ -z "$command" ]
then
command="( set -x; sleep 5 ); false # example command: sleep 5 seconds, set rc=1"
fi
loop_next() {
echo
echo "starting command. hit Ctrl+C to restart"
echo " $command"
(eval "$command") &
command_pid=$!
#echo "main pid: $$"; echo "cmd pid: $command_pid" # debug
restart_command() {
echo
echo "restarting command in $restart_delay seconds. hit Ctrl+C to stop"
sleep $restart_delay
loop_next # recursion
}
stop_command() {
echo
echo "got Ctrl+C -> stopping command"
kill $command_pid
trap exit SIGINT # handle second Ctrl+C
restart_command
}
trap stop_command SIGINT # handle first Ctrl+C
wait $command_pid # this is blocking
echo "command stopped. return code: $?"
restart_command
}
echo starting loop
loop_next

Resources