How to restart a BASH script from itself with a signal?

How to restart a BASH script from itself with a signal? - bash

For example I have script with an infinite loop printing something to stdout. I need to trap a signal (for example SIGHUP) so it will restart the script with different PID and the loop will start itself again from 0. Killing and starting doesn't work as expected:
function traphup(){
kill $0
exec $0
}
trap traphup HUP
Maybe I should place something in background or use nohup, but I am not familiar with this command.

In your function:
traphup(){
$0 "$#" &
exit 0
}
This starts a new process in the background with the original command name and arguments (vary arguments to suit your requirements) with a new process ID. The original shell then exits. Don't forget to sort out the PID file if your daemon uses one to identify itself - but the restart may do that anyway.
Note that using nohup would be the wrong direction; the first time you launched the daemon, it would respond to the HUP signal, but the one launched with nohup would ignore the signal, not restarting again - unless you explicitly overrode the 'ignore' status, which is a bad idea for various reasons.
Answering comment
I'm not quite sure what the trouble is.
When I run the following script, I only see one copy of the script in ps output, regardless of whether I start it as ./xx.sh or as ./xx.sh &.
#!/bin/bash
traphup()
{
$0 "$$" &
exit 0
}
trap traphup HUP
echo
sleep 1
i=1
while [ $i -lt 1000 ]
do
echo "${1:-<none>}: $$: $i"
sleep 1
: $(( i++ ))
done
The output contains lines such as:
<none>: 1155: 21
<none>: 1155: 22
<none>: 1155: 23
1155: 1649: 1
1155: 1649: 2
1155: 1649: 3
1155: 1649: 4
The ones with '<none>' are the original process; the second set are the child process (1649) reporting its parent (1155). This output made it easy to track which process to send HUP signals to. (The initial echo and sleep gets the command line prompt out of the way of the output.)
My suspicion is that what you are seeing depends on the content of your script - in my case, the body of the loop is simple. But if I had a pipeline or something in there, then I might see a second process with the same name. But I don't think that would change depending on whether the original script is run in foreground or background.

Related

Running Child Process In Sequential Statement Before Exiting Parent?

I'm trying to write a Bash script that, when it receives a SIGINT signal, creates a copy of itself before exiting. So, when a user tries to kill this script using a SIGINT signal a copy of the process reapppears.
trap "echo Exiting...?; ./ghoul.sh; exit 1" SIGINT
while :
do
echo Process Number $$, with PPID $PPID!
sleep 1
done
However, whenever I suspend the process and check ps -f, there are multiple processes of the script (children and children of children). The exit command never seems to run since it's waiting for the children to exit. I want to find a way to run the script in the trap statement and exit afterward while maintaining the resulting child process. Is there any way to do this besides creating the child as a background process?

I find it much simpler to put exit code into a function. For example, your unquoted echo contains a bare ? which is a glob (file expansion) character. To avoid the parent killing the child you can use disown, and yes, you need to run it in background.
Try this:
f_exit() {
echo 'Exiting...?'
./ghoul.sh &
disown -h %1
exit 1
}
trap "f_exit" SIGINT
while :
do
echo "Process Number $$, with PPID $PPID!"
sleep 1
done

Prevent SIGINT from interrupting current task while still passing information about SIGINT (and preserve the exit code)

I have a quite long shell script and I'm trying to add signal handling to it.
The main task of the script is to run various programs and then clean up their temporary files.
I want to trap SIGINT.
When the signal is caught, the script should wait for the current program to finish execution, then do the cleanup and exit.
Here is an MCVE:
#!/bin/sh
stop_this=0
trap 'stop_this=1' 2
while true ; do
result="$(sleep 2 ; echo success)" # run some program
echo "result: '$result'"
echo "Cleaning up..." # clean up temporary files
if [ $stop_this -ne 0 ] ; then
echo 'OK, time to stop this.'
break
fi
done
exit 0
The expected result:
Cleaning up...
result: 'success'
Cleaning up...
^Cresult: 'success'
Cleaning up...
OK, time to stop this.
The actual result:
Cleaning up...
result: 'success'
Cleaning up...
^Cresult: ''
Cleaning up...
OK, time to stop this.
The problem is that the currently running instruction (result="$(sleep 2 ; echo success)" in this case) is interrupted.
What can I do so it would behave more like I was set trap '' 2?
I'm looking for either a POSIX solution or one that is supported by most of shell interpreters (BusyBox, dash, Cygwin...)
I already saw answers for Prevent SIGINT from closing child process in bash script but this isn't really working for me. All of these solutions require to modify each line which shouldn't be interrupted. My real script is quite long and much more complicated than the example. I would have to modify hundreds of lines.

You need to prevent the SIGINT from going to the echo in the first place (or rewrite the cmd that you are running in the variable assignment to ignore SIGINT). Also, you need to allow the variable assignment to happen, and it appears that the shell is aborting the assignment when it receives the SIGINT. If you're only worried about user generated SIGINT from the tty, you need to disassociate that command from the tty (eg, get it out of the foreground process group) and prevent the SIGINT from aborting the assignment. You can (almost) accomplish both of those with:
#!/bin/sh
stop_this=0
while true ; do
trap 'stop_this=1' INT
{ sleep 1; echo success > tmpfile; } & # run some program
while ! wait; do : ; done
trap : INT
result=$(cat tmpfile& wait)
echo "result: '$result'"
echo "Cleaning up..." # clean up temporary files
if [ $stop_this -ne 0 ] ; then
echo 'OK, time to stop this.'
break
fi
done
exit 0
If you're worried about SIGINT from another source, you'll have to re-implement sleep (or whatever command I presume sleep is a proxy for) to handle SIGINT the way you want. The key here is to run the command in the background and wait for it to prevent the SIGINT from going to it and terminating it early. Note that we've opened at least 2 new cans of worms here. By waiting in a loop, we're effectively ignoring the any errors that the subcommand might raise (we're doing this to try and implement a SIGRESTART), so may potentially hang. Also, if the SIGINT arrives during the cat, we have attempted to prevent the cat from aborting by running it in the background, but now the variable assignment will be terminated and you'll get your original behavior. Signal handling is not clean in the shell! But this gets you closer to your desired goal.

Sighandling in shell scripts can get clumsy. It's pretty much impossible to
do it "right" without the support of C.
The problem with:
result="$(sleep 2 ; echo success)" # run some program
is that $() creates a subshell and in subshells, non-ignored (trap '' SIGNAL is how you ignore SIGNAL)
signals are reset to their default dispositions which for SIGINT is to terminate the process
($( ) gets its own process, thought it will receive the signal too because the terminal-generated SIGINT
is process-group targeted)
To prevent this, you could do something like:
result="$(
trap '' INT #ignore; could get killed right before the trap command
sleep 2; echo success)"
or
result="$( trap : INT; #no-op handler; same problem
sleep 2; while ! echo success; do :; done)"
but as noted, there will be a small race-condition window between the start of the
subshell and the registration of the signal handler during which
the subshell could get killed by the reset-to-default SIGINT signal.

Both answers from #PSkocik and #WilliamPursell have helped me to get on the right track.
I have a fully working solution. It ain't pretty because it needs to use an external file to indicate that the signal didn't occurred but beside that it should work reliably.
#!/bin/sh
touch ./continue
trap 'rm -f ./continue' 2
( # the whole main body of the script is in a separate background process
trap '' 2 # ignore SIGINT
while true ; do
result="$(sleep 2 ; echo success)" # run some program
echo "result: '$result'"
echo "Cleaning up..." # clean up temporary files
if [ ! -e ./continue ] ; then # exit the loop if file "./continue" is deleted
echo 'OK, time to stop this.'
break
fi
done
) & # end of the main body of the script
while ! wait ; do : ; done # wait for the background process to end (ignore signals)
wait $! # wait again to get the exit code
result=$? # exit code of the background process
rm -f ./continue # clean up if the background process ended without a signal
exit $result
EDIT: There are some problems with this code in Cygwin.
The main functionality regarding signals work.
However, it seems like the finished background process doesn't stay in the system as a zombie. This makes the wait $! to not work. The exit code of the script has incorrect value of 127.
Solution to that would be removing lines wait $!, result=$? and result=$? so the script always returns 0.
It should be also possible to keep the proper error code by using another layer of subshell and temporarily store the exit code in a file.

For disallowing interrupting the program:
trap "" ERR HUP INT QUIT TERM TSTP TTIN TTOU
But if a sub-command handles traps by itself, and that command must really complete, you need to prevent passing signals to it.
For people on Linux that don't mind installing extra commands, you can just use:
waitFor [command]
Alternatively you can adapt the latest source code of waitFor into your program as needed, or use the code from Gilles' answer. Although that has the disadvantage of not benefiting from updates upstream.
Just mind that other terminals and the service manager can still terminate "command". If you want the service manager to be unable to close "command", it shall be run as a service with the appropriate kill mode and kill signal set.

You may want to adapt the following:
#!/bin/sh
tmpfile=".tmpfile"
rm -f $tmpfile
trap : INT
# put the action that should not be interrupted in the innermost brackets
# | |
( set -m; (sleep 10; echo success > $tmpfile) & wait ) &
wait # wait will be interrupted by Ctrl+c
while [ ! -r $tmpfile ]; do
echo "waiting for $tmpfile"
sleep 1
done
result=`cat $tmpfile`
echo "result: '$result'"
This seems also to work with programs that install their own SIGINT handler like mpirun and mpiexec and so on.

Introduce timeout in a bash for-loop

I have a task that is very well inside of a bash for loop. The situation is though, that a few of the iterations seem to not terminate. What I'm looking for is a way to introduce a timeout that if that iteration of command hasn't terminated after e.g. two hours it will terminate, and move on to the next iteration.
Rough outline:
for somecondition; do
while time-run(command) < 2h do
continue command
done
done

One (tedious) way is to start the process in the background, then start another background process that attempts to kill the first one after a fixed timeout.
timeout=7200 # two hours, in seconds
for somecondition; do
command & command_pid=$!
( sleep $timeout & wait; kill $command_pid 2>/dev/null) & sleep_pid=$!
wait $command_pid
kill $sleep_pid 2>/dev/null # If command completes prior to the timeout
done
The wait command blocks until the original command completes, whether naturally or because it was killed after the sleep completes. The wait immediately after sleep is used in case the user tries to interrupt the process, since sleep ignores most signals, but wait is interruptible.

If I'm understanding your requirement properly, you have a process that needs to run, but you want to make sure that if it gets stuck it moves on, right? I don't know if this will fully help you out, but here is something I wrote a while back to do something similar (I've since improved this a bit, but I only have access to a gist at present, I'll update with the better version later).
#!/bin/bash
######################################################
# Program: logGen.sh
# Date Created: 22 Aug 2012
# Description: parses logs in real time into daily error files
# Date Updated: N/A
# Developer: #DarrellFX
######################################################
#Prefix for pid file
pidPrefix="logGen"
#output direcory
outDir="/opt/Redacted/logs/allerrors"
#Simple function to see if running on primary
checkPrime ()
{
if /sbin/ifconfig eth0:0|/bin/grep -wq inet;then isPrime=1;else isPrime=0;fi
}
#function to kill previous instances of this script
killScript ()
{
/usr/bin/find /var/run -name "${pidPrefix}.*.pid" |while read pidFile;do
if [[ "${pidFile}" != "/var/run/${pidPrefix}.${$}.pid" ]];then
/bin/kill -- -$(/bin/cat ${pidFile})
/bin/rm ${pidFile}
fi
done
}
#Check to see if primary
#If so, kill any previous instance and start log parsing
#If not, just kill leftover running processes
checkPrime
if [[ "${isPrime}" -eq 1 ]];then
echo "$$" > /var/run/${pidPrefix}.$$.pid
killScript
commands && commands && commands #Where the actual command to run goes.
else
killScript
exit 0
fi
I then set this script to run on cron every hour. Every time the script is run, it
creates a lock file named after a variable that describes the script that contains the pid of that instance of the script
calls the function killScript which:
uses the find command to find all lock files for that version of the script (this lets more than one of these scripts be set to run in cron at once, for different tasks). For each file it finds, it kills the processes of that lock file and removes the lock file (it automatically checks that it's not killing itself)
Starts doing whatever it is I need to run and not get stuck (I've omitted that as it's hideous bash string manipulation that I've since redone in python).
If this doesn't get you squared let me know.
A few notes:
the checkPrime function is poorly done, and should either return a status, or just exit the script itself
there are better ways to create lock files and be safe about it, but this has worked for me thus far (famous last words)

start and monitoring a process inside shell script for completion

I have a simple shell script whose also is below:
#!/usr/bin/sh
echo "starting the process which is a c++ process which does some database action for around 30 minutes"
#this below process should be run in the background
<binary name> <arg1> <arg2>
exit
Now what I want is to monitor and display the status information of the process.
I don't want to go deep into its functionality. Since I know that the process will complete in 30 minutes, I want to show to the user that 3.3% is completed for every 1 min and also check whether the process is running in the background and finally if the process is completed I want to display that it is completed.
could anybody please help me?

The best thing you could do is to put some kind of instrumentation in your application,
and let it report the actual progress in terms of work items processed / total amount of work.
Failing that, you can indeed refer to the time that the thing has been running.
Here's a sample of what I've used in the past. Works in ksh93 and bash.
#! /bin/ksh
set -u
prog_under_test="sleep"
args_for_prog=30
max=30 interval=1 n=0
main() {
($prog_under_test $args_for_prog) & pid=$! t0=$SECONDS
while is_running $pid; do
sleep $interval
(( delta_t = SECONDS-t0 ))
(( percent=100*delta_t/max ))
report_progress $percent
done
echo
}
is_running() { (kill -0 ${1:?is_running: missing process ID}) 2>& -; }
function report_progress { typeset percent=$1
printf "\r%5.1f %% complete (est.) " $(( percent ))
}
main

If your process involves a pipe than http://www.ivarch.com/programs/quickref/pv.shtml would be an excellent solution or an alternative is http://clpbar.sourceforge.net/ . But these are essentially like "cat" with a progress bar and need something to pipe through them. There is a small program that you could compile and then execute as a background process then kill when things finish up, http://www.dreamincode.net/code/snippet3062.htm that would probablly work if you just want to dispaly something for 30 minutes and then print out almost done in the console if your process runs long and it exits, but you would have to modify it. Might be better just to create another shell script that displays a character every few seconds in a loop and checks if the pid of the previous process is still running, I believe you can get the parent pid by looking at the $$ variable then check if it is still running in /proc/pid .

You really should let the command output statistics, but for simplicity's sake you can do something like this to simply increment a counter while your process runs:
#!/bin/sh
cmd & # execute a command
pid=$! # Record the pid of the command
i=0
while sleep 60; do
: $(( i += 1 ))
e=$( echo $i 3.3 \* p | dc ) # compute percent completed
printf "$e percent complete\r" # report completion
done & # reporter is running in the background
pid2=$! # record reporter's pid
# Wait for the original command to finish
if wait $pid; then
echo cmd completed successfully
else
echo cmd failed
fi
kill $pid2 # Kill the status reporter

Terminate running commands when shell script is killed [duplicate]

This question already has answers here:
What's the best way to send a signal to all members of a process group?
(34 answers)
Closed 6 years ago.
For testing purposes I have this shell script
#!/bin/bash
echo $$
find / >/dev/null 2>&1
Running this from an interactive terminal, ctrl+c will terminate bash, and the find command.
$ ./test-k.sh
13227
<Ctrl+C>
$ ps -ef |grep find
$
Running it in the background, and killing the shell only will orphan the commands running in the script.
$ ./test-k.sh &
[1] 13231
13231
$ kill 13231
$ ps -ef |grep find
nos 13232 1 3 17:09 pts/5 00:00:00 find /
$
I want this shell script to terminate all its child processes when it exits regardless of how it's called. It'll eventually be started from a python and java application - and some form of cleanup is needed when the script exits - any options I should look into or any way to rewrite the script to clean itself up on exit?

I would do something like this:
#!/bin/bash
trap : SIGTERM SIGINT
echo $$
find / >/dev/null 2>&1 &
FIND_PID=$!
wait $FIND_PID
if [[ $? -gt 128 ]]
then
kill $FIND_PID
fi
Some explanation is in order, I guess. Out the gate, we need to change some of the default signal handling. : is a no-op command, since passing an empty string causes the shell to ignore the signal instead of doing something about it (the opposite of what we want to do).
Then, the find command is run in the background (from the script's perspective) and we call the wait builtin for it to finish. Since we gave a real command to trap above, when a signal is handled, wait will exit with a status greater than 128. If the process waited for completes, wait will return the exit status of that process.
Last, if the wait returns that error status, we want to kill the child process. Luckily we saved its PID. The advantage of this approach is that you can log some error message or otherwise identify that a signal caused the script to exit.
As others have mentioned, putting kill -- -$$ as your argument to trap is another option if you don't care about leaving any information around post-exit.
For trap to work the way you want, you do need to pair it up with wait - the bash man page says "If bash is waiting for a command to complete and receives a signal for which a trap has been set, the trap will not be executed until the command completes." wait is the way around this hiccup.
You can extend it to more child processes if you want, as well. I didn't really exhaustively test this one out, but it seems to work here.
$ ./test-k.sh &
[1] 12810
12810
$ kill 12810
$ ps -ef | grep find
$

Was looking for an elegant solution to this issue and found the following solution elsewhere.
trap 'kill -HUP 0' EXIT
My own man pages say nothing about what 0 means, but from digging around, it seems to mean the current process group. Since the script get's it's own process group, this ends up sending SIGHUP to all the script's children, foreground and background.

Send a signal to the group.
So instead of kill 13231 do:
kill -- -13231
If you're starting from python then have a look at:
http://www.pixelbeat.org/libs/subProcess.py
which shows how to mimic the shell in starting
and killing a group

#Patrick's answer almost did the trick, but it doesn't work if the parent process of your current shell is in the same group (it kills the parent too).
I found this to be better:
trap 'pkill -P $$' EXIT
See here for more info.

Just add a line like this to your script:
trap "kill $$" SIGINT
You might need to change 'SIGINT' to 'INT' on your setup, but this will basically kill your process and all child processes when you hit Ctrl-C.

The thing you would need to do is trap the kill signal, kill the find command and exit.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio