How can a sequence of commands in a shell script be executed without interruption by any other processes?
You mean, without being preempted ? No way. The kernel scheduler is free to choose which task to executed at any time.
However, on Linux, you can set a ``real-time'' priority (i.e., SCHED_FIFO or SCHED_RR) to be sure that the script won't be interrupted to execute lower-priority tasks.
If you are asking how to make a shell process ignore interrupts, or other signals, the answer is via the trap command.
trap "" 2
or:
trap "" INT
To cancel that behaviour, don't list anything in the string:
trap 2
trap INT
If you need to remove temporary files on interrupt (and related signals), you can use something like:
tmp=$(mktemp ${TMPDIR:-/tmp}/name.XXXXXX)
trap "rm -f $tmp; exit 1" 0 1 2 3 13 15 # EXIT HUP INT QUIT PIPE TERM
...do operations using temporary files...
rm -f $tmp
trap 0 # Cancel the exit trap
The set of signals shown is a pretty comprehensive set, covering most normal events. If you get sent SIGKILL (kill -9), the temporary will be left around — there is nothing you can do. The mktemp command creates the file safely (see Why do we need mktemp?), but actually leaves a tiny window of opportunity between when the file is created and the trap is set for the script to be interrupted and leave the temporary file lying around.
Related
I have this code:
#!/bin/bash
pids=()
for i in $(seq 1 999); do
sleep 1 &
pids+=( "$!" )
done
for pid in "${pids[#]}"; do
wait "$pid"
done
I expect the following behavior:
spin through the first loop
wait about a second on the first pid
spin through the second loop
Instead, I get this error:
./foo.sh: line 8: wait: pid 24752 is not a child of this shell
(repeated 171 times with different pids)
If I run the script with shorter loop (50 instead of 999), then I get no errors.
What's going on?
Edit: I am using GNU bash 4.4.23 on Windows.
POSIX says:
The implementation need not retain more than the {CHILD_MAX} most recent entries in its list of known process IDs in the current shell execution environment.
{CHILD_MAX} here refers to the maximum number of simultaneous processes allowed per user. You can get the value of this limit using the getconf utility:
$ getconf CHILD_MAX
13195
Bash stores the statuses of at most twice as that many exited background processes in a circular buffer, and says not a child of this shell when you call wait on the PID of an old one that's been overwritten. You can see how it's implemented here.
The way you might reasonably expect this to work, as it would if you wrote a similar program in most other languages, is:
sleep is executed in the background via a fork+exec.
At some point, sleep exits leaving behind a zombie.
That zombie remains in place, holding its PID, until its parent calls wait to retrieve its exit code.
However, shells such as bash actually do this a little differently. They proactively reap their zombie children and store their exit codes in memory so that they can deallocate the system resources those processes were using. Then when you wait the shell just hands you whatever value is stored in memory, but the zombie could be long gone by then.
Now, because all of these exit statuses are being stored in memory, there is a practical limit to how many background processes can exit without you calling wait before you've filled up all the memory you have available for this in the shell. I expect that you're hitting this limit somewhere in the several hundreds of processes in your environment, while other users manage to make it into the several thousands in theirs. Regardless, the outcome is the same - eventually there's nowhere to store information about your children and so that information is lost.
I can reproduce on ArchLinux with docker run -ti --rm bash:5.0.18 bash -c 'pids=; for ((i=1;i<550;++i)); do true & pids+=" $!"; done; wait $pids' and any earlier. I can't reproduce with bash:5.1.0 .
What's going on?
It looks like a bug in your version of Bash. There were a couple of improvements in jobs.c and wait.def in Bash:5.1 and Make sure SIGCHLD is blocked in all cases where waitchld() is not called from a signal handler is mentioned in the changelog. From the look of it, it looks like an issue with handling a SIGCHLD signal while already handling another SIGCHLD signal.
Minimized test case for the problem:
I have following Makefile:
test:
bash test.sh || true
echo OK
and the test.sh contains
#!/bin/bash
while read -p "Enter some text or press Ctrl+C to exit > " input
do
echo "Your input was: $input"
done
When I run make test and press Ctrl+C to exit the bash read the make will emit
Makefile:2: recipe for target 'test' failed
make: *** [test] Interrupt
How can I tell make to ignore the exit status of the script? I already have || true after the script which usually is enough to get make to keep going but for some reason, the SIGINT interrupting the read will cause make to behave different for this case.
I'm looking for a generic answer that works for processes other than while read loop in bash, too.
This has nothing to do with the exit status of the script. When you press ^C you're sending an interrupt signal to the make program, not just to your script. That causes the make program to stop, just like ^C always does.
There's no way to have make ignore ^C operations; whenever you press ^C at the terminal, make will stop.
ctrl+c sends a signal to the program to tell it to stop. What you want is ctrl+d which sends the signal EOT (end of transmission). You will need to send ctrl+d twice unless you are at the beginning of a line.
some text<c-d><c-d>
or
some text<return>
<c-d>
I found a way to make this work. It's a bit tricky so I'll explain the solution first. The important thing to understand that Ctrl+C is handled by your terminal and not by the currently running process in the terminal as I previously thought. When the terminal catches your Ctrl+C it will check the foreground process group and then send SIGINT to all processes in that group immediately. When you run something via Makefile and press Ctrl+C the SIGINT be immediately sent to Makefile and all processes that it started because those all belong in the foreground process group. And GNU Make handles SIGINT by waiting for any currently executed child process to stop and then exit with a message
Makefile:<line number>: recipe for target '<target>' failed
make: *** [<target>] Interrupt
if the child exited with non-zero exit status. If child handled the SIGINT by itself and exited with status 0, GNU Make will exit silently. Many programs exit via status code 130 on SIGINT but this is not required. In addition, kernel and wait() C API interface can differentiate with status code 130 and status code 130 + child received SIGINT so if Make wanted to behave different for these cases, it would be possible regardless of exit code. bash doesn't support testing for child process SIGINT status but only supports exit status codes.
The solution is to setup processes so that your foreground process group does not include GNU Make while you want to handle Ctrl+C specially. However, as POSIX doesn't define a tool to create any process groups, we have to use bash specific trick: use bash job control to trigger bash to create a new process group. Be warned that this causes some side-effects (e.g. stdin and stdout behaves slightly different) but at least for my case it was good enough.
Here's how to do it:
I have following Makefile (as usual, nested lines must have TAB instead of spaces):
test:
bash -c 'set -m; bash ./test.sh'
echo OK
and the test.sh contains
#!/bin/bash
int_handler()
{
printf "\nReceived SIGINT, quitting...\n" 1>&2
exit 0
}
trap int_handler INT
while read -p "Enter some text or press Ctrl+C to exit > " input
do
echo "Your input was: $input"
done
The set -m triggers creating a new foreground process group and the int_handler takes care of returning successful exit code on exit. Of course, if you want to have some other exit code but zero on Ctrl+C, feel free to any any value suitable. If you want to have something shorter, the child script only needs trap 'exit 0' INT instead of separate function and setup for it.
For additional information:
https://unix.stackexchange.com/a/99134/20336
https://unix.stackexchange.com/a/386856/20336
https://stackoverflow.com/a/18479195/334451
https://www.cons.org/cracauer/sigint.html
Say I have a bash script like this:
#!/bin/bash
exec-program zero
exec-program one
the script issued a run command to exec-program with the arg "zero", right? say, for instance, the first line is currently running. I know that Ctrl-C will halt the process and discontinue executing the remainder of the script.
Instead, is there a keypress that will allow the current-line to finish executing and then discontinue the script execution (not execute "exec-program one") (without modifying the script directly)? In this example it would continue running "exec-program zero" but after would return to the shell rather than immediately halting "exec-program zero"
TL;DR Something runtime similar to "Ctrl-C" but more lazy/graceful ??
In the man page, under SIGNALS section it reads:
If bash is waiting for a command to complete and receives a signal for which a trap has been set, the trap will not be executed until the command completes.
This is exactly what you're asking for. You need to set an exit trap for SIGINT, then run exec-program in a subshell where SIGINT is ignored; so that it'll inherit the SIG_IGN handler and Ctrl+C won't kill it. Below is an implementation of this concept.
#!/bin/bash -
trap exit INT
foo() (
trap '' INT
exec "$#"
)
foo sleep 5
echo alive
If you hit Ctrl+C while sleep 5 is running, bash will wait for it to complete and then exit; you will not see alive on the terminal.
exec is for avoiding another fork() btw.
/bin/sh -version
GNU sh, version 1.14.7(1)
exitfn () {
# Resore signal handling for SIGINT
echo "exiting with trap" >> /tmp/logfile
rm -f /var/run/lockfile.pid # Growl at user,
exit # then exit script.
}
trap 'exitfn; exit' SIGINT SIGQUIT SIGTERM SIGKILL SIGHUP
The above is my function in shell script.
I want to call it in some special conditions...like
when:
"kill -9" fires on pid of this script
"ctrl + z" press while it is running on -x mode
server reboots while script is executing ..
In short, with any kind of interrupt in script, should do some action
eg. rm -f /var/run/lockfile.pid
but my above function is not working properly; it works only for terminal close or "ctrl + c"
Kindly don't suggest to upgrade "bash / sh" version.
SIGKILL cannot be trapped by the trap command, or by any process. It is a guarenteed kill signal, that by it's definition cannot be trapped. Thus upgrading you sh/bash will not work anyway.
You can't trap kill -9 that's the whole point of it, to destroy processes violently that don't respond to other signals (there's a workaround for this, see below).
The server reboot should first deliver a signal to your script which should be caught with what you have.
As to the CTRL-Z, that also gives you a signal, SIGSTOP from memory, so you may want to add that. Though that wouldn't normally be a reason to shut down your process since it may be then put into the background and restarted (with bg).
As to what do do for those situations where your process dies without a catchable signal (like the -9 case), the program should check for that on startup.
By that, I mean lockfile.pid should store the actual PID of the process that created it (by using echo $$ >/var/run/myprog_lockfile.pid for example) and, if you try to start your program, it should check for the existence of that process.
If the process doesn't exist, or it exists but isn't the right one (based on name usually), your new process should delete the pidfile and carry on as if it was never there. If the old process both exists and is the right one, your new process should log a message and exit.
I've seen monitoring programs either in scripts that check process status using 'ps' or 'service status(on Linux)' periodically, or in C/C++ that forks and wait on the process...
I wonder if it is possible to use bash with trap and restart the sub-process when SIGCLD received?
I have tested a basic suite on RedHat Linux with following idea (and certainly it didn't work...)
#!/bin/bash
set -o monitor # can someone explain this? discussion on Internet say this is needed
trap startProcess SIGCHLD
startProcess() {
/path/to/another/bash/script.sh & # the one to restart
while [ 1 ]
do
sleep 60
done
}
startProcess
what the bash script being started just sleep for a few seconds and exit for now.
several issues observed:
when the shell starts in foreground, SIGCHLD will be handled only once. does trap reset signal handling like signal()?
the script and its child seem to be immune to SIGINT, which means they cannot be stopped by ^C
since cannot be closed, I closed the terminal. The script seems to be HUP and many zombie children left.
when run in background, the script caused terminal to die
... anyway, this does not work at all. I have to say I know too little about this topic.
Can someone suggest or give some working examples?
Are there scripts for such use?
how about use wait in bash, then?
Thanks
I can try to answer some of your questions but not all based on what I
know.
The line set -o monitor (or equivalently, set -m) turns on job
control, which is only on by default for interactive shells. This seems
to be required for SIGCHLD to be sent. However, job control is more of
an interactive feature and not really meant to be used in shell scripts
(see also this question).
Also keep in mind this is probably not what you intended to do
because once you enable job control, SIGCHLD will be sent for every
external command that exists (e.g. every time you run ls or grep or
anything, a SIGCHLD will fire when that command completes and your trap
will run).
I suspect the reason the SIGCHLD trap only appears to run once is
because your trap handler contains a foreground infinite loop, so your
script gets stuck in the trap handler. There doesn't seem to be a point
to that loop anyways, so you could simply remove it.
The script's "immunity" to SIGINT seems to be an effect of enabling
job control (the monitor part). My hunch is with job control turned on,
the sub-instance of bash that runs your script no longer terminates
itself in response to a SIGINT but instead passes the SIGINT through to
its foreground child process. In your script, the ^C i.e. SIGINT
simply acts like a continue statement in other programming languages
case, since SIGINT will just kill the currently running sleep 60,
whereupon the while loop will immediately run a new sleep 60.
When I tried running your script and then killing it (from another
terminal), all I ended up with were two stray sleep processes.
Backgrounding that script also kills my shell for me, although
the behavior is not terribly consistent (sometimes it happens
immediately, other times not at all). It seems typing any keys other
than enter causes an EOF to get sent somehow. Even after the terminal
exits the script continues to run in the background. I have no idea
what is going on here.
Being more specific about what you want to accomplish would help. If
you just want a command to run continuously for the lifetime of your
script, you could run an infinite loop in the background, like
while true; do
some-command
echo some-command finished
echo restarting some-command ...
done &
Note the & after the done.
For other tasks, wait is probably a better idea than using job control
in a shell script. Again, it would depend on what exactly you are trying
to do.