Un*x shell script: what is the correct way to run a script for at most x milliseconds? - bash

I'm not a scripting expert and I was wondering what was an acceptable way to run a script for at most x milliseconds (and yet finish before x milliseconds if the script is done before the timeout).
I solved that problem using Bash in a way that I think is very hacky and I wonder if there's a better way to do it.
Basically I've got one shell script called sleep_kill.sh that takes a PID as the first argument and a timeout as its second argument and that does this:
sleep $2
kill -9 $1 2> /dev/null 1> /dev/null
So if the PID corresponds to a script that finishes before timing out, nothing is going to be killed (I take it that the OS shall not have the time to be reusing this PID for another [unrelated] process seen that it's 'cycling' through all the process IDs before starting to reuse them).
Anyway, then I call my script that may "hang" or timeout:
sleep_kill.sh $PID .3
wait $PID > /dev/null 2>&1
And I'll be waiting at most 300 ms for command_that_may_hang.sh. Yet if command_that_may_hang.sh took only 10 ms to execute, I won't be "stuck" for 300 ms.
It would be great if some shell expert could explain the drawbacks of this approach and what should be done instead.

Have a look at this script: http://www.pixelbeat.org/scripts/timeout
Note timeouts of less that one second are pretty much nonsensical on most systems due to scheduling delays etc. Note also that newer coreutils has the timeout command included and it has a resolution of 1 second.


Why does bash "forget" about my background processes?

I have this code:
for i in $(seq 1 999); do
sleep 1 &
pids+=( "$!" )
for pid in "${pids[#]}"; do
wait "$pid"
I expect the following behavior:
spin through the first loop
wait about a second on the first pid
spin through the second loop
Instead, I get this error:
./foo.sh: line 8: wait: pid 24752 is not a child of this shell
(repeated 171 times with different pids)
If I run the script with shorter loop (50 instead of 999), then I get no errors.
What's going on?
Edit: I am using GNU bash 4.4.23 on Windows.
POSIX says:
The implementation need not retain more than the {CHILD_MAX} most recent entries in its list of known process IDs in the current shell execution environment.
{CHILD_MAX} here refers to the maximum number of simultaneous processes allowed per user. You can get the value of this limit using the getconf utility:
$ getconf CHILD_MAX
Bash stores the statuses of at most twice as that many exited background processes in a circular buffer, and says not a child of this shell when you call wait on the PID of an old one that's been overwritten. You can see how it's implemented here.
The way you might reasonably expect this to work, as it would if you wrote a similar program in most other languages, is:
sleep is executed in the background via a fork+exec.
At some point, sleep exits leaving behind a zombie.
That zombie remains in place, holding its PID, until its parent calls wait to retrieve its exit code.
However, shells such as bash actually do this a little differently. They proactively reap their zombie children and store their exit codes in memory so that they can deallocate the system resources those processes were using. Then when you wait the shell just hands you whatever value is stored in memory, but the zombie could be long gone by then.
Now, because all of these exit statuses are being stored in memory, there is a practical limit to how many background processes can exit without you calling wait before you've filled up all the memory you have available for this in the shell. I expect that you're hitting this limit somewhere in the several hundreds of processes in your environment, while other users manage to make it into the several thousands in theirs. Regardless, the outcome is the same - eventually there's nowhere to store information about your children and so that information is lost.
I can reproduce on ArchLinux with docker run -ti --rm bash:5.0.18 bash -c 'pids=; for ((i=1;i<550;++i)); do true & pids+=" $!"; done; wait $pids' and any earlier. I can't reproduce with bash:5.1.0 .
What's going on?
It looks like a bug in your version of Bash. There were a couple of improvements in jobs.c and wait.def in Bash:5.1 and Make sure SIGCHLD is blocked in all cases where waitchld() is not called from a signal handler is mentioned in the changelog. From the look of it, it looks like an issue with handling a SIGCHLD signal while already handling another SIGCHLD signal.

How to create an anonymous pipe between 2 child processes and know their pids (while not using files/named pipes)?

Please note that this questions was edited after a couple of comments I received. Initially I wanted to split my goal into smaller pieces to make it simpler (and perhaps expand my knowledge on various fronts), but it seems I went too far with the simplicity :). So, here I am asking the big question.
Using bash, is there a way one can actually create an anonymous pipe between two child processes and know their pids?
The reason I'm asking is when you use the classic pipeline, e.g.
cmd1 | cmd2 &
you lose the ability to send signals to cmd1. In my case the actual commands I am running are these
./my_web_server | ./my_log_parser &
my_web_server is a basic web server that dump a lot of logging information to it's stdout
my_log_parser is a log parser that I wrote that reads through all the logging information it receives from my_web_server and it basically selects only certain values from the log (in reality it actually stores the whole log as it received it, but additionally it creates an extra csv file with the values it finds).
The issue I am having is that my_web_server actually never stops by itself (it is a web server, you don't want that from a web server :)). So after I am done, I need to stop it myself. I would like for the bash script to do this when I stop it (the bash script), either via SIGINT or SIGTERM.
For something like this, traps are the way to go. In essence I would create a trap for INT and TERM and the function it would call would kill my_web_server, but... I don't have the pid and even though I know I could look for it via ps, I am looking for a pretty solution :).
Some of you might say: "Well, why don't you just kill my_log_parser and let my_web_server die on its own with SIGPIPE?". The reason why I don't want to kill it is when you kill a process that's at the end of the pipeline, the output buffer of the process before it, is not flushed. Ergo, you lose stuff.
I've seen several solutions here and in other places that suggested to store the pid of my_web_server in a file. This is a solution that works. It is possible to write the pipeline by fiddling with the filedescriptors a bit. I, however don't like this solution, because I have to generate files. I don't like the idea of creating arbitrary files just to store a 5-character PID :).
What I ended up doing for now is this:
trap " " HUP
fifo="$( mktemp -u "$( basename "${0}" ).XXXXXX" )"
mkfifo "${fifo}"
<"${fifo}" ./my_log_parser &
>"${fifo}" ./my_web_server &
rm "${fifo}"
trap '2>/dev/null kill -TERM '"${server_pid}"'' INT TERM
while true; do
wait "${parser_pid}" && break
This solves the issue with me not being able to terminate my_web_server when the script receives SIGINT or SIGTERM. It seems more readable than any hackery fiddling with file descriptors in order to eventually use a file to store my_web_server's pid, which I think is good, because it improves the readability.
But it still uses a file (named pipe). Even though I know it uses the file (named pipe) for my_web_server and my_log_parser to talk (which is a pretty good reason) and the file gets wiped from the disk very shortly after it's created, it's still a file :).
Would any of you guys know of a way to do this task without using any files (named pipes)?
From the Bash man pages:
! Expands to the process ID of the most recently executed back-
ground (asynchronous) command.
You are not running a background command, you are running process substitution to read to file descriptor 3.
The following works, but I'm not sure if it is what you are trying to achieve:
sleep 120 &
wait "${child_pid}"
sleep 120
Comment was: I know I can pretty much do this the silly 'while read i; do blah blah; done < <( ./my_proxy_server )'-way, but I don't particularly like the fact that when a script using this approach receives INT or TERM, it simply dies without telling ./my_proxy_server to bugger off too :)
So, it seems like your problem stems from the fact that it is not so easy to get the PID of the proxy server. So, how about using your own named pipe, with the trap command:
mkfifo "$pipe"
./my_proxy_server > "$pipe" &
echo "child pid is $child_pid"
# Tell the proxy server to bugger-off
trap 'kill $child_pid' INT TERM
while read
echo $REPLY
# blah blah blah
done < "$pipe"
rm "$pipe"
You could probably also use kill %1 instead of using $child_pid.
YAE (Yet Another Edit):
You ask how to get the PIDS from:
./my_web_server | ./my_log_parser &
Simples, sort of. To test I used sleep, just like your original.
sleep 400 | sleep 500 &
jobs -l
[1]+ 8419 Running sleep 400
8420 Running | sleep 500 &
So its just a question of extracting those PIDS:
pid1=$(jobs -l|awk 'NR==1{print $2}')
pid2=$(jobs -l|awk 'NR==2{print $1}')
I hate calling awk twice here, but anything else is just jumping through hoops.

How to make bash interpreter stop until a command is finished?

I have a bash script with a loop that calls a hard calculation routine every iteration. I use the results from every calculation as input to the next. I need make bash stop the script reading until every calculation is finished.
for i in $(cat calculation-list.txt)
(other commands)
I know the sleep program, and i used to use it, but now the time of the calculations varies greatly.
Thanks for any help you can give.
The "./calculation" is another program, and a subprocess is opened. Then the script passes instantly to next step, but I get an error in the calculation because the last is not finished yet.
If your calculation daemon will work with a precreated empty logfile, then the inotify-tools package might serve:
touch $logfile
inotifywait -qqe close $logfile & ipid=$!
wait $ipid
(edit: stripped a stray semicolon)
if it closes the file just once.
If it's doing an open/write/close loop, perhaps you can mod the daemon process to wrap some other filesystem event around the execution? `
# Uglier, but handles logfile being closed multiple times before exit:
# Have the ./calculation start this shell script, perhaps by substituting
# this for the program it's starting
trap 'echo >closed-on-calculation-exit' 0 1 2 3 15
Well, guys, I've solved my problem with a different approach. When the calculation is finished a logfile is created. I wrote then a simple until loop with a sleep command. Although this is very ugly, it works for me and it's enough.
for i in $(cat calculation-list.txt)
(calculations routine)
until [[ -f $logfile ]]; do
sleep 60
(other commands)
Easy. Get the process ID (PID) via some awk magic and then use wait too wait for that PID to end. Here are the details on wait from the advanced Bash scripting guide:
Suspend script execution until all jobs running in background have
terminated, or until the job number or process ID specified as an
option terminates. Returns the exit status of waited-for command.
You may use the wait command to prevent a script from exiting before a
background job finishes executing (this would create a dreaded orphan
And using it within your code should work like this:
for i in $(cat calculation-list.txt)
./calculation >/dev/null 2>&1 & CALCULATION_PID=(`jobs -l | awk '{print $2}'`);
(other commands)

Is there a way to create a bash script that will only run for X hours?

Is there a way to create a bash script that will only run for X hours? I'm currently setting up a cron job to initiate a script every night. This script essentially runs until a certain condition is met, exporting it's status to a holding variable to keep track of 'where it is' after each iteration. The intention is to start-up the process every night, run for a few hours, and then stop, holding the status until the process starts up the next night.
Short of somehow collecting the start time, and checking it against the current time in each iteration of the loop, is there an easier way to do this? Bash scripting is not my forte (I know enough to get things done and be dangerous) and I have not done something like this before. Any help would be appreciated. Thanks.
Use GNU Coreutils
GNU coreutils contains an actual timeout binary, usually invoked like this:
# timeout after 5 seconds when sleeping for 30
/usr/bin/timeout 5s /bin/sleep 30
In your case, you'd want to specify hours instead of seconds, so to timeout in 2 hours use something like 2h instead of 5s. See timeout(1) or info coreutils 'timeout invocation' for additional options.
Hacks and Workarounds
Native timeouts or the GNU timeout command are really the best options. However, see the following for some ideas if you decide to roll your own:
How do I run a command, and have it abort (timeout) after N seconds?
The TMOUT variable using read and process or command substitution.
Do it as you described - it is the cleanest way.
But if for some strange reason want kill the process after a time, can use the next
./long_runner &
(sleep 5; kill $!; wait; exit 0) &
will kill the long_runner after 5 secs.
By using the SIGALRM facility you can rig a signal to be sent after a certain time, but traditionally, this was not easily accessible from shell scripts (people would write small custom C or Perl programs for this). These days, GNU coreutils ships with a timeout command which does this by wrapping your command:
timeout 4h yourprogram

Why can't I use job control in a bash script?

In this answer to another question, I was told that
in scripts you don't have job control
(and trying to turn it on is stupid)
This is the first time I've heard this, and I've pored over the bash.info section on Job Control (chapter 7), finding no mention of either of these assertions. [Update: The man page is a little better, mentioning 'typical' use, default settings, and terminal I/O, but no real reason why job control is particularly ill-advised for scripts.]
So why doesn't script-based job-control work, and what makes it a bad practice (aka 'stupid')?
Edit: The script in question starts a background process, starts a second background process, then attempts to put the first process back into the foreground so that it has normal terminal I/O (as if run directly), which can then be redirected from outside the script. Can't do that to a background process.
As noted by the accepted answer to the other question, there exist other scripts that solve that particular problem without attempting job control. Fine. And the lambasted script uses a hard-coded job number — Obviously bad. But I'm trying to understand whether job control is a fundamentally doomed approach. It still seems like maybe it could work...
What he meant is that job control is by default turned off in non-interactive mode (i.e. in a script.)
From the bash man page:
Job control refers to the ability to selectively stop (suspend)
the execution of processes and continue (resume) their execution at a
later point.
A user typically employs this facility via an interactive interface
supplied jointly by the system’s terminal driver and bash.
set [--abefhkmnptuvxBCHP] [-o option] [arg ...]
-m Monitor mode. Job control is enabled. This option is on by
default for interactive shells on systems that support it (see
JOB CONTROL above). Background processes run in a separate
process group and a line containing their exit status is
printed upon their completion.
When he said "is stupid" he meant that not only:
is job control meant mostly for facilitating interactive control (whereas a script can work directly with the pid's), but also
I quote his original answer, ... relies on the fact that you didn't start any other jobs previously in the script which is a bad assumption to make. Which is quite correct.
In answer to your comment: yes, nobody will stop you from using job control in your bash script -- there is no hard case for forcefully disabling set -m (i.e. yes, job control from the script will work if you want it to.) Remember that in the end, especially in scripting, there always are more than one way to skin a cat, but some ways are more portable, more reliable, make it simpler to handle error cases, parse the output, etc.
You particular circumstances may or may not warrant a way different from what lhunath (and other users) deem "best practices".
Job control with bg and fg is useful only in interactive shells. But & in conjunction with wait is useful in scripts too.
On multiprocessor systems spawning background jobs can greatly improve the script's performance, e.g. in build scripts where you want to start at least one compiler per CPU, or process images using ImageMagick tools parallely etc.
The following example runs up to 8 parallel gcc's to compile all source files in an array:
for ((i = 0, end=${#sourcefiles[#]}; i < end;)); do
for ((cpu_num = 0; cpu_num < 8; cpu_num++, i++)); do
if ((i < end)); then gcc ${sourcefiles[$i]} & fi
There is nothing "stupid" about this. But you'll require the wait command, which waits for all background jobs before the script continues. The PID of the last background job is stored in the $! variable, so you may also wait ${!}. Note also the nice command.
Sometimes such code is useful in makefiles:
for cpp_file in *.cpp; do gcc -c $$cpp_file & done; wait
This gives much finer control than make -j.
Note that & is a line terminator like ; (write command& not command&;).
Hope this helps.
Job control is useful only when you are running an interactive shell, i.e., you know that stdin and stdout are connected to a terminal device (/dev/pts/* on Linux). Then, it makes sense to have something on foreground, something else on background, etc.
Scripts, on the other hand, doesn't have such guarantee. Scripts can be made executable, and run without any terminal attached. It doesn't make sense to have foreground or background processes in this case.
You can, however, run other commands non-interactively on the background (appending "&" to the command line) and capture their PIDs with $!. Then you use kill to kill or suspend them (simulating Ctrl-C or Ctrl-Z on the terminal, it the shell was interactive). You can also use wait (instead of fg) to wait for the background process to finish.
It could be useful to turn on job control in a script to set traps on
SIGCHLD. The JOB CONTROL section in the manual says:
The shell learns immediately whenever a job changes state. Normally,
bash waits until it is about to print a prompt before reporting
changes in a job's status so as to not interrupt any other output. If
the -b option to the set builtin command is enabled, bash reports
such changes immediately. Any trap on SIGCHLD is executed for each
child that exits.
(emphasis is mine)
Take the following script, as an example:
dualbus#debian:~$ cat children.bash
set -m
count=0 limit=3
trap 'counter && { job & }' CHLD
job() {
local amount=$((RANDOM % 8))
echo "sleeping $amount seconds"
sleep "$amount"
counter() {
((count++ < limit))
counter && { job & }
dualbus#debian:~$ chmod +x children.bash
dualbus#debian:~$ ./children.bash
sleeping 6 seconds
sleeping 0 seconds
sleeping 7 seconds
Note: CHLD trapping seems to be broken as of bash 4.3
In bash 4.3, you could use 'wait -n' to achieve the same thing,
dualbus#debian:~$ cat waitn.bash
count=0 limit=3
trap 'kill "$pid"; exit' INT
job() {
local amount=$((RANDOM % 8))
echo "sleeping $amount seconds"
sleep "$amount"
for ((i=0; i<limit; i++)); do
((i>0)) && wait -n; job & pid=$!
dualbus#debian:~$ chmod +x waitn.bash
dualbus#debian:~$ ./waitn.bash
sleeping 3 seconds
sleeping 0 seconds
sleeping 5 seconds
You could argue that there are other ways to do this in a more
portable way, that is, without CHLD or wait -n:
dualbus#debian:~$ cat portable.sh
count=0 limit=3
trap 'counter && { brand; job & }; wait' USR1
unset RANDOM; rseed=123459876$$
brand() {
[ "$rseed" -eq 0 ] && rseed=123459876
h=$((rseed / 127773))
l=$((rseed % 127773))
rseed=$((16807 * l - 2836 * h))
RANDOM=$((rseed & 32767))
job() {
amount=$((RANDOM % 8))
echo "sleeping $amount seconds"
sleep "$amount"
kill -USR1 "$$"
counter() {
[ "$count" -lt "$limit" ]; ret=$?
return "$ret"
counter && { brand; job & }
dualbus#debian:~$ chmod +x portable.sh
dualbus#debian:~$ ./portable.sh
sleeping 2 seconds
sleeping 5 seconds
sleeping 6 seconds
So, in conclusion, set -m is not that useful in scripts, since
the only interesting feature it brings to scripts is being able to
work with SIGCHLD. And there are other ways to achieve the same thing
either shorter (wait -n) or more portable (sending signals yourself).
Bash does support job control, as you say. In shell script writing, there is often an assumption that you can't rely on the fact that you have bash, but that you have the vanilla Bourne shell (sh), which historically did not have job control.
I'm hard-pressed these days to imagine a system in which you are honestly restricted to the real Bourne shell. Most systems' /bin/sh will be linked to bash. Still, it's possible. One thing you can do is instead of specifying
You can do:
That, and your documentation, would make it clear your script needs bash.
Possibly o/t but I quite often use nohup when ssh into a server on a long-running job so that if I get logged out the job still completes.
I wonder if people are confusing stopping and starting from a master interactive shell and spawning background processes? The wait command allows you to spawn a lot of things and then wait for them all to complete, and like I said I use nohup all the time. It's more complex than this and very underused - sh supports this mode too. Have a look at the manual.
You've also got
kill -STOP pid
I quite often do that if I want to suspend the currently running sudo, as in:
kill -STOP $$
But woe betide you if you've jumped out to the shell from an editor - it will all just sit there.
I tend to use mnemonic -KILL etc. because there's a danger of typing
kill - 9 pid # note the space
and in the old days you could sometimes bring the machine down because it would kill init!
jobs DO work in bash scripts
BUT, you ... NEED to watch for the spawned staff
ls -1 /usr/share/doc/ | while read -r doc ; do ... done
jobs will have different context on each side of the |
bypassing this may be using for instead of while:
for `ls -1 /usr/share/doc` ; do ... done
this should demonstrate how to use jobs in a script ...
with the mention that my commented note is ... REAL (dunno why that behaviour)
for i in `seq 7` ; do ( sleep 100 ) & done
while [ `jobs | wc -l` -ne 0 ] ; do
for jobnr in `jobs | awk '{print $1}' | cut -d\[ -f2- |cut -d\] -f1` ; do
kill %$jobnr
#this is REALLY ODD ... but while won't exit without this ... dunno why
jobs >/dev/null 2>/dev/null
sleep 1
