wait not working in shell script? - bash

I am running a for loop in which a command is run in background using &. In the end i want all commands to return value..
Here is the code i tried
for((i=0 ;i<3;i++)) {
// curl command which returns a value &
}
wait
// next piece of code
I want to get all three returned value and then proceed.. But the wait command does not wait for background processes to complete and runs the next part of code. I need the returned values to proceed..

Shell builtins have documentation accessible with help BUILTIN_NAME.
help wait yields:
wait: wait [-n] [id ...]
Wait for job completion and return exit status.
Waits for each process identified by an ID, which may be a process ID or a
job specification, and reports its termination status. If ID is not
given, waits for all currently active child processes, and the return
status is zero. If ID is a a job specification, waits for all processes
in that job's pipeline.
If the -n option is supplied, waits for the next job to terminate and
returns its exit status.
Exit Status:
Returns the status of the last ID; fails if ID is invalid or an invalid
option is given.
which implies that to get the return statuses, you need to save the pid and then wait on each pid, using wait $THE_PID.
Example:
sl() { sleep $1; echo $1; return $(($1+42)); }
pids=(); for((i=0;i<3;i++)); do sl $i & pids+=($!); done;
for pid in ${pids[#]}; do wait $pid; echo ret=$?; done
Example output:
0
ret=42
1
ret=43
2
ret=44
Edit:
With curl, don't forget to pass -f (--fail) to make sure the process will fail if the HTTP request did:
CURL Example:
#!/bin/bash
URIs=(
https://pastebin.com/raw/w36QWU3D
https://pastebin.com/raw/NONEXISTENT
https://pastebin.com/raw/M9znaBB2
)
pids=(); for((i=0;i<3;i++)); do
curl -fL "${URIs[$i]}" &>/dev/null &
pids+=($!)
done
for pid in "${pids[#]}"; do
wait $pid
echo ret=$?
done
CURL Example output:
ret=0
ret=22
ret=0

GNU Parallel is a great way to do high-latency things like curl in parallel.
parallel curl --head {} ::: www.google.com www.hp.com www.ibm.com
Or, filtering results:
parallel curl --head -s {} ::: www.google.com www.hp.com www.ibm.com | grep '^HTTP'
HTTP/1.1 302 Found
HTTP/1.1 301 Moved Permanently
HTTP/1.1 301 Moved Permanently
Here is another example:
parallel -k 'echo -n Starting {} ...; sleep 5; echo done.' ::: 1 2 3 4
Starting 1 ...done.
Starting 2 ...done.
Starting 3 ...done.
Starting 4 ...done.

Related

Parallel subshells doing work and report status

I am trying to do work in all subfolders in parallel and describe a status per folder once it is done in bash.
suppose I have a work function which can return a couple of statuses
#param #1 is the folder
# can return 1 on fail, 2 on sucess, 3 on nothing happend
work(){
cd $1
// some update thing
return 1, 2, 3
}
now I call this in my wrapper function
do_work(){
while read -r folder; do
tput cup "${row}" 20
echo -n "${folder}"
(
ret=$(work "${folder}")
tput cup "${row}" 0
[[ $ret -eq 1 ]] && echo " \e[0;31mupdate failed \uf00d\e[0m"
[[ $ret -eq 2 ]] && echo " \e[0;32mupdated \uf00c\e[0m"
[[ $ret -eq 3 ]] && echo " \e[0;32malready up to date \uf00c\e[0m"
) &>/dev/null
pids+=("${!}")
((++row))
done < <(find . -maxdepth 1 -mindepth 1 -type d -printf "%f\n" | sort)
echo "waiting for pids ${pids[*]}"
wait "${pids[#]}"
}
and what I want is, that it prints out all the folders per line, and updates them independently from each other in parallel and when they are done, I want that status to be written in that line.
However, I am unsure subshell is writing, which ones I need to capture how and so on.
My attempt above is currently not writing correctly, and not in parallel.
If I get it to work in parallel, I get those [1] <PID> things and [1] + 3156389 done ... messing up my screen.
If I put the work itself in a subshell, I don't have anything to wait for.
If I then collect the pids I dont get the response code to print out the text to show the status.
I did have a look at GNU Parallel but I think I cannot have that behaviour. (I think I could hack it that the finished jobs are printed, but I want all 'running' jobs are printed, and the finished ones get amended).
Assumptions/undestandings:
a separate child process is spawned for each folder to be processed
the child process generates messages as work progresses
messages from child processes are to be displayed in the console in real time, with each child's latest message being displayed on a different line
The general idea is to setup a means of interprocess communications (IC) ... named pipe, normal file, queuing/messaging system, sockets (plenty of ideas available via a web search on bash interprocess communications); the children write to this system while the parent reads from the system and issues the appropriate tput commands.
One very simple example using a normal file:
> status.msgs # initialize our IC file
child_func () {
# Usage: child_func <unique_id> <other> ... <args>
local i
for ((i=1;i<=10;i++))
do
sleep $1
# each message should include the child's <unique_id> ($1 in this case);
# parent/monitoring process uses this <unique_id> to control tput output
echo "$1:message - $1.$i" >> status.msgs
done
}
clear
( child_func 3 & )
( child_func 5 & )
( child_func 2 & )
while IFS=: read -r child msg
do
tput cup $child 10
echo "$msg"
done < <(tail -f status.msgs)
NOTES:
the (child_func 3 &) construct is one way to eliminate the OS message re: 'background process completed' from showing up in stdout (there may be other ways but I'm drawing a blank at the moment)
when using a file (normal, pipe) OP will want to look at a locking method (flock?) to insure messages from multiple children don't stomp each other
OP can get creative with the format of the messages printed to status.msgs in conjunction with parsing logic in the parent's while loop
assuming variable width messages OP may want to look at appending a tput el on the end of each printed message in order to 'erase' any characters leftover from a previous/longer message
exiting the loop could be as simple as keeping count of the number of child processes that send a message <id>:done, or keeping track of the number of children still running in the background, or ...
Running this at my command line generates 3 separate lines of output that are updated at various times (based on the sleep $1):
# no ouput to line #1
message - 2.10 # messages change from 2.1 to 2.2 to ... to 2.10
message - 3.10 # messages change from 3.1 to 3.2 to ... to 3.10
# no ouput to line #4
message - 5.10 # messages change from 5.1 to 5.2 to ... to 5.10
NOTE: comments not actually displayed in console
Based on #markp-fuso's answer:
printer() {
while IFS=$'\t' read -r child msg
do
tput cup $child 10
echo "$child $msg"
done
}
clear
parallel --lb --tagstring "{%}\t{}" work ::: folder1 folder2 folder3 | printer
echo
You can't control exit statuses like that. Try this instead, rework your work function to echo status:
work(){
cd $1
# some update thing &> /dev/null without output
echo "${1}_$status" #status=1, 2, 3
}
And than set data collection from all folders like so:
data=$(
while read -r folder; do
work "$folder" &
done < <(find . -maxdepth 1 -mindepth 1 -type d -printf "%f\n" | sort)
wait
)
echo "$data"

Stop a bash script when awm command returns with failure

I have the following command
ads2 cls create
This command might return two outputs, a reasonable one that looks like:
kernel with pid 7148 (port 9011) killed
kernel with pid 9360 (port 9011) killed
probing service daemon # http://fdt-c-vm-0093.fdtech.intern:9010
starting kernel FDT-C-VM-0093 # http://fdt-c-yy-0093.ssbt.intern:9011 name=FDT-C-VM-0093 max_consec_timeouts=10 clustermode=Standard hostname=FDT-C-VM-0093 framerate=20000 schedmode=Standard rtaddr=fdt-c-vm-0093.fdtech.ssbt tickrole=Local tickmaster=local max_total_timeouts=1000
kernel FDT-C-VM-0093 running
probing service daemon # http://172.16.xx.xx:9010
starting kernel FDT-C-AGX-0004 # http://172.16.xx.xx:9011 name=FDT-C-AGX-0004 max_consec_timeouts=10 clustermode=Standard hostname=FDT-C-AGX-0004 framerate=20000 schedmode=Standard rtaddr=172.16.xx.xx tickrole=Local tickmaster=local max_total_timeouts=1000
kernel Fxx-x-xxx-xxx4 running
>>> start cluster establish ...
>>> cluster established ...
nodes {
node {
name = "FDT-C-VM-xxxx";
address = "http://fxx-x-xx-0093.xxx.intern:xxxx/";
state = "3";
}
node {
name = "xxx-x-xxx-xxx";
address = "http://1xx.16.xx.xx:9011/";
state = "3";
}
}
and an unreasonable one that would be:
kernel with pid 8588 (port 9011) killed
failed to probe service daemon # http://xxx-c-agx-0002.xxxx.intern:90xx
In both ways, I'm passing this output to awk in order to check the state of the nodes in case a reasonable output is returned, otherwise it should exits the whole script (line 28).
ads2 cls create | awk -F [\"] ' BEGIN{code=1} # Set the field delimiter to a double quote
/^>>> cluster established .../ {
strt=1 # If the line starts with ">>> cluster established ...", set a variable strt to 1
}
strt!=1 {
next # If strt is not equal to 1, skip to the next line
}
$1 ~ "name" {
cnt++; # If the first field contains name, increment a cnt variable
nam[cnt]=$2 # Use the cnt variable as the index of an array called nam with the second field the value
}
$1 ~ "state" {
stat[cnt]=$2; # When the first field contains "state", set up another array called stat
print "Node "nam[cnt]" has state "$2 # Print the node name as well as the state
}
END {
if (stat[1]=="3" && stat[2]=="3") {
print "\033[32m" "Success" "\033[37m" # At the end of processing, the array is used to determine whether there is a success of failure.
}
28 else {
29 print "\033[31m" "Failed. Check Nodes in devices.dev file" "\033[37m"
30 exit code
}
}'
some other commands...
Note that this code block is a part of a bash script.
All I'm trying to do is just to stop the whole script (rest following commands) from continuing to execute when it goes inside line 29 in which the exit 1 code should actually do the job. However its not working. In other words. It prints actually the statement Failed. Check Nodes in devices.dev file. However, it continues executing the next commands while i expect the script to stop as the exit command in line 30 should have also been executed.
I suspect your subject Stop a bash script from inside an awk command is what's getting you downvotes as trying to control what the shell that called awk does from inside the awk script is something you can't and shouldn't try to do as that would be a bad case of Inversion Of Control like calling a function in C to do something and that function deciding to exit the whole program instead of just returning a failure status so the calling code can decide what to do upon that failure (e.g. perform recovery actions and then call that function again).
You seem to be confusing exiting your awk script with exiting your shell script. If you want to exit your shell script when the awk script exits with a failure status then you need to write the shell code to tell the shell to do so, e.g.:
whatever | awk 'script' || exit 1
or to get fancy about it:
whatever | awk 'script' || { ret="$?"; printf 'awk exited with status %d\n' "$ret" >&2; exit "$ret"; }
For example:
$ cat tst.sh
#!/usr/bin/env bash
date | awk '{exit 1}' || { ret="$?"; printf 'awk exited with status %d\n' "$ret" >&2; exit 1; }
echo "we should not get here"
$ ./tst.sh
awk exited with status 1

Bash - Log all commands and exit codes in a script

I have a long (~2,000 lines) script that I'm trying to log for future debugging. Right now I have:
function log_with_time()
{
while read a; do
echo `date +'%H:%M:%S.%4N '` " $a" >> $LOGFILE
done
}
exec 7> >(log_with_time)
BASH_XTRACEFD=7
PS4=' exit($?)ln:$LINENO: '
set -x
echo "helloWorld 1"
which gives me very nice logging for any and all commands that are run:
15:18:03.6359 exit(0)ln:28: echo 'helloWorld 1'
The issue that I'm running into is that xtrace seems to be asynchronous. With longer scripts, the log times fall behind the actual time the commands are called, and the exit code doesn't match the logged command.
There has to be a better way to do this but I'd be happy if I could just synchronize xtrace.
...
tldr: How can I generally log the time, command and exit code for all commands in a script?
...
(First time posting, feedback appreciated)
UPDATE:
exec {BASH_XTRACEFD}>>$LOGFILE
PS4=' time:$(date +%H:%M:%S.%4N) ln:$LINENO: '
set -x
fail()
{
echo "fail" >> $LOGFILE
return 1
}
trap 'echo exit:$? >> $LOGFILE' DEBUG
fail
solves all of my synchronization issues. exit codes and timestamps are working beautifully. My only issue now is one of formatting: the trap itself is getting reported by xtrace.
time:18:30:07.6080 ln:27: fail
time:18:30:07.6089 ln:12: echo fail
fail
time:18:30:07.6126 ln:13: return 1
time:18:30:07.6134 ln:28: echo exit:1
exit:1
I've tried setting +x in the trap but then set +x gets logged. If I could find a way to omit one line from xtrace, this log would be perfect.
The async behavior is coming from the process substitution -- anything in >(...) is running in its own subshell on the other end of a FIFO. Since it's a separate process, it's inherently unsynchronized.
You don't need log_with_time here at all, though, and so you don't need BASH_XTRACEFD redirecting to a process substitution in the first place. Consider:
# aside: $(date ...) has a *huge* amount of performance overhead here. Personally, I'd
# advise against using it, unless you really need all that precision; $SECONDS will
# be orders-of-magnitude cheaper.
PS4=' prior-exit:$? time:$(date +%H:%M:%S.%4N) ln:$LINENO: '
...thereafter:
$ true
prior-exit:0 time:16:01:17.2509 ln:28: true
$ false
prior-exit:0 time:16:01:18.4242 ln:29: false
$ false
prior-exit:1 time:16:01:19.2963 ln:30: false
$ true
prior-exit:1 time:16:01:20.2159 ln:31: true
$ true
prior-exit:0 time:16:01:20.8650 ln:32: true
Per conversation with Charles Duffy in the comments to whom all credit is given:
Process substitution >(...) is asynchronous, allowing the log writing to fall behind and out of sync with the xtrace.
Instead use:
exec {BASH_XTRACEFD}>>$LOGFILE
PS4=' time:$(date +%H:%M:%S.%4N) ln:$LINENO: '
for synchronously logging the time and line.
Furthermore, xtrace is triggered before running the command, making it a bad candidate for capturing exit codes. Instead use:
trap 'echo exit:$? >> $LOGFILE' DEBUG
to log the exit codes of each command since trap triggers on command completion. Note that this won't report on every step in a function call like xtrace will. (could use some help with the phrasing here)
No solution yet for omitting the trap from xtrace, but it's good enough:
LOGFILE="SomeFile.log"
exec {BASH_XTRACEFD}>>$LOGFILE
PS4=' time:$(date +%H:%M:%S.%4N) ln:$LINENO: '
set -x
fail() # test function that returns 1
{
echo "fail" >> $LOGFILE
return 1
}
success() # test function that returns 0
{
echo "success" >> $LOGFILE
return 0
}
trap 'echo $? >> $LOGFILE' DEBUG
fail
success
echo "complete"
yields:
time:14:10:22.2686 ln:21: trap 'echo $? >> $LOGFILE' DEBUG
time:14:10:22.2693 ln:23: echo 0
0
time:14:10:22.2736 ln:23: fail
time:14:10:22.2741 ln:12: echo fail
fail
time:14:10:22.2775 ln:13: return 1
time:14:10:22.2782 ln:24: echo 1
1
time:14:10:22.2830 ln:24: success
time:14:10:22.2836 ln:17: echo success
success
time:14:10:22.2873 ln:18: return 0
time:14:10:22.2881 ln:26: echo 0
0
time:14:10:22.2912 ln:26: echo complete

Running several bash commands and killing them after some time [duplicate]

I'd like to automatically kill a command after a certain amount of time. I have in mind an interface like this:
% constrain 300 ./foo args
Which would run "./foo" with "args" but automatically kill it if it's still running after 5 minutes.
It might be useful to generalize the idea to other constraints, such as autokilling a process if it uses too much memory.
Are there any existing tools that do that, or has anyone written such a thing?
ADDED: Jonathan's solution is precisely what I had in mind and it works like a charm on linux, but I can't get it to work on Mac OSX. I got rid of the SIGRTMIN which lets it compile fine, but the signal just doesn't get sent to the child process. Anyone know how to make this work on Mac?
[Added: Note that an update is available from Jonathan that works on Mac and elsewhere.]
GNU Coreutils includes the timeout command, installed by default on many systems.
https://www.gnu.org/software/coreutils/manual/html_node/timeout-invocation.html
To watch free -m for one minute, then kill it by sending a TERM signal:
timeout 1m watch free -m
Maybe I'm not understanding the question, but this sounds doable directly, at least in bash:
( /path/to/slow command with options ) & sleep 5 ; kill $!
This runs the first command, inside the parenthesis, for five seconds, and then kills it. The entire operation runs synchronously, i.e. you won't be able to use your shell while it is busy waiting for the slow command. If that is not what you wanted, it should be possible to add another &.
The $! variable is a Bash builtin that contains the process ID of the most recently started subshell. It is important to not have the & inside the parenthesis, doing it that way loses the process ID.
I've arrived rather late to this party, but I don't see my favorite trick listed in the answers.
Under *NIX, an alarm(2) is inherited across an execve(2) and SIGALRM is fatal by default. So, you can often simply:
$ doalarm () { perl -e 'alarm shift; exec #ARGV' "$#"; } # define a helper function
$ doalarm 300 ./foo.sh args
or install a trivial C wrapper to do that for you.
Advantages Only one PID is involved, and the mechanism is simple. You won't kill the wrong process if, for example, ./foo.sh exited "too quickly" and its PID was re-used. You don't need several shell subprocesses working in concert, which can be done correctly but is rather race-prone.
Disadvantages The time-constrained process cannot manipulate its alarm clock (e.g., alarm(2), ualarm(2), setitimer(2)), since this would likely clear the inherited alarm. Obviously, neither can it block or ignore SIGALRM, though the same can be said of SIGINT, SIGTERM, etc. for some other approaches.
Some (very old, I think) systems implement sleep(2) in terms of alarm(2), and, even today, some programmers use alarm(2) as a crude internal timeout mechanism for I/O and other operations. In my experience, however, this technique is applicable to the vast majority of processes you want to time limit.
There is also ulimit, which can be used to limit the execution time available to sub-processes.
ulimit -t 10
Limits the process to 10 seconds of CPU time.
To actually use it to limit a new process, rather than the current process, you may wish to use a wrapper script:
#! /usr/bin/env python
import os
os.system("ulimit -t 10; other-command-here")
other-command can be any tool. I was running a Java, Python, C and Scheme versions of different sorting algorithms, and logging how long they took, whilst limiting execution time to 30 seconds. A Cocoa-Python application generated the various command lines - including the arguments - and collated the times into a CSV file, but it was really just fluff on top of the command provided above.
I have a program called timeout that does that - written in C, originally in 1989 but updated periodically since then.
Update: this code fails to compile on MacOS X because SIGRTMIN is not defined, and fails to timeout when run on MacOS X because the `signal()` function there resumes the `wait()` after the alarm times out - which is not the required behaviour. I have a new version of `timeout.c` which deals with both these problems (using `sigaction()` instead of `signal()`). As before, contact me for a 10K gzipped tar file with the source code and a manual page (see my profile).
/*
#(#)File: $RCSfile: timeout.c,v $
#(#)Version: $Revision: 4.6 $
#(#)Last changed: $Date: 2007/03/01 22:23:02 $
#(#)Purpose: Run command with timeout monitor
#(#)Author: J Leffler
#(#)Copyright: (C) JLSS 1989,1997,2003,2005-07
*/
#define _POSIX_SOURCE /* Enable kill() in <unistd.h> on Solaris 7 */
#define _XOPEN_SOURCE 500
#include <stdio.h>
#include <stdlib.h>
#include <signal.h>
#include <errno.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/wait.h>
#include "stderr.h"
#define CHILD 0
#define FORKFAIL -1
static const char usestr[] = "[-vV] -t time [-s signal] cmd [arg ...]";
#ifndef lint
/* Prevent over-aggressive optimizers from eliminating ID string */
const char jlss_id_timeout_c[] = "#(#)$Id: timeout.c,v 4.6 2007/03/01 22:23:02 jleffler Exp $";
#endif /* lint */
static void catcher(int signum)
{
return;
}
int main(int argc, char **argv)
{
pid_t pid;
int tm_out;
int kill_signal;
pid_t corpse;
int status;
int opt;
int vflag = 0;
err_setarg0(argv[0]);
opterr = 0;
tm_out = 0;
kill_signal = SIGTERM;
while ((opt = getopt(argc, argv, "vVt:s:")) != -1)
{
switch(opt)
{
case 'V':
err_version("TIMEOUT", &"#(#)$Revision: 4.6 $ ($Date: 2007/03/01 22:23:02 $)"[4]);
break;
case 's':
kill_signal = atoi(optarg);
if (kill_signal <= 0 || kill_signal >= SIGRTMIN)
err_error("signal number must be between 1 and %d\n", SIGRTMIN - 1);
break;
case 't':
tm_out = atoi(optarg);
if (tm_out <= 0)
err_error("time must be greater than zero (%s)\n", optarg);
break;
case 'v':
vflag = 1;
break;
default:
err_usage(usestr);
break;
}
}
if (optind >= argc || tm_out == 0)
err_usage(usestr);
if ((pid = fork()) == FORKFAIL)
err_syserr("failed to fork\n");
else if (pid == CHILD)
{
execvp(argv[optind], &argv[optind]);
err_syserr("failed to exec command %s\n", argv[optind]);
}
/* Must be parent -- wait for child to die */
if (vflag)
err_remark("time %d, signal %d, child PID %u\n", tm_out, kill_signal, (unsigned)pid);
signal(SIGALRM, catcher);
alarm((unsigned int)tm_out);
while ((corpse = wait(&status)) != pid && errno != ECHILD)
{
if (errno == EINTR)
{
/* Timed out -- kill child */
if (vflag)
err_remark("timed out - send signal %d to process %d\n", (int)kill_signal, (int)pid);
if (kill(pid, kill_signal) != 0)
err_syserr("sending signal %d to PID %d - ", kill_signal, pid);
corpse = wait(&status);
break;
}
}
alarm(0);
if (vflag)
{
if (corpse == (pid_t) -1)
err_syserr("no valid PID from waiting - ");
else
err_remark("child PID %u status 0x%04X\n", (unsigned)corpse, (unsigned)status);
}
if (corpse != pid)
status = 2; /* I don't know what happened! */
else if (WIFEXITED(status))
status = WEXITSTATUS(status);
else if (WIFSIGNALED(status))
status = WTERMSIG(status);
else
status = 2; /* I don't know what happened! */
return(status);
}
If you want the 'official' code for 'stderr.h' and 'stderr.c', contact me (see my profile).
Perl one liner, just for kicks:
perl -e '$s = shift; $SIG{ALRM} = sub { print STDERR "Timeout!\n"; kill INT => $p }; exec(#ARGV) unless $p = fork; alarm $s; waitpid $p, 0' 10 yes foo
This prints 'foo' for ten seconds, then times out. Replace '10' with any number of seconds, and 'yes foo' with any command.
The timeout command from Ubuntu/Debian when compiled from source to work on the Mac. Darwin
10.4.*
http://packages.ubuntu.com/lucid/timeout
My variation on the perl one-liner gives you the exit status without mucking with fork() and wait() and without the risk of killing the wrong process:
#!/bin/sh
# Usage: timelimit.sh secs cmd [ arg ... ]
exec perl -MPOSIX -e '$SIG{ALRM} = sub { print "timeout: #ARGV\n"; kill(SIGTERM, -$$); }; alarm shift; $exit = system #ARGV; exit(WIFEXITED($exit) ? WEXITSTATUS($exit) : WTERMSIG($exit));' "$#"
Basically the fork() and wait() are hidden inside system(). The SIGALRM is delivered to the parent process which then kills itself and its child by sending SIGTERM to the whole process group (-$$). In the unlikely event that the child exits and the child's pid gets reused before the kill() occurs, this will NOT kill the wrong process because the new process with the old child's pid will not be in the same process group of the parent perl process.
As an added benefit, the script also exits with what is probably the correct exit status.
#!/bin/sh
( some_slow_task ) & pid=$!
( sleep $TIMEOUT && kill -HUP $pid ) 2>/dev/null & watcher=$!
wait $pid 2>/dev/null && pkill -HUP -P $watcher
The watcher kills the slow task after given timeout; the script waits for the slow task and terminates the watcher.
Examples:
The slow task run more than 2 sec and was terminated
Slow task interrupted
( sleep 20 ) & pid=$!
( sleep 2 && kill -HUP $pid ) 2>/dev/null & watcher=$!
if wait $pid 2>/dev/null; then
echo "Slow task finished"
pkill -HUP -P $watcher
wait $watcher
else
echo "Slow task interrupted"
fi
This slow task finished before the given timeout
Slow task finished
( sleep 2 ) & pid=$!
( sleep 20 && kill -HUP $pid ) 2>/dev/null & watcher=$!
if wait $pid 2>/dev/null; then
echo "Slow task finished"
pkill -HUP -P $watcher
wait $watcher
else
echo "Slow task interrupted"
fi
Try something like:
# This function is called with a timeout (in seconds) and a pid.
# After the timeout expires, if the process still exists, it attempts
# to kill it.
function timeout() {
sleep $1
# kill -0 tests whether the process exists
if kill -0 $2 > /dev/null 2>&1 ; then
echo "killing process $2"
kill $2 > /dev/null 2>&1
else
echo "process $2 already completed"
fi
}
<your command> &
cpid=$!
timeout 3 $cpid
wait $cpid > /dev/null 2>&
exit $?
It has the downside that if your process' pid is reused within the timeout, it may kill the wrong process. This is highly unlikely, but you may be starting 20000+ processes per second. This could be fixed.
How about using the expect tool?
## run a command, aborting if timeout exceeded, e.g. timed-run 20 CMD ARGS ...
timed-run() {
# timeout in seconds
local tmout="$1"
shift
env CMD_TIMEOUT="$tmout" expect -f - "$#" <<"EOF"
# expect script follows
eval spawn -noecho $argv
set timeout $env(CMD_TIMEOUT)
expect {
timeout {
send_error "error: operation timed out\n"
exit 1
}
eof
}
EOF
}
pure bash:
#!/bin/bash
if [[ $# < 2 ]]; then
echo "Usage: $0 timeout cmd [options]"
exit 1
fi
TIMEOUT="$1"
shift
BOSSPID=$$
(
sleep $TIMEOUT
kill -9 -$BOSSPID
)&
TIMERPID=$!
trap "kill -9 $TIMERPID" EXIT
eval "$#"
I use "timelimit", which is a package available in the debian repository.
http://devel.ringlet.net/sysutils/timelimit/
A slight modification of the perl one-liner will get the exit status right.
perl -e '$s = shift; $SIG{ALRM} = sub { print STDERR "Timeout!\n"; kill INT => $p; exit 77 }; exec(#ARGV) unless $p = fork; alarm $s; waitpid $p, 0; exit ($? >> 8)' 10 yes foo
Basically, exit ($? >> 8) will forward the exit status of the subprocess. I just chose 77 at the exit status for timeout.
Isn't there a way to set a specific time with "at" to do this?
$ at 05:00 PM kill -9 $pid
Seems a lot simpler.
If you don't know what the pid number is going to be, I assume there's a way to script reading it with ps aux and grep, but not sure how to implement that.
$ | grep someprogram
tony 11585 0.0 0.0 3116 720 pts/1 S+ 11:39 0:00 grep someprogram
tony 22532 0.0 0.9 27344 14136 ? S Aug25 1:23 someprogram
Your script would have to read the pid and assign it a variable.
I'm not overly skilled, but assume this is doable.

Bash's wait command does not return 0 if a child exits non-zero

It seems that bash's wait doesn't honor set -e as I would expect. Or it somehow loses track of the child process exiting with an error. Consider the example.
set -e # exit immediately on error
function child()
{
if [ $1 -eq 3 ]; then
echo "child $1 performing error"
# exit 1 ## I also tried this
false
else
echo "child $1 performing successful"
true
fi
echo "child $1 exiting normally"
}
# parent
child 1 & # succeeds
child 2 & # fails
child 3 & # succeeds
wait # why doesn't wait indicate an error?
echo "Launch nukes!" # don't want this to execute if a child failed
I want the set -e semantics, but wait doesn't seem to honor them.
A parent launches three children. One of them chokes and exits with an error (honoring the set -e). The problem is that the parent process plunders on as if nothing bad happened silently. I.e. I want to propagate the error.
Is there a way to enable this behavior? I.e. have wait return non-zero if any child exits non-zero.
If you are using bash 4.3, you can use wait -n in a loop to wait for each child in turn. You won't know which child failed, but whenever one does fail, the exit status of wait will be non-zero.
child 1 & # succeeds
child 2 & # fails
child 3 & # succeeds
for i in 1 2 3; do
wait -n
done
According to the Man page, that's what should happen (emphasis mine):
If n is not given, all currently active child processes are waited
for, and the return status is zero.
You want the second form where you specify wait {pid1} {pid2} {...} and get back the right error code.

Resources