In the following perl script, i intend to timeout the execution of script child_script.pl if it goes beyond 1 hour. However, the logic doesn't seem to be working as the script is still running after the specified time limit.
Any guesses what am I doing wrong here?
I'm referring to follwing documentation for implementing the timeout in perl:
https://docstore.mik.ua/orelly/perl4/cook/ch16_22.htm
It works fine for a standalone command. Is it possible that system command has an issue ?
MY_CODE (parent_script.pl)
#!/usr/bin/perl
my $sys_cmd = " perl child_script.pl 2>&1 | tee logfile.txt \n";
print "INFO: Enter alarm..\n";
eval {
local $SIG{ALRM} = sub { die "alarm clock restart" };
alarm 3600; # schedule alarm in 1 hours
eval {
print "INFO: Run script.. \n";
system ($sys_cmd);
};
alarm 0; # cancel the alarm
};
alarm 0; # race condition protection
die if $# && $# !~ /alarm clock restart/; # reraise
print "INFO: Exit alarm..\n";
You don't time out the script which you have started; you set the alarm for the current (parent) script. You would have to kill the child process (the one you have started using system) from your alarm-sub.
EDIT:
If you have /usr/bin/timeout (as it would be the case if you are running on Linux), it would perhaps be more convenient to use this command for handling the timeout, instead of re-implementing the logic in Perl.
The SIGALRM signal may or may not be sent while your system call is running, so alarm may be flaky. This is a good use case for the poor man's alarm.
my $sys_cmd = "perl child_script.pl 2>&1 | tee logfile.txt";
# step 1. Start your long running command and capture it's process id
my $pid = fork();
if ($pid == 0) {
exec($sys_cmd);
}
# step 2. Start another subprocess for the poor man's alarm.
my $time = 3600;
if (fork() == 0) {
exec("$^X","-e","sleep 1,kill(0,$pid)||exit for 1..$time;kill -9,$pid");
}
# step 3. wait for first process to finish or be killed
my $c = waitpid $pid, 0;
if ($c & 128 == 9) {
print "Process timed out and was killed by the poor man's alarm\n";
} else {
print "Process finished without timing out.\n";
}
The poor man's alarm runs in a separate process with two parameters: a $pid to monitor and a $time to wait. It periodically checks to see if the process being monitored is still alive. If the process is no longer alive, then the poor man's alarm also exits without doing anything. After $time seconds have passed, and the monitored process is still hanging around, the poor man's alarm sends a kill signal to the process, which should terminate it.
I'd like to automatically kill a command after a certain amount of time. I have in mind an interface like this:
% constrain 300 ./foo args
Which would run "./foo" with "args" but automatically kill it if it's still running after 5 minutes.
It might be useful to generalize the idea to other constraints, such as autokilling a process if it uses too much memory.
Are there any existing tools that do that, or has anyone written such a thing?
ADDED: Jonathan's solution is precisely what I had in mind and it works like a charm on linux, but I can't get it to work on Mac OSX. I got rid of the SIGRTMIN which lets it compile fine, but the signal just doesn't get sent to the child process. Anyone know how to make this work on Mac?
[Added: Note that an update is available from Jonathan that works on Mac and elsewhere.]
GNU Coreutils includes the timeout command, installed by default on many systems.
https://www.gnu.org/software/coreutils/manual/html_node/timeout-invocation.html
To watch free -m for one minute, then kill it by sending a TERM signal:
timeout 1m watch free -m
Maybe I'm not understanding the question, but this sounds doable directly, at least in bash:
( /path/to/slow command with options ) & sleep 5 ; kill $!
This runs the first command, inside the parenthesis, for five seconds, and then kills it. The entire operation runs synchronously, i.e. you won't be able to use your shell while it is busy waiting for the slow command. If that is not what you wanted, it should be possible to add another &.
The $! variable is a Bash builtin that contains the process ID of the most recently started subshell. It is important to not have the & inside the parenthesis, doing it that way loses the process ID.
I've arrived rather late to this party, but I don't see my favorite trick listed in the answers.
Under *NIX, an alarm(2) is inherited across an execve(2) and SIGALRM is fatal by default. So, you can often simply:
$ doalarm () { perl -e 'alarm shift; exec #ARGV' "$#"; } # define a helper function
$ doalarm 300 ./foo.sh args
or install a trivial C wrapper to do that for you.
Advantages Only one PID is involved, and the mechanism is simple. You won't kill the wrong process if, for example, ./foo.sh exited "too quickly" and its PID was re-used. You don't need several shell subprocesses working in concert, which can be done correctly but is rather race-prone.
Disadvantages The time-constrained process cannot manipulate its alarm clock (e.g., alarm(2), ualarm(2), setitimer(2)), since this would likely clear the inherited alarm. Obviously, neither can it block or ignore SIGALRM, though the same can be said of SIGINT, SIGTERM, etc. for some other approaches.
Some (very old, I think) systems implement sleep(2) in terms of alarm(2), and, even today, some programmers use alarm(2) as a crude internal timeout mechanism for I/O and other operations. In my experience, however, this technique is applicable to the vast majority of processes you want to time limit.
There is also ulimit, which can be used to limit the execution time available to sub-processes.
ulimit -t 10
Limits the process to 10 seconds of CPU time.
To actually use it to limit a new process, rather than the current process, you may wish to use a wrapper script:
#! /usr/bin/env python
import os
os.system("ulimit -t 10; other-command-here")
other-command can be any tool. I was running a Java, Python, C and Scheme versions of different sorting algorithms, and logging how long they took, whilst limiting execution time to 30 seconds. A Cocoa-Python application generated the various command lines - including the arguments - and collated the times into a CSV file, but it was really just fluff on top of the command provided above.
I have a program called timeout that does that - written in C, originally in 1989 but updated periodically since then.
Update: this code fails to compile on MacOS X because SIGRTMIN is not defined, and fails to timeout when run on MacOS X because the `signal()` function there resumes the `wait()` after the alarm times out - which is not the required behaviour. I have a new version of `timeout.c` which deals with both these problems (using `sigaction()` instead of `signal()`). As before, contact me for a 10K gzipped tar file with the source code and a manual page (see my profile).
/*
#(#)File: $RCSfile: timeout.c,v $
#(#)Version: $Revision: 4.6 $
#(#)Last changed: $Date: 2007/03/01 22:23:02 $
#(#)Purpose: Run command with timeout monitor
#(#)Author: J Leffler
#(#)Copyright: (C) JLSS 1989,1997,2003,2005-07
*/
#define _POSIX_SOURCE /* Enable kill() in <unistd.h> on Solaris 7 */
#define _XOPEN_SOURCE 500
#include <stdio.h>
#include <stdlib.h>
#include <signal.h>
#include <errno.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/wait.h>
#include "stderr.h"
#define CHILD 0
#define FORKFAIL -1
static const char usestr[] = "[-vV] -t time [-s signal] cmd [arg ...]";
#ifndef lint
/* Prevent over-aggressive optimizers from eliminating ID string */
const char jlss_id_timeout_c[] = "#(#)$Id: timeout.c,v 4.6 2007/03/01 22:23:02 jleffler Exp $";
#endif /* lint */
static void catcher(int signum)
{
return;
}
int main(int argc, char **argv)
{
pid_t pid;
int tm_out;
int kill_signal;
pid_t corpse;
int status;
int opt;
int vflag = 0;
err_setarg0(argv[0]);
opterr = 0;
tm_out = 0;
kill_signal = SIGTERM;
while ((opt = getopt(argc, argv, "vVt:s:")) != -1)
{
switch(opt)
{
case 'V':
err_version("TIMEOUT", &"#(#)$Revision: 4.6 $ ($Date: 2007/03/01 22:23:02 $)"[4]);
break;
case 's':
kill_signal = atoi(optarg);
if (kill_signal <= 0 || kill_signal >= SIGRTMIN)
err_error("signal number must be between 1 and %d\n", SIGRTMIN - 1);
break;
case 't':
tm_out = atoi(optarg);
if (tm_out <= 0)
err_error("time must be greater than zero (%s)\n", optarg);
break;
case 'v':
vflag = 1;
break;
default:
err_usage(usestr);
break;
}
}
if (optind >= argc || tm_out == 0)
err_usage(usestr);
if ((pid = fork()) == FORKFAIL)
err_syserr("failed to fork\n");
else if (pid == CHILD)
{
execvp(argv[optind], &argv[optind]);
err_syserr("failed to exec command %s\n", argv[optind]);
}
/* Must be parent -- wait for child to die */
if (vflag)
err_remark("time %d, signal %d, child PID %u\n", tm_out, kill_signal, (unsigned)pid);
signal(SIGALRM, catcher);
alarm((unsigned int)tm_out);
while ((corpse = wait(&status)) != pid && errno != ECHILD)
{
if (errno == EINTR)
{
/* Timed out -- kill child */
if (vflag)
err_remark("timed out - send signal %d to process %d\n", (int)kill_signal, (int)pid);
if (kill(pid, kill_signal) != 0)
err_syserr("sending signal %d to PID %d - ", kill_signal, pid);
corpse = wait(&status);
break;
}
}
alarm(0);
if (vflag)
{
if (corpse == (pid_t) -1)
err_syserr("no valid PID from waiting - ");
else
err_remark("child PID %u status 0x%04X\n", (unsigned)corpse, (unsigned)status);
}
if (corpse != pid)
status = 2; /* I don't know what happened! */
else if (WIFEXITED(status))
status = WEXITSTATUS(status);
else if (WIFSIGNALED(status))
status = WTERMSIG(status);
else
status = 2; /* I don't know what happened! */
return(status);
}
If you want the 'official' code for 'stderr.h' and 'stderr.c', contact me (see my profile).
Perl one liner, just for kicks:
perl -e '$s = shift; $SIG{ALRM} = sub { print STDERR "Timeout!\n"; kill INT => $p }; exec(#ARGV) unless $p = fork; alarm $s; waitpid $p, 0' 10 yes foo
This prints 'foo' for ten seconds, then times out. Replace '10' with any number of seconds, and 'yes foo' with any command.
The timeout command from Ubuntu/Debian when compiled from source to work on the Mac. Darwin
10.4.*
http://packages.ubuntu.com/lucid/timeout
My variation on the perl one-liner gives you the exit status without mucking with fork() and wait() and without the risk of killing the wrong process:
#!/bin/sh
# Usage: timelimit.sh secs cmd [ arg ... ]
exec perl -MPOSIX -e '$SIG{ALRM} = sub { print "timeout: #ARGV\n"; kill(SIGTERM, -$$); }; alarm shift; $exit = system #ARGV; exit(WIFEXITED($exit) ? WEXITSTATUS($exit) : WTERMSIG($exit));' "$#"
Basically the fork() and wait() are hidden inside system(). The SIGALRM is delivered to the parent process which then kills itself and its child by sending SIGTERM to the whole process group (-$$). In the unlikely event that the child exits and the child's pid gets reused before the kill() occurs, this will NOT kill the wrong process because the new process with the old child's pid will not be in the same process group of the parent perl process.
As an added benefit, the script also exits with what is probably the correct exit status.
#!/bin/sh
( some_slow_task ) & pid=$!
( sleep $TIMEOUT && kill -HUP $pid ) 2>/dev/null & watcher=$!
wait $pid 2>/dev/null && pkill -HUP -P $watcher
The watcher kills the slow task after given timeout; the script waits for the slow task and terminates the watcher.
Examples:
The slow task run more than 2 sec and was terminated
Slow task interrupted
( sleep 20 ) & pid=$!
( sleep 2 && kill -HUP $pid ) 2>/dev/null & watcher=$!
if wait $pid 2>/dev/null; then
echo "Slow task finished"
pkill -HUP -P $watcher
wait $watcher
else
echo "Slow task interrupted"
fi
This slow task finished before the given timeout
Slow task finished
( sleep 2 ) & pid=$!
( sleep 20 && kill -HUP $pid ) 2>/dev/null & watcher=$!
if wait $pid 2>/dev/null; then
echo "Slow task finished"
pkill -HUP -P $watcher
wait $watcher
else
echo "Slow task interrupted"
fi
Try something like:
# This function is called with a timeout (in seconds) and a pid.
# After the timeout expires, if the process still exists, it attempts
# to kill it.
function timeout() {
sleep $1
# kill -0 tests whether the process exists
if kill -0 $2 > /dev/null 2>&1 ; then
echo "killing process $2"
kill $2 > /dev/null 2>&1
else
echo "process $2 already completed"
fi
}
<your command> &
cpid=$!
timeout 3 $cpid
wait $cpid > /dev/null 2>&
exit $?
It has the downside that if your process' pid is reused within the timeout, it may kill the wrong process. This is highly unlikely, but you may be starting 20000+ processes per second. This could be fixed.
How about using the expect tool?
## run a command, aborting if timeout exceeded, e.g. timed-run 20 CMD ARGS ...
timed-run() {
# timeout in seconds
local tmout="$1"
shift
env CMD_TIMEOUT="$tmout" expect -f - "$#" <<"EOF"
# expect script follows
eval spawn -noecho $argv
set timeout $env(CMD_TIMEOUT)
expect {
timeout {
send_error "error: operation timed out\n"
exit 1
}
eof
}
EOF
}
pure bash:
#!/bin/bash
if [[ $# < 2 ]]; then
echo "Usage: $0 timeout cmd [options]"
exit 1
fi
TIMEOUT="$1"
shift
BOSSPID=$$
(
sleep $TIMEOUT
kill -9 -$BOSSPID
)&
TIMERPID=$!
trap "kill -9 $TIMERPID" EXIT
eval "$#"
I use "timelimit", which is a package available in the debian repository.
http://devel.ringlet.net/sysutils/timelimit/
A slight modification of the perl one-liner will get the exit status right.
perl -e '$s = shift; $SIG{ALRM} = sub { print STDERR "Timeout!\n"; kill INT => $p; exit 77 }; exec(#ARGV) unless $p = fork; alarm $s; waitpid $p, 0; exit ($? >> 8)' 10 yes foo
Basically, exit ($? >> 8) will forward the exit status of the subprocess. I just chose 77 at the exit status for timeout.
Isn't there a way to set a specific time with "at" to do this?
$ at 05:00 PM kill -9 $pid
Seems a lot simpler.
If you don't know what the pid number is going to be, I assume there's a way to script reading it with ps aux and grep, but not sure how to implement that.
$ | grep someprogram
tony 11585 0.0 0.0 3116 720 pts/1 S+ 11:39 0:00 grep someprogram
tony 22532 0.0 0.9 27344 14136 ? S Aug25 1:23 someprogram
Your script would have to read the pid and assign it a variable.
I'm not overly skilled, but assume this is doable.
I am experiencing some strange return values from system() when a child process receives a SIGINT from the terminal. To explain, from a Perl script parent.pl I used system() to run another Perl script as a child process, but I also needed to run the child through the shell, so I used the system 'sh', '-c', ... form.. So the parent of the child became the sh process and the parent of the sh process became parent.pl. Also, to avoid having the sh process receiving the SIGINT signal, I trapped it.
For example, parent.pl:
use feature qw(say);
use strict;
use warnings;
for (1..3) {
my $res = system 'sh', '-c', "trap '' INT; child$_.pl";
say "Parent received return value: " . ($res >> 8);
}
where child1.pl:
local $SIG{INT} = "DEFAULT";
sleep 10;
say "Child timed out..";
exit 1;
child2.pl:
local $SIG{INT} = sub { die };
sleep 10;
say "Child timed out..";
exit 1;
and child3.pl is:
eval {
local $SIG{INT} = sub { die };
sleep 10;
};
if ( $# ) {
print $#;
exit 2;
}
say "Child timed out..";
exit 0;
If I run parent.pl (from the command line) and press CTRL-C to abort each child process, the output is:
^CParent received return value: 130
^CDied at ./child2.pl line 7.
Parent received return value: 4
^CDied at ./child3.pl line 8.
Parent received return value: 2
Now, I would like to know why I get a return value of 130 for case 1, and a return value of 4 for case 2.
Also, it would be nice to know exactly what the "DEFAULT" signal handler does in this case.
Note: the same values are returned if I replace sh with bash ( and trap SIGINT instead of INT in bash ).
See also:
Propagation of signal to parent when using system
perlipc
Chapter 15, in Programming Perl, 4th Edition
This question is very similar to Propagation of signal to parent when using system that you asked earlier.
From my bash docs:
When a command terminates on a fatal signal N, bash uses the value of 128+N as the exit status.
SIGINT is typically 2, so 128 + 2 give you 130.
Perl's die figures out its exit code by inspecting $! or $? for an uncaught exception (so, not the case where you use eval):
exit $! if $!; # errno
exit $? >> 8 if $? >> 8; # child exit status
exit 255; # last resort
Notice that in this case, Perl exits with the value as is, not shifted up 8 bits.
The errno value happens to be 4 (see errno.h). The $! variable is a dualvar with different string and numeric values. Use it numerically (like adding zero) to get the number side:
use v5.10;
local $SIG{INT}=sub{
say "numeric errno is ", $!+0;
die
};
sleep 10;
print q(timed out);
exit 1;
This prints:
$ bash -c "perl errno.pl"
^Cnumeric errno is 4
Died at errno.pl line 6.
$ echo $?
4
Taking your questions out of order:
Also, it would be nice to know exactly what the "DEFAULT" signal handler does in this case.
Setting the handler for a given signal to "DEFAULT" affirms or restores the default signal handler for the given signal, whose action depends on the signal. Details are available from the signal(7) manual page. The default handler for SIGINT terminates the process.
Now, I would like to know why I get a return value of 130 for case 1, and a return value of 4 for case 2.
Your child1 explicitly sets the default handler for SIGINT, so that signal causes it to terminate abnormally. Such a process has no exit code in the conventional sense. The shell also receives the SIGINT, but it traps and ignores it. The exit status it reports for the child process (and therefore for itself) reflects the signal (number 2) that killed the child.
Your other two child processes, on the other hand, catch SIGINT and terminate normally in response. These do produce exit codes, which the shell passes on to you (after trapping and ignoring the SIGINT). The documentation for die() describes how the exit code is determined in this case, but the bottom line is that if you want to exit with a specific code then you should use exit instead of die.
I ran the external python script by system(run_command)
But I want to get the pid of the running python script,
So I tried to use fork and get the pid,
But the returned pid was the pid of the fork's block, not the python process.
How could I get the pid of the python process, thanks.
arguments=[
"-f #{File.join(#public_path, #streaming_verification.excel.to_s)}",
"-duration 30",
"-output TEST_#{#streaming_verification.id}"
]
cmd = [ "python",
#automation[:bin],
arguments.join(' ')
]
run_command = cmd.join(' ').strip()
task_pid = fork do
system(run_command)
end
(Update)
I tried to use the spawn method.
The retuned pid was still not the pid of the running python process.
I got the pid 5177 , but the actually pid,I wanted, is 5179
run.sh
./main.py -f ../tests/test_setting.xls -o testing_`date +%s` -duration 5
sample.rb
cmd = './run.sh'
pid = Process.spawn(cmd)
print pid
Process.wait(pid)
According to Kernel#system:
Executes command… in a subshell. command… is one of following forms.
You will get pid of subshell, not the command.
How about using Process#Spwan? It returns the pid of the subprocess.
run_command = '...'
pid = Process.spawn(cmd)
Process.wait(pid)
I am experimenting with multiple processes. I am trapping SIGCLD to execute something when the child is done. It is working on IRB but not when I execute as a ruby script.
pid = fork {sleep 2; puts 'hello'}
trap('CLD') { puts "pid: #{pid} exited with code"}
When I run the above from IRB, I both lines are printed but when I run it as a ruby script, the line within the trap procedure does not show up.
IRB gives you an outer loop, which means that the ruby process doesn't exit until you decide to kill it. The problem with your ruby script is that the main process is finishing and killing your child (yikes) before it has the chance to trap the signal.
My guess is that this is a test script, and the chances are that your desired program won't have the case where the parent finishes before the child. To see your trap working in a plain ruby script, add a sleep at the end:
pid = fork {sleep 2; puts 'hello'}
trap('CLD') { puts "pid: #{pid} exited with code"}
sleep 3
To populate the $? global variable, you should explicitly wait for the child process to exit:
pid = fork {sleep 2; puts 'hello'}
trap('CLD') { puts "pid: #{pid} exited with code #{$? >> 8}" }
Process.wait
If you do want the child to run after the parent process has died, you want a daemon (double fork).
When you run your code in IRB, the main thread belongs to IRB so that all the stuff you’ve called is living within virtually infinite time loop.
In a case of script execution, the main thread is your own and it dies before trapping. Try this:
pid = fork {sleep 2; puts 'hello'}
trap('CLD') { puts "pid: #{pid} exited with code"}
sleep 5 # this is needed to prevent main thread to die ASAP
Hope it helps.