Implementation of timeout in perl - shell

In the following perl script, i intend to timeout the execution of script child_script.pl if it goes beyond 1 hour. However, the logic doesn't seem to be working as the script is still running after the specified time limit.
Any guesses what am I doing wrong here?
I'm referring to follwing documentation for implementing the timeout in perl:
https://docstore.mik.ua/orelly/perl4/cook/ch16_22.htm
It works fine for a standalone command. Is it possible that system command has an issue ?
MY_CODE (parent_script.pl)
#!/usr/bin/perl
my $sys_cmd = " perl child_script.pl 2>&1 | tee logfile.txt \n";
print "INFO: Enter alarm..\n";
eval {
local $SIG{ALRM} = sub { die "alarm clock restart" };
alarm 3600; # schedule alarm in 1 hours
eval {
print "INFO: Run script.. \n";
system ($sys_cmd);
};
alarm 0; # cancel the alarm
};
alarm 0; # race condition protection
die if $# && $# !~ /alarm clock restart/; # reraise
print "INFO: Exit alarm..\n";

You don't time out the script which you have started; you set the alarm for the current (parent) script. You would have to kill the child process (the one you have started using system) from your alarm-sub.
EDIT:
If you have /usr/bin/timeout (as it would be the case if you are running on Linux), it would perhaps be more convenient to use this command for handling the timeout, instead of re-implementing the logic in Perl.

The SIGALRM signal may or may not be sent while your system call is running, so alarm may be flaky. This is a good use case for the poor man's alarm.
my $sys_cmd = "perl child_script.pl 2>&1 | tee logfile.txt";
# step 1. Start your long running command and capture it's process id
my $pid = fork();
if ($pid == 0) {
exec($sys_cmd);
}
# step 2. Start another subprocess for the poor man's alarm.
my $time = 3600;
if (fork() == 0) {
exec("$^X","-e","sleep 1,kill(0,$pid)||exit for 1..$time;kill -9,$pid");
}
# step 3. wait for first process to finish or be killed
my $c = waitpid $pid, 0;
if ($c & 128 == 9) {
print "Process timed out and was killed by the poor man's alarm\n";
} else {
print "Process finished without timing out.\n";
}
The poor man's alarm runs in a separate process with two parameters: a $pid to monitor and a $time to wait. It periodically checks to see if the process being monitored is still alive. If the process is no longer alive, then the poor man's alarm also exits without doing anything. After $time seconds have passed, and the monitored process is still hanging around, the poor man's alarm sends a kill signal to the process, which should terminate it.

Related

How to wait for grandchild process (`bash` retval becomes -1 in Perl due to SIG CHLD)

I have a Perl script (snippet below) that runs in cron to perform system checks. I fork a child as a timeout and reap it with SIG{CHLD}. Perl does several system calls of Bash scripts and checks their exit status. One bash script fails about 5% of the time with no error. The Bash scripts exists with 0 and Perl sees $? as -1 and $! as "No child processes".
This bash script tests compiler licenses, and Intel icc is left around after the Bash script completes (ps output below). I think the icc zombie completes, forcing Perl into SIG{CHLD} handler, which blows away the $? status before I'm able to read it.
Compile status -1; No child processes
#!/usr/bin/perl
use strict;
use POSIX ':sys_wait_h';
my $GLOBAL_TIMEOUT = 1200;
### Timer to notify if this program hangs
my $timer_pid;
$SIG{CHLD} = sub {
local ($!, $?);
while((my $pid = waitpid(-1, WNOHANG)) > 0)
{
if($pid == $timer_pid)
{
die "Timeout\n";
}
}
};
die "Unable to fork\n" unless(defined($timer_pid = fork));
if($timer_pid == 0) # child
{
sleep($GLOBAL_TIMEOUT);
exit;
}
### End Timer
### Compile test
my #compile = `./compile_test.sh 2>&1`;
my $status = $?;
print "Compile status $status; $!\n";
if($status != 0)
{
print "#compile\n";
}
END # Timer cleanup
{
if($timer_pid != 0)
{
$SIG{CHLD} = 'IGNORE';
kill(15, $timer_pid);
}
}
exit(0);
#!/bin/sh
cc compile_test.c
if [ $? -ne 0 ]; then
echo "Cray compiler failure"
exit 1
fi
module swap PrgEnv-cray PrgEnv-intel
cc compile_test.c
if [ $? -ne 0 ]; then
echo "Intel compiler failure"
exit 1
fi
wait
ps
exit 0
The wait doesn't really wait because cc calls icc which creates a zombie grandchild process that wait (or wait PID) doesn't block for. (wait `pidof icc`, 31589 in this case, gives "not a child of this shell")
user 31589 1 0 12:47 pts/15 00:00:00 icc
I just don't know how to fix this in Bash or Perl.
Thanks, Chris
Isn't this a use case for alarm? Toss out your SIGCHLD handler and say
local $? = -1;
eval {
local $SIG{ALRM} = sub { die "Timeout\n" };
alarm($GLOBAL_TIMEOUT);
#compile = `./compile_test.sh 2>&1`;
alarm(0);
};
my $status = $?;
instead.
I thought the quickest solution would be to add sleep of a second or two at the bottom of the bash script to wait for the zombie icc to complete. But that didn't work.
If I didn't already have a SIG ALRM (in the real program) I agree the best choice would be to wrap the whole thing in a eval. Even thought that would be pretty ugly for a 500 line program.
Without the local($?), every `system` call gets $? = -1. The $? I need in this case is after waitpid, then unfortunately set to -1 after the sig handler exits. So I find this works. New lines shown with ###
my $timer_pid;
my $chld_status; ###
$SIG{CHLD} = sub {
local($!, $?);
while((my $pid = waitpid(-1, WNOHANG)) > 0)
{
$chld_status = $?; ###
if($pid == $timer_pid)
{
die "Timeout\n";
}
}
};
...
my #compile = `./compile_test.sh 2>&1`;
my $status = ($? == -1) ? $chld_status : $?; ###
...
We had a similar issue, here is our solution: Leak a write-side file descriptor into the grandchild and read() from it which will block until it exits.
See also: wait for children and grand-children
use Fcntl;
# OCF scripts invoked by Pacemaker will be killed by Pacemaker with
# a SIGKILL if the script exceeds the configured resource timeout. In
# addition to killing the script, Pacemaker also kills all of the children
# invoked by that script. Because it is a kill, the scripts cannot trap
# the signal and clean up; because all of the children are killed as well,
# we cannot simply fork and have the parent wait on the child. In order
# to work around that, we need the child not to have a parent proccess
# of the OCF script---and the only way to do that is to grandchild the
# process. However, we still want the parent to wait for the grandchild
# process to exit so that the OCF script exits when the grandchild is
# done and not before. This is done by leaking the write file descriptor
# from pipe() into the grandchild and then the parent reads the read file
# descriptor, thus blocking until it gets IO or the grandchild exits. Since
# the file descriptor is never written to by the grandchild, the parent
# blocks until the child exits.
sub grandchild_wait_exit
{
# We use "our" instead of "my" for the write side of the pipe. If
# we did not, then when the sub exits and $w goes out of scope,
# the file descriptor will close and the parent will exit.
pipe(my $r, our $w);
# Enable leaking the file descriptor into the children
my $flags = fcntl($w, F_GETFD, 0) or warn $!;
fcntl($w, F_SETFD, $flags & (~FD_CLOEXEC)) or die "Can't set flags: $!\n";
# Fork the child
my $child = fork();
if ($child) {
# We are the parent, waitpid for the child and
# then read to wait for the grandchild.
close($w);
waitpid($child, 0);
<$r>;
exit;
}
# Otherwise we are the child, so close the read side of the pipe.
close($r);
# Fork a grandchild, exit the child.
if (fork()) {
exit;
}
# Turn off leaking of the file descriptor in the grandchild so
# that no other process can write to the open file descriptor
# that would prematurely exit the parent.
$flags = fcntl($w, F_GETFD, 0) or warn $!;
fcntl($w, F_SETFD, $flags | FD_CLOEXEC) or die "Can't set flags: $!\n";
}
grandchild_wait_exit();
sleep 1;
print getppid() . "\n";
print "$$: gc\n";
sleep 30;
exit;

Running several bash commands and killing them after some time [duplicate]

I'd like to automatically kill a command after a certain amount of time. I have in mind an interface like this:
% constrain 300 ./foo args
Which would run "./foo" with "args" but automatically kill it if it's still running after 5 minutes.
It might be useful to generalize the idea to other constraints, such as autokilling a process if it uses too much memory.
Are there any existing tools that do that, or has anyone written such a thing?
ADDED: Jonathan's solution is precisely what I had in mind and it works like a charm on linux, but I can't get it to work on Mac OSX. I got rid of the SIGRTMIN which lets it compile fine, but the signal just doesn't get sent to the child process. Anyone know how to make this work on Mac?
[Added: Note that an update is available from Jonathan that works on Mac and elsewhere.]
GNU Coreutils includes the timeout command, installed by default on many systems.
https://www.gnu.org/software/coreutils/manual/html_node/timeout-invocation.html
To watch free -m for one minute, then kill it by sending a TERM signal:
timeout 1m watch free -m
Maybe I'm not understanding the question, but this sounds doable directly, at least in bash:
( /path/to/slow command with options ) & sleep 5 ; kill $!
This runs the first command, inside the parenthesis, for five seconds, and then kills it. The entire operation runs synchronously, i.e. you won't be able to use your shell while it is busy waiting for the slow command. If that is not what you wanted, it should be possible to add another &.
The $! variable is a Bash builtin that contains the process ID of the most recently started subshell. It is important to not have the & inside the parenthesis, doing it that way loses the process ID.
I've arrived rather late to this party, but I don't see my favorite trick listed in the answers.
Under *NIX, an alarm(2) is inherited across an execve(2) and SIGALRM is fatal by default. So, you can often simply:
$ doalarm () { perl -e 'alarm shift; exec #ARGV' "$#"; } # define a helper function
$ doalarm 300 ./foo.sh args
or install a trivial C wrapper to do that for you.
Advantages Only one PID is involved, and the mechanism is simple. You won't kill the wrong process if, for example, ./foo.sh exited "too quickly" and its PID was re-used. You don't need several shell subprocesses working in concert, which can be done correctly but is rather race-prone.
Disadvantages The time-constrained process cannot manipulate its alarm clock (e.g., alarm(2), ualarm(2), setitimer(2)), since this would likely clear the inherited alarm. Obviously, neither can it block or ignore SIGALRM, though the same can be said of SIGINT, SIGTERM, etc. for some other approaches.
Some (very old, I think) systems implement sleep(2) in terms of alarm(2), and, even today, some programmers use alarm(2) as a crude internal timeout mechanism for I/O and other operations. In my experience, however, this technique is applicable to the vast majority of processes you want to time limit.
There is also ulimit, which can be used to limit the execution time available to sub-processes.
ulimit -t 10
Limits the process to 10 seconds of CPU time.
To actually use it to limit a new process, rather than the current process, you may wish to use a wrapper script:
#! /usr/bin/env python
import os
os.system("ulimit -t 10; other-command-here")
other-command can be any tool. I was running a Java, Python, C and Scheme versions of different sorting algorithms, and logging how long they took, whilst limiting execution time to 30 seconds. A Cocoa-Python application generated the various command lines - including the arguments - and collated the times into a CSV file, but it was really just fluff on top of the command provided above.
I have a program called timeout that does that - written in C, originally in 1989 but updated periodically since then.
Update: this code fails to compile on MacOS X because SIGRTMIN is not defined, and fails to timeout when run on MacOS X because the `signal()` function there resumes the `wait()` after the alarm times out - which is not the required behaviour. I have a new version of `timeout.c` which deals with both these problems (using `sigaction()` instead of `signal()`). As before, contact me for a 10K gzipped tar file with the source code and a manual page (see my profile).
/*
#(#)File: $RCSfile: timeout.c,v $
#(#)Version: $Revision: 4.6 $
#(#)Last changed: $Date: 2007/03/01 22:23:02 $
#(#)Purpose: Run command with timeout monitor
#(#)Author: J Leffler
#(#)Copyright: (C) JLSS 1989,1997,2003,2005-07
*/
#define _POSIX_SOURCE /* Enable kill() in <unistd.h> on Solaris 7 */
#define _XOPEN_SOURCE 500
#include <stdio.h>
#include <stdlib.h>
#include <signal.h>
#include <errno.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/wait.h>
#include "stderr.h"
#define CHILD 0
#define FORKFAIL -1
static const char usestr[] = "[-vV] -t time [-s signal] cmd [arg ...]";
#ifndef lint
/* Prevent over-aggressive optimizers from eliminating ID string */
const char jlss_id_timeout_c[] = "#(#)$Id: timeout.c,v 4.6 2007/03/01 22:23:02 jleffler Exp $";
#endif /* lint */
static void catcher(int signum)
{
return;
}
int main(int argc, char **argv)
{
pid_t pid;
int tm_out;
int kill_signal;
pid_t corpse;
int status;
int opt;
int vflag = 0;
err_setarg0(argv[0]);
opterr = 0;
tm_out = 0;
kill_signal = SIGTERM;
while ((opt = getopt(argc, argv, "vVt:s:")) != -1)
{
switch(opt)
{
case 'V':
err_version("TIMEOUT", &"#(#)$Revision: 4.6 $ ($Date: 2007/03/01 22:23:02 $)"[4]);
break;
case 's':
kill_signal = atoi(optarg);
if (kill_signal <= 0 || kill_signal >= SIGRTMIN)
err_error("signal number must be between 1 and %d\n", SIGRTMIN - 1);
break;
case 't':
tm_out = atoi(optarg);
if (tm_out <= 0)
err_error("time must be greater than zero (%s)\n", optarg);
break;
case 'v':
vflag = 1;
break;
default:
err_usage(usestr);
break;
}
}
if (optind >= argc || tm_out == 0)
err_usage(usestr);
if ((pid = fork()) == FORKFAIL)
err_syserr("failed to fork\n");
else if (pid == CHILD)
{
execvp(argv[optind], &argv[optind]);
err_syserr("failed to exec command %s\n", argv[optind]);
}
/* Must be parent -- wait for child to die */
if (vflag)
err_remark("time %d, signal %d, child PID %u\n", tm_out, kill_signal, (unsigned)pid);
signal(SIGALRM, catcher);
alarm((unsigned int)tm_out);
while ((corpse = wait(&status)) != pid && errno != ECHILD)
{
if (errno == EINTR)
{
/* Timed out -- kill child */
if (vflag)
err_remark("timed out - send signal %d to process %d\n", (int)kill_signal, (int)pid);
if (kill(pid, kill_signal) != 0)
err_syserr("sending signal %d to PID %d - ", kill_signal, pid);
corpse = wait(&status);
break;
}
}
alarm(0);
if (vflag)
{
if (corpse == (pid_t) -1)
err_syserr("no valid PID from waiting - ");
else
err_remark("child PID %u status 0x%04X\n", (unsigned)corpse, (unsigned)status);
}
if (corpse != pid)
status = 2; /* I don't know what happened! */
else if (WIFEXITED(status))
status = WEXITSTATUS(status);
else if (WIFSIGNALED(status))
status = WTERMSIG(status);
else
status = 2; /* I don't know what happened! */
return(status);
}
If you want the 'official' code for 'stderr.h' and 'stderr.c', contact me (see my profile).
Perl one liner, just for kicks:
perl -e '$s = shift; $SIG{ALRM} = sub { print STDERR "Timeout!\n"; kill INT => $p }; exec(#ARGV) unless $p = fork; alarm $s; waitpid $p, 0' 10 yes foo
This prints 'foo' for ten seconds, then times out. Replace '10' with any number of seconds, and 'yes foo' with any command.
The timeout command from Ubuntu/Debian when compiled from source to work on the Mac. Darwin
10.4.*
http://packages.ubuntu.com/lucid/timeout
My variation on the perl one-liner gives you the exit status without mucking with fork() and wait() and without the risk of killing the wrong process:
#!/bin/sh
# Usage: timelimit.sh secs cmd [ arg ... ]
exec perl -MPOSIX -e '$SIG{ALRM} = sub { print "timeout: #ARGV\n"; kill(SIGTERM, -$$); }; alarm shift; $exit = system #ARGV; exit(WIFEXITED($exit) ? WEXITSTATUS($exit) : WTERMSIG($exit));' "$#"
Basically the fork() and wait() are hidden inside system(). The SIGALRM is delivered to the parent process which then kills itself and its child by sending SIGTERM to the whole process group (-$$). In the unlikely event that the child exits and the child's pid gets reused before the kill() occurs, this will NOT kill the wrong process because the new process with the old child's pid will not be in the same process group of the parent perl process.
As an added benefit, the script also exits with what is probably the correct exit status.
#!/bin/sh
( some_slow_task ) & pid=$!
( sleep $TIMEOUT && kill -HUP $pid ) 2>/dev/null & watcher=$!
wait $pid 2>/dev/null && pkill -HUP -P $watcher
The watcher kills the slow task after given timeout; the script waits for the slow task and terminates the watcher.
Examples:
The slow task run more than 2 sec and was terminated
Slow task interrupted
( sleep 20 ) & pid=$!
( sleep 2 && kill -HUP $pid ) 2>/dev/null & watcher=$!
if wait $pid 2>/dev/null; then
echo "Slow task finished"
pkill -HUP -P $watcher
wait $watcher
else
echo "Slow task interrupted"
fi
This slow task finished before the given timeout
Slow task finished
( sleep 2 ) & pid=$!
( sleep 20 && kill -HUP $pid ) 2>/dev/null & watcher=$!
if wait $pid 2>/dev/null; then
echo "Slow task finished"
pkill -HUP -P $watcher
wait $watcher
else
echo "Slow task interrupted"
fi
Try something like:
# This function is called with a timeout (in seconds) and a pid.
# After the timeout expires, if the process still exists, it attempts
# to kill it.
function timeout() {
sleep $1
# kill -0 tests whether the process exists
if kill -0 $2 > /dev/null 2>&1 ; then
echo "killing process $2"
kill $2 > /dev/null 2>&1
else
echo "process $2 already completed"
fi
}
<your command> &
cpid=$!
timeout 3 $cpid
wait $cpid > /dev/null 2>&
exit $?
It has the downside that if your process' pid is reused within the timeout, it may kill the wrong process. This is highly unlikely, but you may be starting 20000+ processes per second. This could be fixed.
How about using the expect tool?
## run a command, aborting if timeout exceeded, e.g. timed-run 20 CMD ARGS ...
timed-run() {
# timeout in seconds
local tmout="$1"
shift
env CMD_TIMEOUT="$tmout" expect -f - "$#" <<"EOF"
# expect script follows
eval spawn -noecho $argv
set timeout $env(CMD_TIMEOUT)
expect {
timeout {
send_error "error: operation timed out\n"
exit 1
}
eof
}
EOF
}
pure bash:
#!/bin/bash
if [[ $# < 2 ]]; then
echo "Usage: $0 timeout cmd [options]"
exit 1
fi
TIMEOUT="$1"
shift
BOSSPID=$$
(
sleep $TIMEOUT
kill -9 -$BOSSPID
)&
TIMERPID=$!
trap "kill -9 $TIMERPID" EXIT
eval "$#"
I use "timelimit", which is a package available in the debian repository.
http://devel.ringlet.net/sysutils/timelimit/
A slight modification of the perl one-liner will get the exit status right.
perl -e '$s = shift; $SIG{ALRM} = sub { print STDERR "Timeout!\n"; kill INT => $p; exit 77 }; exec(#ARGV) unless $p = fork; alarm $s; waitpid $p, 0; exit ($? >> 8)' 10 yes foo
Basically, exit ($? >> 8) will forward the exit status of the subprocess. I just chose 77 at the exit status for timeout.
Isn't there a way to set a specific time with "at" to do this?
$ at 05:00 PM kill -9 $pid
Seems a lot simpler.
If you don't know what the pid number is going to be, I assume there's a way to script reading it with ps aux and grep, but not sure how to implement that.
$ | grep someprogram
tony 11585 0.0 0.0 3116 720 pts/1 S+ 11:39 0:00 grep someprogram
tony 22532 0.0 0.9 27344 14136 ? S Aug25 1:23 someprogram
Your script would have to read the pid and assign it a variable.
I'm not overly skilled, but assume this is doable.

Return value from system() when using SIGINT default handler

I am experiencing some strange return values from system() when a child process receives a SIGINT from the terminal. To explain, from a Perl script parent.pl I used system() to run another Perl script as a child process, but I also needed to run the child through the shell, so I used the system 'sh', '-c', ... form.. So the parent of the child became the sh process and the parent of the sh process became parent.pl. Also, to avoid having the sh process receiving the SIGINT signal, I trapped it.
For example, parent.pl:
use feature qw(say);
use strict;
use warnings;
for (1..3) {
my $res = system 'sh', '-c', "trap '' INT; child$_.pl";
say "Parent received return value: " . ($res >> 8);
}
where child1.pl:
local $SIG{INT} = "DEFAULT";
sleep 10;
say "Child timed out..";
exit 1;
child2.pl:
local $SIG{INT} = sub { die };
sleep 10;
say "Child timed out..";
exit 1;
and child3.pl is:
eval {
local $SIG{INT} = sub { die };
sleep 10;
};
if ( $# ) {
print $#;
exit 2;
}
say "Child timed out..";
exit 0;
If I run parent.pl (from the command line) and press CTRL-C to abort each child process, the output is:
^CParent received return value: 130
^CDied at ./child2.pl line 7.
Parent received return value: 4
^CDied at ./child3.pl line 8.
Parent received return value: 2
Now, I would like to know why I get a return value of 130 for case 1, and a return value of 4 for case 2.
Also, it would be nice to know exactly what the "DEFAULT" signal handler does in this case.
Note: the same values are returned if I replace sh with bash ( and trap SIGINT instead of INT in bash ).
See also:
Propagation of signal to parent when using system
perlipc
Chapter 15, in Programming Perl, 4th Edition
This question is very similar to Propagation of signal to parent when using system that you asked earlier.
From my bash docs:
When a command terminates on a fatal signal N, bash uses the value of 128+N as the exit status.
SIGINT is typically 2, so 128 + 2 give you 130.
Perl's die figures out its exit code by inspecting $! or $? for an uncaught exception (so, not the case where you use eval):
exit $! if $!; # errno
exit $? >> 8 if $? >> 8; # child exit status
exit 255; # last resort
Notice that in this case, Perl exits with the value as is, not shifted up 8 bits.
The errno value happens to be 4 (see errno.h). The $! variable is a dualvar with different string and numeric values. Use it numerically (like adding zero) to get the number side:
use v5.10;
local $SIG{INT}=sub{
say "numeric errno is ", $!+0;
die
};
sleep 10;
print q(timed out);
exit 1;
This prints:
$ bash -c "perl errno.pl"
^Cnumeric errno is 4
Died at errno.pl line 6.
$ echo $?
4
Taking your questions out of order:
Also, it would be nice to know exactly what the "DEFAULT" signal handler does in this case.
Setting the handler for a given signal to "DEFAULT" affirms or restores the default signal handler for the given signal, whose action depends on the signal. Details are available from the signal(7) manual page. The default handler for SIGINT terminates the process.
Now, I would like to know why I get a return value of 130 for case 1, and a return value of 4 for case 2.
Your child1 explicitly sets the default handler for SIGINT, so that signal causes it to terminate abnormally. Such a process has no exit code in the conventional sense. The shell also receives the SIGINT, but it traps and ignores it. The exit status it reports for the child process (and therefore for itself) reflects the signal (number 2) that killed the child.
Your other two child processes, on the other hand, catch SIGINT and terminate normally in response. These do produce exit codes, which the shell passes on to you (after trapping and ignoring the SIGINT). The documentation for die() describes how the exit code is determined in this case, but the bottom line is that if you want to exit with a specific code then you should use exit instead of die.

why doesn't this timer work?

I'm trying to make a script to start a second counter. [but later I want to add minutes too] but so far, it just keeps echoing 0, 0, 0, 0, over and over. :\
#!/bin/bash
seconds=0;
count()
{
export seconds=$[seconds + 1]
sleep 1;
count
}
count&
N=$!
trap "kill $N; exit 0;" 2
while true; do
echo $seconds
sleep 1;
done
The & makes it run in a subshell, which means that it has its own set of environment variables independent of the current script. Find another way (or another language) to do this.
Ignacio's answer explains that your subshell's environment is not visible to your parent process.
One way to create slaves like this is co-processes (with coproc in zsh and newer bash or with special syntax in ksh). Your bash probably doesn't support this yet.
Here's a variation on your idea that uses signals to send the updates to the parent. I've retained your basic structure where it doesn't conflict:
count() {
parent=$1
kill -ALRM $parent
sleep 1
count $parent
}
trap 'seconds=$[$seconds + 1]' ALRM
count $$ &
trap "kill $!; exit 0" INT
while true
do
echo $seconds
done

Catch PHP Exits in CLI via sh

Alright, I am trying to figure this problem out. I have a class that loops indefinitely until I either restart it manually or it runs out of available ram. I've written the code to be compliant with both CLI and normal web based execution. The only difference is with web-based execution the script will last about 12 hours or so until it crashes due to memory issues. When I run it in CLI it runs far longer, (On average 4-5 days before a crash due to memory)
The script is an IRC bot that is heavily customized for what I need it to do. I don't know enough of C++, ruby, python or other languages to make something that is cross platform compliant. My dev machine is Windows and my production server is Ubuntu. Right now I have the script successfully forking off and detaching from the terminal window so I can close that with out ending the script.
But what I am trying to figure out is how to catch errors and restart the script automatically since it tends to fail at random times and not always when I am at the IRC channel to catch the failure. One last positive would be a way to catch if I requested a restart from the channel and have the bot restart as I am constantly adding in new code functions or just general bug fixes.
Here is my CLI start php script
#!/usr/bin/php
<?php
include_once ("./config/base_conf.php");
include_once ("./libs/irc_base.php");
if ($config ['database'] == true) {
include_once ("./config/db_conf.php");
}
$server = getopt ( 's', array ("server::" ) );
if (! $server) {
$SER = 'default_server';
} elseif ($server ['server'] == 'raelgun') {
$SER = 'server_a';
} else {
$SER = 'default_server';
}
declare ( ticks = 1 )
;
$pid = pcntl_fork ();
if ($pid == - 1) {
die ( "could not fork" );
} else if ($pid) {
exit (); // we are the parent
} else {
// we are the child
}
// detatch from the controlling terminal
if (posix_setsid () == - 1) {
die ( "could not detach from terminal" );
}
$posid = posix_getpid ();
$PID_FILE = "/var/run/bot_process_".$SER.".pid";
$fp = fopen ($PID_FILE , "w" ) or die("File Exists Process Running");
fwrite ( $fp, $posid );
fclose ( $fp );
// setup signal handlers
pcntl_signal ( SIGTERM, "sig_handler" );
pcntl_signal ( SIGHUP, "sig_handler" );
// loop forever performing tasks
$bot = new IRC_BOT ( $config, $SER );
function sig_handler($signo) {
switch ($signo) {
case SIGTERM :
$bot->machineKill();
unlink($PID_FILE);
exit ();
break;
case SIGHUP :
$bot->machineKill();
unlink($PID_FILE);
break;
default :
// handle all other signals
}
}
Depending on the server I connect to since it connects to a maximum of 2 servers I run the following in the terminal to get the script running
php bot_start_shell.php --server="servernamehere" > /dev/null
So what I am trying to do is get a shell file coded correctly to monitor that script, and if it exits due to error or requested restart to restart the script.
I've used this technique for a while, where a shell script runs a PHP script, monitors the exit value and restarts.
Here's a test script that uses exit() to return a value to the shell script - 95,96 & 100 are taken as other 'unplanned restarts', handled at the bottom of the script.
#!/usr/bin/php
<?php
// cli-script.php
// for testing of the BASH script
exit (rand(95, 100));
/* normally we would return one of
# 97 - planned pause/restart
# 98 - planned restart
# 99 - planned stop, exit.
# anything else is an unplanned restart
*/
I prefer to wait a few seconds before I restart the script, to avoid wasting CPU if the script being called instantly fails, and so would be immediately restarted.
#!/bin/bash
# runPHP-Worker.sh
# a shell script that keeps looping until an exit code is given
# if its does an exit(0), restart after a second - or if it's a declared error
# if we've restarted in a planned fashion, we don't bother with any pause
# and for one particular code, we can exit the script entirely.
# The numbers 97, 98, 99 must match what is returned from the PHP script
nice php -q -f ./cli-script.php -- $#
ERR=$?
## Possibilities
# 97 - planned pause/restart
# 98 - planned restart
# 99 - planned stop, exit.
# 0 - unplanned restart (as returned by "exit;")
# - Anything else is also unplanned paused/restart
if [ $ERR -eq 97 ]
then
# a planned pause, then restart
echo "97: PLANNED_PAUSE - wait 1";
sleep 1;
exec $0 $#;
fi
if [ $ERR -eq 98 ]
then
# a planned restart - instantly
echo "98: PLANNED_RESTART, no pause";
exec $0 $#;
fi
if [ $ERR -eq 99 ]
then
# planned complete exit
echo "99: PLANNED_SHUTDOWN";
exit 0;
fi
# unplanned exit, pause, and then restart
echo "unplanned restart: err:" $ERR;
echo "sleeping for 1 sec"
sleep 1
exec $0 $#
If you don't want to do different things for each value, it really just comes down to
#!/bin/bash
php -q -f ./cli-script.php -- $#
exec $0 $#;

Resources