Bash script lingers after exiting (issues with named pipe I/O) - bash

Summary
I have worked out a solution to the issue of this question.
Basically, the callee (wallpaper) was not itself exiting because it was waiting on another process to finish.
Over the course of 52 days, this problematic side effect had snowballed until 10,000+ lingering processes were consuming 10+ gigabytes of RAM, almost crashing my system.
The offending process turned out to be a call to printf from a function called log that I had sent into the background and forgotten about, because it was writing to a pipe and hanging.
As it turns out, a process writing to a named pipe will block until another process comes along and reads from it.
This, in turn, changed the requirements of the question from "I need a way to stop these processes from building up" to "I need a better way of getting around FIFO I/O than throwing it to the background".
Note that while the question has been solved, I'm more than happy to accept an answer that goes into detail on the technical level. For example, the unsolved mystery of why the caller script's (wallpaper-run) process was being duplicated as well, even though it was only called once, or how to read a pipe's state information proper, rather than relying on open's failure when called with O_NONBLOCK.
The original question follows.
The Question
I have two bash scripts meant to run in a loop. The first, wallpaper-run, runs in an infinite loop and calls the second, wallpaper.
They are part of my "desktop", which is a bunch of hacked together shell scripts augmenting the dwm window manager.
wallpaper-run:
log "starting wallpaper runner"
while true; do
log "..."
$scr/wallpaper
sleep 900 # 15 minutes
done &
wallpaper:
log "changing wallpaper"
# several utility functions ...
if [[ $1 ]]; then
parse_arg $1
else
load_random
fi
Some notes:
log is an exported function from init, which, as its name suggests, logs a message.
init calls wallpaper-run (among other things) in its foreground (hence the while loop being in the background)
$scr is also defined by init; it is the directory where so-called "init-scripts" go
parse_arg and load_random are local to wallpaper
in particular, images are loaded into the background via the program feh
The manner in which wallpaper-run is loaded is as such: $mod/wallpaper-run
init is called directly by startx, and starts dwm before it runs wallpaper-run (and the other "modules")
Now on to the problem, which is that for some reason, both wallpaper-run and wallpaper "linger" in memory. That is to say that after each iteration of the loop, two new instances of wallpaper and wallpaper-run are created, while the "old" ones don't get cleaned up and get stuck in sleep status. It's like a memory leak, but with lingering processes instead of bad memory management.
I found out about this "process leak" after having my system up for 52 days when everything broke ( something like bash: cannot fork: resource temporarily unavailable spammed the terminal whenever I tried to run a command ) because the system ran out of memory. I had to kill over 10,000 instances of wallpaper/run to bring my system back to working order.
I have absolutely no idea why this is the case. I see no reason for these scripts to linger in memory because a script exiting should mean that its process gets cleaned up.
Why are they lingering and eating up resources?
Update 1
With some help from the comments (much thanks to I'L'I), I've traced the problem to the function log, which makes background calls to printf (though why I chose to do that, I don't recall). Here is the function as it appears in init:
log(){
local pipe=$pipe_front
if ! [[ -p $pipe ]]; then
mkfifo $pipe
fi
printf ... >> $initlog
printf ... > $pipe &
printf ... &
[[ $2 == "-g" ]] && notify-send "[DWM Init] $1"
sleep 0.001
}
As you can see, the function is very poorly written. I hacked it together to make it work, not to make it robust.
The second and third printf are sent to the background. I don't recall why i did this, but it's presumably because the first printf must have been making log hang.
The printf lines have been abridged to "...", because they are fairly complex and not relevant to the issue at hand (And also I have better things to do with 40 minutes of my time than fighting with Android's garbage text input interface). In particular, things like the current time, name of the calling process, and the passed message are printed, depending on which printf we're talking about. The first has the most detail because it's saved to a file where immediate context is lost, while the notify-send line has the least amount of detail because it's going to be displayed on the desktop.
The whole pipe debacle is for interfacing directly with init via a rudimentary shell that I wrote for it.
The third printf is intentional; it prints to the tty that I log into at the beginning of a session. This is so that if init suddenly crashes on me, I can see a log of what went wrong. Or at least what was happening before it crashed
I'm including this in the question because this is the root cause of the "leak". If I can fix this function, the issue will be resolved.
The function needs to log the messages to their respective sources and halt until each call to printf finishes, but it also must finish within a timely manner; hanging for an indefinite period of time and/or failing to log the messages is unacceptable behavior.
Update 2
After isolating the log function (see update 1) into a test script and setting up a mock environment, I've boiled it down to printf.
The printf call which is redirected into a pipe,
printf "..." > $pipe
hangs if nothing is listening to it, because it's waiting for a second process to pick up the read end of the pipe and consume the data. This is probably why I had initially forced them into the background, so that a process could, at some point, read the data from the pipe while, in the immediate case, the system could move on and do other things.
The call to sleep, then, was a not-well-thought-out hack to work around data race problems resulting from one reader trying to read from multiple writers simultaneously. The theory was that if each writer had to wait for 0.001 seconds (despite the fact that the printf in the background has nothing to do with the sleep following it), somehow, that would make the data appear in order and fix the bug. Of course, looking back, that really does nothing useful.
The end result is several background processes hanging on to the pipe, waiting for something to read from it.
The answer to "Prevent hanging of "echo STRING > fifo" when nothing..." presents the same "solution" that caused the bug that spawned this question. Obviously incorrect. However, an interesting comment by user R.. mentioned something about fifos containing state which includes information such as what processes are reading the pipe.
Storing state? You mean the absence/presence of a reader? That's part of the state of the fifo; any attempt to store it outside would be bogus and would be subject to race conditions.
Obtaining this information and refusing to write if there is no reader is the key to solving this.
However, no matter what I search for on Google, I can't seem to find anything about reading the state of a pipe, even in C. I am perfectly willing to use C if need be, but a bash solution (or an existing core util) would be preferred.
So now the question becomes: how in the heck do I read the state information of a FIFO, particularly the process(es) who has (have) the pipe open for reading and/or writing?

https://stackoverflow.com/a/20694422
The above linked answer shows a C program attempting to open a file with O_NONBLOCK. So I tried writing a program whose job is to return 0 (success) if open returns a valid file descriptor, and 1 (fail) if open returns -1.
#include <fcntl.h>
#include <unistd.h>
int
main(int argc, char **argv)
{
int fd = open(argv[1], O_WRONLY | O_NONBLOCK);
if(fd == -1)
return 1;
close(fd);
return 0;
}
I didn't bother checking if argv[1] is null or if open failed because the file doesn't exist because I only plan to utilize this program from a shell script where it is guaranteed to be given the correct arguments.
That said, the program does its job
$ gcc pipe-open.c
$ ./a.out ./pipe && echo "pipe has a reader" || echo "pipe has no reader"
$ ./a.out ./pipe && echo "pipe has a reader" || echo "pipe has no reader"
Assuming the existence of pipe and that between the first and second invocations, another process opens the pipe (cat pipe), the output looks like this:
pipe has no reader
pipe has a reader
The program also works if the pipe has a second writer (I.e. it will fail because there is no reader)
The only problem is that after closing the file, the reader closes its end of the pipe as well. And removing the call to close won't do any good because all open file descriptors are automatically closed after main returns (control goes to exit, which walks the list of open file descriptors and closes them one by one). Not good!
This means that the only window to actually write to the pipe is before its closing, I.e. from within the C program itself.
#include <fcntl.h>
#include <unistd.h>
int
write_to_pipe(int fd)
{
char buf[1024];
ssize_t nread;
int nsuccess = 0;
while((nread = read(0, buf, 1024)) > 0 && ++nsuccess)
write(fd, buf, nread);
close(fd);
return nsuccess > 0 ? 0 : 2;
}
int
main(int argc, char **argv)
{
int fd = open(argv[1], O_WRONLY | O_NONBLOCK);
if(fd == -1)
return 1;
return write_to_pipe(fd);
}
Invocation:
$ echo hello world | ./a.out pipe
$ ret=$?
$ if [[ $ret == 1 ]]; then echo no reader
> elif [[ $ret == 2 ]]; then echo an error occurred trying to write to the pipe
> else echo success
> fi
Output with same conditions as before (1st call has no reader; 2nd call does):
no reader
success
Additionally, the text "Hello World" can be seen in the terminal reading the pipe
And finally, the problem is solved. I have a program which acts as a middle man between a writer and a pipe, which exits immediately with a failure code if no reader is attached to the pipe at the time of invocation, or if there is, attempts to write to the pipe and communicates failure if nothing is written.
That last part is new. I thought it might be useful in the future to know if nothing got written.
I'll probably add more error detection in the future, but since log checks for the existence of the pipe before trying to write to it, this is fine for now

The issue is that you are starting the wallpaper process without checking if the previous run finished or not. So, in 52 days, potentially 4 * 24 * 52 = ~5000 instances could be running (not sure how you found 10000, though)! Is it possible to use flock to make sure there is only one instance of wallpaper running at a time?
See this post: Quick-and-dirty way to ensure only one instance of a shell script is running at a time

Related

Why does bash "forget" about my background processes?

I have this code:
#!/bin/bash
pids=()
for i in $(seq 1 999); do
sleep 1 &
pids+=( "$!" )
done
for pid in "${pids[#]}"; do
wait "$pid"
done
I expect the following behavior:
spin through the first loop
wait about a second on the first pid
spin through the second loop
Instead, I get this error:
./foo.sh: line 8: wait: pid 24752 is not a child of this shell
(repeated 171 times with different pids)
If I run the script with shorter loop (50 instead of 999), then I get no errors.
What's going on?
Edit: I am using GNU bash 4.4.23 on Windows.
POSIX says:
The implementation need not retain more than the {CHILD_MAX} most recent entries in its list of known process IDs in the current shell execution environment.
{CHILD_MAX} here refers to the maximum number of simultaneous processes allowed per user. You can get the value of this limit using the getconf utility:
$ getconf CHILD_MAX
13195
Bash stores the statuses of at most twice as that many exited background processes in a circular buffer, and says not a child of this shell when you call wait on the PID of an old one that's been overwritten. You can see how it's implemented here.
The way you might reasonably expect this to work, as it would if you wrote a similar program in most other languages, is:
sleep is executed in the background via a fork+exec.
At some point, sleep exits leaving behind a zombie.
That zombie remains in place, holding its PID, until its parent calls wait to retrieve its exit code.
However, shells such as bash actually do this a little differently. They proactively reap their zombie children and store their exit codes in memory so that they can deallocate the system resources those processes were using. Then when you wait the shell just hands you whatever value is stored in memory, but the zombie could be long gone by then.
Now, because all of these exit statuses are being stored in memory, there is a practical limit to how many background processes can exit without you calling wait before you've filled up all the memory you have available for this in the shell. I expect that you're hitting this limit somewhere in the several hundreds of processes in your environment, while other users manage to make it into the several thousands in theirs. Regardless, the outcome is the same - eventually there's nowhere to store information about your children and so that information is lost.
I can reproduce on ArchLinux with docker run -ti --rm bash:5.0.18 bash -c 'pids=; for ((i=1;i<550;++i)); do true & pids+=" $!"; done; wait $pids' and any earlier. I can't reproduce with bash:5.1.0 .
What's going on?
It looks like a bug in your version of Bash. There were a couple of improvements in jobs.c and wait.def in Bash:5.1 and Make sure SIGCHLD is blocked in all cases where waitchld() is not called from a signal handler is mentioned in the changelog. From the look of it, it looks like an issue with handling a SIGCHLD signal while already handling another SIGCHLD signal.

Perl: Child subprocesses are not being killed when child is being killed

This is being done on windows
I am getting error: The process cannot access the file because it is being used by another process. It seems that even after the child is exiting(exit 0) and the parent is waiting for the child to complete (waitpid($lkpid, 0)),the child's subprocesses are not being killed. Hence, when the next iteration (test case) is running, it is finding the process already running, and hence gives the error message.
Code Snippet ($bashexe and $bePath are defined):
my $MSROO = "/home/abc";
if (my $fpid = fork()) {
for (my $i=1; $i<=1200; $i++) {
sleep 1;
if (-e "$MSROO/logs/Complete") {
last;
}
}
elsif (defined ($fpid)) {
&runAndMonitor (\#ForRun, "$MSROO/logs/Test.log"); ### #ForRun has the list of test cases
system("touch $MSROO/logs/Complete");
exit 0;
}
sub runAndMonitor {
my #ForRunPerProduct = #{$_[0]};
my $logFile = $_[1];
foreach my $TestVar (#ForRunPerProduct) {
my $TestVarDirName = $TestVar;
$TestVarDirName = dirname ($TestVarDirName);
my $lkpid;
my $filehandle;
if ( !($pid = open( $filehandle, "-|" , " $bashexe -c \" echo abc \; perl.exe reg_script.pl $TestVarDirName -t wint\" >> $logFile "))) {
die( "Failed to start process: $!" );
}
else {
print "$pid is pid of shell running: $TestVar\n"; ### Issue (error message above) is coming here after piped open is launched for a new test
my $taskInfo=`tasklist | grep "$pid"`;
chomp ($taskInfo);
print "$taskInfo is taskInfo\n";
}
if ($lkpid = fork()) {
sleep 1;
chomp ($lkpid);
LabelToCheck:
my $pidExistingOrNotInParent = kill 0, $pid;
if ($pidExistingOrNotInParent) {
sleep 10;
goto LabelToCheck;
}
}
elsif (defined ($lkpid)) {
sleep 12;
my $pidExistingOrNot = kill 0, $pid;
if ($pidExistingOrNot){
print "$pid still exists\n";
my $taskInfoVar1 =`tasklist | grep "$pid"`;
chomp ($taskInfoVar1);
my $killPID = kill 15, $pid;
print "$killPID is the value of PID\n"; ### Here, I am getting output 1 (value of $killPID). Also, I tried with signal 9, and seeing same behavior
my $taskInfoVar2 =`tasklist | grep "$pid"`;
sleep 10;
exit 0;
}
}
system("TASKKILL /F /T /PID $lkpid") if ($lkpid); ### Here, child pid is not being killed . Saying "ERROR: The process "-1472" not found"
sleep 2;
print "$lkpid is lkpid\n"; ## Here, though I am getting message "-1472 is lkpid"
#waitpid($lkpid, 0);
return;
}
Why is it that even after "exit 0 in child" and then "waitpid in parent", child subprocesses are not being killed? What can be done to fully clean child process and its subprocesses?
The exit doesn't touch child processes; it's not meant to. It just exits the process. In order to shut down its child processes as well you'd need to signal them.†
However, since this is Windows, where fork is merely emulated, here is what perlfork says
Behavior of other Perl features in forked pseudo-processes
...
kill() "kill('KILL', ...)" can be used to terminate a pseudo-process by passing it the ID returned by fork(). The outcome of kill on a
pseudo-process is unpredictable and it should not be used except under dire circumstances, because the operating system may not
guarantee integrity of the process resources when a running thread is terminated
...
exit() exit() always exits just the executing pseudo-process, after automatically wait()-ing for any outstanding child pseudo-processes. Note
that this means that the process as a whole will not exit unless all running pseudo-processes have exited. See below for some
limitations with open filehandles.
So don't do kill, while exit behaves nearly opposite to what you need.
But the Windows command TASKKILL can terminate a process and its tree
system("TASKKILL /F /T /PID $pid");
This should terminate a process with $pid and its children processes. (The command can use a process's name instead, TASKKILL /F /T /IM $name, but using names on a busy modern system, with a lot going on, can be tricky.) See taskkill on MS docs.
A more reliable way about this, altogether, is probably to use dedicated modules for Windows process management.
A few other comments
I also notice that you use pipe-open, while perlfork says for that
Forking pipe open() not yet implemented
The open(FOO, "|-") and open(BAR, "-|") constructs are not yet implemented.
So I am confused, does that pipe-open work in your code? But perlfork continues with
This limitation can be easily worked around in new code by creating a pipe explicitly. The following example shows how to write to a forked child: [full code follows]
That C-style loop, for (my $i=1; $i<=1200; $i++), is better written as
for my $i (1..1200) { ... }
(or foreach, synonyms) A C-style loop is very rarely needed in Perl.
† A kill with a negative signal (name or number) OR process-id generally terminates the whole tree under the signaled process. This is on Linux.
So one way would be to signal that child from its parent when ready, instead of exit-ing from it. (Then the child would have signal the parent in some way when it's ready.)
Or, the child can send a negative terminate signal to all its direct children process, then exit.
You didn't say which perl you are using. On Windows with Strawberry Perl (and presumably Active State), fork() emulation is ... very problematic, (maybe just "broken") as #zdim mentioned. If you want a longer explanation, see Proc::Background::Win32 - Perl Fork Limitations
Meanwhile, if you use Cygwin's Perl, fork works perfectly. This is because Cygwin does a full emulation of Unix fork() semantics, so anything built against cygwin works just like it does on Unix. The downside is that file paths show up weird, like /cygdrive/c/Program Files. This may or may not trip up code you've already written.
But, you might also have confusion about process trees. Even on Unix, killing a parent process does not kill the child processes. This usually happens for various reasons, but it is not enforced. For example, most child processes have a pipe open to the parent, and when the parent exits that pipe closes and then reading/writing the pipe gives SIGPIPE that kills the child. In other cases, the parent catches SIGTERM and then re-broadcasts that to its children before exiting gracefully. In other cases, monitors like Systemd or Docker create a container inherited by all children of the main process, and when the main process exits the monitor kills off everything else in the container.
Since it looks like you're writing your own task monitor, I'll give some advice from one that I wrote for Windows (and is running along happily years later). I ended up with a design using Proc::Background where the parent starts a task that writes to a file as STDOUT/STDERR. Then it opens that same log file and wakes up every few seconds to try reading more of the log file to see what the task is doing, and check with the Proc::Background object to see if the task exited. When the task exits, it appends the exit code and timestamp to the log file. The monitor has a timeout setting that if the child exceeds, it just un-gracefully runs TerminateProcess. (you could improve on that by leaving STDIN open as a pipe between monitor and worker, and then have the worker check STDIN every now and then, but on Windows that will block, so you have to use PeekNamedPipe, which gets messy)
Meanwhile, the monitor parses any new lines of the log file to read status information and send updates to the database. The other parts of the system can watch the database to see the status of background tasks, including a web admin interface that can also open and read the log file. If the monitor sees that a child has run for too long, it can use TerminateProcess to stop it. Missing from this design is any way for the monitor to know when it's being asked to exit, and clean up, which is a notable deficiency, and one you're probably looking for. However, there actually isn't any way to intercept a TerminateProcess aimed at the parent! Windows does have some Message Queue API stuff where you can set up to receive notifications about termination, but I never chased down the full details there. If you do, please come back and drop a comment for me :-)

How to create an anonymous pipe between 2 child processes and know their pids (while not using files/named pipes)?

Please note that this questions was edited after a couple of comments I received. Initially I wanted to split my goal into smaller pieces to make it simpler (and perhaps expand my knowledge on various fronts), but it seems I went too far with the simplicity :). So, here I am asking the big question.
Using bash, is there a way one can actually create an anonymous pipe between two child processes and know their pids?
The reason I'm asking is when you use the classic pipeline, e.g.
cmd1 | cmd2 &
you lose the ability to send signals to cmd1. In my case the actual commands I am running are these
./my_web_server | ./my_log_parser &
my_web_server is a basic web server that dump a lot of logging information to it's stdout
my_log_parser is a log parser that I wrote that reads through all the logging information it receives from my_web_server and it basically selects only certain values from the log (in reality it actually stores the whole log as it received it, but additionally it creates an extra csv file with the values it finds).
The issue I am having is that my_web_server actually never stops by itself (it is a web server, you don't want that from a web server :)). So after I am done, I need to stop it myself. I would like for the bash script to do this when I stop it (the bash script), either via SIGINT or SIGTERM.
For something like this, traps are the way to go. In essence I would create a trap for INT and TERM and the function it would call would kill my_web_server, but... I don't have the pid and even though I know I could look for it via ps, I am looking for a pretty solution :).
Some of you might say: "Well, why don't you just kill my_log_parser and let my_web_server die on its own with SIGPIPE?". The reason why I don't want to kill it is when you kill a process that's at the end of the pipeline, the output buffer of the process before it, is not flushed. Ergo, you lose stuff.
I've seen several solutions here and in other places that suggested to store the pid of my_web_server in a file. This is a solution that works. It is possible to write the pipeline by fiddling with the filedescriptors a bit. I, however don't like this solution, because I have to generate files. I don't like the idea of creating arbitrary files just to store a 5-character PID :).
What I ended up doing for now is this:
#!/bin/bash
trap " " HUP
fifo="$( mktemp -u "$( basename "${0}" ).XXXXXX" )"
mkfifo "${fifo}"
<"${fifo}" ./my_log_parser &
parser_pid="$!"
>"${fifo}" ./my_web_server &
server_pid="$!"
rm "${fifo}"
trap '2>/dev/null kill -TERM '"${server_pid}"'' INT TERM
while true; do
wait "${parser_pid}" && break
done
This solves the issue with me not being able to terminate my_web_server when the script receives SIGINT or SIGTERM. It seems more readable than any hackery fiddling with file descriptors in order to eventually use a file to store my_web_server's pid, which I think is good, because it improves the readability.
But it still uses a file (named pipe). Even though I know it uses the file (named pipe) for my_web_server and my_log_parser to talk (which is a pretty good reason) and the file gets wiped from the disk very shortly after it's created, it's still a file :).
Would any of you guys know of a way to do this task without using any files (named pipes)?
From the Bash man pages:
! Expands to the process ID of the most recently executed back-
ground (asynchronous) command.
You are not running a background command, you are running process substitution to read to file descriptor 3.
The following works, but I'm not sure if it is what you are trying to achieve:
sleep 120 &
child_pid="$!"
wait "${child_pid}"
sleep 120
Edit:
Comment was: I know I can pretty much do this the silly 'while read i; do blah blah; done < <( ./my_proxy_server )'-way, but I don't particularly like the fact that when a script using this approach receives INT or TERM, it simply dies without telling ./my_proxy_server to bugger off too :)
So, it seems like your problem stems from the fact that it is not so easy to get the PID of the proxy server. So, how about using your own named pipe, with the trap command:
pipe='/tmp/mypipe'
mkfifo "$pipe"
./my_proxy_server > "$pipe" &
child_pid="$!"
echo "child pid is $child_pid"
# Tell the proxy server to bugger-off
trap 'kill $child_pid' INT TERM
while read
do
echo $REPLY
# blah blah blah
done < "$pipe"
rm "$pipe"
You could probably also use kill %1 instead of using $child_pid.
YAE (Yet Another Edit):
You ask how to get the PIDS from:
./my_web_server | ./my_log_parser &
Simples, sort of. To test I used sleep, just like your original.
sleep 400 | sleep 500 &
jobs -l
Gives:
[1]+ 8419 Running sleep 400
8420 Running | sleep 500 &
So its just a question of extracting those PIDS:
pid1=$(jobs -l|awk 'NR==1{print $2}')
pid2=$(jobs -l|awk 'NR==2{print $1}')
I hate calling awk twice here, but anything else is just jumping through hoops.

Introduce timeout in a bash for-loop

I have a task that is very well inside of a bash for loop. The situation is though, that a few of the iterations seem to not terminate. What I'm looking for is a way to introduce a timeout that if that iteration of command hasn't terminated after e.g. two hours it will terminate, and move on to the next iteration.
Rough outline:
for somecondition; do
while time-run(command) < 2h do
continue command
done
done
One (tedious) way is to start the process in the background, then start another background process that attempts to kill the first one after a fixed timeout.
timeout=7200 # two hours, in seconds
for somecondition; do
command & command_pid=$!
( sleep $timeout & wait; kill $command_pid 2>/dev/null) & sleep_pid=$!
wait $command_pid
kill $sleep_pid 2>/dev/null # If command completes prior to the timeout
done
The wait command blocks until the original command completes, whether naturally or because it was killed after the sleep completes. The wait immediately after sleep is used in case the user tries to interrupt the process, since sleep ignores most signals, but wait is interruptible.
If I'm understanding your requirement properly, you have a process that needs to run, but you want to make sure that if it gets stuck it moves on, right? I don't know if this will fully help you out, but here is something I wrote a while back to do something similar (I've since improved this a bit, but I only have access to a gist at present, I'll update with the better version later).
#!/bin/bash
######################################################
# Program: logGen.sh
# Date Created: 22 Aug 2012
# Description: parses logs in real time into daily error files
# Date Updated: N/A
# Developer: #DarrellFX
######################################################
#Prefix for pid file
pidPrefix="logGen"
#output direcory
outDir="/opt/Redacted/logs/allerrors"
#Simple function to see if running on primary
checkPrime ()
{
if /sbin/ifconfig eth0:0|/bin/grep -wq inet;then isPrime=1;else isPrime=0;fi
}
#function to kill previous instances of this script
killScript ()
{
/usr/bin/find /var/run -name "${pidPrefix}.*.pid" |while read pidFile;do
if [[ "${pidFile}" != "/var/run/${pidPrefix}.${$}.pid" ]];then
/bin/kill -- -$(/bin/cat ${pidFile})
/bin/rm ${pidFile}
fi
done
}
#Check to see if primary
#If so, kill any previous instance and start log parsing
#If not, just kill leftover running processes
checkPrime
if [[ "${isPrime}" -eq 1 ]];then
echo "$$" > /var/run/${pidPrefix}.$$.pid
killScript
commands && commands && commands #Where the actual command to run goes.
else
killScript
exit 0
fi
I then set this script to run on cron every hour. Every time the script is run, it
creates a lock file named after a variable that describes the script that contains the pid of that instance of the script
calls the function killScript which:
uses the find command to find all lock files for that version of the script (this lets more than one of these scripts be set to run in cron at once, for different tasks). For each file it finds, it kills the processes of that lock file and removes the lock file (it automatically checks that it's not killing itself)
Starts doing whatever it is I need to run and not get stuck (I've omitted that as it's hideous bash string manipulation that I've since redone in python).
If this doesn't get you squared let me know.
A few notes:
the checkPrime function is poorly done, and should either return a status, or just exit the script itself
there are better ways to create lock files and be safe about it, but this has worked for me thus far (famous last words)

running multiple binaries that does not terminate by itself in shell script

I have several binary in the same folder that I want to run in a sequence.
Each binary does not terminate by itself and is waiting for data from a socket interface. Also, I need to decide whether to run the next binary based on the output of the previous binary. I am thinking of running them in the background and redirect the output of the previous binary to a file and "grep" for the keyword. However, unless I use wait, I couldn't capture all the output I want from running the previous binary. But if I use wait, I can't get control back because the binary is listening on socket and wouldn't return.
What can I do here?
a sample code here:
/home/test_1 & > test_1_log
test_1_id=$!
wait
===> I also want to grep "Success" in test_1_log here.
===> can't get here because of wait.
/home/test_2 & >test_2_log
test_2_id=$!
wait
Thanks
Can you use sleep instead of wait?
The problem is that you can't wait for it to return, because it won't. At the same time, you have to wait for some output. If you know that "Success" or something will be output, then you can loop until that line appears with a sleep.
RC=1
while [ $RC != 0 ]
do
sleep 1
grep -q 'Success' test_1_log
RC=$?
done
that also allows you to stop waiting after, say, 10 iterations or something, making sure your script exits

Resources