Multiprocess Queue in Bash - bash

What's a good implementation of a multiprocess queue in Bash?
I was considering a FIFO, with each line representing an element in the queue:
mkfifo fifo
ls > fifo
In a different process:
read element < fifo
The expected result is that the reader process reads one line (i. e. one element) and stores it in the variable $element, leaving the rest of the queue untouched so that other reader processes can get elements (lines) as well.
Unfortunately this does not work. The read statement opens the FIFO, causing the writer (ls) to complete at once, closing the FIFO then seems to cause the remaining data to be dropped, other elements cannot be read by another process (in fact, the next read < fifo hangs until another writer appears and writes into the FIFO).
I also considered touching files in a special directory (as a writer) and moving the files away (as a reader), but this seems tedious and obviously is not feasible for millions of queue entries.
Can I get the FIFO variant to work somehow?
Is there a different way of implementing a shell queue, having several writers and several readers, all working on the same queue?

You just need to keep the PIPE open
$mkfifo PIPE
$cat > PIPE &
Pipe is now open indefinitely until you kill the cat.
$ls > PIPE &
$read Line < PIPE
$echo $Line
file1
You can now write and read to your hearts content.

I may have found an answer myself. I'm not using FIFOs but a minimalistic TCP server accepting input from one port and writing output line by line to another.
To set up the TCP server, I use this script:
nc -k -l 4444 | while read a
do echo "$a" | nc -l 4445
done
(Append & to run this in the background, of course.)
Then the writers can do sth like this:
for ((i=0; i<10000; i++))
do
printf "x%02d\n" "$i"
done >/dev/tcp/127.0.0.1/4444
and the readers can do sth like this:
while ! { read a < /dev/tcp/localhost/4445; } 2>/dev/null
do
sleep 2 # we poll; if there is nothing, we sleep between polls
done
echo "$a"
This script fetches one element (line) and processes it (echo "$a"). Do this in a loop if you want to drain the queue.
I'm not all too happy with the polling solution, but tests show that it works reliably with two writers and two readers (and I don't see why more readers and writers should pose a problem).

Related

How do I stop the cat command when it's reading from an named pipe?

I have named pipe that gets data very slowly but endlessly, I want to copy the contents of the named pipe to date formatted files while they arrive.
I have something like this
do
cat /tmp/big_file > `printf '%(%Y/%m/%d)T' -1`.output &
sleep 3590
kill $!
sleep 10
done
Is it safe to just kill cat? Or could I lose some data in its buffer? How do I tell cat it's time to stop?
If you do not want to use while read you can send signals also with kill like in https://unix.stackexchange.com/questions/2107/how-to-suspend-and-resume-processes By default kill sends SIGTERM. You can also send signals using C or shell code like in https://www.thegeekstuff.com/2011/02/send-signal-to-process/ With SIGTSTP you suspend a process, SIGCONT continues it, a list of all signals is in http://man7.org/linux/man-pages/man7/signal.7.html
https://unix.stackexchange.com/questions/2107/how-to-suspend-and-resume-processes
There are some pitfalls described in https://unix.stackexchange.com/questions/149741/why-is-sigint-not-propagated-to-child-process-when-sent-to-its-parent-process

Bash script to read ids from file and spawn process in a controlled manner

I need to read hundreds of ids from a file and pass those to another shell script as parameters to be spawned as separate child requests. [Done]
But we cannot spawn more than 6 child requests i.e., not more than 6 requests can be running at a given point in time.
I have gone through this site (references given below) and others and came to know that you can get the PID of the spawned process using $! but am faraway from implementing it as I do not know how to store them in an array and delete it once the spawned process is complete.
Forking / Multi-Threaded Processes | Bash
How to wait in bash for several subprocesses to finish and return exit code !=0 when any subprocess ends with code !=0?
#!/bin/bash
file="/usr/share/nginx/html/cron/userids.txt" //file containing the userids that needs to be spawned
MAXCOUNT=6 //maximum number of child request that can be spawned
while IFS= read -r line
do
#submit a background job here
sh fetch.sh $line & //this is the shell script that needs to be submitted
//check if the spawned request count is less than MAXCOUNT
//If it is then wait for one or more request to finish
//If the number of child requests is less than MAXCOUNT then spawn another request
//if all the lines are read then wait till all the child process completes and then exit
done <"$file"
Please be cognizant that I am newbie and do not know much about the shell process.
Will appreciate any directions and feedback.
Thanks
You can use GNU Parallel for this:
parallel -j6 -a "$file" fetch.sh
It has lots of options for handling failures, progress bars, logging etc.
You can use xargs to spawn a maximum number of processes passing the arguments read from stdin
xargs -n 1 -P 6 fetch.sh < "$file"

Bash script lingers after exiting (issues with named pipe I/O)

Summary
I have worked out a solution to the issue of this question.
Basically, the callee (wallpaper) was not itself exiting because it was waiting on another process to finish.
Over the course of 52 days, this problematic side effect had snowballed until 10,000+ lingering processes were consuming 10+ gigabytes of RAM, almost crashing my system.
The offending process turned out to be a call to printf from a function called log that I had sent into the background and forgotten about, because it was writing to a pipe and hanging.
As it turns out, a process writing to a named pipe will block until another process comes along and reads from it.
This, in turn, changed the requirements of the question from "I need a way to stop these processes from building up" to "I need a better way of getting around FIFO I/O than throwing it to the background".
Note that while the question has been solved, I'm more than happy to accept an answer that goes into detail on the technical level. For example, the unsolved mystery of why the caller script's (wallpaper-run) process was being duplicated as well, even though it was only called once, or how to read a pipe's state information proper, rather than relying on open's failure when called with O_NONBLOCK.
The original question follows.
The Question
I have two bash scripts meant to run in a loop. The first, wallpaper-run, runs in an infinite loop and calls the second, wallpaper.
They are part of my "desktop", which is a bunch of hacked together shell scripts augmenting the dwm window manager.
wallpaper-run:
log "starting wallpaper runner"
while true; do
log "..."
$scr/wallpaper
sleep 900 # 15 minutes
done &
wallpaper:
log "changing wallpaper"
# several utility functions ...
if [[ $1 ]]; then
parse_arg $1
else
load_random
fi
Some notes:
log is an exported function from init, which, as its name suggests, logs a message.
init calls wallpaper-run (among other things) in its foreground (hence the while loop being in the background)
$scr is also defined by init; it is the directory where so-called "init-scripts" go
parse_arg and load_random are local to wallpaper
in particular, images are loaded into the background via the program feh
The manner in which wallpaper-run is loaded is as such: $mod/wallpaper-run
init is called directly by startx, and starts dwm before it runs wallpaper-run (and the other "modules")
Now on to the problem, which is that for some reason, both wallpaper-run and wallpaper "linger" in memory. That is to say that after each iteration of the loop, two new instances of wallpaper and wallpaper-run are created, while the "old" ones don't get cleaned up and get stuck in sleep status. It's like a memory leak, but with lingering processes instead of bad memory management.
I found out about this "process leak" after having my system up for 52 days when everything broke ( something like bash: cannot fork: resource temporarily unavailable spammed the terminal whenever I tried to run a command ) because the system ran out of memory. I had to kill over 10,000 instances of wallpaper/run to bring my system back to working order.
I have absolutely no idea why this is the case. I see no reason for these scripts to linger in memory because a script exiting should mean that its process gets cleaned up.
Why are they lingering and eating up resources?
Update 1
With some help from the comments (much thanks to I'L'I), I've traced the problem to the function log, which makes background calls to printf (though why I chose to do that, I don't recall). Here is the function as it appears in init:
log(){
local pipe=$pipe_front
if ! [[ -p $pipe ]]; then
mkfifo $pipe
fi
printf ... >> $initlog
printf ... > $pipe &
printf ... &
[[ $2 == "-g" ]] && notify-send "[DWM Init] $1"
sleep 0.001
}
As you can see, the function is very poorly written. I hacked it together to make it work, not to make it robust.
The second and third printf are sent to the background. I don't recall why i did this, but it's presumably because the first printf must have been making log hang.
The printf lines have been abridged to "...", because they are fairly complex and not relevant to the issue at hand (And also I have better things to do with 40 minutes of my time than fighting with Android's garbage text input interface). In particular, things like the current time, name of the calling process, and the passed message are printed, depending on which printf we're talking about. The first has the most detail because it's saved to a file where immediate context is lost, while the notify-send line has the least amount of detail because it's going to be displayed on the desktop.
The whole pipe debacle is for interfacing directly with init via a rudimentary shell that I wrote for it.
The third printf is intentional; it prints to the tty that I log into at the beginning of a session. This is so that if init suddenly crashes on me, I can see a log of what went wrong. Or at least what was happening before it crashed
I'm including this in the question because this is the root cause of the "leak". If I can fix this function, the issue will be resolved.
The function needs to log the messages to their respective sources and halt until each call to printf finishes, but it also must finish within a timely manner; hanging for an indefinite period of time and/or failing to log the messages is unacceptable behavior.
Update 2
After isolating the log function (see update 1) into a test script and setting up a mock environment, I've boiled it down to printf.
The printf call which is redirected into a pipe,
printf "..." > $pipe
hangs if nothing is listening to it, because it's waiting for a second process to pick up the read end of the pipe and consume the data. This is probably why I had initially forced them into the background, so that a process could, at some point, read the data from the pipe while, in the immediate case, the system could move on and do other things.
The call to sleep, then, was a not-well-thought-out hack to work around data race problems resulting from one reader trying to read from multiple writers simultaneously. The theory was that if each writer had to wait for 0.001 seconds (despite the fact that the printf in the background has nothing to do with the sleep following it), somehow, that would make the data appear in order and fix the bug. Of course, looking back, that really does nothing useful.
The end result is several background processes hanging on to the pipe, waiting for something to read from it.
The answer to "Prevent hanging of "echo STRING > fifo" when nothing..." presents the same "solution" that caused the bug that spawned this question. Obviously incorrect. However, an interesting comment by user R.. mentioned something about fifos containing state which includes information such as what processes are reading the pipe.
Storing state? You mean the absence/presence of a reader? That's part of the state of the fifo; any attempt to store it outside would be bogus and would be subject to race conditions.
Obtaining this information and refusing to write if there is no reader is the key to solving this.
However, no matter what I search for on Google, I can't seem to find anything about reading the state of a pipe, even in C. I am perfectly willing to use C if need be, but a bash solution (or an existing core util) would be preferred.
So now the question becomes: how in the heck do I read the state information of a FIFO, particularly the process(es) who has (have) the pipe open for reading and/or writing?
https://stackoverflow.com/a/20694422
The above linked answer shows a C program attempting to open a file with O_NONBLOCK. So I tried writing a program whose job is to return 0 (success) if open returns a valid file descriptor, and 1 (fail) if open returns -1.
#include <fcntl.h>
#include <unistd.h>
int
main(int argc, char **argv)
{
int fd = open(argv[1], O_WRONLY | O_NONBLOCK);
if(fd == -1)
return 1;
close(fd);
return 0;
}
I didn't bother checking if argv[1] is null or if open failed because the file doesn't exist because I only plan to utilize this program from a shell script where it is guaranteed to be given the correct arguments.
That said, the program does its job
$ gcc pipe-open.c
$ ./a.out ./pipe && echo "pipe has a reader" || echo "pipe has no reader"
$ ./a.out ./pipe && echo "pipe has a reader" || echo "pipe has no reader"
Assuming the existence of pipe and that between the first and second invocations, another process opens the pipe (cat pipe), the output looks like this:
pipe has no reader
pipe has a reader
The program also works if the pipe has a second writer (I.e. it will fail because there is no reader)
The only problem is that after closing the file, the reader closes its end of the pipe as well. And removing the call to close won't do any good because all open file descriptors are automatically closed after main returns (control goes to exit, which walks the list of open file descriptors and closes them one by one). Not good!
This means that the only window to actually write to the pipe is before its closing, I.e. from within the C program itself.
#include <fcntl.h>
#include <unistd.h>
int
write_to_pipe(int fd)
{
char buf[1024];
ssize_t nread;
int nsuccess = 0;
while((nread = read(0, buf, 1024)) > 0 && ++nsuccess)
write(fd, buf, nread);
close(fd);
return nsuccess > 0 ? 0 : 2;
}
int
main(int argc, char **argv)
{
int fd = open(argv[1], O_WRONLY | O_NONBLOCK);
if(fd == -1)
return 1;
return write_to_pipe(fd);
}
Invocation:
$ echo hello world | ./a.out pipe
$ ret=$?
$ if [[ $ret == 1 ]]; then echo no reader
> elif [[ $ret == 2 ]]; then echo an error occurred trying to write to the pipe
> else echo success
> fi
Output with same conditions as before (1st call has no reader; 2nd call does):
no reader
success
Additionally, the text "Hello World" can be seen in the terminal reading the pipe
And finally, the problem is solved. I have a program which acts as a middle man between a writer and a pipe, which exits immediately with a failure code if no reader is attached to the pipe at the time of invocation, or if there is, attempts to write to the pipe and communicates failure if nothing is written.
That last part is new. I thought it might be useful in the future to know if nothing got written.
I'll probably add more error detection in the future, but since log checks for the existence of the pipe before trying to write to it, this is fine for now
The issue is that you are starting the wallpaper process without checking if the previous run finished or not. So, in 52 days, potentially 4 * 24 * 52 = ~5000 instances could be running (not sure how you found 10000, though)! Is it possible to use flock to make sure there is only one instance of wallpaper running at a time?
See this post: Quick-and-dirty way to ensure only one instance of a shell script is running at a time

How to create an anonymous pipe between 2 child processes and know their pids (while not using files/named pipes)?

Please note that this questions was edited after a couple of comments I received. Initially I wanted to split my goal into smaller pieces to make it simpler (and perhaps expand my knowledge on various fronts), but it seems I went too far with the simplicity :). So, here I am asking the big question.
Using bash, is there a way one can actually create an anonymous pipe between two child processes and know their pids?
The reason I'm asking is when you use the classic pipeline, e.g.
cmd1 | cmd2 &
you lose the ability to send signals to cmd1. In my case the actual commands I am running are these
./my_web_server | ./my_log_parser &
my_web_server is a basic web server that dump a lot of logging information to it's stdout
my_log_parser is a log parser that I wrote that reads through all the logging information it receives from my_web_server and it basically selects only certain values from the log (in reality it actually stores the whole log as it received it, but additionally it creates an extra csv file with the values it finds).
The issue I am having is that my_web_server actually never stops by itself (it is a web server, you don't want that from a web server :)). So after I am done, I need to stop it myself. I would like for the bash script to do this when I stop it (the bash script), either via SIGINT or SIGTERM.
For something like this, traps are the way to go. In essence I would create a trap for INT and TERM and the function it would call would kill my_web_server, but... I don't have the pid and even though I know I could look for it via ps, I am looking for a pretty solution :).
Some of you might say: "Well, why don't you just kill my_log_parser and let my_web_server die on its own with SIGPIPE?". The reason why I don't want to kill it is when you kill a process that's at the end of the pipeline, the output buffer of the process before it, is not flushed. Ergo, you lose stuff.
I've seen several solutions here and in other places that suggested to store the pid of my_web_server in a file. This is a solution that works. It is possible to write the pipeline by fiddling with the filedescriptors a bit. I, however don't like this solution, because I have to generate files. I don't like the idea of creating arbitrary files just to store a 5-character PID :).
What I ended up doing for now is this:
#!/bin/bash
trap " " HUP
fifo="$( mktemp -u "$( basename "${0}" ).XXXXXX" )"
mkfifo "${fifo}"
<"${fifo}" ./my_log_parser &
parser_pid="$!"
>"${fifo}" ./my_web_server &
server_pid="$!"
rm "${fifo}"
trap '2>/dev/null kill -TERM '"${server_pid}"'' INT TERM
while true; do
wait "${parser_pid}" && break
done
This solves the issue with me not being able to terminate my_web_server when the script receives SIGINT or SIGTERM. It seems more readable than any hackery fiddling with file descriptors in order to eventually use a file to store my_web_server's pid, which I think is good, because it improves the readability.
But it still uses a file (named pipe). Even though I know it uses the file (named pipe) for my_web_server and my_log_parser to talk (which is a pretty good reason) and the file gets wiped from the disk very shortly after it's created, it's still a file :).
Would any of you guys know of a way to do this task without using any files (named pipes)?
From the Bash man pages:
! Expands to the process ID of the most recently executed back-
ground (asynchronous) command.
You are not running a background command, you are running process substitution to read to file descriptor 3.
The following works, but I'm not sure if it is what you are trying to achieve:
sleep 120 &
child_pid="$!"
wait "${child_pid}"
sleep 120
Edit:
Comment was: I know I can pretty much do this the silly 'while read i; do blah blah; done < <( ./my_proxy_server )'-way, but I don't particularly like the fact that when a script using this approach receives INT or TERM, it simply dies without telling ./my_proxy_server to bugger off too :)
So, it seems like your problem stems from the fact that it is not so easy to get the PID of the proxy server. So, how about using your own named pipe, with the trap command:
pipe='/tmp/mypipe'
mkfifo "$pipe"
./my_proxy_server > "$pipe" &
child_pid="$!"
echo "child pid is $child_pid"
# Tell the proxy server to bugger-off
trap 'kill $child_pid' INT TERM
while read
do
echo $REPLY
# blah blah blah
done < "$pipe"
rm "$pipe"
You could probably also use kill %1 instead of using $child_pid.
YAE (Yet Another Edit):
You ask how to get the PIDS from:
./my_web_server | ./my_log_parser &
Simples, sort of. To test I used sleep, just like your original.
sleep 400 | sleep 500 &
jobs -l
Gives:
[1]+ 8419 Running sleep 400
8420 Running | sleep 500 &
So its just a question of extracting those PIDS:
pid1=$(jobs -l|awk 'NR==1{print $2}')
pid2=$(jobs -l|awk 'NR==2{print $1}')
I hate calling awk twice here, but anything else is just jumping through hoops.

Bash: Linking I/O of two processes

I have two programs A and B (in my case a C program and a Java program) that are supposed communicate with each other. The invocation inside a bash script of those programs looks like this:
mkfifo fifo1
mkfifo fifo2
A < fifo1 > fifo2 &
java B < fifo2 > fifo1
I know that I could do it with one fifo, but I also want to be able to show the communication on the console. The following works fine:
mkfifo fifo1
mkfifo fifo2
A < fifo1 | tee fifo2 &
java B < fifo2 | tee fifo1
My question is: Why does the second script work while the first one just hangs?
Side question: While the second version works, as soon as I redirect the output of the script to a file, the communication is no longer interleaved but ordered by process. Is there a way to prevent this?
Why does the second script work while the first one just hangs?
man open:
When opening a FIFO with O_RDONLY or O_WRONLY set:
If O_NONBLOCK is set, an open() for reading-only
shall return without delay. An open() for writing-only shall
return an error if no process currently has the file open for
reading.
If O_NONBLOCK is clear, an open() for reading-only
shall block the calling thread until a thread opens the file for
writing. An open() for writing-only shall block the calling
thread until a thread opens the file for reading.
In the first script, A opens fifo1 and B opens fifo2, both with O_RDONLY; A blocks until B would open fifo1 for writing, while B blocks until A would open fifo2 for writing… a circular wait situation. (Actually, shells open the fifos, but the resulting circular waiting is the same.)
In the second script, A opens fifo1 and B opens fifo2, both with O_RDONLY - so far the same as above. But in parallel, the first tee opens fifo2 and the second tee opens fifo1 for writing, thus unblocking A and B.
While the second version works, as soon as I redirect the output of
the script to a file, the communication is no longer interleaved but
ordered by process. Is there a way to prevent this?
This may be due to stdout buffering; try … stdbuf -oL tee … or post your input and output.

Resources