Waiting for named pipe to be opened in subscript in BASH? - bash

I've got two scripts, one that takes a couple filenames as input and writes data to the pipes (really passes the pipes as arguments to program I wrote). And then the other one calls the first script with some named pipes as inputs and then calls some other programs to process the data from the pipes.
My problem is that my pipes are stalling and what I think is happening is the first bash script is called in the background from the second script, which then goes on to immediately start up the consumer processes, so I think the readers are being opened before the writers (in the subscript), which can cause a stall?
Is there a way to synchronize on a named pipe and wait for it to be opened in bash?

I don't think that's your problem.
If the producer starts later than the consumer, no big deal.
Example:
Window 1
$ mkfifo foo.pipe
$ cat foo.pipe
(hangs)
Window 2
$ echo 'something' > foo.pipe
Window 1
something
(exits)
Perhaps your problem is that one process is consuming the output of the fifo, then the producer quits, then you're trying to read from the fifo again.
In that case, it would hang indefinitely.
e.g. after the above sequence:
Window 1
$ cat foo.pipe
hangs until you run another echo something > foo.pipe.

Related

Make bash file wait for system calls made in Fortran

My bash script looks something like this
mpiexec ./fortran_bin |& tee text_file
wait
./process_output_files
My MPI-based Fortran program makes several synchronous system calls with call exec_cmd(cmd,wait=.true.).
My problem is that handle_output_files only waits for fortran_bin to finish, but some system commands (cmd) are not yet done, and this messes up my output files.
How do I make process_output_files wait for cmd to finish?
NOTES
I'm not sure where best to solve this problem (if there is a solution):
within Fortran, with MPI, within Bash ...
cmd is of the form cat out_{1..n} > out && rm -f out_{1..n}.
I would like it to run synchronously (wait=.false.), because cmd can be time-consuming, and unrelated to the rest of the Fortran program.
The wait line in the bash script seems to have no effect.
I suppose you could ask the equivalent question for a C/C++ program that calls system(some_script).
But I can only find question about waiting within a C/C++ program, if the same program needs the result of the called command (e.g., here and here).
From the notes above, looks like the sub command inherit the stdout/stderr on the calling process, and do not leave any background processes behind.
If those assumptions are true, you can impose a wait until there is no more output coming from the fortran_bin and it's children by piping the output into cat (or similar). The cat program will not terminate until all 'fortran_bin' children (that did not redirect stderr) will finish
mpiexec ./fortran_bin 2>&1 3>&1 | cat
Possible to use tee (or other similar programs) instead of cat

Input redirection after program has started running

Is there a way to use input redirection after the program has started?
For example I want to run a program, scrape some data from it, then use that data to push it + some static data (from a file) to std input:
1 ./Binary
2 Hello the open machine is: computer2
3 Which computer:command do you want to use:
4 <<< "computer2:RunWaterPlants"
I want to redirect line 4 in using some program output from line 2.
I've tried Keeping a bash script running along the program it has started, and sending the program input, but it will just continue with the program execution without waiting for my input.
I can't edit ./Binary.
I found Write to stdin of a running process using pipe and it works for what I'm asking, but I can't see the stdout when I run it with pipe.
I figured it out from Writing to stdin of a process. Pretty much I started a fifo pipe and wrote to it and let it listen for input.

Bash: file descriptors

I am a Bash beginner but I am trying to learn this tool to have a job in computers one of these days.
I am trying to teach myself about file descriptors now. Let me share some of my experiments:
#!/bin/bash
# Some dummy multi-line content
read -d '' colours <<- 'EOF'
red
green
blue
EOF
# File descriptor 3 produces colours
exec 3< <(echo "$colours")
# File descriptor 4 filters colours
exec 4> >(grep --color=never green)
# File descriptor 5 is an unlimited supply of violet
exec 5< <(yes violet)
echo Reading colours from file descriptor 3...
cat <&3
echo ... done.
echo Reading colours from file descriptor 3 again...
cat <&3
echo ... done.
echo Filtering colours through file descriptor 4...
echo "$colours" >&4
echo ... done. # Race condition?
echo Dipping into some violet...
head <&5
echo ... done.
echo Dipping into some more violet...
head <&5
echo ... done.
Some questions spring to mind as I see the output coming from the above:
fd3 seems to get "depleted" after "consumption", is it also automatically closed after first use?
how is fd3 different from a named pipe? (something I have looked at already)
when exactly does the command yes start executing? upon fd declaration? later?
does yes stop (CTRL-Z or other) and restart when more violet is needed?
how can I get the PID of yes?
can I get a list of "active" fds?
very interesting race condition on filtering through fd4, can it be avoided?
will yes only stop when I exec 5>&-?
does it matter whether I close with >&- or <&-?
I'll stop here, for now.
Thanks!
PS: partial (numbered) answers are fine.. I'll put together the different bits and pieces myself.. (although a comprehensive answer from a single person would be impressive!)
fd3 seems to get "depleted" after "consumption", is it also automatically closed after first use?
No, it is not closed. This is due to the way exec works. In the mode in which you have used exec (without arguments), its function is to arrange the shell's own file descriptors as requested by the I/O redirections specified to itself, and then leave them that way until the script terminated or they are changed again later.
Later, cat receives a copy of this file descriptor 3 on its standard input (file descriptor 0). cat's standard input is implicitly closed when cat exits (or perhaps, though unlikely, cat closes it before it exists, but that doesn't matter). The original copy of this file, which is the shell's file descriptor 3, remains. Although the actual file has reached EOF and nothing further will be read from it.
how is fd3 different from a named pipe? (something I have looked at already)
The shell's <(some command) syntax (which is not standard bourne shell syntax and I believe is only available in zsh and bash, by the way) might actually be implemented using named pipes. It probably isn't under Linux because there's a better way (using /dev/fd), but it probably is on other operating systems.
So in that sense, this syntax may or may not be a helper for setting up named pipes.
when exactly does the command yes start executing? upon fd declaration? later?
As soon as the <(yes violet) construct is evaluated (which happens when the exec 5< <(yes violet) is evaluated).
does yes stop (CTRL-Z or other) and restart when more violet is needed?
No, it does not stop. However, it will block soon enough when it starts producing more output than anything reading the other end of the pipe is consuming. In other words, the pipe buffer will become full.
how can I get the PID of yes?
Good question! $! appears to contain it immediately after yes is executed. However there seems to be an intermediate subshell and you actually get the pid of that subshell. Try <(exec yes violet) to avoid the intermediate process.
can I get a list of "active" fds?
Not from the shell. But if you're using an operating system like Linux that has /proc, you can just consult /proc/self/fd.
very interesting race condition on filtering through fd4, can it be avoided?
To avoid it, you presumably want to wait for the grep process to complete before proceeding through the script. If you obtain the process ID of that process (as above), I think you should be able to wait for it.
will yes only stop when I exec 5>&-?
Yes. What will happen then is that yes will continue to try to produce output forever but when the other end of the file descriptor is closed it will either get a write error (EPIPE), or a signal (SIGPIPE) which is fatal by default.
does it matter whether I close with >&- or <&-?
No. Both syntaxes are available for consistency's sake.

How does shell execute piped commands?

I want to understand that how does shell executes piped commands ? e.g. cat | more. I am aware that for executing a normal command shell does a fork, execute it and then child returns. But how does shell internally handle the execution of piped commands ?
Considering for example cat | grep, the shell first forks itself to start cat, and then forks itself once more to start grep.
Before calling one of the exec* family of functions in the two newly created processes to start the two programs, the tricky part is setting up the pipe and redirecting the descriptors. The pipe(2) system call is used in the shell process before forking to return a pair of descriptors which both children inherit - a reading end and a writing end.
The reading end will be closed in the first process (cat), and stdout will be redirected to the writing end using the dup2(2) system call. Similarly, the writing end in the second process (grep) will be closed and stdin will be redirected to the reading end again using dup2(2).
This way both programs are unaware of a pipe because they just work with the standard input/output.
It sets up a pipe using the pipe system call, forks two processes instead of one and attaches one end of the pipe to the first process' stdout and the other end to the second process' stdin.
The same, just the stdout of one application is the same as the next stdin. http://unixwiz.net/techtips/remap-pipe-fds.html

When the input is from a pipe, does STDIN.read run until EOF is reached?

Sorry if this is a naïve question, but let's say I have a Ruby program called processor.rb that begins with data = STDIN.read. If I invoke this program like this
cat textfile.txt | processor.rb
Does STDIN.read wait for cat to pipe the entire textfile.txt in? Or does it assign some indeterminate portion of textfile.txt to the data variable?
I'm asking this because I recently saw a strange bug in one of my programs that suggests that the latter is the case.
The read method should import the entire file, as-is, and return only when the process producing the output has finished, as indicated by a flag on the pipe. It should be the case that on output from cat that if you call read a subsequent time, you will return 0 bytes.
In simple terms, a process is allowed to append to its output at any time, which is the case of things like 'tail -f', so you can't be assured that you have read all the data from STDIN without actually checking.
Your OS may implement cat or shell pipes slightly differently, though. I'm not familiar with what POSIX dictates for behavior here.
Probably is line buffered and reads until it encounters a newline or EOF.

Resources