Named and Unnamed Pipes - bash

Ok, here's something that I cannot wrap my head around. I bumped into this while working on a rather complex script. Managed to simplify this to the bare minimum, but it still doesn't make sense.
Let's say, I have a fifo:
mkfifo foo.fifo
Running the command below on one terminal, and then writing things into the pipe (echo "abc" > foo.fifo) on another seems to work fine:
while true; do read LINE <foo.fifo; echo "LINE=$LINE"; done
LINE=abc
However, changing the command ever so slightly, and the read command fails to wait for the next line after it's read the first one:
cat a.fifo | while true; do read LINE; echo "LINE=$LINE"; done
LINE=abc
LINE=
LINE=
LINE=
[...] # At this keeps repeating endlessly
The really disturbing part is, that it'll wait for the first line, but then it just reads an empty string into $LINE, and fails to block. (Funny enough, this is one of the few times, I want an I/O-operation to block :))
I thought, I really understand how I/O-redirection and such things work, but now I am rather confused.
So, what's the solution, what am I missing? Can anyone explain this phenomenon?
UPDATE: For a short answer, and a quick solution see William's answer. For a more in-depth, and complete insight, you'd want to go with rici's explanation!

Really, the two command lines in the question are very similar, if we eliminate the UUOC:
while true; do read LINE <foo.fifo; echo "LINE=$LINE"; done
and
while true; do read LINE; echo "LINE=$LINE"; done <foo.fifo
They act in slightly different ways, but the important point is that neither of them correct.
The first one opens and reads from the fifo and then closes the fifo every time through the loop. The second one opens the fifo, and then attempts to read from it every time through the loop.
A fifo is a slightly complicated state machine, and it's important to understand the various transitions.
Opening a fifo for reading or writing will block until some process has it open in the other direction. That makes it possible to start a reader and a writer independently; the open calls will return at the same time.
A read from a fifo succeeds if there is data in the fifo buffer. It blocks if there is no data in the fifo buffer but there is at least one writer which holds the fifo open. If returns EOF if there is no data in the fifo buffer and no writer.
A write to a fifo succeeds if there is space in the fifo buffer and there is at least one reader which has the fifo open. It blocks if there is no space in the fifo buffer, but at least one reader has the fifo open. And it triggers SIGPIPE (and then fails with EPIPE if that signal is being ignored) if there is no reader.
Once both ends of the fifo are closed, any data left in the fifo buffer is discarded.
Now, based on that, let's consider the first scenario, where the fifo is redirected to the read. We have two processes:
reader writer
-------------- --------------
1. OPEN blocks
2. OPEN succeeds OPEN succeeds immediately
3. READ blocks
4. WRITE
5. READ succeeds
6. CLOSE ///////// CLOSE
(The writer could equally well have started first, in which case it would block at line 1 instead of the reader. But the result is the same. The CLOSE operations at line 6 are not synchronized. See below.)
At line 6, the fifo no longer has readers nor writers, so its buffer is flushed. Consequently, if the writer had written two lines instead of one, the second line would be tossed into the bit bucket, before the loop continues.
Let's contrast that with the second scenario, in which the reader is the while loop and not just the read:
reader writer
--------- ---------
1. OPEN blocks
2. OPEN succeeds OPEN succeeds immediately
3. READ blocks
4. WRITE
5. READ succeeds
6. CLOSE
--loop--
7. READ returns EOF
8. READ returns EOF
... and again
42. and again OPEN succeeds immediately
43. and again WRITE
44. READ succeeds
Here, the reader will continue to read lines until it runs out. If no writer has appeared by then, the reader will start getting EOFs. If it ignores them (eg. while true; do read...), then it will get a lot of them, as indicated.
Finally, let's return for a moment to the first scenario, and consider the possibilities when both processes loop. In the description above, I assumed that both CLOSE operations would succeed before either OPEN operation was attempted. That would be the common case, but nothing guarantees it. Suppose instead that the writer succeeds in doing both a CLOSE and an OPEN before the reader manages to do its CLOSE. Now we have the sequence:
reader writer
-------------- --------------
1. OPEN blocks
2. OPEN succeeds OPEN succeeds immediately
3. READ blocks
4. WRITE
5. CLOSE
5. READ succeeds OPEN
6. CLOSE
7. WRITE !! SIGPIPE !!
In short, the first invocation will skip lines, and has a race condition in which the writer will occasionally receive a spurious error. The second invocation will read everything written, and the writer will be safe, but the reader will continuously receive EOF indications instead of blocking until data is available.
So what is the correct solution?
Aside from the race condition, the optimal strategy for the reader is to read until EOF, and then close and reopen the fifo. The second open will block if there is no writer. That can be achieved with a nested loop:
while :; do
while read line; do
echo "LINE=$line"
done < fifo
done
Unfortunately, the race condition which generates SIGPIPE is still possible, although it is going to be extremely rare [See note 1]. All the same, a writer would have to be prepared for its write to fail.
A simpler and more robust solution is available on Linux, because Linux allows fifos to be opened for reading and writing. Such an open always succeeds immediately. And since there is always a process which holds the fifo open for writing, the reads will block, as expected:
while read line; do
echo "LINE=$line"
done <> fifo
(Note that in bash, the "redirect both ways" operator <> still only redirects stdin -- or fd n form n<> -- so the above does not mean "redirect stdin and stdout to fifo".)
Notes
The fact that a race condition is extremely rare is not a reason to ignore it. Murphy's law states that it will happen at the most critical moment; for example, when the correct functioning was necessary in order to create a backup just before a critical file was corrupted. But in order to trigger the race condition, the writer process needs to arrange for its actions to happen in some extremely tight time bands:
reader writer
-------------- --------------
fifo is open fifo is open
1. READ blocks
2. CLOSE
3. READ returns EOF
4. OPEN
5. CLOSE
6. WRITE !! SIGPIPE !!
7. OPEN
In other words, the writer needs to perform its OPEN in the brief interval between the moment the reader receives an EOF and responds by closing the fifo. (That's the only way the writer's OPEN won't block.) And then it needs to do the write in the (different) brief interval between the moment that the reader closes the fifo, and the subsequent reopen. (The reopen wouldn't block because now the writer has the fifo open.)
That's one of those once in a hundred million race conditions that, as I said, only pops up at the most inopportune moment, possibly years after the code was written. But that doesn't mean you can ignore it. Make sure that the writer is prepared to handle SIGPIPE and retry a write which fails with EPIPE.

When you do
cat a.fifo | while true; do read LINE; echo "LINE=$LINE"; done
which, incidentally, ought to be written:
while true; do read LINE; echo "LINE=$LINE"; done < a.fifo
that script will block until someone opens the fifo for writing. As soon as that happens, the while loop begins. If the writer (the 'echo foo > a.fifo' you ran in another shell) terminates and there is no one else with the pipe open for writing, then the read returns because the pipe is empty and there are no processes that have the other end open. Try this:
in one shell:
while true; do date; read LINE; echo "LINE=$LINE"; done < a.fifo
in a second shell:
cat > a.fifo
in a third shell
echo hello > a.fifo
echo world > a.fifo
By keeping the cat running in the second shell, the read in the while loop blocks instead of returning.
I guess the key insight is that when you do the redirection inside the loop, the shell does not start the read until someone opens the pipe for writing. When you do the redirection to the while loop, the shell only blocks before it starts the loop.

Related

Read from n pipes from one process in parallel

I faced a concurrency problem when writing to the same named pipe created with mkfifo by multiple processes at the same time, where some writes got lost. Since the number of writing processes are limited I want to switch from "writing to 1 pipe from n processes and reading from 1 separate" to "writing to n pipes by n processes and reading from 1 separate process".
Currently I'm reading via read line <"$pipe" in a loop until a condition is met. read blocks here until a line was read.
How can I read from multiples pipes ($pipe1, $pipe2 … $pipeN) via one loop until a condition is met, while honouring newly written lines on all pipes the same?
One way to deal with the initially described problem of multiple children writing to a single FIFO is to have a process open the FIFO for reading but never actually read it. This will allow writers to write unless the FIFO is full. I don't think there's a standard program that simply goes to sleep forever until signalled. I use a home-brew program pause, which is a pretty minimal C program:
#include <unistd.h>
int main(void)
{
pause();
}
It never exits until signalled. In your shell script, first launch the children, telling them to write to $FIFO, then run:
pause <$FIFO &
pid=$!
Note that pause-like command will not be launched into the background until the redirection completes, and the open of the FIFO won't complete until there is a process to write to the FIFO — so at least one child needs to be launched in background before the pause-like process is executed. Or write a variant of pause (I call mine sleepon) which opens the files named in its argument list — then the command line is similar to sleepon $FIFO & and the backgrounding operation completes and the pause-like program blocks until it is able to open the FIFO (which will be when one of the children opens the FIFO for writing), and then goes to sleep indefinitely. But the code for sleepon is a lot more complex than the code for pause.
Once the children and the pause-like process are launched, the parent can continue with the main processing loop.
while read line
do
…
done < $FIFO
The main thing to be aware of is that the parent loop will exit whenever the FIFO is emptied. You need to know when it should terminate, if ever. At the point where it does terminate, it should kill the pause process: kill $pid. You may need to wrap a while true; do … done loop around the line-reading loop — but you may need something cleverer than that. It depends, in part, on what your "until a condition is met" requirement is.
Your requirement to 'read from multiple FIFOs, all of which may intermittently have data on them' is not easy to do. It's not particularly trivial in C; I don't think there's a standard (POSIX) shell command to assist with that. In C, you'd end up using POSIX select() or
poll() or one of their many variants — some of which are platform-specific. There might be a platform-specific command that will help; I have my doubts, though.

How can I exit reader.ReadString from waiting for user input?

I am making it so that it stops asking for input upon CTRL-C.
What I have currently is that a separate go-routine, upon receiving a CTRL-C, changes the value of a variable so it won't ask for another line. However, I can't seem to find a way around the current line.
i.e. I still have to press enter once, to get out of the current iteration of reading for \n.
Is there perhaps a way to push a "\n" into stdin for the reader.ReadString to read. Or a way to stop its execution altogether.
The only decent mechanism that Go gives you to proceed when either of two things happens is select, and select only selects on channel reads, so your only option is to change your signal-handler goroutine to write to a channel, and add another goroutine that handles stdin and passes lines of input to a channel, then select on the two channels.
However, that still leaves your question half-unanswered: your main program can stop waiting for input on a Ctrl-C, but the goroutine that's reading input will still be waiting for input. In some cases that might be okay... if you will never need stdin again, or if you will go right back to processing lines in the same exact way. But if you want to do something other than ReadString from that reader, you're stuck... literally. The only solution I see would be to write your own state machine around Read or ReadByte that is capable of changing its behavior in response to external conditions, but that can easily get horribly complicated.
Basically, this looks like a case where Go simplifies things compared to the underlying system (not exposing anything like EINTR, not allowing select on filehandles), but ends up providing less power to the programmer.

Asynchronously consuming pipe with bash

I have a bash script like this
data_generator_that_never_guits | while read data
do
an_expensive_process_with data
done
The first process continuously generates events (at irregular intervals) which needs to be processed as they become available. A problem with this script is that read on consumes a single line of the output; and as the processing is very expensive, I'd want it to consume all the data that is currently available. On the other side, the processing must start immediately if a new data becomes available. In the nutshell, I want to do something like this
data_generator_that_never_guits | while read_all_available data
do
an_expensive_process_with data
done
where the command read_all_available will wait if no data is available for consumption or copy all the currently available data to the variable. It is perfectly fine if the data does not consist of full lines. Basically, I am looking for an analog of read which would read the entire pipe buffer instead of reading just a single line from the pipe.
For the curious among you, the background of the question that I have a build script which needs to trigger a rebuild on a source file change. I want to avoid triggering rebuilds too often. Please do not suggest me to use grunt, gulp or other available build systems, they do not work well for my purpose.
Thanks!
I think I have found the solution after I got better insight how subshells work. This script appears to do what I need:
data_generator_that_never_guits | while true
do
# wait until next element becomes available
read LINE
# consume any remaining elements — a small timeout ensures that
# rapidly fired events are batched together
while read -t 1 LINE; do true; done
# the data buffer is empty, launch the process
an_expensive_process
done
It would be possible to collect all the read lines to a single batch, but I don't really care about their contents at this point, so I didn't bother figuring that part out :)
Added on 25.09.2014
Here is a final subroutine, in case it could be useful for someone one day:
flushpipe() {
# wait until the next line becomes available
read -d "" buffer
# consume any remaining elements — a small timeout ensures that
# rapidly fired events are batched together
while read -d "" -t 1 line; do buffer="$buffer\n$line"; done
echo $buffer
}
To be used like this:
data_generator_that_never_guits | while true
do
# wait until data becomes available
data=$(flushpipe)
# the data buffer is empty, launch the process
an_expensive_process_with data
done
Something like read -N 4096 -t 1 might do the trick, or perhaps read -t 0 with additional logic. See the Bash reference manual for details. Otherwise, you might have to move from Bash to e.g. Perl.

Controlling an interactive command-line utility from a Cocoa app - trouble with ptys

What I'm trying to do
My Cocoa app needs to run a bunch of command-line programs. Most of these are non-interactive, so I launch them with some command-line arguments, they do their thing, output something and quit. One of the programs is interactive, so it outputs some text and a prompt to stdout and then expects input on stdin and this keeps going until you send it a quit command.
What works
The non-interactive programs, which just dump a load of data to stdout and then terminate, are comparatively trivial:
Create NSPipes for stdout/stdin/stderr
Launch NSTask with those pipes
Then, either
get the NSFileHandle for the other end of the pipe to read all data until the end of the stream and process it in one go when the task ends
or
Get the -fileDescriptors from the NSFileHandle of the other end of the output pipes.
Set the file descriptor to use non-blocking mode
Create a GCD dispatch source with each of those file descriptors using dispatch_source_create(DISPATCH_SOURCE_TYPE_READ, ...
Resume the dispatch source and handle the data it throws at you using read()
Keep going until the task ends and the pipe file descriptor reports EOF (read() reports 0 bytes read)
What doesn't work
Either approach completely breaks down for interactive tools. Obviously I can't wait until the program exits because it's sitting at a command prompt and never will exit unless I tell it to. On the other hand, NSPipe buffers the data, so you receive it in buffer-sized chunks, unless the CLI program happens to flush the pipe explicitly, which the one in my case does not. The initial command prompt is much smaller than the buffer size, so I don't receive anything, and it just sits there. So NSPipe is also a no-go.
After some research, I determined that I needed to use a pseudo-terminal (pty) in place of the NSPipe. Unfortunately, I've had nothing but trouble getting it working.
What I've tried
Instead of the stdout pipe, I create a pty like so:
struct termios termp;
bzero(&termp, sizeof(termp));
int res = openpty(&masterFD, &slaveFD, NULL, &termp, NULL);
This gives me two file descriptors; I hand the slaveFD over to an NSFileHandle, which gets passed to the NSTask for either just stdout or both stdout and stdin. Then I try to do the usual asynchronous reading from the master side.
If I run the program I'm controlling in a Terminal window, it starts off by outputting 2 lines of text, one 18 bytes long including the newline, one 22 bytes and with no newline for the command prompt. After those 40 bytes it waits for input.
If I just use the pty for stdout, I receive 18 bytes of output (exactly one line, ending in newline) from the controlled program, and no more. Everything just sits there after the initial 18 bytes, no more events - the GCD event source's handler doesn't get called.
If I also use the pty for stdin, I usually receive 19 bytes of output (the aforementioned line plus one character from the next line) and then the controlled program dies immediately. If I wait a little before attempting to read the data (or scheduling noise causes a small pause), I actually get the whole 40 bytes before the program again dies instantly.
An additional dead end
At one point I was wondering if my async reading code was flawed, so I re-did everything using NSFileHandles and its -readInBackgroundAndNotify method. This behaved the same as when using GCD. (I originally picked GCD over the NSFileHandle API as there doesn't appear to be any async writing support in NSFileHandle)
Questions
Having arrived at this point after well over a day of futile attempts, I could do with some kind of help. Is there some fundamental problem with what I'm trying to do? Why does hooking up stdin to the pty terminate the program? I'm not closing the master end of the pty, so it shouldn't be receiving EOF. Leaving aside stdin, why am I only getting one line's worth of output? Is there a problem with the way I'm performing I/O on the pty's file descriptor? Am I using the master and slave ends correctly - master in the controlling process, slave in the NSTask?
What I haven't tried
I so far have only performed non-blocking (asynchronous) I/O on pipes and ptys. The only thing I can think of is that the pty simply doesn't support that. (if so, why does fcntl(fd, F_SETFL, O_NONBLOCK); succeed though?) I can try doing blocking I/O on background threads instead and send messages to the main thread. I was hoping to avoid having to deal with multithreading, but considering how broken all these APIs seem to be, it can't be any more time consuming than trying yet another permutation of async I/O. Still, I'd love to know what exactly I'm doing wrong.
The problem is likely that the stdio library inside is buffering output. The output will only appear in the read pipe when the command-line program flushes it, either because it writes a "\n" via the stdio library, or fflush()s, or the buffer gets full, or exits (which causes the stdio library to automatically flush any output still buffered), or possibly some other conditions. If those printf strings were "\n"-terminated, then you MIGHT the output quicker. That's because there are three output buffering styles -- unbuffered, line-buffered (\n causes a flush), and block buffered (when the output buffer gets full, it's auto-flushed).
Buffering of stdout is line-buffered by default if the output file descriptor is a tty (or pty); otherwise, block buffered. stderr is by default unbuffered. The setvbuf() function is used to change the buffering mode. These are all standard BSD UNIX (and maybe general UNIX) things I've described here.
NSTask does not do any setting up of ttys/ptys for you. It wouldn't help in this case anyway since the printfs aren't printing out \n.
Now, the problem is that the setvbuf() needs to be executed inside the command-line program. Unless (1) you have the source to the command-line program and can modify it and use that modified program, or (2) the command-line program has a feature that allows you to tell it to not buffer its output [ie, call setvbuf() itself], there's no way to change this, that I know of. The parent simply cannot affect the subprocess in this way, either to force flushing at certain points or change the stdio buffering behavior, unless the command-line utility has those features built into it (which would be rare).
Source: Re: NSTask, NSPipe's and interactive UNIX command

create a rolling buffer in bash

I want to use curl to get a stream from a remote server, and write it to a buffer. So far so good I just do curl http://the.stream>/path/to/thebuffer. Thing is I don't want this file to get too large, so I want to be able to delete the first bytes of the file as I simultaneously add to the last bytes. Is there a way of doing this?
Alternatively if I could write n bytes to buffer1, then switch to buffer2, buffer3.. and when buffer x was reached delete buffer1 and start again - without losing the data coming in from curl (it's a live stream, so I can't stop curl). I've been reading up the man pages for curl and cat and read, but can't see anything promising.
There isn't any particularly easy way to do what you are seeking to do.
Probably the nearest approach creates a FIFO, and redirects the output of curl to the FIFO. You then have a program such as split or csplit reading the FIFO and writing to different files. If you decide that the split programs are not the tool, you may need to write your own variation on them. You can then decide how to process the files that are created, and when to remove them.
Note that curl will hang until there is a process reading from the FIFO. When the process reading the FIFO exits, curl will get either a SIGPIPE signal or a write error, either of which should stop it.

Resources