A pipe in Bash takes the standard output of one process and passes it
as standard input into another process. Bash scripts support
positional arguments that can be passed in at the command line.
I am working with Linux commands and pipes have always come up. What does it mean to pipe the standard output of one process? I understand that the output will be sent to the next command in the pipeline but how? How exactly does it happen? Why does variables created in a sub shell not be piped?
What does it mean to pipe the standard output of one process?
It means to execute pipe system call. Then, fork one process, execute dup2 system call to duplicate file descriptor representing standard output stream to one of the file descriptors returned by pipe. Then, fork the second process, do the same dup2 but with the standard output file descriptor and the other file descriptor returned by pipe.
but how? How exactly does it happen?
Calling write system call from one process on one of the file descriptors returned by pipe system call, it executes a handler for the pipe, a kernel function, that takes the content of the buffer from userspace process and places it in a circular hold buffer.
When the other process executes read system call on the other file descriptors returned by pipe system call, the kernel function reads the element from the pipe circular hold buffer and copies data to the buffer given by the user, and advances the hold buffer.
Can I pipe contents of a variable? Can you include this in your answer?
You can execute a process that writes the content of the variable to standard output and connect the process to a pipe. I.e. echo "$var" |.
Related
In Linux/Unix when a process in the background mode tries to read from stdin it gets terminated. What is causing this termination ? There is no core file. So it doesn't look like termination is by a signal that generates core.
One reason for termination could be a signal.
When process are not connected directly to a tty device, stdin/stdout are typically handled with pipes.
The pipe(7) man page says:
If all file descriptors referring to the read end of a pipe have been closed, then a write(2) will cause a SIGPIPE signal to be generated for the calling process.
However, this applies only to writing.
For reading,
if all file descriptors referring to the write end of a pipe have been closed, then an attempt to read(2) from the pipe will see end-of-file (read(2) will return 0).
It is quite possible that the program, when it cannot read anything, decides to terminate. (What else could it do?)
Suppose I have two processes which may write the same content into the same file:
echo "large content" > aFileName
cat aFileName
Does it need synchronization/locks? Could I be sure that after such command the file will have the content not mutilated by race conditions in all processes?
Each process will be using its own file pointer so it should be safe in the normal case.
The only problem I can see is:
process A truncates the file
process A writes some data
process B truncates the file
process B writes less than what process A has written in step #2
process B terminates abnormally without writing the whole file
Now some of the data written in step #2 is lost, even if process A then continues to write the rest.
You could write into a temporary file that's atomically renamed after all the content has been written. Make sure the temporary file is on the same file system as the output, and use some unique identifier (such as the process id) in its name. Also, set up a trap handler to delete the file.
Downside to this solution is that it takes up more storage because multiple copies of the temporary file may be written concurrently, and also some of said temporary files may stay behind as garbage if the process dies without the trap handler being able to run.
I'm writing a program that communicate with its child process using anonymous pipe. And I want to know if there are contents in the pipe for me to read. Because calling ReadFile for an empty pipe would result in halting.
The child process is another software so I can't change it. I can only redirect its i/o.
You can use the PeekNamedPipe function to examine the state of the pipe without blocking.
What I'm trying to do
My Cocoa app needs to run a bunch of command-line programs. Most of these are non-interactive, so I launch them with some command-line arguments, they do their thing, output something and quit. One of the programs is interactive, so it outputs some text and a prompt to stdout and then expects input on stdin and this keeps going until you send it a quit command.
What works
The non-interactive programs, which just dump a load of data to stdout and then terminate, are comparatively trivial:
Create NSPipes for stdout/stdin/stderr
Launch NSTask with those pipes
Then, either
get the NSFileHandle for the other end of the pipe to read all data until the end of the stream and process it in one go when the task ends
or
Get the -fileDescriptors from the NSFileHandle of the other end of the output pipes.
Set the file descriptor to use non-blocking mode
Create a GCD dispatch source with each of those file descriptors using dispatch_source_create(DISPATCH_SOURCE_TYPE_READ, ...
Resume the dispatch source and handle the data it throws at you using read()
Keep going until the task ends and the pipe file descriptor reports EOF (read() reports 0 bytes read)
What doesn't work
Either approach completely breaks down for interactive tools. Obviously I can't wait until the program exits because it's sitting at a command prompt and never will exit unless I tell it to. On the other hand, NSPipe buffers the data, so you receive it in buffer-sized chunks, unless the CLI program happens to flush the pipe explicitly, which the one in my case does not. The initial command prompt is much smaller than the buffer size, so I don't receive anything, and it just sits there. So NSPipe is also a no-go.
After some research, I determined that I needed to use a pseudo-terminal (pty) in place of the NSPipe. Unfortunately, I've had nothing but trouble getting it working.
What I've tried
Instead of the stdout pipe, I create a pty like so:
struct termios termp;
bzero(&termp, sizeof(termp));
int res = openpty(&masterFD, &slaveFD, NULL, &termp, NULL);
This gives me two file descriptors; I hand the slaveFD over to an NSFileHandle, which gets passed to the NSTask for either just stdout or both stdout and stdin. Then I try to do the usual asynchronous reading from the master side.
If I run the program I'm controlling in a Terminal window, it starts off by outputting 2 lines of text, one 18 bytes long including the newline, one 22 bytes and with no newline for the command prompt. After those 40 bytes it waits for input.
If I just use the pty for stdout, I receive 18 bytes of output (exactly one line, ending in newline) from the controlled program, and no more. Everything just sits there after the initial 18 bytes, no more events - the GCD event source's handler doesn't get called.
If I also use the pty for stdin, I usually receive 19 bytes of output (the aforementioned line plus one character from the next line) and then the controlled program dies immediately. If I wait a little before attempting to read the data (or scheduling noise causes a small pause), I actually get the whole 40 bytes before the program again dies instantly.
An additional dead end
At one point I was wondering if my async reading code was flawed, so I re-did everything using NSFileHandles and its -readInBackgroundAndNotify method. This behaved the same as when using GCD. (I originally picked GCD over the NSFileHandle API as there doesn't appear to be any async writing support in NSFileHandle)
Questions
Having arrived at this point after well over a day of futile attempts, I could do with some kind of help. Is there some fundamental problem with what I'm trying to do? Why does hooking up stdin to the pty terminate the program? I'm not closing the master end of the pty, so it shouldn't be receiving EOF. Leaving aside stdin, why am I only getting one line's worth of output? Is there a problem with the way I'm performing I/O on the pty's file descriptor? Am I using the master and slave ends correctly - master in the controlling process, slave in the NSTask?
What I haven't tried
I so far have only performed non-blocking (asynchronous) I/O on pipes and ptys. The only thing I can think of is that the pty simply doesn't support that. (if so, why does fcntl(fd, F_SETFL, O_NONBLOCK); succeed though?) I can try doing blocking I/O on background threads instead and send messages to the main thread. I was hoping to avoid having to deal with multithreading, but considering how broken all these APIs seem to be, it can't be any more time consuming than trying yet another permutation of async I/O. Still, I'd love to know what exactly I'm doing wrong.
The problem is likely that the stdio library inside is buffering output. The output will only appear in the read pipe when the command-line program flushes it, either because it writes a "\n" via the stdio library, or fflush()s, or the buffer gets full, or exits (which causes the stdio library to automatically flush any output still buffered), or possibly some other conditions. If those printf strings were "\n"-terminated, then you MIGHT the output quicker. That's because there are three output buffering styles -- unbuffered, line-buffered (\n causes a flush), and block buffered (when the output buffer gets full, it's auto-flushed).
Buffering of stdout is line-buffered by default if the output file descriptor is a tty (or pty); otherwise, block buffered. stderr is by default unbuffered. The setvbuf() function is used to change the buffering mode. These are all standard BSD UNIX (and maybe general UNIX) things I've described here.
NSTask does not do any setting up of ttys/ptys for you. It wouldn't help in this case anyway since the printfs aren't printing out \n.
Now, the problem is that the setvbuf() needs to be executed inside the command-line program. Unless (1) you have the source to the command-line program and can modify it and use that modified program, or (2) the command-line program has a feature that allows you to tell it to not buffer its output [ie, call setvbuf() itself], there's no way to change this, that I know of. The parent simply cannot affect the subprocess in this way, either to force flushing at certain points or change the stdio buffering behavior, unless the command-line utility has those features built into it (which would be rare).
Source: Re: NSTask, NSPipe's and interactive UNIX command
I have some basic questions about pipes I am unsure about.
a) What is the standard behavior if a process writing to a pipe gets killed (ie. SIGKILL SIGINT) Does it close the pipe? Does it flush the pipe? Or is the behavior undefined?
b) What is the standard behavior if a process returns normally? Is it guaranteed to flush the pipe and close the pipe? (without explicitly doing so of course).
I would like these answers to be as general as possible, but in reality if it depends entirely on the OS specs I can accept that! However, if there is a Posix standard or a current defined Windows behavior I would be very grateful to know.
Thanks.
a. What is the standard behavior if a process writing to a pipe gets killed (ie. SIGKILL SIGINT) Does it close the pipe? Does it flush the pipe? Or is the behavior undefined?
SIGKILL never allows any cleanup - the process dies, dead. With SIGINT, it depends on whether the process handles the signal. If so, it is likely to exit via exit(2), which flushes standard I/O file handles. The question is - was the pipe connected to standard output or via popen()? If so, outstanding buffered data may be flushed; if not, there is no buffered data so flushing is immaterial.
If there is unread data in the pipe, that data remains in the pipe, ready for the reader to collect - assuming there is a reader.
b. What is the standard behavior if a process returns normally? Is it guaranteed to flush the pipe and close the pipe? (without explicitly doing so of course).
It depends on whether the pipe was connected via standard I/O or not. If not, there is nothing pending. If so, then yes, any material in the buffers will be flushed as the standard I/O stream is closed.
c. Thanks for the info on signals and the unread data, but I'm a little confused about the standard I/O pipe connection. After you mentioned popen() I looked it up and the man page says its return value identical to an I/O stream and the streams are fully buffered by default. I'm just not clear on the difference between the two nor do I understand where the difference comes from.
The basic system call for creating pipes is pipe(2). It creates two file descriptors, one for the read end of the pipe, one for the write end. If you do nothing else with them, then they remain as file descriptors, with unbuffered output (via write(2) and related system calls). If the process terminates, there is no buffering in the application; the pipe is closed.
If you use popen(3), then it does a whole lot more work for you. It still invokes pipe(2) to create the pipes, but it then does a fork(2). The child arranges the correct configuration of the pipes and launches the child process. The parent also closes the unused end of the pipe, and uses fdopen(3) to create a standard I/O file stream for the calling process to use.
With the file stream, if there is data in the I/O buffer, then a close or equivalent will ensure that the outstanding data is flushed and the file descriptor is closed.
The normal behaviour is that all file descriptors are closed when a process terminates. This means that a pipe, like any other open file descriptor, is closed normally.
One interesting thing about pipes, though: in POSIX, if a process writes to a pipe that has been closed, the writer will get a signal, SIGPIPE.
Edit:
A caveat: The difference between s SIGx termination and a normal termination is that, like any other file write, you may loose data that has been buffered (via a FILE write) and not yet written to the file descriptor.