Unwanted buffering when filtering console output in Win32 - winapi

My question is related to "Turn off buffering in pipe" albeit concerning Windows rather than Unix.
I'm writing a Make clone and to stop parallel processes from thrashing each others' console output I've redirected the output to pipes (as described in here) on which I can do any filtering I want. Unfortunately long-running processes now buffer up their output rather than sending it in real-time as they would on a console.
From peeking at the MSVCRT sources it seems the root cause is that GetFileType() is used to check whether the standard I/O handles are attached to a console, which then sets an internal flag and ends up disabling buffering.
Apparently a separate array of inheritable file handles and flags can also be passed on through the undocumented lpReserved2 member of the STARTUPINFO structured when creating the process. About the only working solution I've figured out is to use this list and just lie about the device type when setting the flags for stdout/stderr.
Now then... Is there any sane way of solving this problem?

There is not. Yes, GetFileType() tells it that stdout is no longer a char device, _isatty() return false so the CRT switches the output stream to buffered mode. Important to get reasonable throughput. Flushing output one character at a time is only acceptable when a human is looking at them.
You would have to relink the programs you are trying to redirect with a customized version of the CRT. I don't doubt that if that was possible, you wouldn't be messing with this in the first place. Patching GetFileType() is another un-sane solution.

Related

Making STDIN unbuffered under Windows in Perl

I am trying to do input processing (from the console) in Perl asynchronously. My first approach was to use IO::Select but that does not work under Windows.
I then came across the post Non-buffered processor in Perl which roughly suggests this:
binmode STDIN;
binmode STDOUT;
STDIN->blocking(0) or warn $!;
STDOUT->autoflush(1);
while (1) {
my $buffer;
my $read_count = sysread(STDIN, $buffer, 4096);
if (not defined($read_count)) {
next;
} elsif (0 == $read_count) {
exit 0;
}
}
That works as expected for regular Unix systems but not for Windows, where the sysread actually does block. I have tested that on Windows 10 with 64-bit Strawberry Perl 5.32.1.
When you check the return value of blocking() (as done in the code above), it turns out that the call fails with the funny error message "An operation was attempted on something that is not a socket".
Edit: My application is a chess engine that theoretically can be run interactively in a terminal but usually communicates via pipes with a GUI. Therefore, Win32::Console does not help.
Has something changed since the blog post had been published? The author explicitely claims that this approach would work for Windows. Any other option that I can go with, maybe some module from the Win32:: namespace?
The solution I now implemented in https://github.com/gflohr/Chess-Plisco/blob/main/lib/Chess/Plisco/Engine.pm (search for the method __msDosSocket()) can be outlined as follows:
If Windows is detected as the operating system, create a temporary file as a Unix domain socket with IO::Socket::Unix for writing.
Do a fork() which actually creates a thread in Perl for Windows because the system does not have a real fork().
In the "parent", create another instance of IO::Socket::Unix with the same path for reading.
In the "child", read from standard input with getline(). This blocks, of course. Every line read is echoed to the write end of the socket.
The "parent" uses the read-end of the socket as a replacement for standard input and puts it into non-blocking mode. That works even under Windows because it is a socket.
From here on, everything is working the same as under Unix: All input is read in non-blocking mode with IO::Select.
Instead of a Unix domain socket it is probably wiser to route the communication through the loopback interface because under Windows it is hard to guarantee that a temporary file gets deleted when the process terminates since you cannot unlink it while it is in use. It is also stated in the comments that IO::Socket::UNIX may not work under older Windows versions, and so inet sockets are probably more portable to use.
I also had trouble to terminate both threads. A call to kill() does not seem to work. In my case, the protocol that the program implements is so that the command "quit" read from standard input should cause the program to terminate. The child thread therefore checks, whether the line read was "quit" and terminates with exit in that case. A proper solution should find a better way for letting the parent kill the child.
I did not bother to ignore SIGCHLD (because it doesn't exist under Windows) or call wait*() because fork does not spawn a new process image under Windows but only a new thread.
This approach is close to the one suggested in one of the comments to the question, only that the thread comes in disguise as a child process created by fork().
The other suggestion was to use the module Win32::Console. This does not work for two reasons:
As the name suggests, it only works for the console. But my software is a backend for a GUI frontend and rarely runs in a console.
The underlying API is for keyboard and mouse events. It works fine for key strokes and most mouse events, but polling an event blocks as soon as the user has selected something with the mouse. So even for a real console application, this approach would not work. A solution built on Win32::Console must also handle events like pressing the CTRL, ALT or Shift key because they will not guarantee that input can be read immediately from the tty.
It is somewhat surprising that a task as trivial as non-blocking I/O on a file descriptor is so hard to implement in a portable way in Perl because Windows actually has a similar concept called "overlapped" I/O. I tried to understand that concept, failed at it, and concluded that it is true to the Windows maxim "make easy things hard, and hard things impossible". Therefore I just cannot blame the Perl developers for not using it as an emulation of non-blocking I/O. Maybe it is simply not possible.

Recovering control of a closed input descriptor process

Doing some tests in scm (a scheme interpreter), I've intentionally closed the current-input-port (equivalent to the standard input file descriptor). Once the program work in REPL, the things got crazy, printing systematically a error message. My question is: how could I recover the control of process, that means, how could I reestablish the input file descriptor of such process?
Search for "changing file descriptor of a running process" or something similar, I couldn't find a helpful article.
Thanks in advance
System information: Debian 10.
You almost certainly can't, although this does slightly depend on how the language-level ports are mapped to the underlying OS-level I/O system.
If what you do is close the OS-level standard input then all is lost:
the REPL tries to read from standard input, gets an error as it's closed;
it tries to raise some error which will involve prompting the user for input ...
... from standard input, which is closed, so it gets error;
game over.
The only way to survive this is for one of two things to be true:
either you've wrapped an error handler around the code which is already prepared to deal with this;
or the implementation is smart enough to recognise that it's getting closed-port errors in its closed-port error handler and gives up in some smart way.
Basically once the OS level standard input is gone anything that needs to get input from it is doomed: you can't put it back without OS-level surgery on the process.
However it's possible that the implementation maps a single OS-level I/O stream to multiple language-level streams, and closing only one of these streams would leave the system with some other stream-of-last-resort to which it can still talk, and which still refers to the OS-level standard input. Common Lisp is an example of a system which can (depending on configuration) do this. It has, for instance, *standard-input* *error-output*, *query-io*, *terminal-io* and other streams, and it's very possible to be in a situation where, for instance, *standard-input* has been closed causing read errors, but *query-io* still points somewhere with a human on the end of it.
I don't know if scm does that.

Closing all pipes of a process

I am working on making a program that will act in a similar way as a shell, but supports only foreground processes and pipes. I have multiple processes writing to the same pipe and some other properties that differ from the normal usage of pipes. Anyhow, my question is,
Is there any easy (automatic) way to close all file descriptors of a process except the three basic ones?
I am asking this question since I have a lot of difficulties keeping track of all file descriptors for every process. And sometimes they act in some unpredictable ways to me. It could be also because of the fact that I don't have a very thorough understanding of them.
Is there any easy way(automatic) to close all file descriptors of a process except the three basic ones?
The normal way to do this is to simply iterate over all of them and close them:
for (i = getdtablesize(); i > 3;) close(--i);
That's already a one-liner. It doesn't get any more "automatic" than that.
I am asking this question since I have a lot of difficulty keeping track of all file descriptors for every process.
It will be worth your time to think about the life cycle of each file descriptor you open, when it gets duplicated (e.g. dup2() and fork()), how it gets used, and make sure you account for how each one is going to get closed when it is no longer needed. Papering over a problem of leaked file descriptors by indiscriminately closing them all is not going to be sustainable.
I have multiple processes writing to the same pipe
If you do this, then you need to be aware that the order in which data arrive at the other end of the pipe is going to be unpredictable. It will be difficult to avoid corrupting the data stream.
Use the closefrom(3) C library function.
From the manpage:
The closefrom() system call deletes all open file descriptors greater
than or equal to lowfd from the per-process object reference table.
Any errors encountered while closing file descriptors are ignored.
Example usage:
#include <unistd.h>
int main() {
// Close everything except stdin, stdout and stderr
closefrom(3); // Were 3 is the lowest file descriptor you wish to close
printf("Clear of all, but the three basic file descriptors!\n");
return 0;
}
This works in most unices, but requires the libbsd support library for Linux.

Ruby file handle management (too many open files)

I am performing very rapid file access in ruby (2.0.0 p39474), and keep getting the exception Too many open files
Having looked at this thread, here, and various other sources, I'm well aware of the OS limits (set to 1024 on my system).
The part of my code that performs this file access is mutexed, and takes the form:
File.open( filename, 'w'){|f| Marshal.dump(value, f) }
where filename is subject to rapid change, depending on the thread calling the section. It's my understanding that this form relinquishes its file handle after the block.
I can verify the number of File objects that are open using ObjectSpace.each_object(File). This reports that there are up to 100 resident in memory, but only one is ever open, as expected.
Further, the exception itself is thrown at a time when there are only 10-40 File objects reported by ObjectSpace. Further, manually garbage collecting fails to improve any of these counts, as does slowing down my script by inserting sleep calls.
My question is, therefore:
Am I fundamentally misunderstanding the nature of the OS limit---does it cover the whole lifetime of a process?
If so, how do web servers avoid crashing out after accessing over ulimit -n files?
Is ruby retaining its file handles outside of its object system, or is the kernel simply very slow at counting 'concurrent' access?
Edit 20130417:
strace indicates that ruby doesn't write all of its data to the file, returning and releasing the mutex before doing so. As such, the file handles stack up until the OS limit.
In an attempt to fix this, I have used syswrite/sysread, synchronous mode, and called flush before close. None of these methods worked.
My question is thus revised to:
Why is ruby failing to close its file handles, and how can I force it to do so?
Use dtrace or strace or whatever equivalent is on your system, and find out exactly what files are being opened.
Note that these could be sockets.
I agree that the code you have pasted does not seem to be capable of causing this problem, at least, not without a rather strange concurrency bug as well.

Controlling an interactive command-line utility from a Cocoa app - trouble with ptys

What I'm trying to do
My Cocoa app needs to run a bunch of command-line programs. Most of these are non-interactive, so I launch them with some command-line arguments, they do their thing, output something and quit. One of the programs is interactive, so it outputs some text and a prompt to stdout and then expects input on stdin and this keeps going until you send it a quit command.
What works
The non-interactive programs, which just dump a load of data to stdout and then terminate, are comparatively trivial:
Create NSPipes for stdout/stdin/stderr
Launch NSTask with those pipes
Then, either
get the NSFileHandle for the other end of the pipe to read all data until the end of the stream and process it in one go when the task ends
or
Get the -fileDescriptors from the NSFileHandle of the other end of the output pipes.
Set the file descriptor to use non-blocking mode
Create a GCD dispatch source with each of those file descriptors using dispatch_source_create(DISPATCH_SOURCE_TYPE_READ, ...
Resume the dispatch source and handle the data it throws at you using read()
Keep going until the task ends and the pipe file descriptor reports EOF (read() reports 0 bytes read)
What doesn't work
Either approach completely breaks down for interactive tools. Obviously I can't wait until the program exits because it's sitting at a command prompt and never will exit unless I tell it to. On the other hand, NSPipe buffers the data, so you receive it in buffer-sized chunks, unless the CLI program happens to flush the pipe explicitly, which the one in my case does not. The initial command prompt is much smaller than the buffer size, so I don't receive anything, and it just sits there. So NSPipe is also a no-go.
After some research, I determined that I needed to use a pseudo-terminal (pty) in place of the NSPipe. Unfortunately, I've had nothing but trouble getting it working.
What I've tried
Instead of the stdout pipe, I create a pty like so:
struct termios termp;
bzero(&termp, sizeof(termp));
int res = openpty(&masterFD, &slaveFD, NULL, &termp, NULL);
This gives me two file descriptors; I hand the slaveFD over to an NSFileHandle, which gets passed to the NSTask for either just stdout or both stdout and stdin. Then I try to do the usual asynchronous reading from the master side.
If I run the program I'm controlling in a Terminal window, it starts off by outputting 2 lines of text, one 18 bytes long including the newline, one 22 bytes and with no newline for the command prompt. After those 40 bytes it waits for input.
If I just use the pty for stdout, I receive 18 bytes of output (exactly one line, ending in newline) from the controlled program, and no more. Everything just sits there after the initial 18 bytes, no more events - the GCD event source's handler doesn't get called.
If I also use the pty for stdin, I usually receive 19 bytes of output (the aforementioned line plus one character from the next line) and then the controlled program dies immediately. If I wait a little before attempting to read the data (or scheduling noise causes a small pause), I actually get the whole 40 bytes before the program again dies instantly.
An additional dead end
At one point I was wondering if my async reading code was flawed, so I re-did everything using NSFileHandles and its -readInBackgroundAndNotify method. This behaved the same as when using GCD. (I originally picked GCD over the NSFileHandle API as there doesn't appear to be any async writing support in NSFileHandle)
Questions
Having arrived at this point after well over a day of futile attempts, I could do with some kind of help. Is there some fundamental problem with what I'm trying to do? Why does hooking up stdin to the pty terminate the program? I'm not closing the master end of the pty, so it shouldn't be receiving EOF. Leaving aside stdin, why am I only getting one line's worth of output? Is there a problem with the way I'm performing I/O on the pty's file descriptor? Am I using the master and slave ends correctly - master in the controlling process, slave in the NSTask?
What I haven't tried
I so far have only performed non-blocking (asynchronous) I/O on pipes and ptys. The only thing I can think of is that the pty simply doesn't support that. (if so, why does fcntl(fd, F_SETFL, O_NONBLOCK); succeed though?) I can try doing blocking I/O on background threads instead and send messages to the main thread. I was hoping to avoid having to deal with multithreading, but considering how broken all these APIs seem to be, it can't be any more time consuming than trying yet another permutation of async I/O. Still, I'd love to know what exactly I'm doing wrong.
The problem is likely that the stdio library inside is buffering output. The output will only appear in the read pipe when the command-line program flushes it, either because it writes a "\n" via the stdio library, or fflush()s, or the buffer gets full, or exits (which causes the stdio library to automatically flush any output still buffered), or possibly some other conditions. If those printf strings were "\n"-terminated, then you MIGHT the output quicker. That's because there are three output buffering styles -- unbuffered, line-buffered (\n causes a flush), and block buffered (when the output buffer gets full, it's auto-flushed).
Buffering of stdout is line-buffered by default if the output file descriptor is a tty (or pty); otherwise, block buffered. stderr is by default unbuffered. The setvbuf() function is used to change the buffering mode. These are all standard BSD UNIX (and maybe general UNIX) things I've described here.
NSTask does not do any setting up of ttys/ptys for you. It wouldn't help in this case anyway since the printfs aren't printing out \n.
Now, the problem is that the setvbuf() needs to be executed inside the command-line program. Unless (1) you have the source to the command-line program and can modify it and use that modified program, or (2) the command-line program has a feature that allows you to tell it to not buffer its output [ie, call setvbuf() itself], there's no way to change this, that I know of. The parent simply cannot affect the subprocess in this way, either to force flushing at certain points or change the stdio buffering behavior, unless the command-line utility has those features built into it (which would be rare).
Source: Re: NSTask, NSPipe's and interactive UNIX command

Resources