fprintf vs WriteFile to write to pipe: Cannot read from all pipes

fprintf vs WriteFile to write to pipe: Cannot read from all pipes - winapi

I'm trying to run a console app and read/write it's standard i/o. The problem is that, when this app writes to the output via WriteFile(GetStdHandle(...)), I successfully read it's input with ReadFile on the pipe.
When the target app uses fprintf, then ReadFile blocks until the target app exits, in which case it returns the entire output at once. When the target app blocks (say, via fgets()), then ReadFile blocks.
I 'm using standard pipe redirection: http://msdn.microsoft.com/en-us/library/windows/desktop/ms682499(v=vs.85).aspx.
Why is that strange behaviour and how do I get around it?

It is likely due to the fact that fprintf is buffered while WriteFile is not. Can you use fflush after fprintf and try the same ?

Related

Is writing to a unix file through shell script is synchronized?

i have a requirement where many threads will call same shell script to perform a work, and then will write output(data as single text line) to a common text file.
as here many threads will try to write data to same file, my question is whether unix provides a default locking mechanism so that all can not write at the same time.

Performing a short single write to a file opened for append is mostly atomic; you can get away with it most of the time (depending on your filesystem), but if you want to be guaranteed that your writes won't interrupt each other, or to write arbitrarily long strings, or to be able to perform multiple writes, or to perform a block of writes and be assured that their contents will be next to each other in the resulting file, then you'll want to lock.
While not part of POSIX (unlike the C library call for which it's named), the flock tool provides the ability to perform advisory locking ("advisory" -- as opposed to "mandatory" -- meaning that other potential writers need to voluntarily participate):
(
flock -x 99 || exit # lock the file descriptor
echo "content" >&99 # write content to that locked FD
) 99>>/path/to/shared-file
The use of file descriptor #99 is completely arbitrary -- any unused FD number can be chosen. Similarly, one can safely put the lock on a different file than the one to which content is written while the lock is held.
The advantage of this approach over several conventional mechanisms (such as using exclusive creation of a file or directory) is automatic unlock: If the subshell holding the file descriptor on which the lock is held exits for any reason, including a power failure or unexpected reboot, the lock will be automatically released.

my question is whether unix provides a default locking mechanism so
that all can not write at the same time.
In general, no. At least not something that's guaranteed to work. But there are other ways to solve your problem, such as lockfile, if you have it available:
Examples
Suppose you want to make sure that access to the file "important" is
serialised, i.e., no more than one program or shell script should be
allowed to access it. For simplicity's sake, let's suppose that it is
a shell script. In this case you could solve it like this:
...
lockfile important.lock
...
access_"important"_to_your_hearts_content
...
rm -f important.lock
...
Now if all the scripts that access "important" follow this guideline,
you will be assured that at most one script will be executing between
the 'lockfile' and the 'rm' commands.
But, there's actually a better way, if you can use C or C++: Use the low-level open call to open the file in append mode, and call write() to write your data. With no locking necessary. Per the write() man page:
If the O_APPEND flag of the file status flags is set, the file offset
shall be set to the end of the file prior to each write and no
intervening file modification operation shall occur between changing
the file offset and the write operation.
Like this:
// process-wide global file descriptor
int outputFD = open( fileName, O_WRONLY | O_APPEND, 0600 );
.
.
.
// write a string to the file
ssize_t writeToFile( const char *data )
{
return( write( outputFD, data, strlen( data ) );
}
In practice, you can write anything to the file - it doesn't have to be a NUL-terminated character string.
That's supposed to be atomic on writes up to PIPE_BUF bytes, which is usually something like 512, 4096, or 5120. Some Linux filesystems apparently don't implement that properly, so you may in practice be limited to about 1K on those file systems.

Spawned process limit in MacOS X

I am running into an issue spawning a large number of processes (200) under MacOS X Mountain Lion (though I'm sure this issue is not version specific). What I am doing is launching processes (in my test it is /bin/cat) which have their STDIN, STDOUT, and STDERR connected to pipes -- the other end of which is the spawning (parent) process.
The parent process writes data into the STDIN of the processes, which is piped to the [/bin/cat] child processes, which in turn spit the data back out of STDOUT and is read by the parent process. /bin/cat is just used for testing.
I am actually using kqueue to be notified when there is space available in the STDIN pipe. When kqueue notifies you with a EVFILT_WRITE event that space is available, the event also tells you exactly how many bytes can be written without blocking.
This all works well, and I can easily spawn 100 child (/bin/cat) processes, and everything flows through the pipes without blocking (all day long). However, when I crank up the number of processes to 200 everything grinds to a halt when the single kqueue service thread blocks on a write() call to one of the STDIN pipes. The event says that there is 16384 bytes available (basically an empty pipe) but when the program tries to write exactly 16384 bytes into the pipe, the write() blocks anyway.
Initially I thought I was running into a max. open files issue, but I've jacked up the ulimit for open files to 8192, so that is not the issue. What I have discovered from some googling is that on OS X, STDIN/STDOUT/STDERR are not in fact "files" (or "pipes") but are actually devices. When the process is hung, running lsof on the command-line also hangs with a warning about not being able to stat() the file system:
lsof: WARNING: can't stat() hfs file system /
Output information may be incomplete.
assuming "dev=1000008" from mount table
As soon as I kill the process, lsof completes. This reinforces the notion that STDIN/OUT/ERR are in fact devices and I'm running into some kind of limit.
Does anyone have an idea of what limit I am running into, for example is there a limit on the number of "device" that can be created? Can this be increased somehow?

Just to answer my own question here. This appears to be related to how MacOS X will dynamically expand a pipe from 16K to 32K to 64K (and so on). My basic workaround was to prevent the pipe from expanding. It appears that whenever you fill the pipe completely the OS will expand it. So, when the kqueue triggers that I can write into the pipe, and indicates that I have 16384 bytes available to write, I simply write 16384 - 1 bytes. Basically, whatever it tells me I have available, I write at most (available - 1) bytes. This prevents the pipe from expanding, and is preventing my code from encountering the condition where a write() to the pipe would block (even though the pipe is non-blocking).

Using stdout in a Win32 GUI application: crashes if I don't have a redirect to file in arguments

I'm building a Win32 GUI app. Inside that app, I'm using a DLL that was intended to be used in a command line app.
Suppose Foo.exe is my GUI app, and bar() is a function in the DLL that prints "hello" to stdout. Foo.exe calls bar().
If I run Foo.exe from the command line with a redirect (>) (i.e. Foo.exe > out.txt) it writes "hello" to out.txt and exits normally (as expected).
However, if I run Foo.exe without a redirect (either from cmd.exe or by double-clicking in Windows Explorer), it crashes when bar() is called.
If I run Foo.exe inside the debugger with the redirect in the command line (set through VS's properties for the project) and call "GetStdHandle(STD_OUTPUT_HANDLE)", I get a reasonable address for a handle. If I call it without the redirect in the command line, I get 0.
Do I need something to "initialize" standard out? Is there a way that I can set up this redirect in the application startup? (Redirecting to a file would be ideal. But just throwing out the data printed by the DLL would be okay, too.)
Finally, I suspect that the DLL is writing to stdout through the CRT POSIX-like API, because it is a cross-platform DLL. I don't know if this matters.
I've tried creating a file with CreateFile and calling SetStdHandle, but that doesn't seem to work. I may be creating the file incorrectly, however. See code below.
HANDLE hStdOut = GetStdHandle(STD_OUTPUT_HANDLE);
// hStdOut is zero
HANDLE hFile;
hFile = CreateFile(TEXT("something.txt"), // name of the write
GENERIC_WRITE, // open for writing
0, // do not share
NULL, // default security
CREATE_NEW, // create new file only
FILE_ATTRIBUTE_NORMAL, // normal file
NULL); // no attr. template
BOOL r = SetStdHandle(STD_OUTPUT_HANDLE, hFile) ;
hStdOut = GetStdHandle(STD_OUTPUT_HANDLE);
// hStdOut is now equal to hFile, and r is 1
bar();
// crashes if there isn't a redirect in the program arguments
UPDATE: I just found this article: http://support.microsoft.com/kb/105305. It states "Note that this code does not correct problems with handles 0, 1, and 2. In fact, due to other complications, it is not possible to correct this, and therefore it is necessary to use stream I/O instead of low-level I/O."
My DLL definitely uses file handles 0,1 and 2. So, there may be no good solution to this problem.
I'm working on a solution that checks for this case, and re-launches the exe appropriately using CreateProcess. I'll post here when I'm done.

The solution that I've found is the following:
obtain a valid File HANDLE in some way to direct the standard output.
Lets call the file handle "fh".
(please note that on Windows a File HANDLE is not the same thing of a file descriptor)
associate a file descriptor to the file handle with _open_osfhandle
(see http://msdn.microsoft.com/en-us/library/kdfaxaay.aspx for details)
Lets call "fd" the new file descriptor, an int value.
call dup2 to associate the STDOUT_FILENO to the given file descriptor:
dup2(fd, STDOUT_FILENO)
create a file strem associated to the stdout file descriptor
FILE* f = _fdopen(STDOUT_FILENO, "w");
memset stdout to the content of f:
*stdout = *f
call SetStdHandle on the given file handle:
SetStdHandle(STD_OUTPUT_HANDLE, ofh);
Please note that I've not tested exactly this sequence but something slightly different.
I don't know if some steps are redundant.
In any case the following article explain very well the concept of file handle, descriptor et fiel stream:
http://dslweb.nwnexus.com/~ast/dload/guicon.htm

You must build foo.exe as a console application with the /SUBSYSTEM switch. Windows will allocate a console (stdout) for your application automatically, which can be :
The current console
A redirection to a file
A pipe to another program's STDIN
If you build foo.exe as a GUI application, the console is not allocated by default, wich explains the crash
If you must use the GUI subsystem, it can still be done with AllocConsole. This old WDJ article has sample code to help you.

Can you tell me which library do you use? This problem have good solution. Write small stub launcher EXE (in GUI mode but with NO windows!) that have your icon and that all shortcuts launch. Make this stub EXE "CreateProcess" the real EXE with redirected output to "NUL" or "CON", or, CreateProcess() it suspended, take its STDOUT, doing nothing with it. This way, your original EXE should work without visible console, but will actually have where to write - in the handles 0,1 and 2 that are taken by the parent invisible stub EXE. Note that killing the parent EXE may make the child lost its handles - and - crash.
You may end up with two processes in Task Manager. So you can try making these 2 processes a job like Google Chrome does.
On your question Do I need something to "initialize" standard out? - only your parent / launcher can pre-initialize your STDOUT for handles 0,1 and 2 "properly".

FFmpeg progress track visual C++

In my main process, i create a ffmpeg child process using CreateProcess(...).
I need to track the status of converting progress to update a progress bar. To do it, I read text from ffmpeg output and extract progress status from it.
I make a sample programm like this:
HANDLE rPipe, wPipe;
CreatePipe(&rPipe,&wPipe,&secattr,0);
STARTUPINFO sInfo;
ZeroMemory(&sInfo,sizeof(sInfo));
PROCESS_INFORMATION pInfo;
ZeroMemory(&pInfo,sizeof(pInfo));
sInfo.cb=sizeof(sInfo);
sInfo.dwFlags=STARTF_USESTDHANDLES;
sInfo.hStdInput=NULL;
sInfo.hStdOutput=wPipe;
sInfo.hStdError=wPipe;
// pStr contain ffmpeg command
CreateProcess(0,(LPTSTR)pStr,0,0,TRUE,NORMAL_PRIORITY_CLASS|CREATE_NO_WINDOW,0,0,&sInfo,&pInfo);
CloseHandle(wPipe);
BOOL ok;
do
{
memset(buf,0,bufsize);
ok=::ReadFile(rPipe,buf,100,&reDword,0);
result += buf;
}while(ok);
But I couldnt get "result" interactively updated. My app is held during conversion, and "result" string update only after ffmpeg's process finish.
How can I have my main process and ffmpeg's run simultaneously, and interactively read from/write to ffmpeg process's output/input?
Thanks for your time!
LRs

If the ffmpeg just uses stdout without explicitly flushing the output then it may not get sent to the calling process until it ends
Child processes that use such C run-time functions as printf() and
fprintf() can behave poorly when redirected. The C run-time functions
maintain separate IO buffers. When redirected, these buffers might not
be flushed immediately after each IO call. As a result, the output to
the redirection pipe of a printf() call or the input from a getch()
call is not flushed immediately and delays, sometimes-infinite delays
occur. This problem is avoided if the child process flushes the IO
buffers after each call to a C run-time IO function. Only the child
process can flush its C run-time IO buffers. A process can flush its C
run-time IO buffers by calling the fflush() function.
http://support.microsoft.com/kb/190351

In order of tracking the progress of your child process while it is running (and after its completion), you need to check the status of this child process.
After the process was launched, check the status periodically using the following code.
pi is the PROCESS_INFORMATION:
PROCESS_INFORMATION pi;
and the code:
DWORD exitCode = 0;
success = [GetExitCodeProcess][2](pi.hProcess, &exitCode);
exitCode will hold the value STILL_ACTIVE if the process is still running.
If the function succeeds, the return value of success is nonzero.

Handling streamed data via pipes

A Win32 application (the "server") is sending a continuous stream of data over a named pipe. GetNamedPipeInfo() tells me that input and output buffer sizes are automatically allocated as needed. The pipe is operating in byte mode (although it is sending data units that are bigger than 1 byte (doubles, to be precise)).
Now, my question is this: Can I somehow verify that my application (the "client") is not missing any data when reading from the pipe? I know that those read/write operations are buffered, but I suppose the buffers will not grow indefinitely if the client doesn't fetch the data quickly enough. How do I know if I missed something? Does the server (or the pipe?) silently discard data that is not read in time by the client?
BTW, can I rely on proper alignment of the data the client reads using ReadFile()? As far as I understood, ReadFile() may return with less bytes read than specified, i.e. NumberOfBytesRead <= NumberOfBytesToRead. Do I have to check every time that NumberOfBytesRead is a multiple of sizeof(double)?

The write operation will block if there is no more room in the pipe's buffers. This is from my (old) copy of the SDK manual:
When an application uses the WriteFile
function to write to a pipe, the write
operation may not finish if the pipe
buffer is full. The write operation is
completed when a read operation (using
the ReadFile function) makes more
buffer space available.

Sorry, didn't find out how to comment on your post, Neil.
The write operation will block if there is no more room in the pipe's buffers.
I just discovered that Sysinternals' FileMon can also monitor pipe operations. For testing purposes I connected the client to the named pipe and did no read operations, just waiting. The server writes a few hundred kB to the pipe every 4--5 seconds, even though nobody is fetching the data from the pipe on the client side. No blocking write operation ... And so far no limits in buffer-size seem to have been reached.
This is either a very big buffer ... or the server does some magic additional to just using WriteFile() and waiting for the client to read.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio