Freepascal flushes stdout on every output under Windows? - windows

Look at the following four programs. Build them with Freepascal under Windows and run, redirecting output to any file, and noticing the time it would take.
My results are: all the programs run approximately the same time (about 6 seconds), although the fourth does 100 times more bytes of output. This means that the fourth program runs much faster per byte of output than the tree others.
For the second program the reason of slowness is obvious: the flush call. For the third program the reason is not so obvious, but it may be reasonable to assume that each call to writeln to stdout flushes the output buffer implicitly.
However, it is not clear why the first program is so slower than the fourth. However, the fact that adding flush(output); (see program 2) does not change the timing much seems to mean that FPC flushes the output buffer even after every write, this would explain all the behaviour. This happens only when output is to stdout, even redirected; if I explicitly output to a particular file using assign/rewrite, then the program without flush runs much faster than the program with flush — as should be expected.
Under Linux the running times are 0.01s, 0.65s, 0.01s, 0.30s (output being 100 times bigger). Here clearly flush() slows program down, so under Linux FPC seems not to flush stdout each time.
I have tried to google whether FPC really flushes stdout buffer on every output (be it write or writeln), but have found no information except the comment in the example program from flush function documentation at http://www.freepascal.org/docs-html/rtl/system/flush.html, the comment mentions that a writeln to 'output' always causes a flush [as opposed to write]. However, the example there does not produce the intended output neither under Windows, neither under Linux. In fact, the output seems to be flushed after every write and writeln under Windows, redirected or not, and under Linux too when the output is not redirected. Under Linux with redirected output it seems there is no implicit flushing at all.
So, my questions are:
Is it true that FPC flushes the output buffer after every output, be it write or writeln, on Windows whether the output is redirected to file or not?
If yes, then is there any way to turn this off (some compiler directive, or a workaround)? I still need to keep the output to stdout, so that if I start the program without any redirections, it will output text to the console. (I understand that I might see it appearing at strange times as a result of buffering, it is not a problem.)
If no, then why the first program runs much slower then the fourth?
My system is Windows XP with FPC 2.6.4 under VirtualBox under Kubuntu 14.04, and Kubuntu 14.04 itself with FPC 2.6.2. I had no chance trying to run it on a real Windows machine, but I have some reasons to believe that the situation is the same there.
The programs:
var i,j:integer;
s:string;
begin
for j:=1 to 1000 do begin
for i:=1 to 10 do
write('!');
end;
end.
var i,j:integer;
s:string;
begin
for j:=1 to 1000 do begin
for i:=1 to 10 do begin
write('!');
flush(output);
end;
end;
end.
var i,j:integer;
s:string;
begin
for j:=1 to 1000 do begin
for i:=1 to 10 do
writeln('!');
end;
end.
var i,j:integer;
s:string;
begin
for j:=1 to 10000 do begin
s:='';
for i:=1 to 100 do
s:=s+'!';
write(s);
end;
end.

To prevent flushing of Stdout, insert the following code snippets into your program:
// Textrec is defined in sysutils
uses
sysutils;
// ...
// disabled flushing after each write(ln) on Output, do this at the start of the program
Textrec(Output).FlushFunc:=nil;
But be aware that this means that writelns might not be completed before the program ends.
You can even accelerate the program further by increasing the output buffer of stdout:
// define buffer
var
buf : array[1..100000] of byte;
// ...
// install buffer, do this at the start of the program
SetTextBuf(Output,buf,sizeof(buf));

Related

Virtual Pascal RunTime Error (or no output) if including Printer unit?

I'm using Virtual Pascal on Windows and have a weird problem whereby if an attempt to Reset() any file that doesn't exist causes the WriteLn() to CRT fail. Example:
{$I+}
program TestIt;
uses Printer; { attempts to open LPT1 as Lst but LPT1 doesn't exist on system }
begin
{ Generates a RunTime Error with $I+ and no output with $I- }
writeln('Test It');
end.
This also fails:
program TestIt;
var
SomeFile : text;
begin
{$I-}
Assign(SomeFile, 'a:\filepath_not_exist');
{ Without $I- the Reset generates a RunTime Error as expected }
Reset(SomeFile);
{$I+}
{ Generates a RunTime Error with $I+ and no output with $I- }
writeln('Test It');
end.
This works as expected:
{$I+}
program TestIt;
begin
writeln('Test It');
end.
Any ideas why this may be happening and how to fix it? Is the source available for WriteLn() or Assign() ? I'm able to change the Printer unit to not Reset() to work around it until needing to get printing working but I don't think failure to open a file should cause the screen output to stop working.
Probably the error state is kept in some variable (which is queried by ioresult) and might cause an error to be raised in the next I/O function. If you don't do automatic error checking ($I+) you must call ioresult after every operation.
Such implementation is not ideal (if errors from the opening using "somefile" spill into a writeln to a second(output) file as happens in the second example), but it happens, specially in older compilers, and specially on targets that are less dos-like.
Also, not existing drive errors are much harder to cache than non existing files. Keep those cases fundamentally apart.
You can test this theory by trying to reset state by querying ioresult just before the action in the hope it resets the state.
Afaik VP is dead for almost a decade now. What you see (read: download) is what you get.

Cygwin 64-bit C compiler caching funny (and ending early)

We've been using CygWin (/usr/bin/x86_64-w64-mingw32-gcc) to generate Windows 64-bit executable files and it has been working fine through yesterday. Today it stopped working in a bizarre way--it "caches" standard output until the program ends. I wrote a six line example
that did the same thing. Since we use the code in batch, I wouldn't worry except when I run a test case on the now-strangely-caching executable, it opens the output files, ends early, and does not fill them with data. (The same code on Linux works fine, but these guys are using Windows.) I know it's not gorgeous code, but it demonstrates my problem, printing the numbers "1 2 3 4 5 6 7 8 9 10" only after I press the key.
#include <stdio.h>
main ()
{
char q[256];
int i;
for (i = 1; i <= 10; i++)
printf ("%d ", i);
gets (q);
printf ("\n");
}
Does anybody know enough CygWin to help me out here? What do I try? (I don't know how to get version numbers--I did try to get them.) I found a 64-bit cygwin1.dll in /cygdrive/c/cygwin64/bin and that didn't help a bit. The 32-bit gcc compilation works fine, but I need 64-bit to work. Any suggestions will be appreciated.
Edit: we found and corrected an unexpected error in the original code that caused the program not to populate the output files. At this point, the remaining problem is that cygwin won't show the output of the program.
For months, the 64-bit executable has properly generated the expected output, just as the 32-bit version did. Just today, it has started exhibiting the "caching" behavior described above. The program sends many hundreds of lines with many newline characters through stdout. Now, when the 64-bit executable is created as above, none of these lines are shown until the program completes and the entire output it printed at once. Can anybody provide insight into this problem?
This is quite normal. printf outputs to stdout which is a FILE* and is normally line buffered when connected to a terminal. This means you will not see any output until you write a newline, or the internal buffer of the stdout FILE* is full (A common buffer size is 4096 bytes).
If you write to a file or pipe, output might be fully buffered, in which case output is flushed when the internal buffer is full and not when you write a newline.
In all cases the buffers of a FILE* are flushed when: you call fflush(..). You call fclose(..) or the program ends normally.
Your program will behave the same on windows/cygwin as on linux.
You can add a call to fflush(stdout) to see the output immediately.
for (i = 1; i <= 10; i++) {
printf ("%d ", i);
fflush(stdout);
}
Also, do not use the gets() function.
If your real programs "ends early" and does not write data in text files that it's supposed to, it may be it crashes due to a bug of yours before it finishes, in which case the buffered output will not be flushed out. Or, more unlikely, you call the _exit() function, which will terminate the program without flushing FILE* buffers (in contrast to the exit() function)

Spawned process limit in MacOS X

I am running into an issue spawning a large number of processes (200) under MacOS X Mountain Lion (though I'm sure this issue is not version specific). What I am doing is launching processes (in my test it is /bin/cat) which have their STDIN, STDOUT, and STDERR connected to pipes -- the other end of which is the spawning (parent) process.
The parent process writes data into the STDIN of the processes, which is piped to the [/bin/cat] child processes, which in turn spit the data back out of STDOUT and is read by the parent process. /bin/cat is just used for testing.
I am actually using kqueue to be notified when there is space available in the STDIN pipe. When kqueue notifies you with a EVFILT_WRITE event that space is available, the event also tells you exactly how many bytes can be written without blocking.
This all works well, and I can easily spawn 100 child (/bin/cat) processes, and everything flows through the pipes without blocking (all day long). However, when I crank up the number of processes to 200 everything grinds to a halt when the single kqueue service thread blocks on a write() call to one of the STDIN pipes. The event says that there is 16384 bytes available (basically an empty pipe) but when the program tries to write exactly 16384 bytes into the pipe, the write() blocks anyway.
Initially I thought I was running into a max. open files issue, but I've jacked up the ulimit for open files to 8192, so that is not the issue. What I have discovered from some googling is that on OS X, STDIN/STDOUT/STDERR are not in fact "files" (or "pipes") but are actually devices. When the process is hung, running lsof on the command-line also hangs with a warning about not being able to stat() the file system:
lsof: WARNING: can't stat() hfs file system /
Output information may be incomplete.
assuming "dev=1000008" from mount table
As soon as I kill the process, lsof completes. This reinforces the notion that STDIN/OUT/ERR are in fact devices and I'm running into some kind of limit.
Does anyone have an idea of what limit I am running into, for example is there a limit on the number of "device" that can be created? Can this be increased somehow?
Just to answer my own question here. This appears to be related to how MacOS X will dynamically expand a pipe from 16K to 32K to 64K (and so on). My basic workaround was to prevent the pipe from expanding. It appears that whenever you fill the pipe completely the OS will expand it. So, when the kqueue triggers that I can write into the pipe, and indicates that I have 16384 bytes available to write, I simply write 16384 - 1 bytes. Basically, whatever it tells me I have available, I write at most (available - 1) bytes. This prevents the pipe from expanding, and is preventing my code from encountering the condition where a write() to the pipe would block (even though the pipe is non-blocking).

Pascal WriteLn failing

I'm working on small project in Pascal for school.
I'm using Lazaruz 1.0.2
I have problem with wirteLn function when writing to file.
After some time it just stops writing to file.
Take for example this program:
var oFile: Text;
i: LongWord;
begin
Assign(oFile, 'test.txt');
ReWrite(oFile);
for i:=1 to 4096 do
WriteLn(oFile, 'ThisIsTest');
CloseFile(oFile);//Added as suggested
end.
This is output:
...
4072 ThisIsTest
4073 ThisIsTest
4074 ThisIsTest
4075 ThisIsTe
As you can see it just stops at the middle of sentence and it is not writing all.
All depends on how long is one WriteLn insturction and how many times it is called.
How to fix it?
I tried to use WinApi function from "Windows" module called WriteFile but I failed to pass last 3 arguments to it.
BIG UPDATE
Thanks. That works (Closing file) in that example. But I have little bit more complex program where I'm passing opened file handle to functions that are writing to it via "var". And even after closing that file at the and does nothing. It is strange.
You should Close(oFile) at the end of your program to be sure the output is flushed.
It's also possible to update a file without closing it by adding (in this example)
Flush(oFile);
after a Writeln
This is useful where you might have a long file and want to make sure it's updated regularly. Of course, you should still close the file when finished.

Pixel modifying code runs quick in main app, really slow in Delphi 6 DirectShow filter with other problems

I have a Delphi 6 application that sends bitmaps to a DirectShow DLL in real-time, 25 frames a second. The DirectShow DLL is my code too and is also written in Delphi 6 using the DSPACK DirectShow component suite. I have a simple block of code that goes through each pixel in the bitmap modifying the brightness and contrast of the image, if a certain flag is set, otherwise the bitmap is pushed out the DirectShow DLL unmodified (push source video filter). The code used to be in the main application and then I just moved it into the DirectShow DLL. When it was in the main application it ran fine. I could see the changes in the bitmap as expected. However, now that the code resides in the DirectShow DLL it has the following problems:
When the code block below is active the DirectShow DLL is really slow. I have a quad core i5 and it's really slow. I can also see a big spike in the CPU consumption. In contrast, the very same code running in the main application ran fine on an old single core P4. It did hit the CPU noticeably on that old machine but the video was smooth and there were no problems. The images are only 352 x 288 pixels in size.
I don't see the expected changes to the visible bitmap. I can trace the code in the DirectShow DLL and see the numerical values of each pixel properly altered by the code, but the viewable image in the Graph Edit ActiveMovie window looks completely unchanged.
If I deactivate the code, which I can do in real-time, the ActiveMovie window shows video that is as smooth as glass, perfectly rendered with the CPU barely touched. If I reactivate the code the video is now really choppy, probably showing only 1 to 2 frames a second with a long delay before the first frame is shown, and the CPU spikes. Not completely, but a lot more than I would expect.
I tried compiling the DirectShow DLL with everything on including range checking, overflow checking, etc. and there were no warnings or errors during run-time. I then tried compiling for fastest speed and it still had the exact same problems listed above. Something is really wrong and I can't figure out what. Note, I do indeed lock the canvas before modifying the bitmap and unlock it after I'm done. If it weren't for the "everything on" compilation run I noted above I'd say it felt like an FPU Exception was being raised and silently swallowed with every pixel computation, but as I said, no errors or Exceptions are occurring.
UPDATE: I am putting this here so that the solution, which is embedded in one of Roman R's comment, is plainly visible. The problem that I was not setting the PixelFormat property to pf24Bit before accessing the ScanLine property. As Roman suggested, not doing this must make the TBitmap code create a temporary copy of the bitmap. As soon as I added the line of code below the problems went away, both that of changes not being visible and the soft page faults. It's an insidious problem because the only object that is affected is the pointer you use to access the ScanLine property, since (assumption) it contains a pointer to a temporary copy of the bitmap. That's must be why the subsequent TextOut() call still worked since it worked on the original copy of the bitmap.
clip.PixelFormat := pf24bit; // The missing code line that fixes the problem.
Here's the code block I've been referring to:
function IntToByte(i: Integer): Byte;
begin
if i > 255 then
Result := 255
else if i < 0 then
Result := 0
else
Result := i;
end;
// ---------------------------------------------------------------
procedure brightnessTurboBoost(var clip: TBitmap; rangeExpansionPowerOf2: integer; shiftValue: Byte);
var
p0: PByte;
x,y: Integer;
begin
if (rangeExpansionPowerOf2 = 0) and (shiftValue = 0) then
exit; // These parameter settings will not change the pixel values.
for y := 0 to clip.Height-1 do
begin
p0 := clip.scanline[y];
// Can't just do the whole buffer as a big block of bytes since the
// individual scan lines may be padded for CPU alignment.
for x := 0 to (clip.Width - 1) * 3 do
begin
if rangeExpansionPowerOf2 >= 1 then
p0^ := IntToByte((p0^ shl rangeExpansionPowerOf2) + shiftValue)
else
p0^ := IntToByte(p0^ + shiftValue);
Inc(p0);
end;
end;
end;
There are a few things to say about this code snippet.
First of all, you are using Scanline property of TBitmap class. I have not been dealign with Delphi for many years, so I might be wrong about this but I am under impression that Scanline is not actually a thin accessor, is it? It might be internally hiding things which can dramatically affect performance, such as "if he wants to access the bits of the image, then we have to first convert it to DIB before returning pointers". So a thing looking so simple might appear to be a killer.
"if rangeExpansionPowerOf2 >= 1 then" in the inner loop body? You don't really want to compare this all the way. Either make two separate functions or duplicate the whole loop without in two version for zero and non-zero rangeExpansionPowerOf2 and do this if only once.
"for ... to (clip.Width - 1) * 3 do" I am not really sure that Delphi optimizes the upper boundary evaluation to make it only once. You might be doing those multiplication thrice for every pixel, while you could do it only once the whole image.
For top perofrmance IntToByte is definitely implemented in MMX to avoid ifs and process multiple bytes at once.
Still as you say that images are only 352x288, I would suspect that #1 is ruining the performance.

Resources