Delphi optimisation : constant loop - performance

I just noticed something quite interesting in a program I'm writing. I have a simple procedure that populates a TStringlist with objects of type x.
I added a breakpoint as I was tracing an issue and stumbled across this ans was hoping someone might be able to explain why it is happening, or link to a relevant document as I couldn't find anything.
My loop goes from 0 - 11. The pointer that I'm using was initialised in the loop by for nPtr := 0 but when the program was run the nPtr var was going from 12 down to 1. I then initialised the var outside the loop as shown in the code snippet but the same thing happened. The variable is used nowhere else in the unit.
I asked one of the guys I worked with who said it was due to Delphi optimisation but I'd like to know why and how it decides which loop should be affected.
Thanks for any help.
Code:
procedure TUnit.ProcedureName;
var
nPtr : Integer;
obj : TObject;
begin
nPtr:=0;//added later
for nPtr := 0 to 11 do
begin
obj := TObject.Create(Self);
slPeriodList.AddObject('X', obj);
end;
end;

The optimization is only possible if the loop body does not refer to the loop variable. In that case, if the lower bound of the loop is zero, then the compiler will reverse the loop.
If the loop variable is never referenced by the loop body then the compiler is justified in implementing the loop however it pleases. All it is required to do is execute the loop body as many times as is mandated by the loop bounds. Indeed, the compiler would be perfectly justified in optimizing away the loop variable.
Consider this program:
{$APPTYPE CONSOLE}
procedure Test1;
var
i: Integer;
begin
for i := 0 to 11 do
Writeln(0);
end;
procedure Test2;
var
i: Integer;
begin
for i := 0 to 11 do
Writeln(i);
end;
begin
Test1;
Test2;
end.
The body of Test1 is compiled to this code by XE7, 32 bit Windows compiler, with release options:
Project1.dpr.9: for i := 0 to 11 do
00405249 BB0C000000 mov ebx,$0000000c
Project1.dpr.10: Writeln(0);
0040524E A114784000 mov eax,[$00407814]
00405253 33D2 xor edx,edx
00405255 E8FAE4FFFF call #Write0Long
0040525A E8D5E7FFFF call #WriteLn
0040525F E800DBFFFF call #_IOTest
Project1.dpr.9: for i := 0 to 11 do
00405264 4B dec ebx
00405265 75E7 jnz $0040524e
The compiler is running the loop downwards, as can be seen by the use of dec. Notice that the test for loop termination is performed with jnz with no need for a cmp. That is because dec performs an implicit compare against zero.
The documentation for dec says the following:
Flags Affected
The CF flag is not affected. The OF, SF, ZF, AF, and PF flags are set
according to the result.
The ZF flag is set if and only if the result of the dec instruction is zero. And the ZF is what determines whether or not jnz branches.
The code emitted for Test2 is:
Project1.dpr.17: for i := 0 to 11 do
0040526D 33DB xor ebx,ebx
Project1.dpr.18: Writeln(i);
0040526F A114784000 mov eax,[$00407814]
00405274 8BD3 mov edx,ebx
00405276 E8D9E4FFFF call #Write0Long
0040527B E8B4E7FFFF call #WriteLn
00405280 E8DFDAFFFF call #_IOTest
00405285 43 inc ebx
Project1.dpr.17: for i := 0 to 11 do
00405286 83FB0C cmp ebx,$0c
00405289 75E4 jnz $0040526f
Note that the loop variable is increasing, and we now have an extra cmp instruction, executed on every loop iteration.
It is perhaps interesting to note that the 64 bit Windows compiler does not include this optimization. For Test1 it produces this:
Project1.dpr.9: for i := 0 to 11 do
00000000004083A5 4833DB xor rbx,rbx
Project1.dpr.10: Writeln(0);
00000000004083A8 488B0D01220000 mov rcx,[rel $00002201]
00000000004083AF 4833D2 xor rdx,rdx
00000000004083B2 E839C3FFFF call #Write0Long
00000000004083B7 4889C1 mov rcx,rax
00000000004083BA E851C7FFFF call #WriteLn
00000000004083BF E86CB4FFFF call #_IOTest
00000000004083C4 83C301 add ebx,$01
Project1.dpr.9: for i := 0 to 11 do
00000000004083C7 83FB0C cmp ebx,$0c
00000000004083CA 75DC jnz Test1 + $8
I'm not sure why this optimization has not been implemented in the 64 bit compiler. My guess would be that the optimization has negligible effect in real world cases and the designers chose not to expend effort implementing it for the 64 bit compiler.

Related

There is an invalid floating point operation, but where?

I'm currently writing a code the takes a number given a prints all the prime numbers that fit the format 4n+1. This is what I have so far. They problem is that this gives me a runtime error 207 which I think means invalid floating point operation, but I can't see how it ended up doing an invalid floating point operation. The only the code should be dealing with negative numbers in the line "if num-(iter*iter)> then".
program TwoSquares;
var
num, numSqrt, iter, bigSqr,smallSqr: integer;
begin
num:=29;
while num>4 do
begin
numSqrt:=trunc(sqrt(num));
for iter:=2 to numSqrt do
begin
if num mod iter = 0 then
num:=num - 1;
continue;
end;
if (num-1) mod 4 = 0 then
begin
iter:=(num-1) div 4;
while iter>0 do
begin
if num-(iter*iter)>0 then
bigSqr:=iter;
break;
iter:=iter-1;
end;
smallSqr:=trunc(sqrt(num-(iter*iter)));
writeln(num,' ', smallSqr,' ',bigSqr);
num:=num - 1;
end;
end;
end.
Your check is not directly before the place where it is used. Think; is the break; statement after that num-(iter*iter) if-then check the only way that while loop can terminate?
Try to single step your program, this can also verify if the block structure works as you think. It doesn't seem very consistent, with begin..end in some places, and indentation in others.

how to fix exitcode 201?

I have a task to write a program in Pascal. When I run the program, the result was exitcode 201.
I don't know how to fix this error.
program convertTime;
uses crt;
Type
Jam = record
hh:integer ;
mm:integer ;
ss:integer;
end;
var
J : Jam;
P,totaldetik,sisa : integer;
begin
J.hh:= 16;
J.mm:= 10;
J.ss:= 34;
write('masukkan waktu(menit): ');read(p);
totaldetik:= (J.hh*3600) + (J.mm*60) + J.ss + (p*60);
J.hh:= totaldetik div 3600;
sisa:= totaldetik mod 3600 ;
J.mm:= sisa div 60;
J.ss:= sisa mod 60;
writeln('total the time: ',J.hh,' Hour ',J.mm,' Minute ',J.ss,' second');
readln;
end.
As seen in other questions, the error code 201 is a range check error. Put simply, a value's trying to be stored where it doesn't fit.
If, as in the linked question, you're using the Free Pascal Compiler, integer variables are 16-bit values – they can't go higher than
32,767.
Your totaldetik variable looks like it would often be higher than the limit for an integer value, so you'll need a larger variable to store it in. Try making totaldetik a longint instead.

Delphi - Obtain Full Stack Trace on OSX

I have an application which can log a stacktrace, which can be later used for debugging.
On Windows, I've gotten by using the excellent JCLDebug unit provided by the JEDI project.
Now that my application is running on OSX, I've hit a bit of a hitch - I don't know how to obtain the correct stacktrace when an exception occurs.
I have got the basics down -
1) I can get a stacktrace using 'backtrace' (found in libSystem.dylib)
2) The resulting backtrace can be converted into line numbers using the .map file provided by Delphi's linker
The issue I'm left with is - I don't know where to call backtrace from. I know that Delphi uses Mach exceptions (on a separate thread), and that I cannot use posix signals, but that's all that I've managed to sort out.
I can get a backtrace in the 'try...except' block, but unfortunately, by that point the stack has already wound down.
How can I install a proper exception logger which will run right after the exception occurs?
Update:
As per 'Honza R's suggestion, I've taken a look at the 'GetExceptionStackInfoProc' procedure.
This function does get me 'inside' of the exception handling process, but unfortunately leaves me with some of the same issues I had previously.
First of all - on desktop platforms, this function 'GetExceptionStackInfoProc' is just a function pointer, which you can assign with your own exception info handler. So out of the box, Delphi doesn't provide any stack information provider.
If I assign a function to 'GetExceptionStackInfoProc' and then run a 'backtrace' inside of it, I receive a stacktrace, but that trace is relative to the exception handler, not the thread which caused the exception.
'GetExceptionStackInfoProc' does contain a pointer to a 'TExceptionRecord', but there's very limited documentation available on this.
I might be going beyond my depth, but how can I get a stacktrace from the correct thread? Would it be possible for me to inject my own 'backtrace' function into the exception handler and then return to the standard exception handler from there?
Update 2
Some more details. One thing to clear up - this question is about exceptions that are handled by MACH messages, not software exceptions that are handled entirely within the RTL.
Embarcadero has laid out some comments along with these functions -
System.Internal.MachExceptions.pas -> catch_exception_raise_state_identity
{
Now we set up the thread state for the faulting thread so that when we
return, control will be passed to the exception dispatcher on that thread,
and this POSIX thread will continue watching for Mach exception messages.
See the documentation at <code>DispatchMachException()</code> for more
detail on the parameters loaded in EAX, EDX, and ECX.
}
System.Internal.ExcUtils.pas -> SignalConverter
{
Here's the tricky part. We arrived here directly by virtue of our
signal handler tweaking the execution context with our address. That
means there's no return address on the stack. The unwinder needs to
have a return address so that it can unwind past this function when
we raise the Delphi exception. We will use the faulting instruction
pointer as a fake return address. Because of the fencepost conditions
in the Delphi unwinder, we need to have an address that is strictly
greater than the actual faulting instruction, so we increment that
address by one. This may be in the middle of an instruction, but we
don't care, because we will never be returning to that address.
Finally, the way that we get this address onto the stack is important.
The compiler will generate unwind information for SignalConverter that
will attempt to undo any stack modifications that are made by this
function when unwinding past it. In this particular case, we don't want
that to happen, so we use some assembly language tricks to get around
the compiler noticing the stack modification.
}
Which seem to be responsible for the issue I'm having.
When I do a stacktrace after this exception system has handed control over to the RTL, it looks like this - (bearing in mind, the stack unwinder has been superseded by a backtrace routine. The backtrace will hand control over to the unwinder once it is completed)
0: MyExceptionBacktracer
1: initunwinder in System.pas
2: RaiseSignalException in System.Internal.ExcUtils.pas
Since RaiseSignalException is called by SignalConverter, I'm led to believe that the backtrace function provided by libc is not compatible with the modifications made to the stack. So, it's incapable of reading the stack beyond that point, but the stack is still present underneath.
Does anyone know what to do about that (or whether my hypothesis is correct)?
Update 3
I've finally managed to get proper stacktraces on OSX. Huge thanks to both Honza and Sebastian. By combining both of their techniques, I found something that works.
For anyone else who could benefit from this, here's the basic source. Bear in mind that I'm not quite sure if it's 100% correct, if you can suggest improvements, go ahead. This technique hooks onto an exception right before Delphi unwinds the stack on the faulting thread, and compensates for any stack frame corruption that might have taken place beforehand.
unit MyExceptionHandler;
interface
implementation
uses
SysUtils;
var
PrevRaiseException: function(Exc: Pointer): LongBool; cdecl;
function backtrace2(base : NativeUInt; buffer : PPointer; size : Integer) : Integer;
var SPMin : NativeUInt;
begin
SPMin:=base;
Result:=0;
while (size > 0) and (base >= SPMin) and (base <> 0) do begin
buffer^:=PPointer(base + 4)^;
base:=PNativeInt(base)^;
//uncomment to test stacktrace
//WriteLn(inttohex(NativeUInt(buffer^), 8));
Inc(Result);
Inc(buffer);
Dec(size);
end;
if (size > 0) then buffer^:=nil;
end;
procedure UnInstallExceptionHandler; forward;
var
InRaiseException: Boolean;
function RaiseException(Exc: Pointer): LongBool; cdecl;
var b : NativeUInt;
c : Integer;
buff : array[0..7] of Pointer;
begin
InRaiseException := True;
asm
mov b, ebp
end;
c:=backtrace2(b - $4 {this is the compiler dependent value}, #buff, Length(buff));
//... do whatever you want to do with the stacktrace
Result := PrevRaiseException(Exc);
InRaiseException := False;
end;
procedure InstallExceptionHandler;
var
U: TUnwinder;
begin
GetUnwinder(U);
Assert(Assigned(U.RaiseException));
PrevRaiseException := U.RaiseException;
U.RaiseException := RaiseException;
SetUnwinder(U);
end;
procedure UnInstallExceptionHandler;
var
U: TUnwinder;
begin
GetUnwinder(U);
U.RaiseException := PrevRaiseException;
SetUnwinder(U);
end;
initialization
InstallExceptionHandler;
end.
You can use GetExceptionStackInfoProc, CleanUpStackInfoProc and GetStackInfoStringProc in Exception class you need to save stack trace in GetExceptionStackInfoProc and then retrieve it with GetStackInfoStringProc which will get called by RTL if you use StackTrace property of the Exception. Maybe you could also take look at https://bitbucket.org/shadow_cs/delphi-arm-backtrace which demonstrates this on Android.
To do this properly on Mac OS X the libc backtrace function cannot be used because Delphi will corrupt stack frame when calling the GetExceptionStackInfoProc from Exception.RaisingException. Own implementation must be used that is capable of walking the stack from different base address which can be corrected by hand.
Your GetExceptionStackInfoProc would then look like this (I used XE5 for this example the value added to EBP bellow may differ based on which compiler you use and this example was only tested on Mac OS X, Windows implementation may or may not differ):
var b : NativeUInt;
c : Integer;
buff : array[0..7] of Pointer;
begin
asm
mov b, ebp
end;
c:=backtrace2(b - $14 {this is the compiler dependent value}, #buff, Length(buff));
//... do whatever you want to do with the stacktrace
end;
And the backtrace2 function would look like this (note that stop conditions and other validations are missing in the implementation to ensure that AVs are not caused during stack walking):
function backtrace2(base : NativeUInt; buffer : PPointer; size : Integer) : Integer;
var SPMin : NativeUInt;
begin
SPMin:=base;
Result:=0;
while (size > 0) and (base >= SPMin) and (base <> 0) do begin
buffer^:=PPointer(base + 4)^;
base:=PNativeInt(base)^;
Inc(Result);
Inc(buffer);
Dec(size);
end;
if (size > 0) then buffer^:=nil;
end;
You could hook yourself into the Exception Unwinder. Then you can call backtrace where the exception happens. Here's an example. The unit SBMapFiles is what I use for reading the mapfiles. It is not required to get the exception call stack.
unit MyExceptionHandler;
interface
implementation
uses
Posix.Base, SysUtils, SBMapFiles;
function backtrace(result: PNativeUInt; size: Integer): Integer; cdecl; external libc name '_backtrace';
function _NSGetExecutablePath(buf: PAnsiChar; BufSize: PCardinal): Integer; cdecl; external libc name '__NSGetExecutablePath';
var
PrevRaiseException: function(Exc: Pointer): LongBool; cdecl;
MapFile: TSBMapFile;
const
MaxDepth = 20;
SkipFrames = 3;
procedure ShowCurrentStack;
var
StackLog: PNativeUInt; //array[0..10] of Pointer;
Cnt: Integer;
I: Integer;
begin
{$POINTERMATH ON}
GetMem(StackLog, SizeOf(Pointer) * MaxDepth);
try
Cnt := backtrace(StackLog, MaxDepth);
for I := SkipFrames to Cnt - 1 do
begin
if StackLog[I] = $BE00EF00 then
begin
WriteLn('---');
Break;
end;
WriteLn(IntToHex(StackLog[I], 8), ' ', MapFile.GetFunctionName(StackLog[I]));
end;
finally
FreeMem(StackLog);
end;
{$POINTERMATH OFF}
end;
procedure InstallExceptionHandler; forward;
procedure UnInstallExceptionHandler; forward;
var
InRaiseException: Boolean;
function RaiseException(Exc: Pointer): LongBool; cdecl;
begin
InRaiseException := True;
ShowCurrentStack;
Result := PrevRaiseException(Exc);
InRaiseException := False;
end;
procedure InstallExceptionHandler;
var
U: TUnwinder;
begin
GetUnwinder(U);
Assert(Assigned(U.RaiseException));
PrevRaiseException := U.RaiseException;
U.RaiseException := RaiseException;
SetUnwinder(U);
end;
procedure UnInstallExceptionHandler;
var
U: TUnwinder;
begin
GetUnwinder(U);
U.RaiseException := PrevRaiseException;
SetUnwinder(U);
end;
procedure LoadMapFile;
var
FileName: array[0..255] of AnsiChar;
Len: Integer;
begin
if MapFile = nil then
begin
MapFile := TSBMapFile.Create;
Len := Length(FileName);
_NSGetExecutablePath(#FileName[0], #Len);
if FileExists(ChangeFileExt(FileName, '.map')) then
MapFile.LoadFromFile(ChangeFileExt(FileName, '.map'));
end;
end;
initialization
LoadMapFile;
InstallExceptionHandler;
end.

What tips are there for rewriting stream code so it doesn't use any units?

I am trying to port some xor-encryption code so it doesn't use any other units. I want to use just the commands, variables, and types that are supported natively by the compiler.
For example, here's some of the original code:
[...]
while (StreamIn.Position < StreamIn.Size) and
((StreamIn.Size -StreamIn.Position) >= szBuffer) do begin
(* read 4 bytes at a time into a local integer variable *)
StreamIn.ReadBuffer(buffer, szBuffer);
(* the XOR encryption/decryption *)
buffer := buffer xor theKey;
buffer := buffer xor $E0F;
(* write data to output stream *)
StreamOut.WriteBuffer(buffer, szBuffer);
end;
[...]
This is my code:
function __NativeEncrypt (const Key, Source : String) : String;
// this function should not be used directly
// use EncryptText and DecryptText
const
szBuffer = SizeOf(Integer); (* 4 bytes *)
szByteBuffer = SizeOf(Byte); (* 1 byte *)
var
byteBuffer,
buffer,
index,
theKey: Integer;
StreamIn : String;
StreamOut : String;
i : Integer;
begin
theKey := hashKey(Key);
StreamIn := Source;
StreamOut := '';
for i := 1 to Length (StreamIn) do begin
buffer := Integer(StreamIn[i]);
buffer := buffer xor thekey;
buffer := buffer xor $E0F;
StreamOut := StreamOut + char(Buffer);
end;
result := StreamOut; // wrong results.
// to continue...
end;
What tips are there for this task?
The only reason not to use library-provided units is as a learning exercise. I see no other reason to intentionally cripple yourself by refusing to use built-in features of your tools. Any answer to your general request for tips would rob you of the learning experience.
Most developers end up rewriting something from scratch at some point in their careers. However, unless it was imposed by a supervisor who suffers from extreme not-invested-here syndrome, it's nearly always a personal experience. You won't profit from their experience the same way you will from doing the work yourself. Doing it yourself will give you an understanding of what jobs the built-in tools do, and may give you some insight into why they're designed the way they are. Although you might be able to get those explanations from other people, unless you've actually tried to do it yourself, you won't really appreciate the explanations anyway.
My tip to you is to proceed with your project. I hope you find it interesting, and I wish you luck. If you eventually find yourself unable to make further progress, then identify the specific problem you're stuck on, and then ask others for help with that roadblock.

Looking for second opinion on the validity of findings drawn from this simple localized performance test under any other different setting

My setting:
OS: Windows 7 SP1 (32 bits)
Ram: 4 Go
Processor: Intel Pentium D 3.00 GHz
Delphi XE
My simple test:
I performed a test running the following program:
program TestAssign;
{$APPTYPE CONSOLE}
uses
SysUtils,
Diagnostics;
type
TTestClazz = class
private
FIntProp: Integer;
FStringProp: string;
protected
procedure SetIntProp(const Value: Integer);
procedure SetStringProp(const Value: string);
public
property IntProp: Integer read FIntProp write SetIntProp;
property StringProp: string read FStringProp write SetStringProp;
end;
{ TTestClazz }
procedure TTestClazz.SetIntProp(const Value: Integer);
begin
if FIntProp <> Value then
FIntProp := Value;
end;
procedure TTestClazz.SetStringProp(const Value: string);
begin
if FStringProp <> Value then
FStringProp := Value;
end;
var
i, j: Integer;
stopw1, stopw2 : TStopwatch;
TestObj: TTestClazz;
begin
ReportMemoryLeaksOnShutdown := True;
//
try
TestObj := TTestClazz.Create;
//
try
j := 10000;
while j <= 100000 do
begin
///
/// assignement
///
stopw1 := TStopwatch.StartNew;
for i := 0 to j do
begin
TestObj.FIntProp := 666;
TestObj.FStringProp := 'Hello';
end;
stopw1.Stop;
///
/// property assignement using Setter
///
stopw2 := TStopwatch.StartNew;
for i := 0 to j do
begin
TestObj.IntProp := 666;
TestObj.StringProp := 'Hello';
end;
stopw2.Stop;
///
/// Log results
///
Writeln(Format('Ellapsed time for %6.d loops: %5.d %5.d', [j, stopw1.ElapsedMilliseconds, stopw2.ElapsedMilliseconds]));
//
Inc(j, 5000);
end;
//
Writeln('');
Write('Press Return to Quit...');
Readln;
finally
TestObj.Free
end
except
on E: Exception do
Writeln(E.ClassName, ': ', E.Message);
end;
end.
My (provisionnal) conclusion:
It seems that:
It's worth using Setter with property under some condition
The overhead of calling a method and performing a conditional test take less time than an assignement.
My question:
Are those findings valid under any other diffrent setting or just localized ones (exception)?
I would make the following observations:
The decision as to whether or not to use a setter should be based on factors like code maintenance, correctness, readability rather than performance.
Your benchmark is wholly unreasonable since the if statements evaluate to False every time. Real world code that sets properties would be likely to modify the properties a reasonable proportion of the time that the setter runs.
I would expect that for many real world examples, the setter would run faster without the equality test. If that test were to evaluate to True every time then clearly the code would be quicker without it.
The integer setter is practically free and in fact the setter is slower than the direct field access.
The time is spent in the string property. Here there is some real performance benefit due to the optimisation of the if test which avoids string assignment code if possible.
The setters would be faster if you inlined them, but not by a significant amount.
My belief is that any real world code would never be able to detect any of these performance differences. In reality the bottleneck will be obtaining the values passed to the setters rather than time spent in the setters.
The main situation where such if protection is valuable is where the property modification is expensive. For example, perhaps it involves sending a Windows message, or hitting a database. For a property backed by a field you can probably take it or leave it.
In the chatter in the comments Premature Optimization wonders why the comparison if FStringProp <> Value is quicker than the assignment FStringProp := Value. I investigated a little further and it wasn't quite as I had originally thought.
It turns out that if FStringProp <> Value is dominated by a call to System._UStrEqual. The two strings passed are not in fact the same reference and so each character has to be compared. However, this code is highly optimised and crucially there are only 5 characters to compare.
The call to FStringProp := Value goes to System._UStrAsg and since Value is a literal with negative reference count, a brand new string has to be made. The Pascal version of the code looks like this:
procedure _UStrAsg(var Dest: UnicodeString; const Source: UnicodeString); // globals (need copy)
var
S, D: Pointer;
P: PStrRec;
Len: LongInt;
begin
S := Pointer(Source);
if S <> nil then
begin
if __StringRefCnt(Source) < 0 then // make copy of string literal
begin
Len := __StringLength(Source);
S := _NewUnicodeString(Len);
Move(Pointer(Source)^, S^, Len * SizeOf(WideChar));
end else
begin
P := PStrRec(PByte(S) - SizeOf(StrRec));
InterlockedIncrement(P.refCnt);
end;
end;
D := Pointer(Dest);
Pointer(Dest) := S;
_UStrClr(D);
end;
The key part of this is the call to _NewUnicodeString which of course calls GetMem. I am not at all surprised that heap allocation is significantly slower than comparison of 5 characters.
Put 'Hello' const into a variable and use it for setting then do a test again

Resources