Clock drift on Windows - windows

I've developed a Windows service which tracks business events. It uses the Windows clock to timestamp events. However, the underlying clock can drift quite dramatically (e.g. losing a few seconds per minute), particularly when the CPUs are working hard. Our servers use the Windows Time Service to stay in sync with domain controllers, which uses NTP under the hood, but the sync frequency is controlled by domain policy, and in any case even syncing every minute would still allow significant drift. Are there any techniques we can use to keep the clock more stable, other than using hardware clocks?

Clock ticks should be predictable, but on most PC hardware - because they're not designed for real-time systems - other I/O device interrupts have priority over the clock tick interrupt, and some drivers do extensive processing in the interrupt service routine rather than defer it to a deferred procedure call (DPC), which means the system may not be able to serve the clock tick interrupt until (sometimes) long after it was signalled.
Other factors include bus-mastering I/O controllers which steal many memory bus cycles from the CPU, causing it to be starved of memory bus bandwidth for significant periods.
As others have said, the clock-generation hardware may also vary its frequency as component values change with temperature.
Windows does allow the amount of ticks added to the real-time clock on every interrupt to be adjusted: see SetSystemTimeAdjustment. This would only work if you had a predictable clock skew, however. If the clock is only slightly off, the SNTP client ("Windows Time" service) will adjust this skew to make the clock tick slightly faster or slower to trend towards the correct time.

I don't know if this applies, but ...
There's an issue with Windows that if you change the timer resolution with timeBeginPeriod() a lot, the clock will drift.
Actually, there is a bug in Java's Thread wait() (and the os::sleep()) function's Windows implementation that causes this behaviour. It always sets the timer resolution to 1 ms before wait in order to be accurate (regardless of sleep length), and restores it immediately upon completion, unless any other threads are still sleeping. This set/reset will then confuse the Windows clock, which expects the windows time quantum to be fairly constant.
Sun has actually known about this since 2006, and hasn't fixed it, AFAICT!
We actually had the clock going twice as fast because of this! A simple Java program that sleeps 1 millisec in a loop shows this behaviour.
The solution is to set the time resolution yourself, to something low, and keep it there as long as possible. Use timeBeginPeriod() to control that. (We set it to 1 ms without any adverse effects.)
For those coding in Java, the easier way to fix this is by creating a thread that sleeps as long as the app lives.
Note that this will fix this issue on the machine globally, regardless of which application is the actual culprit.

You could run "w32tm /resync" in a scheduled task .bat file. This works on Windows Server 2003.

Other than resynching the clock more frequently, I don't think there is much you can do, other than to get a new motherboard, as your clock signal doesn't seem to be at the right frequency.

http://www.codinghorror.com/blog/2007/01/keeping-time-on-the-pc.html
PC clocks should typically be accurate to within a few seconds per day. If you're experiencing massive clock drift-- on the order of minutes per day-- the first thing to check is your source of AC power. I've personally observed systems with a UPS plugged into another UPS (this is a no-no, by the way) that gained minutes per day. Removing the unnecessary UPS from the chain fixed the time problem. I am no hardware engineer, but I'm guessing that some timing signal in the power is used by the real-time clock chip on the motherboard.

As already mentioned, Java programs can cause this issue.
Another solution that does not require code modification is adding the VM argument -XX:+ForceTimeHighResolution (found on the NTP support page).
9.2.3. Windows and Sun's Java Virtual Machine
Sun's Java Virtual Machine needs to be started with the >-XX:+ForceTimeHighResolution parameter to avoid losing interrupts.
See http://www.macromedia.com/support/coldfusion/ts/documents/createuuid_clock_speed.htm for more information.
From the referenced link (via the Wayback machine - original link is gone):
ColdFusion MX: CreateUUID Increases the Windows System Clock Speed
Calling the createUUID function multiple times under load in
Macromedia ColdFusion MX and higher can cause the Windows system clock
to accelerate. This is an issue with the Java Virtual Machine (JVM) in
which Thread.sleep calls less than 10 milliseconds (ms) causes the
Windows system clock to run faster. This behavior was originally filed
as Sun Java Bug 4500388
(developer.java.sun.com/developer/bugParade/bugs/4500388.html) and has
been confirmed for the 1.3.x and 1.4.x JVMs.
In ColdFusion MX, the createUUID function has an internal Thread.sleep
call of 1 millisecond. When createUUID is heavily utilized, the
Windows system clock will gain several seconds per minute. The rate of
acceleration is proportional to the number of createUUID calls and the
load on the ColdFusion MX server. Macromedia has observed this
behavior in ColdFusion MX and higher on Windows XP, 2000, and 2003
systems.

Increase the frequency of the re-sync.
If the syncs are with your own main server on your own network there's no reason not to sync every minute.

Sync more often. Look at the Registry entries for the W32Time service, especially "Period". "SpecialSkew" sounds like it would help you.

Clock drift may be a consequence of the temperature; maybe you could try to get temperature more constant - using better cooling perhaps? You're never going to loose drift totally, though.
Using an external clock (GPS receiver etc...), and a statistical method to relate CPU time to Absolute Time is what we use here to synch events in distributed systems.

Since it sounds like you have a big business:
Take an old laptop or something which isn't good for much, but seems to have a more or less reliable clock, and call it the Timekeeper. The Timekeeper's only job is to, once every (say) 2 minutes, send a message to the servers telling the time. Instead of using the Windows clock for their timestamps, the servers will put down the time from the Timekeeper's last signal, plus the elapsed time since the signal. Check the Timekeeper's clock by your wristwatch once or twice a week. This should suffice.

What servers are you running? In desktops the times I've come across this are with Spread Spectrum FSB enabled, causes some issues with the interrupt timing which is what makes that clock tick. May want to see if this is an option in BIOS on one of those servers and turn it off if enabled.
Another option you have is to edit the time polling interval and make it much shorter using the following registry key, most likely you'll have to add it (note this is a DWORD value and the value is in seconds, e.g. 600 for 10min):
HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\W32Time\TimeProviders\NtpClient\SpecialPollInterval
Here's a full workup on it: KB816042

I once wrote a Delphi class to handle time resynchs. It is pasted below. Now that I see the "w32tm" command mentioned by Larry Silverman, I suspect I wasted my time.
unit TimeHandler;
interface
type
TTimeHandler = class
private
FServerName : widestring;
public
constructor Create(servername : widestring);
function RemoteSystemTime : TDateTime;
procedure SetLocalSystemTime(settotime : TDateTime);
end;
implementation
uses
Windows, SysUtils, Messages;
function NetRemoteTOD(ServerName :PWideChar; var buffer :pointer) : integer; stdcall; external 'netapi32.dll';
function NetApiBufferFree(buffer : Pointer) : integer; stdcall; external 'netapi32.dll';
type
//See MSDN documentation on the TIME_OF_DAY_INFO structure.
PTime_Of_Day_Info = ^TTime_Of_Day_Info;
TTime_Of_Day_Info = record
ElapsedDate : integer;
Milliseconds : integer;
Hours : integer;
Minutes : integer;
Seconds : integer;
HundredthsOfSeconds : integer;
TimeZone : LongInt;
TimeInterval : integer;
Day : integer;
Month : integer;
Year : integer;
DayOfWeek : integer;
end;
constructor TTimeHandler.Create(servername: widestring);
begin
inherited Create;
FServerName := servername;
end;
function TTimeHandler.RemoteSystemTime: TDateTime;
var
Buffer : pointer;
Rek : PTime_Of_Day_Info;
DateOnly, TimeOnly : TDateTime;
timezone : integer;
begin
//if the call is successful...
if 0 = NetRemoteTOD(PWideChar(FServerName),Buffer) then begin
//store the time of day info in our special buffer structure
Rek := PTime_Of_Day_Info(Buffer);
//windows time is in GMT, so we adjust for our current time zone
if Rek.TimeZone <> -1 then
timezone := Rek.TimeZone div 60
else
timezone := 0;
//decode the date from integers into TDateTimes
//assume zero milliseconds
try
DateOnly := EncodeDate(Rek.Year,Rek.Month,Rek.Day);
TimeOnly := EncodeTime(Rek.Hours,Rek.Minutes,Rek.Seconds,0);
except on e : exception do
raise Exception.Create(
'Date retrieved from server, but it was invalid!' +
#13#10 +
e.Message
);
end;
//translate the time into a TDateTime
//apply any time zone adjustment and return the result
Result := DateOnly + TimeOnly - (timezone / 24);
end //if call was successful
else begin
raise Exception.Create('Time retrieval failed from "'+FServerName+'"');
end;
//free the data structure we created
NetApiBufferFree(Buffer);
end;
procedure TTimeHandler.SetLocalSystemTime(settotime: TDateTime);
var
SystemTime : TSystemTime;
begin
DateTimeToSystemTime(settotime,SystemTime);
SetLocalTime(SystemTime);
//tell windows that the time changed
PostMessage(HWND_BROADCAST,WM_TIMECHANGE,0,0);
end;
end.

I believe Windows Time Service only implements SNTP, which is a simplified version of NTP. A full NTP implementation takes into account the stability of your clock in deciding how often to sync.
You can get the full NTP server for Windows here.

Related

Did WH_CALLWNDPROC hooks performance dramatically decrease with Win10 (Compared to Win7)?

We are in the process of upgrading our workstations to Win10 from Win7. While investigating reports of performance degradation, I came to the conclusion it was caused by a WH_CALLWNDPROC hook installed by a third party.
I came to this conclusion based on the result of the following test application (Done in Delphi 10 Seattle)
procedure TForm3.Button1Click(Sender: TObject);
var
I: Integer;
SW : TStopWatch;
begin
sw := TStopWatch.StartNew;
for I := 0 to 1000000 do
begin
if Combobox1.ItemIndex > 0 then
Exit;
end;
sw.Stop;
ShowMessage(sw.ElapsedMilliseconds.ToString);
end;
(For those unfamiliar with Delphi, TStopwatch uses QueryPerformanceFrequency/QueryPerformanceCounter APIs to get elapsed time)
The execution time for this method is
Win10 : 1485 msecs
Win7 : 4996 msecs
(Note : Both machine are on wildly different hardware and can't really be compared to eachother).
Now, if I add a hook before executing the same code
function MySystemWndProcHook(Code: Integer; wParam: WParam; lParam: LParam): LRESULT; stdcall;
begin
Result := CallNextHookEx(FHook, Code, wParam, LParam);
end;
procedure TForm3.FormCreate(Sender: TObject);
begin
FHook := SetWindowsHookEx(WH_CALLWNDPROC, #MySystemWndProcHook, 0, GetCurrentThreadId)
end;
The execution time now becomes :
Win10 : 19552 msecs (about 1300% longer)
Win7 : 8682 msecs (about 75% longer)
Now, as I mentionned, both workstation are on different hardware, but I don't believe that alone could explain the difference. Win10 has an i7 cpu while Win7 has an i3. If anything, I'd expect the i3 to take a bigger hit (less cache, less resource... )
So, did WH_CALLWNDPROC hooks get that much slower since Win7?
A quick google search didn't seem to reveal any other report of this issue. Can anybody reproduce my results?
If it can't be reproduced, anyone has any idea what settings/conflicting application could be causing this? (Already tried disabling Windows Defender real time scanning and it didn't affect performance).
EDIT : This was tested under Win10 1803 64 bits. The test application itself was 32 bits.
EDIT2 : Same application compiled in 64 bits gives the following execution time.
Win10 : 780 msecs / 10201 msecs.
Win7 : 6419 msec / 9201 msecs.
EDIT3 : Interestingly enough, when running the application (32bits) as admin :
Win10 : 12693 msecs / 18028 msecs
Also, (on yet another workstation), running as different user makes a difference :
Win10(1809) / "standard user" : 9430 / 17440 msecs
Win10(1809) / System : 5220 / 10160 msecs (Started remotely through PsExec)
EDIT4 : If ran as admin, the application will run faster from a USB key than from a hard disk. (Note : So far, I only tested on system with a single drive. At this point, I wouldn't exclude that only the OS drive is slower.)
EDIT5 : I found out quite a few more things about this situation.
First, running "As Admin"(win10) causes the application to have a WH_CALLWNDPROCRET hook to be installed. I haven't found where it is coming from (OS, Delphi's framework, other app?). It is definitely not there when simply running the app.
The performance hit doesn't seem to be so much on the hook itself, but on its effect on SendMessage.
We are in contact with Microsoft's support, they have reproduced similar results (on a 100k loop instead of 1m) :
Windows 7 - Without hook 0.018396 seconds.
Windows 10 - Without hook 0.025102 seconds.
Windows 7 - With hook 0.167941 seconds.
Windows 10 - With hook 1.105929 seconds.
(Investigation still on-going so still no conclusions thus far)
Those result also suggest many of our workstations perform way worse than they should when there are no hooks involved.
So, WH_CALLWNDPROC and WH_CALLWNDPROCRET hooks do degrade performance quite a bit. And quite a bit more so in Win10 than it did in Win7.
Some of the performance hit is coming from the mitigation code for Spectre and Meltdown. Early reports from Microsoft suggest the rest is apparently from lock contention in the window manager (win32k*.sys).
As for the weird result I've got in my investigation :
Running "As Admin" caused an additional hook to be installed in my application which explains the massive slowdown I witnessed
Many of the test I did were in test machine accessed through a remote admin tool... which happens to install global WH_CALLWNDPROC/WH_CALLWNDPROCRET hook itself, which made my test result flawed. Running locally "fixed" the results. Took me a while to find out about it since my application is 32 bits and the hooks were 64 bits, so my application wasn't notified of them (but still incurred the performance hit).
2020-02-04 : I just received an update from Microsoft. Their engineer identified a few issues that contribute to the performance degradation. Current estimate for a Windows Insider version containing fixes is 2020H1, early 2020H2

Delphi gettimeofday for OSX (equivalent for timeGetTime under win)

I am converting a threaded timer pool unit for cross platform use.
The current unit uses timeGetTime to ensure high accuracy and to report the actual elapsed interval when the timer event is called.
I have used gettimeofday in OSX before to get a high resolution timer but cannot find any reference to it for use in Delphi XE3.
Looking for help on how I can call this in Delphi or an alternative cross platform way to get a high res timer. I want ms accuracy (I know its OS dependent) for this.
Thanks in advance, Martin
A better option, multi-platform ready, may be to use the TStopWatch record from the System.Diagnostics unit.
The TStopWatch is a true high resolution timer if available, and in that case have close to nano-second precision (depend on the OS and hardware), and if not available (in Windows) use standard timer functions to provide millisecond precision.
If you want only millisecond precision, use the ElapsedMilliseconds property, like this:
var
sw : TStopWatch;
ElapsedMilliseconds : Int64;
begin
sw := TStopWatch.Create() ;
try
sw.Start;
Whatever();
sw.Stop;
ElapsedMilliseconds := sw.ElapsedMilliseconds;
finally
sw.Free;
end;
end;
The TStopWatch relies on QueryPreformanceFrequency/QueryPerformanceCounter functions in windows and mach_absolute_time on OS-X

How can I get a pulse in win32 Assembler (specifically nasm)?

I'm planning on making a clock. An actual clock, not something for Windows. However, I would like to be able to write most of the code now. I'll be using a PIC16F628A to drive the clock, and it has a timer I can access (actually, it has 3, in addition to the clock it has built in). Windows, however, does not appear to have this function. Which makes making a clock a bit hard, since I need to know how long it's been so I can update the current time. So I need to know how I can get a pulse (1Hz, 1KHz, doesn't really matter as long as I know how fast it is) in Windows.
There are many timer objects available in Windows. Probably the easiest to use for your purposes would be the Multimedia Timer, but that's been deprecated. It would still work, but Microsoft recommends using one of the new timer types.
I'd recommend using a threadpool timer if you know your application will be running under Windows Vista, Server 2008, or later. If you have to support Windows XP, use a Timer Queue timer.
There's a lot to those APIs, but general use is pretty simple. I showed how to use them (in C#) in my article Using the Windows Timer Queue API. The code is mostly API calls, so I figure you won't have trouble understanding and converting it.
The LARGE_INTEGER is just an 8-byte block of memory that's split into a high part and a low part. In assembly, you can define it as:
MyLargeInt equ $
MyLargeIntLow dd 0
MyLargeIntHigh dd 0
If you're looking to learn ASM, just do a Google search for [x86 assembly language tutorial]. That'll get you a whole lot of good information.
You could use a waitable timer object. Since Windows is not a real-time OS, you'll need to make sure you set the period long enough that you won't miss pulses. A tenth of a second should be safe most of the time.
Additional:
The const LARGE_INTEGER you need to pass to SetWaitableTimer is easy to implement in NASM, it's just an eight byte constant:
period: dq 100 ; 100ms = ten times a second
Pass the address of period as the second argument to SetWaitableTimer.

How do I increase windows interrupt latency to stress test a driver?

I have a driver & device that seem to misbehave when the user does any number of complex things (opening large word documents, opening lots of files at once, etc.) -- but does not reliably go wrong when any one thing is repeated. I believe it's because it does not handle high interrupt latency situations gracefully.
Is there a reliable way to increase interrupt latency on Windows XP to test this theory?
I'd prefer to write my test programn in python, but c++ & WinAPI is also fine...
My apologies for not having a concrete answer, but an idea to explore would be to use either c++ or cython to hook into the timer interrupt (the clock tick one) and waste time in there. This will effectively increase latency.
I don't know if there's an existing solution. But you may create your own one.
On Windows all the interrupts are prioritized. So that if there's a driver code running on a high IRQL, your driver won't be able to serve your interrupt if its level is lower. At least it won't be able to run on the same processor.
I'd do the following:
Configure your driver to run on a single processor (don't remember how to do this, but such an option definitely exists).
Add an I/O control code to your driver.
In your driver's Dispatch routine do a busy wait on a high IRQL (more about this later)
Call your driver (via DeviceIoControl) to simulate a stress.
The busy wait may look something like this:
KIRQL oldIrql;
__int64 t1, t2;
KeRaiseIrql(31, &oldIrql);
KeQuerySystemTime((LARGE_INTEGER*) &t1);
while (1)
{
KeQuerySystemTime((LARGE_INTEGER*) &t2);
if (t1 - t1 > /* put the needed time interval */)
break;
}
KeLowerIrql(oldIrql);

MSG::time is later than timeGetTime

After noticing some timing descrepencies with events in my code, I boiled the problem all the way down to my Windows Message Loop.
Basically, unless I'm doing something strange, I'm experiencing this behaviour:-
MSG message;
while (PeekMessage(&message, _applicationWindow.Handle, 0, 0, PM_REMOVE))
{
int timestamp = timeGetTime();
bool strange = message.time > timestamp; //strange == true!!!
TranslateMessage(&message);
DispatchMessage(&message);
}
The only rational conclusion I can draw is that MSG::time uses a different timing mechanism then timeGetTime() and therefore is free to produce differing results.
Is this the case or am i missing something fundemental?
Could this be a signed unsigned issue? You are comparing a signed int (timestamp) to an unsigned DWORD (msg.time).
Also, the clock wraps every 40ish days - when that happens strange could well be true.
As an aside, if you don't have a great reason to use timeGetTime, you can use GetTickCount here - it saves you bringing in winmm.
The code below shows how you should go about using times - you should never compare the times directly, because clock wrapping messes that up. Instead you should always subtract the start time from the current time and look at the interval.
// This is roughly equivalent code, however strange should never be true
// in this code
DWORD timestamp = GetTickCount();
bool strange = (timestamp - msg.time < 0);
I don't think it's advisable to expect or rely on any particular relationship between the absolute values of timestamps returned from different sources. For one thing, the multimedia timer may have a different resolution from the system timer. For another, the multimedia timer runs in a separate thread, so you may encounter synchronisation issues. (I don't know if each CPU maintains its own independent tick count.) Furthermore, if you are running any sort of time synchronisation service, it may be making its own adjustments to your local clock and affecting the timestamps you are seeing.
Are you by any chance running an AMD dual core? There is an issue where since each core has a separate timer and can run at different speeds, the timers can diverge from each other. This can manifest itself in negative ping times, for example.
I had similar issues when measuring timeouts in different threads using GetTickCount().
Install this driver (IIRC) to resolve the issue.
MSG.time is based on GetTickCount(), and timeGetTime() uses the multimedia timer, which is completely independent of GetTickCount(). I would not be surprised to see that one timer has 'ticked' before the other.

Resources