How precise is WinAPI's Sleep()?

How precise is WinAPI's Sleep()? - winapi

And I don't mean precision to one millisecond, I'm asking about a situation when I want a delay of an hour using Sleep(60 * 60 * 1000). Will it be an hour and not like 55 or 70 minutes? Is thread guaranteed to wake up and not sleep forever?

Over an hour, the accuracy of a Sleep() call is not that bad, (it's fairly easy to test as well). A Sleep() call will return sufficiently close to the hour that it is not possible to determine any error with a manual stopwatch, (tried it on XP - no reason for it to be any different now, AFAIK).
Errors wrt. wall-time will, of course accumulate if consecutive calls to Sleep(3600*1000) are made, especially if the operations performed at theend of each interval are themselves lengthy and/or the box is seriously overloaded, (ie. many more ready threads than cores).
Why would the thread sleep forever if you ask it to sleep for an hour? If you call Sleep(3600*1000), it will become ready after that time. If it does not, the OS is stuft anyway and you're on your way to a reboot.
The reason why such a Sleep() call might be prefered over some timer is that it's a one-liner and will work anywhere on the caller stack - no need for a message-handler and/or state-machine to handle the timer callback.

Related

Faster Than 1ms But Slower Than a Loop

I need to execute some lines of code multiple times (say, 300 times) in rapid succession, each time incrementing some variable and then using it do to a task (let's assume it's a task that requires negligible time to complete).
I tried doing it with a timer set to 1 ms, but it runs too slowly. I then tried doing it with a While loop, but that was much too fast. I could use Threading.Sleep but I really hate using that, not to mention it can only sleep as short as 1 ms anyways. I also thought of using Environment.TickCount but I believe that counts in milliseconds as well.
While this program isn't important to me, it got me wondering if such a thing was possible. A loop that could run with "faster than 1 ms intervals," but slower than "as fast as the program can execute it."

One thing that comes to my mind is calculating the waiting for every iteration yourself with a high precision clock like javas System.nanoTime().
Yet, the call itself is relatively costly and will not let you do nano second precision waiting. But for waiting shorter than 1ms and longer than e.g. 1ns this might help.

forced preemption on windows (occurs or not here)

Sorry for my weak english, by preemption I mean forced context
(process) switch applied to my process.
My question is :
If I write and run my own program game in such way that it does 20 millisecond period work, then 5 millisecond sleep, and then windows pump (peek message/dispatch message) in loop again and again - is it ever preempted by force in windows or no, this preemption does not occur?
I suppose that this preemption would occur if I would not voluntary give control back to system by sleep or peek/dispatch in by a larger amount of time. Here, will it occur or not?

The short answer is: Yes, it can be, and it will be preempted.
Not only driver events (interrupts) can preempt your thread at any time, such thing may also happen due to temporary priority boost, for example when a waitable object is signalled on which a thread is blocked, or for example due to another window becoming the topmost window. Or, another process might simply adjust its priority class.
There is no way (short of giving your process realtime priority, and this is a very bad idea -- forget about it immediately) to guarantee that no "normal" thread will preempt you, and even then hardware interrupts will preempt you, and certain threads such as the one handling disk I/O and the mouse will compete with you over time quantums. So, even if you run with realtime priority (which is not truly "realtime"), you still have no guarantee, but you seriously interfere with important system services.
On top of that, Sleeping for 5 milliseconds is unprecise at best, and unreliable otherwise.
Sleeping will make your thread ready (ready does not mean "it will run", it merely means that it may run -- if and only if a time slice becomes available and no other ready thread is first in line) on the next scheduler tick. This effectively means that the amount of time you sleep is rounded to the granularity of the system timer resolution (see timeBeginPeriod function), plus some unknown time.
By default, the timer resolution is 15.6ms, so your 5ms will be 7.8 seconds on the average (assuming the best, uncontended case), but possibly a lot more. If you adjust the system timer resolution to 1ms (which is often the lowest possible, though some systems allow 0.5ms), it's somewhat better, but still not precise or reliable. Plus, making the scheduler run more often burns a considerable amount of CPU cycles in interrupts, and power. Therefore, it is not something that is generally advisable.
To make things even worse, you cannot even rely on Sleep's rounding mode, since Windows 2000/XP round differently from Windows Vista/7/8.

It can be interrupted by a driver at any time. The driver may signal another thread and then ask the OS to schedule/dispatch. The newly-ready thread may well run instead of yours.
These desktop OS, like Windows, do not provide any real-time guarantees - they were not designed to provide it.

Is 16 milliseconds an unusually long length of time for an unblocked thread running on Windows to be waiting for execution?

Recently I was doing some deep timing checks on a DirectShow application I have in Delphi 6, using the DSPACK components. As part of my diagnostics, I created a Critical Section class that adds a time-out feature to the usual Critical Section object found in most Windows programming languages. If the time duration between the first Acquire() and the last matching Release() is more than X milliseconds, an Exception is thrown.
Initially I set the time-out at 10 milliseconds. The code I have wrapped in Critical Sections is pretty fast using mostly memory moves and fills for most of the operations contained in the protected areas. Much to my surprise I got fairly frequent time-outs in seemingly random parts of the code. Sometimes it happened in a code block that iterates a buffer list and does certain quick operations in sequence, other times in tiny sections of protected code that only did a clearing of a flag between the Acquire() and Release() calls. The only pattern I noticed is that the durations found when the time-out occurred were centered on a median value of about 16 milliseconds. Obviously that's a huge amount of time for a flag to be set in the latter example of an occurrence I mentioned above.
So my questions are:
1) Is it possible for Windows thread management code to, on a fairly frequent basis (about once every few seconds), to switch out an unblocked thread and not return to it for 16 milliseconds or longer?
2) If that is a reasonable scenario, what steps can I take to lessen that occurrence and should I consider elevating my thread priorities?
3) If it is not a reasonable scenario, what else should I look at or try as an analysis technique to diagnose the real problem?
Note: I am running on Windows XP on an Intel i5 Quad Core with 3 GB of memory. Also, the reason why I need to be fast in this code is due to the size of the buffer in milliseconds I have chosen in my DirectShow filter graphs. To keep latency at a minimum audio buffers in my graph are delivered every 50 milliseconds. Therefore, any operation that takes a significant percentage of that time duration is troubling.

Thread priorities determine when ready threads are run. There's, however, a starvation prevention mechanism. There's a so-called Balance Set Manager that wakes up every second and looks for ready threads that haven't been run for about 3 or 4 seconds, and if there's one, it'll boost its priority to 15 and give it a double the normal quantum. It does this for not more than 10 threads at a time (per second) and scans not more than 16 threads at each priority level at a time. At the end of the quantum, the boosted priority drops to its base value. You can find out more in the Windows Internals book(s).
So, it's a pretty normal behavior what you observe, threads may be not run for seconds.
You may need to elevate priorities or otherwise consider other threads that are competing for the CPU time.

sounds like normal windows behaviour with respect to timer resolution unless you explicitly go for some of the high precision timers. Some details in this msdn link

First of all, I am not sure if Delphi's Now is a good choice for millisecond precision measurements. GetTickCount and QueryPerformanceCoutner API would be a better choice.
When there is no collision in critical section locking, everything runs pretty fast, however if you are trying to enter critical section which is currently locked on another thread, eventually you hit a wait operation on an internal kernel object (mutex or event), which involves yielding control on the thread and waiting for scheduler to give control back later.
The "later" above would depend on a few things, including priorities mentioned above, and there is one important things you omitted in your test - what is the overall CPU load at the time of your testing. The more is the load, the less chances to get the thread continue execution soon. 16 ms time looks perhaps a bit still within reasonable tolerance, and all in all it might depends on your actual implementation.

Why is the sleep-time of Sleep(1) seems to be variable in Windows?

Last week I needed to test some different algorithmic functions and to make it easy to myself I added some artifical sleeps and simply measured the clock time. Something like this:
start = clock();
for (int i=0;i<10000;++i)
{
...
Sleep(1);
...
}
end = clock();
Since the argument of Sleep is expressed in milliseconds I expected a total wall clock time of about 10 seconds (a big higher because of the algorithms but that's not important now), and that was indeed my result.
This morning I had to reboot my PC because of new Microsoft Windows hot fixes and to my surprise Sleep(1) didn't take 1 millisecond anymore, but about 0.0156 seconds.
So my test results were completely screwed up, since the total time grew from 10 seconds to about 156 seconds.
We tested this on several PC's and apparently on some PC's the result of one Sleep was indeed 1 ms. On other PC's it was 0.0156 seconds.
Then, suddenly, after a while, the time of Sleep dropped to 0.01 second, and then an hour later back to 0.001 second (1 ms).
Is this normal behavior in Windows?
Is Windows 'sleepy' the first hours after reboot and then gradually gets a higher sleep-granularity after a while?
Or are there any other aspects that might explain the change in behavior?
In all my tests no other application was running at the same time (or: at least not taking any CPU).
Any ideas?
OS is Windows 7.

I've not heard about the resolution jumping around like that on its own, but in general the resolution of Sleep follows the clock tick of the task scheduler. So by default it's usually 10 or 15 ms, depending on the edition of Windows. You can set it manually to 1 ms by issuing a timeBeginPeriod.

I'd guess it's the scheduler. Each OS has a certain amount of granularity. If you ask it to do something lower than that, the results aren't perfect. By asking to sleep 1ms (especially very often) the scheduler may decide you're not important and have you sleep longer, or your sleeps may run up against the end of your time slice.
The sleep call is an advisory call. It tells the OS you want to sleep for amount of time X. It can be less than X (due to a signal or something else), or it can be more (as you are seeing).
Another Stack Overflow question has a way to do it, but you have to use winsock.

When you call Sleep the processor is stopping that thread until it can resume at a time >= to the called Sleep time. Sometimes due to thread priority (which in some cases causes Sleep(0) to cause your program to hang indefinitely) your program may resume at a later time because more processor cycles were allocated for another thread to do work (mainly OS threads have higher priority).

I just wrote some words about the sleep() function in the thread Sleep Less Than One Millisecond. The characteristics of the sleep() function depends on the underlying hardware and on the setting of the multimedia timer interface.
A windows update may change the behavior, i.e. Windows 7 does treat things differently compared to Vista. See my comment in that thread and its links to learn more abut the sleep() function.

Most likely the sleep timer has not enough resolution.
What kind of resolution do you get when you call the timeGetDevCaps function as explained in the documentation of the Sleep function?

Windows Sleep granularity is normally 16ms, you get this unless your or some other program changes this. When you got 1ms granularity other days and 16ms others, some other program probably set time slice (effects to your program also). I think that my Labview for example does this.

How do I obtain CPU cycle count in Win32?

In Win32, is there any way to get a unique cpu cycle count or something similar that would be uniform for multiple processes/languages/systems/etc.
I'm creating some log files, but have to produce multiple logfiles because we're hosting the .NET runtime, and I'd like to avoid calling from one to the other to log. As such, I was thinking I'd just produce two files, combine them, and then sort them, to get a coherent timeline involving cross-world calls.
However, GetTickCount does not increase for every call, so that's not reliable. Is there a better number, so that I get the calls in the right order when sorting?
Edit: Thanks to #Greg that put me on the track to QueryPerformanceCounter, which did the trick.

Heres an interesting article! says not to use RDTSC, but to instead use QueryPerformanceCounter.
Conclusion:
Using regular old timeGetTime() to do
timing is not reliable on many
Windows-based operating systems
because the granularity of the system
timer can be as high as 10-15
milliseconds, meaning that
timeGetTime() is only accurate to
10-15 milliseconds. [Note that the
high granularities occur on NT-based
operation systems like Windows NT,
2000, and XP. Windows 95 and 98 tend
to have much better granularity,
around 1-5 ms.]
However, if you call
timeBeginPeriod(1) at the beginning of
your program (and timeEndPeriod(1) at
the end), timeGetTime() will usually
become accurate to 1-2 milliseconds,
and will provide you with extremely
accurate timing information.
Sleep() behaves similarly; the length
of time that Sleep() actually sleeps
for goes hand-in-hand with the
granularity of timeGetTime(), so after
calling timeBeginPeriod(1) once,
Sleep(1) will actually sleep for 1-2
milliseconds,Sleep(2) for 2-3, and so
on (instead of sleeping in increments
as high as 10-15 ms).
For higher precision timing
(sub-millisecond accuracy), you'll
probably want to avoid using the
assembly mnemonic RDTSC because it is
hard to calibrate; instead, use
QueryPerformanceFrequency and
QueryPerformanceCounter, which are
accurate to less than 10 microseconds
(0.00001 seconds).
For simple timing, both timeGetTime
and QueryPerformanceCounter work well,
and QueryPerformanceCounter is
obviously more accurate. However, if
you need to do any kind of "timed
pauses" (such as those necessary for
framerate limiting), you need to be
careful of sitting in a loop calling
QueryPerformanceCounter, waiting for
it to reach a certain value; this will
eat up 100% of your processor.
Instead, consider a hybrid scheme,
where you call Sleep(1) (don't forget
timeBeginPeriod(1) first!) whenever
you need to pass more than 1 ms of
time, and then only enter the
QueryPerformanceCounter 100%-busy loop
to finish off the last < 1/1000th of a
second of the delay you need. This
will give you ultra-accurate delays
(accurate to 10 microseconds), with
very minimal CPU usage. See the code
above.

You can use the RDTSC CPU instruction (assuming x86). This instruction gives the CPU cycle counter, but be aware that it will increase very quickly to its maximum value, and then reset to 0. As the Wikipedia article mentions, you might be better off using the QueryPerformanceCounter function.

System.Diagnostics.Stopwatch.GetTimestamp() return the number of CPU cycle since a time origin (maybe when the computer start, but I'm not sure) and I've never seen it not increased between 2 calls.
The CPU Cycles will be specific for each computer so you can't use it to merge log file between 2 computers.

RDTSC output may depend on the current core's clock frequency, which for modern CPUs is neither constant nor, in a multicore machine, consistent.
Use the system time, and if dealing with feeds from multiple systems use an NTP time source. You can get reliable, consistent time readings that way; if the overhead is too much for your purposes, using the HPET to work out time elapsed since the last known reliable time reading is better than using the HPET alone.

Use the GetTickCount and add another counter as you merge the log files. Won't give you perfect sequence between the different log files, but it will at least keep all logs from each file in the correct order.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio