Is there a Windows equivalent of nanosleep? - winapi

Unix has a variety of sleep APIs (sleep, usleep, nanosleep). The only Win32 function I know of for sleep is Sleep(), which is in units of milliseconds.
I seen that most sleeps, even on Unix, get rounded up significantly (ie: typically to about 10ms). I've seen that on Solaris, if you run as root, you can get sub 10ms sleeps, and I know this is also possible on HPUX provided the fine grain timers kernel parameter is enabled. Are finer granularity timers available on Windows and if so, what are the APIs?

The sad truth is that there is no good answer to this. Multimedia timers are probably the closest you can get -- they only let you set periods down to 1 ms, but (thanks to timeBeginPeriod) they do actually provide precision around 1 ms, where most of the others do only about 10-15 ms as a rule.
There are a lot of other candidates. At first glance, CreateWaitableTimer and SetWaitableTimer probably seem like the closest equivalent since they're set in 100 ns interals. Unfortunately, you can't really depend on anywhere close to that good of resolution, at least in my testing. In the long term, they probably do provide the best possibility, since they at least let you specify a time of less than 1 ms, even though you can't currently depend on the implementation to provide (anywhere close to) that resolution.
NtDelayExecution seems to be roughly the same, as SetWaitableTimer except that it's undocumented. Unless you're set on using/testing undocumented functions, it seems to me that CreateWaitableTimer/SetWaitableTimer is a better choice just on the basis of being documented.
If you're using thread pools, you could try using CreateThreadPoolTimer and SetThreadPoolTimer instead. I haven't tested them enough to have any certainty about the resolution they really provide, but I'm not particularly optimistic.
Timer queues (CreateTimerQueue, CreateTimerQueueTimer, etc.) are what MS recommends as the replacement for multimedia timers, but (at least in my testing) they don't really provide much better resolution than Sleep.

If you merely want resolution in the nanoseconds range, there's NtDelayExecution in ntdll.dll:
NTSYSAPI NTSTATUS NTAPI NtDelayExecution(BOOLEAN Alertable, PLARGE_INTEGER DelayInterval);
It measures time in 100-nanosecond intervals.
HOWEVER, this probably isn't what you want:
It can delay for much longer than that—as long as a thread time slice (0.5 - 15ms) or two.
Here's code you can use to observe this:
#ifdef __cplusplus
extern "C" {
#endif
#ifdef _M_X64
typedef long long intptr_t;
#else
typedef int intptr_t;
#endif
int __cdecl printf(char const *, ...);
int __cdecl _unloaddll(intptr_t);
intptr_t __cdecl _loaddll(char *);
int (__cdecl * __cdecl _getdllprocaddr(intptr_t, char *, intptr_t))(void);
typedef union _LARGE_INTEGER *PLARGE_INTEGER;
typedef long NTSTATUS;
typedef NTSTATUS __stdcall NtDelayExecution_t(unsigned char Alertable, PLARGE_INTEGER Interval); NtDelayExecution_t *NtDelayExecution = 0;
typedef NTSTATUS __stdcall NtQueryPerformanceCounter_t(PLARGE_INTEGER PerformanceCounter, PLARGE_INTEGER PerformanceFrequency); NtQueryPerformanceCounter_t *NtQueryPerformanceCounter = 0;
#ifdef __cplusplus
}
#endif
int main(int argc, char *argv[]) {
long long delay = 1 * -(1000 / 100) /* relative 100-ns intervals */, counts_per_sec = 0;
long long counters[2];
intptr_t ntdll = _loaddll("ntdll.dll");
NtDelayExecution = (NtDelayExecution_t *)_getdllprocaddr(ntdll, "NtDelayExecution", -1);
NtQueryPerformanceCounter = (NtQueryPerformanceCounter_t *)_getdllprocaddr(ntdll, "NtQueryPerformanceCounter", -1);
for (int i = 0; i < 10; i++) {
NtQueryPerformanceCounter((PLARGE_INTEGER)&counters[0], (PLARGE_INTEGER)&counts_per_sec);
NtDelayExecution(0, (PLARGE_INTEGER)&delay);
NtQueryPerformanceCounter((PLARGE_INTEGER)&counters[1], (PLARGE_INTEGER)&counts_per_sec);
printf("Slept for %lld microseconds\n", (counters[1] - counters[0]) * 1000000 / counts_per_sec);
}
return 0;
}
My output:
Slept for 9455 microseconds
Slept for 15538 microseconds
Slept for 15401 microseconds
Slept for 15708 microseconds
Slept for 15510 microseconds
Slept for 15520 microseconds
Slept for 1248 microseconds
Slept for 996 microseconds
Slept for 984 microseconds
Slept for 1010 microseconds

The MinGW answer in long form:
MinGW and Cygwin provides a nanosleep() implementation under <pthread.h>. Source code:
In Cygwin and MSYS2: signal.cc and cygwait.cc (LGPLv3+; with linking exception)
This is based on NtCreateTimer and WaitForMultipleObjects.
In MinGW-W64: nanosleep.c and thread.c (Zope Public License)
This is based on WaitForSingleObject and Sleep.
In addition, gnulib (GPLv3+) has a higher-precision implementation in nanosleep.c. This performs a busy-loop over QueryPerformanceCounter for short (<1s) intervals and Sleep for longer intervals.
You can use the usual timeBeginPeriod trick ethanpil linked to with all the underlying NT timers.

Windows provides Multimedia timers that are higher resolution than Sleep(). The actual resolution supported by the OS can be obtained at runtime.

You may want to look into
timeBeginPeriod / timeEndPeriod
and/or
QueryPerformanceCounter
See here for more information: http://www.geisswerks.com/ryan/FAQS/timing.html
particulary the section towards the bottom: High-precision 'Sleeps'

Yeah there is , under 'pthread.h' mingw compiler

Related

How do I instantiate a boost::date_time::time_duration with nanoseconds?

I feel like I'm missing something obvious here. I can easily make a time_duration with milliseconds, microseconds, or seconds by doing:
time_duration t1 = seconds(1);
time_duration t2 = milliseconds(1000);
time_duration t3 = microseconds(1000000);
But there's no function for nanoseconds. What's the trick to converting a plain integer nanoseconds value to a time_duration?
I'm on amd64 architecture on Debian Linux. Boost version 1.55.
boost::posix_time::microseconds is actually subsecond_duration<boost::posix_time::time_duration, 1000000>. So...
#include <boost/date_time/posix_time/posix_time.hpp>
using nanoseconds = boost::date_time::subsecond_duration<boost::posix_time::time_duration, 1000000000>;
int main() {
boost::posix_time::time_duration t = nanoseconds(1000000000);
std::cout << t << "\n";
}
Prints
00:00:01
UPDATE
Indeed, in the Compile Options for the Boost DateTime library you can see that there's an option to select nanosecond resolution:
By default the posix_time system uses a single 64 bit integer
internally to provide a microsecond level resolution. As an
alternative, a combination of a 64 bit integer and a 32 bit integer
(96 bit resolution) can be used to provide nano-second level
resolutions. The default implementation may provide better performance
and more compact memory usage for many applications that do not
require nano-second resolutions.
To use the alternate resolution (96 bit nanosecond) the variable
BOOST_DATE_TIME_POSIX_TIME_STD_CONFIG must be defined in the library
users project files (ie Makefile, Jamfile, etc). This macro is not
used by the Gregorian system and therefore has no effect when building
the library.
Indeed, you can check it using:
Live On Coliru
#define BOOST_DATE_TIME_POSIX_TIME_STD_CONFIG
#include <boost/date_time/posix_time/posix_time.hpp>
int main() {
using namespace boost::posix_time;
std::cout << nanoseconds(1000000000) << "\n";
}

Slow performance with injected library OS X DYLD_INSERT_LIBRARIES

I wrote a small library that overrides the system library function gettimeofday() with a slightly different one that returns the time minus a hardcoded offset.
I inject that with DYLD_INSERT_LIBRARIES into the application (an old 3D tool that crashes on dates > 2014). However with the library injected the performance is roughly half compared to without it.
I don't really know a lot about how OSX and dynamic libraries work, I just got that working after reading a lot of tutorials and trial and error.
I can't believe the performance drop is caused by just adding that tiny code change but I read it might have something to do with namespace flattening, which I don't understand.
Is there anything I can do to get rid of the performance overhead?
#include <sys/time.h>
//compile:
//clang -c interpose.c
//clang -dynamiclib -o interpose.dylib -install_name interpose.dylib interpose.o
#define DYLD_INTERPOSE(_replacment,_replacee) \
__attribute__((used)) static struct{ const void* replacment; const void* replacee; } _interpose_##_replacee \
__attribute__ ((section ("__DATA,__interpose"))) = { (const void*)(unsigned long)&_replacment, (const void*)(unsigned long) &_replacee };
int mygettimeofday(struct timeval *restrict tp, void *restrict tzp);
DYLD_INTERPOSE(mygettimeofday, gettimeofday)
int mygettimeofday(struct timeval *restrict tp, void *restrict tzp)
{
int ret = gettimeofday(tp, tzp);
tp->tv_sec -= 123123;
return ret;
}
Edit:
I also wrote a tiny test tool that does nothing but call gettimeofday() 100000000 times, and ran it with and without the lib. Without it consistently takes 2.9 seconds, with library injected it takes 3.0 seconds, so the extra code in mygettimeofday should not be the issue.

Why can't get process id that more than 65535 by 'ntQuerySystemInformation' in Win7 64bit?

I used the 'ntQuerySystemInformation' to get all the handle information like:
NtQuerySystemInformation(SystemHandleInformation, pHandleInfor, ulSize,NULL);//SystemHandleInformation = 16
struct of pHandleInfor is:
typedef struct _SYSTEM_HANDLE_INFORMATION
{
ULONG ProcessId;
UCHAR ObjectTypeNumber;
UCHAR Flags;
USHORT Handle;
PVOID Object;
ACCESS_MASK GrantedAccess;
} SYSTEM_HANDLE_INFORMATION, *PSYSTEM_HANDLE_INFORMATION;
It works well in xp 32bit, but in Win7 64bit can only get the right pid that less than 65535. The type of processId in this struct is ULONG, I think it can get more than 65535. What's wrong with it? Is there any other API instead?
There are two enum values for NtQuerySystemInformation to get handle info:
CNST_SYSTEM_HANDLE_INFORMATION = 16
CNST_SYSTEM_EXTENDED_HANDLE_INFORMATION = 64
And correspondingly two structs: SYSTEM_HANDLE_INFORMATION and SYSTEM_HANDLE_INFORMATION_EX.
The definitions for these structs are:
struct SYSTEM_HANDLE_INFORMATION
{
short UniqueProcessId;
short CreatorBackTraceIndex;
char ObjectTypeIndex;
char HandleAttributes; // 0x01 = PROTECT_FROM_CLOSE, 0x02 = INHERIT
short HandleValue;
size_t Object;
int GrantedAccess;
}
struct SYSTEM_HANDLE_INFORMATION_EX
{
size_t Object;
size_t UniqueProcessId;
size_t HandleValue;
int GrantedAccess;
short CreatorBackTraceIndex;
short ObjectTypeIndex;
int HandleAttributes;
int Reserved;
}
As You can see, the first struct really can only contain 16-bit process id-s...
See for example ProcessExplorer project's source file ntexapi.h for more information.
Note also that the field widths for SYSTEM_HANDLE_INFORMATION_EX in my struct definitions might be different from theirs (that is, in my definition some field widths vary depending on the bitness), but I think I tested the code both under 32-bit and 64-bit and found it to be correct.
Please recheck if necessary and let us know if You have additional info.
From Raymond Chen's article Processes, commit, RAM, threads, and how high can you go?:
I later learned that the Windows NT folks do try to keep the numerical values of process ID from getting too big. Earlier this century, the kernel team experimented with letting the numbers get really huge, in order to reduce the rate at which process IDs get reused, but they had to go back to small numbers, not for any technical reasons, but because people complained that the large process IDs looked ugly in Task Manager. (One customer even asked if something was wrong with his computer.)

When (if ever) does QueryPerformanceCounter() get reset to zero?

I can't find any clear indication of if/when the 64-bit value returned by QueryPerformanceCounter() gets reset, or overflows and resets back to zero. Hopefully it never overflows because the 64 bits gives space for decades worth of counting at gigahertz rates. However... is there anything other than a computer restart that will reset it?
Empirically, QPC is reset at system startup.
Note that you should not depend on this behavior, since Microsoft do not explicitly state what the "zero point" is for QPC, merely that it is a monotonically increasing value (mod 2^64) that can be used for high precision timing.
Hence they are quite within their rights to modify it's behavior at any time. They could, for example, make it return values that match FILETIME values as would be produced by a call to GetSystemTimeAsFileTime(), with the same resolution, 100ns tick rate. Under these circumstances, it would never reset. At least not in your or my lifetimes.
That said, the following program when run on Windows 10 [Version 6.3.16299] produces pairs of identical values that are the system uptime in seconds.
#include <windows.h>
#include <iostream>
int main()
{
LARGE_INTEGER performanceCount;
LARGE_INTEGER performanceFrequency;
QueryPerformanceFrequency(&performanceFrequency);
for (;;)
{
QueryPerformanceCounter(&performanceCount);
DWORD const systemTicks = timeGetTime();
DWORD const systemSeconds = systemTicks / 1000;
__int64 const performanceSeconds = performanceCount.QuadPart / performanceFrequency.QuadPart;
std::cout << systemSeconds << " " << performanceSeconds << std::endl;
Sleep(1000);
}
return 0;
}
Standard disclaimers apply, your actual mileage may vary, etc. etc. etc.
It seems that some Windows running inside VirtualBox may reset QueryPerformanceCounter every 20 minutes or so: see here.
QPC is more reliable as time goes by, but for better portability a low precision timer should be used such as GetTickCount64.

Retrieving boot time using GetTickCount64

I'm trying to extract boot time by getting current time SYSTEMTIME structure, then converting it to FILETIME which I then convert to ULARGE_INTEGER from which I subtract GetTickCount64() and then proceed on converting everything back to SYSTEMTIME.
I'm comparing this function to 'NET STATISTICS WORKSTATION' and for some reason my output is off by several hours, which don't seem to match any timezone differences.
Here's visual studio example code:
#include "stdafx.h"
#include <windows.h>
#include <tchar.h>
#include <strsafe.h>
#define KILOBYTE 1024
#define BUFF KILOBYTE
int _tmain(int argc, _TCHAR* argv[])
{
ULARGE_INTEGER ticks, ftime;
SYSTEMTIME current, final;
FILETIME ft, fout;
OSVERSIONINFOEX osvi;
char output[BUFF];
int retval=0;
ZeroMemory(&osvi, sizeof(OSVERSIONINFOEX));
ZeroMemory(&final, sizeof(SYSTEMTIME));
GetVersionEx((OSVERSIONINFO *) &osvi);
if (osvi.dwBuildNumber >= 6000) ticks.QuadPart = GetTickCount64();
else ticks.QuadPart = GetTickCount();
//Convert miliseconds to 100-nanosecond time intervals
ticks.QuadPart = ticks.QuadPart * 10000;
//GetLocalTime(&current); -- //doesn't really fix the problem
GetSystemTime(&current);
SystemTimeToFileTime(&current, &ft);
printf("INITIAL: Filetime lowdatetime %u, highdatetime %u\r\n", ft.dwLowDateTime, ft.dwHighDateTime);
ftime.LowPart=ft.dwLowDateTime;
ftime.HighPart=ft.dwHighDateTime;
//subtract boot time interval from current time
ftime.QuadPart = ftime.QuadPart - ticks.QuadPart;
//Convert ULARGE_INT back to FILETIME
fout.dwLowDateTime = ftime.LowPart;
fout.dwHighDateTime = ftime.HighPart;
printf("FINAL: Filetime lowdatetime %u, highdatetime %u\r\n", fout.dwLowDateTime, fout.dwHighDateTime);
//Convert FILETIME back to system time
retval = FileTimeToSystemTime(&fout, &final);
printf("Return value is %d\r\n", retval);
printf("Current time %d-%.2d-%.2d %.2d:%.2d:%.2d\r\n", current.wYear, current.wMonth, current.wDay, current.wHour, current.wMinute, current.wSecond);
printf("Return time %d-%.2d-%.2d %.2d:%.2d:%.2d\r\n", final.wYear, final.wMonth, final.wDay, final.wHour, final.wMinute, final.wSecond);
return 0;
}
I ran it and found that it works correctly when using GetLocalTime as opposed to GetSystemTime, which is expressed in UTC. So it would make sense that GetSystemTime would not necessarily match the "clock" on the PC.
Other than that, though, the issue could possibly be the call to GetVersionEx. As written, I think it will always return zeros for all values. You need this line prior to calling it:
osvi.dwOSVersionInfoSize = sizeof( osvi );
Otherwise that dwBuildNumber will be zero and it will call GetTickCount, which is only good for 49 days or so. On the other hand, if that were the case, I think you would get a result with a much larger difference.
I'm not completely sure that (as written) the check is necessary to choose which tick count function to call. If GetTickCount64 doesn't exist, the app would not load due to the missing entrypoint (unless perhaps delay loading was used ... I'm not sure in that case). I believe that it would be necessary to use LoadLibrary and GetProcAddress to make the decision dynamically between those two functions and have it work on an older platform.

Resources