How do I create monotonic clock on Windows which doesn't tick during suspend?

How do I create monotonic clock on Windows which doesn't tick during suspend? - winapi

I'm looking for a way to obtain a guaranteed-monotonic clock which excludes time spent during suspend, just like POSIX CLOCK_MONOTONIC.
Solutions requiring Windows 7 (or later) are acceptable.
Here's an example of something that doesn't work:
LONGLONG suspendTime, uiTime1, uiTime2;
do {
QueryUnbiasedInterruptTime((ULONGLONG*)&uiTime1);
suspendTime = GetTickCount64()*10000 - uiTime1;
QueryUnbiasedInterruptTime((ULONGLONG*)&uiTime2);
} while (uiTime1 != uiTime2);
static LARGE_INTEGER firstSuspend = suspendTime;
static LARGE_INTERER lastSuspend = suspendTime;
assert(suspendTime > lastSuspend);
lastSuspend = suspendTime;
LARGE_INTEGER now;
QueryPerformanceCounter(&now);
static LONGLONG firstQpc = now.QuadPart;
return (now.QuadPart - firstQpc)*qpcFreqNumer/qpcFreqDenom -
(suspendTime - firstSuspend);
The problem with this (my first attempt) is that GetTickCount only ticks every 15ms, wheras QueryUnbiasedInterruptTime seems to tick a little more often, so every now and then my method observes the suspend time go back by a little.
I've also tried using CallNtPowerInformation, but it's not clear how to use those values either to get a nice, race-free measure of suspend time.

The suspend bias time is available in kernel mode (_KUSER_SHARED_DATA.QpcBias in ntddk.h). A read-only copy is available in user mode.
#include <nt.h>
#include <ntrtl.h>
#include <nturtl.h>
LONGLONG suspendTime, uiTime1, uiTime2;
QueryUnbiasedInterruptTime((ULONGLONG*)&uiTime1);
uiTime1 -= USER_SHARED_DATA->QpcBias; // subtract off the suspend bias

The full procedure for calculating monotonic time, which does not tick during suspend, is as follows:
typedef struct _KSYSTEM_TIME {
ULONG LowPart;
LONG High1Time;
LONG High2Time;
} KSYSTEM_TIME;
#define KUSER_SHARED_DATA 0x7ffe0000
#define InterruptTime ((KSYSTEM_TIME volatile*)(KUSER_SHARED_DATA + 0x08))
#define InterruptTimeBias ((ULONGLONG volatile*)(KUSER_SHARED_DATA + 0x3b0))
static LONGLONG readInterruptTime() {
// Reading the InterruptTime from KUSER_SHARED_DATA is much better than
// using GetTickCount() because it doesn't wrap, and is even a little quicker.
// This works on all Windows NT versions (NT4 and up).
LONG timeHigh;
ULONG timeLow;
do {
timeHigh = InterruptTime->High1Time;
timeLow = InterruptTime->LowPart;
} while (timeHigh != InterruptTime->High2Time);
LONGLONG now = ((LONGLONG)timeHigh << 32) + timeLow;
static LONGLONG d = now;
return now - d;
}
static LONGLONG scaleQpc(LONGLONG qpc) {
// We do the actual scaling in fixed-point rather than floating, to make sure
// that we don't violate monotonicity due to rounding errors. There's no
// need to cache QueryPerformanceFrequency().
LARGE_INTEGER frequency;
QueryPerformanceFrequency(&frequency);
double fraction = 10000000/double(frequency.QuadPart);
LONGLONG denom = 1024;
LONGLONG numer = std::max(1LL, (LONGLONG)(fraction*denom + 0.5));
return qpc * numer / denom;
}
static ULONGLONG readUnbiasedQpc() {
// We remove the suspend bias added to QueryPerformanceCounter results by
// subtracting the interrupt time bias, which is not strictly speaking legal,
// but the units are correct and I think it's impossible for the resulting
// "unbiased QPC" value to go backwards.
LONGLONG interruptTimeBias, qpc;
do {
interruptTimeBias = *InterruptTimeBias;
LARGE_INTEGER counter;
QueryPerformanceCounter(&counter);
qpc = counter.QuadPart;
} while (interruptTimeBias != *InterruptTimeBias);
static std::pair<LONGLONG,LONGLONG> d(qpc, interruptTimeBias);
return scaleQpc(qpc - d.first) - (interruptTimeBias - d.second);
}
/// getMonotonicTime() returns the time elapsed since the application's first
/// call to getMonotonicTime(), in 100ns units. The values returned are
/// guaranteed to be monotonic. The time ticks in 15ms resolution and advances
/// during suspend on XP and Vista, but we manage to avoid this on Windows 7
/// and 8, which also use a high-precision timer. The time does not wrap after
/// 49 days.
uint64_t getMonotonicTime()
{
OSVERSIONINFOEX ver = { sizeof(OSVERSIONINFOEX), };
GetVersionEx(&ver);
bool win7OrLater = (ver.dwMajorVersion > 6 ||
(ver.dwMajorVersion == 6 && ver.dwMinorVersion >= 1));
// On Windows XP and earlier, QueryPerformanceCounter is not monotonic so we
// steer well clear of it; on Vista, it's just a bit slow.
return win7OrLater ? readUnbiasedQpc() : readInterruptTime();
}

Related

Why does clock() returns -1 in C

I'm trying to implement an error handler using the clock() function from the "time.h" library. The code runs inside an embeeded system (Colibri IMX7 - M4 Processor). The function is used to monitor a current value within a specific range, if the value of the current isn't correct the function should return an error message.
The function will see if the error is ocurring and in the first run it will save the first appearance of the error in a clock_t as reference, and then in the next runs if the error is still there, it will compare the current time using clock() with the previous reference and see if it will be longer than a specific time.
The problem is that the function clock() is always returning -1. What should I do to avoid that? Also, why can't I declare a clock_t variable as static (e.g. static clock_t start_t = clock()?
Please see below the function:
bool CrossLink_check_error_LED_UV_current_clock(int current_state, int current_at_LED_UV)
{
bool has_LED_UV_current_deviated = false;
static int current_number_of_errors_Current_LED_CANNON = 0;
clock_t startTimeError = clock();
const int maximum_operational_current_when_on = 2000;
const int minimum_turned_on_LED_UV_current = 45;
if( (current_at_LED_UV > maximum_operational_current_when_on)
||(current_state!=STATE_EMITTING && (current_at_LED_UV > minimum_turned_on_LED_UV_current))
||(current_state==STATE_EMITTING && (current_at_LED_UV < minimum_turned_on_LED_UV_current)) ){
current_number_of_errors_Current_LED_CANNON++;
if(current_number_of_errors_Current_LED_CANNON > 1) {
if (clock() - startTimeError > 50000){ // 50ms
has_LED_UV_current_deviated = true;
PRINTF("current_at_LED_UV: %d", current_at_LED_UV);
if(current_state==STATE_EMITTING){
PRINTF(" at state emitting");
}
PRINTF("\n\r");
}
}else{
if(startTimeError == -1){
startTimeError = clock();
}
}
}else{
startTimeError = 0;
current_number_of_errors_Current_LED_CANNON = 0;
}
return has_LED_UV_current_deviated;
}
Edit: I forgot to mention before, but we are using GCC 9.3.1 arm-none-eabi compiler with CMake to build the executable file. We have an embedeed system (Colibri IMX7 made by Toradex) that consists in 2 A7 Processors that runs our Linux (more visual interface) and the program that is used to control our device runs in a M4 Processor without an OS, just pure bare-metal.

For a lot of provided functions in the c standard library, if you have the documentation installed (usually it gets installed with the compiler), you can view documentation using the man command in the shell. With man clock, it tells me that:
NAME
clock - determine processor time
SYNOPSIS
#include <time.h>
clock_t clock(void);
DESCRIPTION
The clock() function returns an approximation of processor time used by the program.
RETURN VALUE
The value returned is the CPU time used so far as a clock_t; to get the number of seconds used, divide by
CLOCKS_PER_SEC. If the processor time used is not available or its value cannot be represented, the function
returns the value (clock_t) -1.
etc.
This tells us that -1 means that the processor time (CLOCK_PROCESS_CPUTIME_ID) is unavailable. The solution is to use CLOCK_MONOTONIC instead. We can select the clock we want to use with clock_gettime.
timespec clock_time;
if (clock_gettime(CLOCK_MONOTONIC, &clock_time)) {
printf("CLOCK_MONOTONIC is unavailable!\n");
exit(1);
}
printf("Seconds: %d Nanoseconds: %ld\n", clock_time.tv_sec, clock_time.tv_nsec);

To answer the second part of your question:
static clock_t start_time = clock();
is not allowed because the return value of the function clock() is not known until runtime, but in C the initializer of a static variable must be a compile-time constant.
You can write:
static clock_t start_time = 0;
if (start_time == 0)
{
start_time = clock();
}
But this may or may not be suitable to use in this case, depending on whether zero is a legitimate return value of the function. If it could be, you would need something like:
static bool start_time_initialized = false;
static clock_t start_time;
if (!start_time_initialized)
{
start_time_initialized = true;
start_time = clock();
}
The above is reliable only if you cannot have two copies of this function running at once (it is not re-entrant).
If you have a POSIX library available you could use a pthread_once_t to do the same as the above bool but in a re-entrant way. See man pthread_once for details.
Note that C++ allows more complicated options in this area, but you have asked about C.
Note also that abbreviating "start time" as start_t is a very bad idea, because the suffix _t means "type" and should only be used for type names.

in the end the problem was that since we are running our code on bare metal, the clock() function wasn't working. We ended up using an internal timer on the M4 Processor that we found, so now everything is fine. Thanks for the answers.

Cannot get OpenAL to play sound

I've searched the net, I've searched here. I've found code that I could compile and it works fine, but for some reason my code won't produce any sound. I'm porting an old game to the PC (Windows,) and I'm trying to make it as authentic as possible, so I'm wanting to use generated wave forms. I've pretty much copied and pasted the working code (only adding in multiple voices,) and it still won't work (even thought the exact same code for a single voice works fine.) I know I'm missing something obvious, but I just cannot figure out what. Any help would be appreciated thank you.
First some notes... I was looking for something that would allow me to use the original methodology. The original system used paired bytes for music (sound effects - only 2 - were handled in code.) A time byte that counted down every time the routine was called, and a note byte that was played until time reached zero. this was done by patching into the interrupt vector, windows doesn't allow that, so I set up a timer that routing that accomplished the same thing. The timer kicks in, updates the display, and then runs the music sequence. I set this up with a defined time so that I only have one place to adjust the timing at (to get it as close as possible to the original sequence. The music is a generated wave form (and I've double checked the math, and even examined the generated data in debug mode,) and it looks good. The sequence looks good, but doesn't actually produce sound. I tried SDL2 first, and it's method of only playing 1 sound doesn't work for me, also, unless I make the sample duration extremely short (and the sound produced this way is awful,) I can't match the timing (it plays the entire sample through it's own interrupt without letting me make adjustments.) Also, blending the 3 voices together (when they all run with different timings,) is a mess. Most of the other engines I examined work in much the same way, they want to use their own callback interrupt and won't allow me to tweak it appropriately. This is why I started working with OpenAL. It allows multiple voices (sources,) and allows me to set the timings myself. On advice from several forums, I set it up so that the sample lengths are all multiples of full cycles.
Anyway, here's the code.
int main(int argc, char* argv[])
{
FreeConsole(); //Get rid of the DOS console, don't need it
if (InitLog() < 0) return -1; //Start logging
UINT_PTR tim = NULL;
SDL_Event event;
InitVideo(false); //Set to window for now, will put options in later
curmusic = 5;
InitAudio();
SetTimer(NULL,tim,_FREQ_,TimerProc);
SDL_PollEvent(&event);
while (event.type != SDL_KEYDOWN) SDL_PollEvent(&event);
SDL_Quit();
return 0;
}
void CALLBACK TimerProc(HWND hWind, UINT Msg, UINT_PTR idEvent, DWORD dwTime)
{
RenderOutput();
PlayMusic();
//UpdateTimer();
//RotateGate();
return;
}
void InitAudio(void)
{
ALCdevice *dev;
ALCcontext *cxt;
Log("Initializing OpenAL Audio\r\n");
dev = alcOpenDevice(NULL);
if (!dev) {
Log("Failed to open an audio device\r\n");
exit(-1);
}
cxt = alcCreateContext(dev, NULL);
alcMakeContextCurrent(cxt);
if(!cxt) {
Log("Failed to create audio context\r\n");
exit(-1);
}
alGenBuffers(4,Buffer);
if (alGetError() != AL_NO_ERROR) {
Log("Error during buffer creation\r\n");
exit(-1);
}
alGenSources(4, Source);
if (alGetError() != AL_NO_ERROR) {
Log("Error during source creation\r\n");
exit(-1);
}
return;
}
void PlayMusic()
{
static int oldsong, ofset, mtime[4];
double freq;
ALuint srate = 44100;
ALuint voice, i, note, len, hold;
short buf[4][_BUFFSIZE_];
bool test[4] = {false, false, false, false};
if (curmusic != oldsong) {
oldsong = (int)curmusic;
if (curmusic > 0)
ofset = moffset[(curmusic - 1)];
for (voice = 1; voice < 4; voice++)
alSourceStop(Source[voice]);
mtime[voice] = 0;
return;
}
if (curmusic == 0) return;
//Only 3 voices for music, but have
for (voice = 0; voice < 3; voice ++) { // 4 set asside for eventual sound effects
if (mtime[voice] == 0) { //is note finished
alSourceStop(Source[voice]); //It is, so stop the channel (source)
mtime[voice] = music[ofset++]; //Get the next duration
if (mtime[voice] == 0) {oldsong = 0; return;} //zero marks end, so restart
note = music[ofset++]; //Get the next note
if (note > 127) { //Old HW data was designed for could only
if (note == 255) note = 127; //use values 128 - 255 (255 = 127)
freq = (15980 / (voice + (int)(voice / 3))) / (256 - note); //freq of note
len = (ALuint)(srate / freq); //A single cycle of that freq.
hold = len;
while (len < (srate / (1000 / _FREQ_))) len += hold; //Multiply till 1 interrup cycle
while (len > _BUFFSIZE_) len -= hold; //Don't overload buffer
if (len == 0) len = _BUFFSIZE_; //Just to be safe
for (i = 0; i < len; i++) //calculate sine wave and put in buffer
buf[voice][i] = (short)((32760 * sin((2 * M_PI * i * freq) / srate)));
alBufferData(Buffer[voice], AL_FORMAT_MONO16, buf[voice], len, srate);
alSourcei(openAL.Source[i], AL_LOOPING, AL_TRUE);
alSourcei(Source[i], AL_BUFFER, Buffer[i]);
alSourcePlay(Source[voice]);
}
} else --mtime[voice];
}
}

Well, it turns out there were 3 problems with my code. First, you have to link the built wave buffer to the AL generated buffer "before" you link the buffer to the source:
alBufferData(buffer,AL_FORMAT_MONO16,&wave_sample,sample_lenght * sizeof(short),frequency);
alSourcei(source,AL_BUFFER,buffer);
Also in the above example, I multiplied the sample_length by how many bytes are in each sample (in this case "sizeof(short)".
The final problem was that you need to un-link a buffer from the source before you change the buffer data
alSourcei(source,AL_BUFFER,NULL);
The music would play, but not correctly until I added that line to the note change code.

Thread-safe GetTickCount64 implementation for Windows XP

I'm targeting Windows XP, and I need a function similar to GetTickCount64, that does not overflow.
I couldn't find a decent solution that is correct and thread safe, so I tried to roll my own.
Here's what I came up with:
ULONGLONG MyGetTickCount64(void)
{
static volatile DWORD dwHigh = 0;
static volatile DWORD dwLastLow = 0;
DWORD dwTickCount;
dwTickCount = GetTickCount();
if(dwTickCount < (DWORD)InterlockedExchange(&dwLastLow, dwTickCount))
{
InterlockedIncrement(&dwHigh);
}
return (ULONGLONG)dwTickCount | (ULONGLONG)dwHigh << 32;
}
Is it really thread safe?
Thread safety is difficult to check for correctness, so I'm not sure whether it's really correct in all cases.

On Windows the timer overflow problem in usually solved (in games) with using QueryPerformanceCounter() functions instead of GetTickCount():
double GetCycles() const
{
LARGE_INTEGER T1;
QueryPerformanceCounter( &T1 );
return static_cast<double>( T1.QuadPart );
}
Then you can multiply this number by reciprocal number of cycles per second to convert cycles to seconds:
void Initialize()
{
LARGE_INTEGER Freq;
QueryPerformanceFrequency( &Freq );
double CyclesPerSecond = static_cast<double>( Freq.QuadPart );
RecipCyclesPerSecond = 1.0 / CyclesPerSecond;
}
After initialization, this code is thread safe:
double GetSeconds() const
{
return GetCycles() * RecipCyclesPerSecond;
}
You can also checkout the full source code (portable between Windows and many other platforms) from our open-source Linderdaum Engine: http://www.linderdaum.com

How to put my structure variable into CPU caches to eliminate main memory page access time? Options

It's clear that there is no explicit way or certain system calls that
help programmers to put a variable into the CPU cache.
But I think that a certain programming style or well designed
algorithm can make it possible to increase the possibilities that the
variable can be cached into the CPU caches.
Here is my example:
I want to append an 8 byte structure at the end of an array consisting
of the same type of structures, declared in the global main memory
region.
This process is continuously repeated for 4 million operations. This process takes 6 seconds, 1.5 us for each operation. I think this result tells that the two memory areas have not been cached.
I got some clues from a cache-oblivious algorithm, so I tried several
ways to enhance this. Until now, no enhancement.
I think some clever codes can reduce the elapsed time, up to 10 to 100
times. Please show me the way.
-------------------------------------------------------------------------
Appended (2011-04-01)
Damon~ thank you for your comment!
After reading your comment, I analyzed my code again, and found several things
that I missed. The following code that I attached is the abbreviated version of my original code.
To accurately measure each operation's execution time (in the original code, there are several different types of operations), I inserted the time measuring code using clock_gettime() function. I thought if I measure each operation's execution time and accumulate them, the additional cost by the main loop can be avoided.
In the original code, the time measuring code was hidden by a macro function, so I totally forgot about it.
The running time of this code is almost 6 seconds. But if I get rid of the time measuring function in the main loop, it becomes 0.1 seconds.
Since the clock_gettime() function supports very high precision (upto 1 nano second), executed on the basis of an independent thread, and also it requires very big structure,
I think the function caused the cache-out of the main memory area where the consecutive insertions are performed.
Thank you again for your comment. For further enhancement, any suggestion will be very helpful for me to optimize my code.
I think the hierachically defined structure variable might cause unnecessary time cost,
but first I want to know how much it would be, before I change it to the more C-style code.
typedef struct t_ptr {
uint32 isleaf :1, isNextLeaf :1, ptr :30;
t_ptr(void) {
isleaf = false;
isNextLeaf = false;
ptr = NIL;
}
} PTR;
typedef struct t_key {
uint32 op :1, key :31;
t_key(void) {
op = OP_INS;
key = 0;
}
} KEY;
typedef struct t_key_pair {
KEY key;
PTR ptr;
t_key_pair() {
}
t_key_pair(KEY k, PTR p) {
key = k;
ptr = p;
}
} KeyPair;
typedef struct t_op {
KeyPair keyPair;
uint seq;
t_op() {
seq = 0;
}
} OP;
#define MAX_OP_LEN 4000000
typedef struct t_opq {
OP ops[MAX_OP_LEN];
int freeOffset;
int globalSeq;
bool queueOp(register KeyPair keyPair);
} OpQueue;
bool OpQueue::queueOp(register KeyPair keyPair) {
bool isFull = false;
if (freeOffset == (int) (MAX_OP_LEN - 1)) {
isFull = true;
}
ops[freeOffset].keyPair = keyPair;
ops[freeOffset].seq = globalSeq++;
freeOffset++;
}
OpQueue opQueue;
#include <sys/time.h>
int main() {
struct timespec startTime, endTime, totalTime;
for(int i = 0; i < 4000000; i++) {
clock_gettime(CLOCK_REALTIME, &startTime);
opQueue.queueOp(KeyPair());
clock_gettime(CLOCK_REALTIME, &endTime);
totalTime.tv_sec += (endTime.tv_sec - startTime.tv_sec);
totalTime.tv_nsec += (endTime.tv_nsec - startTime.tv_nsec);
}
printf("\n elapsed time: %ld", totalTime.tv_sec * 1000000LL + totalTime.tv_nsec / 1000L);
}

YOU don't put the structure into any cache. The CPU does that automatically for you. The CPU is even more clever than that; if you access sequential memory, it will start putting things from memory into the cache before you read them.
And really, it should be common sense that for a simple bit of code like this, the time you spend on measuring is ten times more than the time to perform the code (apparently 60 times in your case).
Since you put so much confidence in clock_gettime (): I suggest you call it five times in a row and store the results, then print the differences. There's resolution, there's precision, and there's how long it takes to return the current time, which is pretty damned long.

I have been unable to force caching, but you can force memory to be uncache-able. If you have large other datastructures you might exclude these so that they will not pollute your caches. This can be done by specifying PAGE_NOCACHE for the Windows VirutalAllocXXX functions.
http://msdn.microsoft.com/en-us/library/windows/desktop/aa366786(v=vs.85).aspx

How can I get the Windows system time with millisecond resolution?

How can I get the Windows system time with millisecond resolution?
If the above is not possible, then how can I get the operating system start time? I would like to use this value together with timeGetTime() in order to compute a system time with millisecond resolution.

Try this article from MSDN Magazine. It's actually quite complicated.
Implement a Continuously Updating, High-Resolution Time Provider for Windows
(archive link)

This is an elaboration of the above comments to explain the some of the whys.
First, the GetSystemTime* calls are the only Win32 APIs providing the system's time. This time has a fairly coarse granularity, as most applications do not need the overhead required to maintain a higher resolution. Time is (likely) stored internally as a 64-bit count of milliseconds. Calling timeGetTime gets the low order 32 bits. Calling GetSystemTime, etc requests Windows to return this millisecond time, after converting into days, etc and including the system start time.
There are two time sources in a machine: the CPU's clock and an on-board clock (e.g., real-time clock (RTC), Programmable Interval Timers (PIT), and High Precision Event Timer (HPET)). The first has a resolution of around ~0.5ns (2GHz) and the second is generally programmable down to a period of 1ms (though newer chips (HPET) have higher resolution). Windows uses these periodic ticks to perform certain operations, including updating the system time.
Applications can change this period via timerBeginPeriod; however, this affects the entire system. The OS will check / update regular events at the requested frequency. Under low CPU loads / frequencies, there are idle periods for power savings. At high frequencies, there isn't time to put the processor into low power states. See Timer Resolution for further details. Finally, each tick has some overhead and increasing the frequency consumes more CPU cycles.
For higher resolution time, the system time is not maintained to this accuracy, no more than Big Ben has a second hand. Using QueryPerformanceCounter (QPC) or the CPU's ticks (rdtsc) can provide the resolution between the system time ticks. Such an approach was used in the MSDN magazine article Kevin cited. Though these approaches may have drift (e.g., due to frequency scaling), etc and therefore need to be synced to the system time.

In Windows, the base of all time is a function called GetSystemTimeAsFiletime.
It returns a structure that is capable of holding a time with 100ns resoution.
It is kept in UTC
The FILETIME structure records the number of 100ns intervals since January 1, 1600; meaning its resolution is limited to 100ns.
This forms our first function:
A 64-bit number of 100ns ticks since January 1, 1600 is somewhat unwieldy. Windows provides a handy helper function, FileTimeToSystemTime that can decode this 64-bit integer into useful parts:
record SYSTEMTIME {
wYear: Word;
wMonth: Word;
wDayOfWeek: Word;
wDay: Word;
wHour: Word;
wMinute: Word;
wSecond: Word;
wMilliseconds: Word;
}
Notice that SYSTEMTIME has a built-in resolution limitation of 1ms
Now we have a way to go from FILETIME to SYSTEMTIME:
We could write the function to get the current system time as a SYSTEIMTIME structure:
SYSTEMTIME GetSystemTime()
{
//Get the current system time utc in it's native 100ns FILETIME structure
FILETIME ftNow;
GetSytemTimeAsFileTime(ref ft);
//Decode the 100ns intervals into a 1ms resolution SYSTEMTIME for us
SYSTEMTIME stNow;
FileTimeToSystemTime(ref stNow);
return stNow;
}
Except Windows already wrote such a function for you: GetSystemTime
Local, rather than UTC
Now what if you don't want the current time in UTC. What if you want it in your local time? Windows provides a function to convert a FILETIME that is in UTC into your local time: FileTimeToLocalFileTime
You could write a function that returns you a FILETIME in local time already:
FILETIME GetLocalTimeAsFileTime()
{
FILETIME ftNow;
GetSystemTimeAsFileTime(ref ftNow);
//convert to local
FILETIME ftNowLocal
FileTimeToLocalFileTime(ftNow, ref ftNowLocal);
return ftNowLocal;
}
And lets say you want to decode the local FILETIME into a SYSTEMTIME. That's no problem, you can use FileTimeToSystemTime again:
Fortunately, Windows already provides you a function that returns you the value:
Precise
There is another consideration. Before Windows 8, the clock had a resolution of around 15ms. In Windows 8 they improved the clock to 100ns (matching the resolution of FILETIME).
GetSystemTimeAsFileTime (legacy, 15ms resolution)
GetSystemTimeAsPreciseFileTime (Windows 8, 100ns resolution)
This means we should always prefer the new value:
You asked for the time
You asked for the time; but you have some choices.
The timezone:
UTC (system native)
Local timezone
The format:
FILETIME (system native, 100ns resolution)
SYTEMTIME (decoded, 1ms resolution)
Summary
100ns resolution: FILETIME
UTC: GetSytemTimeAsPreciseFileTime (or GetSystemTimeAsFileTime)
Local: (roll your own)
1ms resolution: SYSTEMTIME
UTC: GetSystemTime
Local: GetLocalTime

GetTickCount will not get it done for you.
Look into QueryPerformanceFrequency / QueryPerformanceCounter. The only gotcha here is CPU scaling though, so do your research.

Starting with Windows 8 Microsoft has introduced the new API command GetSystemTimePreciseAsFileTime
Unfortunately you can't use that if you create software which must also run on older operating systems.
My current solution is as follows, but be aware: The determined time is not exact, it is only near to the real time. The result should always be smaller or equal to the real time, but with a fixed error (unless the computer went to standby). The result has a millisecond resolution. For my purpose it is exact enough.
void GetHighResolutionSystemTime(SYSTEMTIME* pst)
{
static LARGE_INTEGER uFrequency = { 0 };
static LARGE_INTEGER uInitialCount;
static LARGE_INTEGER uInitialTime;
static bool bNoHighResolution = false;
if(!bNoHighResolution && uFrequency.QuadPart == 0)
{
// Initialize performance counter to system time mapping
bNoHighResolution = !QueryPerformanceFrequency(&uFrequency);
if(!bNoHighResolution)
{
FILETIME ftOld, ftInitial;
GetSystemTimeAsFileTime(&ftOld);
do
{
GetSystemTimeAsFileTime(&ftInitial);
QueryPerformanceCounter(&uInitialCount);
} while(ftOld.dwHighDateTime == ftInitial.dwHighDateTime && ftOld.dwLowDateTime == ftInitial.dwLowDateTime);
uInitialTime.LowPart = ftInitial.dwLowDateTime;
uInitialTime.HighPart = ftInitial.dwHighDateTime;
}
}
if(bNoHighResolution)
{
GetSystemTime(pst);
}
else
{
LARGE_INTEGER uNow, uSystemTime;
{
FILETIME ftTemp;
GetSystemTimeAsFileTime(&ftTemp);
uSystemTime.LowPart = ftTemp.dwLowDateTime;
uSystemTime.HighPart = ftTemp.dwHighDateTime;
}
QueryPerformanceCounter(&uNow);
LARGE_INTEGER uCurrentTime;
uCurrentTime.QuadPart = uInitialTime.QuadPart + (uNow.QuadPart - uInitialCount.QuadPart) * 10000000 / uFrequency.QuadPart;
if(uCurrentTime.QuadPart < uSystemTime.QuadPart || abs(uSystemTime.QuadPart - uCurrentTime.QuadPart) > 1000000)
{
// The performance counter has been frozen (e. g. after standby on laptops)
// -> Use current system time and determine the high performance time the next time we need it
uFrequency.QuadPart = 0;
uCurrentTime = uSystemTime;
}
FILETIME ftCurrent;
ftCurrent.dwLowDateTime = uCurrentTime.LowPart;
ftCurrent.dwHighDateTime = uCurrentTime.HighPart;
FileTimeToSystemTime(&ftCurrent, pst);
}
}

GetSystemTimeAsFileTime gives the best precision of any Win32 function for absolute time. QPF/QPC as Joel Clark suggested will give better relative time.

Since we all come here for quick snippets instead of boring explanations, I'll write one:
FILETIME t;
GetSystemTimeAsFileTime(&t); // unusable as is
ULARGE_INTEGER i;
i.LowPart = t.dwLowDateTime;
i.HighPart = t.dwHighDateTime;
int64_t ticks_since_1601 = i.QuadPart; // now usable
int64_t us_since_1601 = (i.QuadPart * 1e-1);
int64_t ms_since_1601 = (i.QuadPart * 1e-4);
int64_t sec_since_1601 = (i.QuadPart * 1e-7);
// unix epoch
int64_t unix_us = (i.QuadPart * 1e-1) - 11644473600LL * 1000000;
int64_t unix_ms = (i.QuadPart * 1e-4) - 11644473600LL * 1000;
double unix_sec = (i.QuadPart * 1e-7) - 11644473600LL;
// i.QuadPart is # of 100ns ticks since 1601-01-01T00:00:00Z
// difference to Unix Epoch is 11644473600 seconds (attention to units!)
No idea how drifting performance-counter-based answers went up, don't do slippage bugs, guys.

QueryPerformanceCounter() is built for fine-grained timer resolution.
It is the highest resolution timer that the system has to offer that you can use in your application code to identify performance bottlenecks
Here is a simple implementation for C# devs:
[DllImport("kernel32.dll")]
extern static short QueryPerformanceCounter(ref long x);
[DllImport("kernel32.dll")]
extern static short QueryPerformanceFrequency(ref long x);
private long m_endTime;
private long m_startTime;
private long m_frequency;
public Form1()
{
InitializeComponent();
}
public void Begin()
{
QueryPerformanceCounter(ref m_startTime);
}
public void End()
{
QueryPerformanceCounter(ref m_endTime);
}
private void button1_Click(object sender, EventArgs e)
{
QueryPerformanceFrequency(ref m_frequency);
Begin();
for (long i = 0; i < 1000; i++) ;
End();
MessageBox.Show((m_endTime - m_startTime).ToString());
}
If you are a C/C++ dev, then take a look here: How to use the QueryPerformanceCounter function to time code in Visual C++

Well, this one is very old, yet there is another useful function in Windows C library _ftime, which returns a structure with local time as time_t, milliseconds, timezone, and daylight saving time flag.

In C11 and above (or C++17 and above) you can use timespec_get() to get time with higher precision portably
#include <stdio.h>
#include <time.h>
int main(void)
{
struct timespec ts;
timespec_get(&ts, TIME_UTC);
char buff[100];
strftime(buff, sizeof buff, "%D %T", gmtime(&ts.tv_sec));
printf("Current time: %s.%09ld UTC\n", buff, ts.tv_nsec);
}
If you're using C++ then since C++11 you can use std::chrono::high_resolution_clock, std::chrono::system_clock (wall clock), or std::chrono::steady_clock (monotonic clock) in the new <chrono> header. No need to use Windows-specific APIs anymore
auto start1 = std::chrono::high_resolution_clock::now();
auto start2 = std::chrono::system_clock::now();
auto start3 = std::chrono::steady_clock::now();
// do some work
auto end1 = std::chrono::high_resolution_clock::now();
auto end2 = std::chrono::system_clock::now();
auto end3 = std::chrono::steady_clock::now();
std::chrono::duration<long long, std::milli> diff1 = end1 - start1;
std::chrono::duration<double, std::milli> diff2 = end2 - start2;
auto diff3 = std::chrono::duration_cast<std::chrono::milliseconds>(end3 - start3);
std::cout << diff.count() << ' ' << diff2.count() << ' ' << diff3.count() << '\n';

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio