Go - System Time in pico-second

Go - System Time in pico-second - go

Is it possible in go to get system time in less than nano second means in pico-second or like that? Actually, I want to measure two consecutive events time gap which I can't catch in nano-second in our fast system.

The cost of calling profiling functions/instructions on modern hardware is larger (and mor espread and prone to deviance) than the interval you're going to measure. So even if you try, you'll get erroneous results.
Consider tracking time lapse for 100 events, if that's at all possible.

The time resolution is hardware and operating system dependent. The Go time.Nanoseconds function provides up to nanosecond time resolution. On a PC, it's usually 1000 nanoseconds (1 microsecond) or 100 nanoseconds at best.

Related

CPU cycles vs. total CPU time

On Windows, GetProcessTimes() and QueryProcessCycleTime() can be used to get totals for all threads of an app. I expected (apparently naively) to find a proportional relationship between the total number of cycles and the total processor time (user + kernel). When converted to the same units (seconds) and expressed at a percent of the app's running time, they're not even close; and the ratio between them varies greatly.
Right after an app starts, they're fairly close.
3.6353% CPU cycles
5.2000% CPU time
0.79 Ratio
But this ratio increases as an app remains idle (below, after 11 hours, mostly idle).
0.0474% CPU cycles
0.0039% CPU time
12.16 Ratio
Apparently, cycles are counted that don't contribute to user or kernel time. I'm curious about how it works. Please enlighten me.
Thanks.
Vince

The GetProcessTimes and the QueryProcessCycleTime values are calculated in different ways. GetProcesTimes/GetThreadTimes are updated in response to timer interrupts, while QueryProcessCycleTime values are based on the tracking of actual thread execution times. These different ways of measuring may cause vastly different timing results when both API results are used and compared. Especially since the GetThreadTimes includes only fully completed time-slot values for its thread counters (see http://blog.kalmbachnet.de/?postid=28), which usually results in incorrect timings.
Since GetProcessTimes will in general report less time than actually spent (due to not always completing its time-slice) it makes sense
that its CPU time percentage will decrease over time compared to the cycle measurement percentage.

How does the clock work in Windows 7?

I have read this answer somewhere but I don't understand it exactly:
I understand Windows increments the clock every curTimeIncrement
(156001 100 nanoseconds) with the value of curTimeAdjustment (156001
+- N). But when the clock is read using GetSystemTime does the routine interpolate within the 156001 nanosecond*100 interval to
produce the precision indicated?
Can someone try to explain it to me?
What is curTimeIncrement, curTimeAdjustment and how can Windows do this?
What is the effect for this on getting the accurate time?
Is that true just for windows 7 or also other OS Win8, Linux, etcetera?

It refers to the values returned by GetSystemTimeAdjustment() on Windows. It tells you how the clock is being adjusted to catch up or slow down to match the real time. "Real" being the time kept by an institution like the NIST in the USA, they have an atomic clock whose accuracy is far, far higher than the clock built into your machine.
The Real Time Clock (RTC) in your machine has limited accuracy, a side-effect of keeping the hardware affordable, it tends to be off by a few seconds each month. So periodically the operating system contacts a time server through the Internet, time.windows.com is the common selection on Windows. Which tells it the current real time according to the atom clock oracle.
The inaccuracy of the RTC is not the only source for drift, sometimes the real time is changed intentionally. Adding a leap second to resynchronize the clocks with the true rotation of the Earth. The current day (24 x 60 x 60 seconds) is a bit too short, the Earth's rotation is slowing down by ~1.5 msec every century and is in general irregular due to large storms and earth-quakes. The inserted leap second makes up for that. The most recent one was added on June 30th of this year at 23:59:60 UTC. 60 is not a typo :)
The one previous to that was on June 30th, 2012. A bit notorious, the insertion of the leap second crashed a lot of Linux servers. Google "Linux leap second bug" to learn more about it.
Which is in general what the machinery underneath GetSystemTimeAdjustment() is trying to avoid, instantly changing the time with the value obtained from the time server is very dangerous. Software often has a hard assumption that time progresses steadily and misbehaves when it doesn't. Like observing the same time twice when the clock is set back. Or observing a fake time like 23:59:60 UTC due to the leap second insertion.
So it doesn't, the clock is updated 64 times per second, at the clock tick interrupt. Or in other words 1 / 64 = 0.015625 between ticks, 156250 in nanoseconds*100 units. If a clock adjustment needs to be made then it doesn't just add 156250 but slightly more or less. Thus slowly resynchronizing the clock to the true time and avoiding upsetting software.
This of course has an unpleasant side-effect on software that paints a clock. An obvious way to do it is to use a one second timer. But sometimes that is not a second, it won't be when a time adjustment is in progress. Then Nyquist's sampling theorem comes into play, sometimes a timer tick does not update the clock at all or it skips a second. Notable is that this is not the only reason why it is hard to keep a painted clock accurate, the timer notification itself is always delayed as well. A side-effect of software not being able to instantly execute. This is in fact the much more likely source of trouble, the clock adjustment is just icing on the cake that's easier to understand.
Awkward problem, Mr. Nyquist has taught us that you have to sample more frequently to eliminate undesirable aliasing effects. So a workaround is to just set the timer at a small interval, like 15 or 31 milliseconds, short enough for the user to no longer observe the missing update.

What is the clock source for the count returned by QueryPerfomanceCounter

I was under the impression that QueryPerformanceCounter was actually accessing the counter that feeds the HPET (High Performance Event Timer)---the difference of course being that HPET is a timer which send an interrupt when the counter value matches the desired interval whereas to make a timer "out of" QueryPerformanceCounter you have to write your own loop in software.
The only reason I had assumed the hardware behind the two was the same is because somewhere I had read that QueryPerformanceCounter was accessing a counter on the chipset.
http://www.gamedev.net/reference/programming/features/timing/ claims that QueryPerformanceCounter use chipset timers which apparently have a specified clock rate. However, I can verify that QueryPerformanceFrequency returns wildly different numbers on different machines, and in fact, the number can change slightly from boot to boot.
The numbers returned can sometimes be totally ridiculous---implying ticks in the nanosecond range. Of course when put together it all works; that is, writing timer software using QueryPerformanceCounter/QueryPerformanceFrequency allows you to get proper timing and latency is pretty low.
A software timer using these functions can be pretty good. For example, with an interval of 1 millisecond, over 30 seconds it's easy to nearly 100% of ticks to fall within 10% of the intended interval. With an interval of 100 microseconds, you still get a high success rate (99.7%) but the worst ticks can be way off (200 microseconds).
I'm wondering if the clock behind the HPET is the same. Supposedly HPET should still increase accuracy since it is a hardware timer, but as of yet I don't know how to access it in Windows.

Sounds like Microsoft has made these functions use "whatever best timer there is":
http://www.microsoft.com/whdc/system/sysinternals/mm-timer.mspx

Did you try updating your CPU driver for your AMD multicore system? Did you check whether your motherboard chipset is on the "bad" list? Did you try setting the CPU affinity?
One can also use the RTC-based time functions and/or a skip-detecting heuristic to eliminate trouble with QPC.
This has some hints: CPU clock frequency and thus QueryPerformanceCounter wrong?
Please improve this. It is a community wiki.

How do I get repeatable CPU-bound benchmark runtimes on Windows?

We sometimes have to run some CPU-bound tests where we want to measure runtime. The tests last in the order of a minute. The problem is that from run to run the runtime varies by quite a lot (+/- 5%). We suspect that the variation is caused by activity from other applications/services on the system, eg:
Applications doing housekeeping in their idle time (e.g. Visual Studio updating IntelliSense)
Filesystem indexers
etc..
What tips are there to make our benchmark timings more stable?
Currently we minimize all other applications, run the tests at "Above Normal" priority, and not touch the machine while it runs the test.

The usual approach is to perform lots of repetitions and then discard outliers. So, if the distractions such as the disk indexer only crops up once every hour or so, and you do 5 minutes runs repeated for 24 hours, you'll have plenty of results where nothing got in the way. It is a good idea to plot the probability density function to make sure you are understand what is going on. Also, if you are not interested in startup effects such as getting everything into the processor caches then make sure the experiment runs long enough to make them insignificant.

First of all, if it's just about benchmarking the application itself, you should use CPU time, not wallclock time as a measure. That's then (almost) free from influences of what the other processes or the system do. Secondly, as Dickon Reed pointed out, more repetitions increase confidence.

Quote from VC++ team blog, how they do performance tests:
To reduce noise on the benchmarking machines, we take several steps:
Stop as many services and processes as possible.
Disable network driver: this will turn off the interrupts from NIC caused by >broadcast packets.
Set the test’s processor affinity to run on one processor/core only.
Set the run to high priority which will decrease the number of context switches.
Run the test for several iterations.

I do the following:
Call the method x times and measure the time
Do this n times and calculate the mean and standard deviation of those measurements
Try to get the x to a point where you're at a >1 second per measurement. This will reduce the noise a bit.
The mean will tell you the average performance of your test and the standard deviation the stability of your test/measurements.
I also set my application at a very high priority, and when I test a single-thread algorithm I associate it with one cpu core to make sure there is not scheduling overhead.
This code demonstrates how to do this in .NET:
Thread.CurrentThread.Priority = ThreadPriority.Highest;
Process.GetCurrentProcess().PriorityClass = ProcessPriorityClass.RealTime;
if (Environment.ProcessorCount > 1)
{
Process.GetCurrentProcess().ProcessorAffinity =
new IntPtr(1 << (Environment.ProcessorCount - 1));
}

How do I obtain CPU cycle count in Win32?

In Win32, is there any way to get a unique cpu cycle count or something similar that would be uniform for multiple processes/languages/systems/etc.
I'm creating some log files, but have to produce multiple logfiles because we're hosting the .NET runtime, and I'd like to avoid calling from one to the other to log. As such, I was thinking I'd just produce two files, combine them, and then sort them, to get a coherent timeline involving cross-world calls.
However, GetTickCount does not increase for every call, so that's not reliable. Is there a better number, so that I get the calls in the right order when sorting?
Edit: Thanks to #Greg that put me on the track to QueryPerformanceCounter, which did the trick.

Heres an interesting article! says not to use RDTSC, but to instead use QueryPerformanceCounter.
Conclusion:
Using regular old timeGetTime() to do
timing is not reliable on many
Windows-based operating systems
because the granularity of the system
timer can be as high as 10-15
milliseconds, meaning that
timeGetTime() is only accurate to
10-15 milliseconds. [Note that the
high granularities occur on NT-based
operation systems like Windows NT,
2000, and XP. Windows 95 and 98 tend
to have much better granularity,
around 1-5 ms.]
However, if you call
timeBeginPeriod(1) at the beginning of
your program (and timeEndPeriod(1) at
the end), timeGetTime() will usually
become accurate to 1-2 milliseconds,
and will provide you with extremely
accurate timing information.
Sleep() behaves similarly; the length
of time that Sleep() actually sleeps
for goes hand-in-hand with the
granularity of timeGetTime(), so after
calling timeBeginPeriod(1) once,
Sleep(1) will actually sleep for 1-2
milliseconds,Sleep(2) for 2-3, and so
on (instead of sleeping in increments
as high as 10-15 ms).
For higher precision timing
(sub-millisecond accuracy), you'll
probably want to avoid using the
assembly mnemonic RDTSC because it is
hard to calibrate; instead, use
QueryPerformanceFrequency and
QueryPerformanceCounter, which are
accurate to less than 10 microseconds
(0.00001 seconds).
For simple timing, both timeGetTime
and QueryPerformanceCounter work well,
and QueryPerformanceCounter is
obviously more accurate. However, if
you need to do any kind of "timed
pauses" (such as those necessary for
framerate limiting), you need to be
careful of sitting in a loop calling
QueryPerformanceCounter, waiting for
it to reach a certain value; this will
eat up 100% of your processor.
Instead, consider a hybrid scheme,
where you call Sleep(1) (don't forget
timeBeginPeriod(1) first!) whenever
you need to pass more than 1 ms of
time, and then only enter the
QueryPerformanceCounter 100%-busy loop
to finish off the last < 1/1000th of a
second of the delay you need. This
will give you ultra-accurate delays
(accurate to 10 microseconds), with
very minimal CPU usage. See the code
above.

You can use the RDTSC CPU instruction (assuming x86). This instruction gives the CPU cycle counter, but be aware that it will increase very quickly to its maximum value, and then reset to 0. As the Wikipedia article mentions, you might be better off using the QueryPerformanceCounter function.

System.Diagnostics.Stopwatch.GetTimestamp() return the number of CPU cycle since a time origin (maybe when the computer start, but I'm not sure) and I've never seen it not increased between 2 calls.
The CPU Cycles will be specific for each computer so you can't use it to merge log file between 2 computers.

RDTSC output may depend on the current core's clock frequency, which for modern CPUs is neither constant nor, in a multicore machine, consistent.
Use the system time, and if dealing with feeds from multiple systems use an NTP time source. You can get reliable, consistent time readings that way; if the overhead is too much for your purposes, using the HPET to work out time elapsed since the last known reliable time reading is better than using the HPET alone.

Use the GetTickCount and add another counter as you merge the log files. Won't give you perfect sequence between the different log files, but it will at least keep all logs from each file in the correct order.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio