How to measure execution time of function in golang, excluding waiting time - performance

I have a demand to measure execute time(cpu cost) of plugins in go, we can treat plugins as functions, there may be many goroutine running in the same time. More precisely, the execute time should exclude idle time(goroutine waiting time), only cpu acquire time(of current goroutine).
it's like:
go func(){
// this func is a plugin
** start to record cpu acquire time of current func/plugin/goroutine **
** run code **
** stop to record cpu acquire time of current func/plugin/goroutine **
log.Debugf("This function is buzy for %d millisecs.", cpuAcquireTime)
** report cpuAcquirTime to monitor **
}()
In my circunstance, it's hard to make unit test to measure function, the code is hard to decouple.
I search google and stackoverflow and find no clue, is there any solution to satisfy my demand, and does it take too much resource?

There is no built-in way in Go to measure CPU time, but you can do it in a platform-specific way.
For example, on POSIX systems (e.g. Linux) use clock_gettime with CLOCK_THREAD_CPUTIME_ID as the parameter.
Similarly you can use CLOCK_PROCESS_CPUTIME_ID to measure process CPU time and CLOCK_MONOTONIC for elapsed time.
Example:
package main
/*
#include <pthread.h>
#include <time.h>
#include <stdio.h>
static long long getThreadCpuTimeNs() {
struct timespec t;
if (clock_gettime(CLOCK_THREAD_CPUTIME_ID, &t)) {
perror("clock_gettime");
return 0;
}
return t.tv_sec * 1000000000LL + t.tv_nsec;
}
*/
import "C"
import "fmt"
import "time"
func main() {
cputime1 := C.getThreadCpuTimeNs()
doWork()
cputime2 := C.getThreadCpuTimeNs()
fmt.Printf("CPU time = %d ns\n", (cputime2 - cputime1))
}
func doWork() {
x := 1
for i := 0; i < 100000000; i++ {
x *= 11111
}
time.Sleep(time.Second)
}
Output:
CPU time = 31250000 ns
Note the output is in nanoseconds. So here CPU time is 0.03 sec.

For people who stumble on this later like I did, you can actually use the built-in syscall.Getrusage instead of using Cgo. An example of this looks like
func GetCPU() int64 {
usage := new(syscall.Rusage)
syscall.Getrusage(syscall.RUSAGE_SELF, usage)
return usage.Utime.Nano() + usage.Stime.Nano()
}
where I have added up the Utime (user CPU time) and Stime (system CPU time) of the calling process (RUSAGE_SELF) after converting them both to nanoseconds. man 2 getrusage has a bit more information on this system call.
The documentation for syscall.Timeval suggests that Nano() returns the time in nanoseconds since the Unix epoch, but in my tests and looking at the implementation it appears actually to return just the CPU time in nanoseconds, not in nanoseconds since the Unix epoch.

Related

Go: Unexpected Results from time.Sleep

Running this code (as a built executable, not with the debugger):
package main
import (
"fmt"
"time"
)
func main() {
startTime := time.Now()
for i := 0; i < 1000; i++ {
time.Sleep(1 * time.Millisecond)
}
fmt.Printf("%d\n", time.Since(startTime).Milliseconds())
}
I get the output:
15467
This seems like a major overhead for calling the time.Sleep() function; it's essentially taking 15ms per loop iteration, even though it's only sleeping for 1ms in each loop. This suggests that there's a 14ms overhead for running an iteration of the loop and initiating a sleep.
If we adjust the sleep duration:
package main
import (
"fmt"
"time"
)
func main() {
startTime := time.Now()
for i := 0; i < 1000; i++ {
time.Sleep(10 * time.Millisecond)
}
fmt.Printf("%d\n", time.Since(startTime).Milliseconds())
}
I get the output:
15611
This is essentially the same duration, even though it should be sleeping for 10x as long. This kills the idea that there's a 14ms overhead for the loop iteration and initiating the sleep, because if that were the case, it would be (14+10)*1000 = 24000ms total, which it is not.
What am I missing? Why would this code take the same duration to execute, whether the sleep duration is 1ms or 10ms?
Note that I've tried running this in the Go playground but get different results; I think it handles sleeping differently. These results are consistent on my laptop, which is running an i7-10510.
It is probably related to the frequency of the system's timer. For example, on Windows the clock ticks every 15 milliseconds (source):
For example, for Windows running on an x86 processor, the default interval between system clock ticks is typically about 15 milliseconds, and the minimum interval between system clock ticks is about 1 millisecond. Thus, the expiration time of a default-resolution timer (which ExAllocateTimer creates if the EX_TIMER_HIGH_RESOLUTION flag is not set) can be controlled only to within about 15 milliseconds, but the expiration time of a high-resolution timer can be controlled to within a millisecond.
If you need a higher precision timer you probably need to find a way to use High-Resolution Timers.
More information can be found in the threads below:
https://github.com/golang/go/issues/44343
https://randomascii.wordpress.com/2013/07/08/windows-timer-resolution-megawatts-wasted/
https://github.com/golang/go/issues/44343

What is the most time efficient way to guarantee at least one nanosecond has elapsed in Go? time.Sleep(time.Nanosecond) can take milliseconds

I have two function calls that I would like to separate by at least a nanosecond. But I want the delay to be as small as possible.
The code below shows an empty for loop is much more efficient at this than using time.Sleep(time.Nanosecond)
Is there an even more efficient way to guarantee at least one nanosecond has elapsed?
func TimeWaster () {
start := uint64(time.Now().UnixNano())
stop := uint64(time.Now().UnixNano())
fmt.Println(time.Duration(stop-start))//0s
//no nanoseconds pass
start = uint64(time.Now().UnixNano())
time.Sleep(time.Nanosecond)
stop = uint64(time.Now().UnixNano())
fmt.Println(time.Duration(stop-start))//6.9482ms
//much *much* more than one nanosecond passes
start = uint64(time.Now().UnixNano())
for uint64(time.Now().UnixNano()) == start {
//intentionally empty loop
}
stop = uint64(time.Now().UnixNano())
fmt.Println(time.Duration(stop-start))//59.3µs
//much quicker than time.Sleep(time.Nanosecond), but still much slower than 1 nanosecond
}
The package you're using strangely enforces uniqueness of values by time, so all you need to do is loop until the time package is no longer reporting the same value for the current nanosecond. This doesn't happen after 1 nanosecond, in fact the resolution of the UnixNano is about 100 nanoseconds on my machine and only updates about every 0.5 milliseconds.
package main
import (
"fmt"
"time"
)
func main() {
fmt.Println(time.Now().UnixNano())
smallWait()
fmt.Println(time.Now().UnixNano())
}
func smallWait() {
for start := time.Now().UnixNano(); time.Now().UnixNano() == start; {}
}
The loop is pretty self-explanatory, just repeat until the UnixNano() is different

Run a benchmark in parallel, i.e. simulate simultaneous requests

When testing a database procedure invoked from an API, when it runs sequentially, it seems to run consistently within ~3s. However we've noticed that when several requests come in at the same time, this can take much longer, causing time outs. I am trying to reproduce the "several requests at one time" case as a go test.
I tried the -parallel 10 go test flag, but the timings were the same at ~28s.
Is there something wrong with my benchmark function?
func Benchmark_RealCreate(b *testing.B) {
b.ResetTimer()
for n := 0; n < b.N; n++ {
name := randomdata.SillyName()
r := gofight.New()
u := []unit{unit{MefeUnitID: name, MefeCreatorUserID: "user", BzfeCreatorUserID: 55, ClassificationID: 2, UnitName: name, UnitDescriptionDetails: "Up on the hills and testing"}}
uJSON, _ := json.Marshal(u)
r.POST("/create").
SetBody(string(uJSON)).
Run(h.BasicEngine(), func(r gofight.HTTPResponse, rq gofight.HTTPRequest) {
assert.Contains(b, r.Body.String(), name)
assert.Equal(b, http.StatusOK, r.Code)
})
}
}
Else how I can achieve what I am after?
The -parallel flag is not for running the same test or benchmark parallel, in multiple instances.
Quoting from Command go: Testing flags:
-parallel n
Allow parallel execution of test functions that call t.Parallel.
The value of this flag is the maximum number of tests to run
simultaneously; by default, it is set to the value of GOMAXPROCS.
Note that -parallel only applies within a single test binary.
The 'go test' command may run tests for different packages
in parallel as well, according to the setting of the -p flag
(see 'go help build').
So basically if your tests allow, you can use -parallel to run multiple distinct testing or benchmark functions parallel, but not the same one in multiple instances.
In general, running multiple benchmark functions parallel defeats the purpose of benchmarking a function, because running it parallel in multiple instances usually distorts the benchmarking.
However, in your case code efficiency is not what you want to measure, you want to measure an external service. So go's built-in testing and benchmarking facilities are not really suitable.
Of course we could still use the convenience of having this "benchmark" run automatically when our other tests and benchmarks run, but you should not force this into the conventional benchmarking framework.
First thing that comes to mind is to use a for loop to launch n goroutines which all attempt to call the testable service. One problem with this is that this only ensures n concurrent goroutines at the start, because as the calls start to complete, there will be less and less concurrency for the remaining ones.
To overcome this and truly test n concurrent calls, you should have a worker pool with n workers, and continously feed jobs to this worker pool, making sure there will be n concurrent service calls at all times. For a worker pool implementation, see Is this an idiomatic worker thread pool in Go?
So all in all, fire up a worker pool with n workers, have a goroutine send jobs to it for an arbitrary time (e.g. for 30 seconds or 1 minute), and measure (count) the completed jobs. The benchmark result will be a simple division.
Also note that for solely testing purposes, a worker pool might not even be needed. You can just use a loop to launch n goroutines, but make sure each started goroutine keeps calling the service and not return after a single call.
I'm new to go, but why don't you try to make a function and run it using the standard parallel test?
func Benchmark_YourFunc(b *testing.B) {
b.RunParralel(func(pb *testing.PB) {
for pb.Next() {
YourFunc(staff ...T)
}
})
}
Your example code mixes several things. Why are you using assert there? This is not a test it is a benchmark. If the assert methods are slow, your benchmark will be.
You also moved the parallel execution out of your code into the test command. You should try to make a parallel request by using concurrency. Here just a possibility how to start:
func executeRoutines(routines int) {
wg := &sync.WaitGroup{}
wg.Add(routines)
starter := make(chan struct{})
for i := 0; i < routines; i++ {
go func() {
<-starter
// your request here
wg.Done()
}()
}
close(starter)
wg.Wait()
}
https://play.golang.org/p/ZFjUodniDHr
We start some goroutines here, which are waiting until starter is closed. So you can set your request direct after that line. That the function waits until all the requests are done we are using a WaitGroup.
BUT IMPORTANT: Go just supports concurrency. So if your system has not 10 cores the 10 goroutines will not run parallel. So ensure that you have enough cores availiable.
With this start you can play a little bit. You could start to call this function inside your benchmark. You could also play around with the numbers of goroutines.
As the documentation indicates, the parallel flag is to allow multiple different tests to be run in parallel. You generally do not want to run benchmarks in parallel because that would run different benchmarks at the same time, throwing off the results for all of them. If you want to benchmark parallel traffic, you need to write parallel traffic generation into your test. You need to decide how this should work with b.N which is your work factor; I would probably use it as the total request count, and write a benchmark or multiple benchmarks testing different concurrent load levels, e.g.:
func Benchmark_RealCreate(b *testing.B) {
concurrencyLevels := []int{5, 10, 20, 50}
for _, clients := range concurrencyLevels {
b.Run(fmt.Sprintf("%d_clients", clients), func(b *testing.B) {
sem := make(chan struct{}, clients)
wg := sync.WaitGroup{}
for n := 0; n < b.N; n++ {
wg.Add(1)
go func() {
name := randomdata.SillyName()
r := gofight.New()
u := []unit{unit{MefeUnitID: name, MefeCreatorUserID: "user", BzfeCreatorUserID: 55, ClassificationID: 2, UnitName: name, UnitDescriptionDetails: "Up on the hills and testing"}}
uJSON, _ := json.Marshal(u)
sem <- struct{}{}
r.POST("/create").
SetBody(string(uJSON)).
Run(h.BasicEngine(), func(r gofight.HTTPResponse, rq gofight.HTTPRequest) {})
<-sem
wg.Done()
}()
}
wg.Wait()
})
}
}
Note here I removed the initial ResetTimer; the timer doesn't start until you benchmark function is called, so calling it as the first op in your function is pointless. It's intended for cases where you have time-consuming setup prior to the benchmark loop that you don't want included in the benchmark results. I've also removed the assertions, because this is a benchmark, not a test; assertions are for validity checking in tests and only serve to throw off timing results in benchmarks.
One thing is benchmarking (measuring time code takes to run) another one is load/stress testing.
The -parallel flag as stated above, is to allow a set of tests to execute in parallel, allowing the test set to execute faster, not to execute some test N times in parallel.
But is simple to achieve what you want (execution of same test N times). Bellow a very simple (really quick and dirty) example just to clarify/demonstrate the important points, that gets this very specific situation done:
You define a test and mark it to be executed in parallel => TestAverage with a call to t.Parallel
You then define another test and use RunParallel to execute the number of instances of the test (TestAverage) you want.
The class to test:
package math
import (
"fmt"
"time"
)
func Average(xs []float64) float64 {
total := float64(0)
for _, x := range xs {
total += x
}
fmt.Printf("Current Unix Time: %v\n", time.Now().Unix())
time.Sleep(10 * time.Second)
fmt.Printf("Current Unix Time: %v\n", time.Now().Unix())
return total / float64(len(xs))
}
The testing funcs:
package math
import "testing"
func TestAverage(t *testing.T) {
t.Parallel()
var v float64
v = Average([]float64{1,2})
if v != 1.5 {
t.Error("Expected 1.5, got ", v)
}
}
func TestTeardownParallel(t *testing.T) {
// This Run will not return until the parallel tests finish.
t.Run("group", func(t *testing.T) {
t.Run("Test1", TestAverage)
t.Run("Test2", TestAverage)
t.Run("Test3", TestAverage)
})
// <tear-down code>
}
Then just do a go test and you should see:
X:\>go test
Current Unix Time: 1556717363
Current Unix Time: 1556717363
Current Unix Time: 1556717363
And 10 secs after that
...
Current Unix Time: 1556717373
Current Unix Time: 1556717373
Current Unix Time: 1556717373
Current Unix Time: 1556717373
Current Unix Time: 1556717383
PASS
ok _/X_/y 20.259s
The two extra lines, in the end are because TestAverage is executed also.
The interesting point here: if you remove t.Parallel() from TestAverage, it will all be execute sequencially:
X:> go test
Current Unix Time: 1556717564
Current Unix Time: 1556717574
Current Unix Time: 1556717574
Current Unix Time: 1556717584
Current Unix Time: 1556717584
Current Unix Time: 1556717594
Current Unix Time: 1556717594
Current Unix Time: 1556717604
PASS
ok _/X_/y 40.270s
This can of course be made more complex and extensible...

How to benchmark init() function

I was playing with following Go code which calculates Population count using lookup table:
package population
import (
"fmt"
)
var pc [256]byte
func init(){
for i := range pc {
pc[i] = pc[i/2] + byte(i&1)
}
}
func countPopulation() {
var x uint64 = 65535
populationCount := int(pc[byte(x>>(0*8))] +
pc[byte(x>>(1*8))] +
pc[byte(x>>(2*8))] +
pc[byte(x>>(3*8))] +
pc[byte(x>>(4*8))] +
pc[byte(x>>(5*8))] +
pc[byte(x>>(6*8))] +
pc[byte(x>>(7*8))])
fmt.Printf("Population count: %d\n", populationCount)
}
I have written following benchmark code to check performance of above code block:
package population
import "testing"
func BenchmarkCountPopulation(b *testing.B) {
for i := 0; i < b.N; i++ {
countPopulation()
}
}
Which gave me following result:
100000 18760 ns/op
PASS
ok gopl.io/ch2 2.055s
Then I moved the code from init() function to the countPopulation() function as below:
func countPopulation() {
var pc [256]byte
for i := range pc {
pc[i] = pc[i/2] + byte(i&1)
}
var x uint64 = 65535
populationCount := int(pc[byte(x>>(0*8))] +
pc[byte(x>>(1*8))] +
pc[byte(x>>(2*8))] +
pc[byte(x>>(3*8))] +
pc[byte(x>>(4*8))] +
pc[byte(x>>(5*8))] +
pc[byte(x>>(6*8))] +
pc[byte(x>>(7*8))])
fmt.Printf("Population count: %d\n", populationCount)
}
and once again ran the same benchmark code, which gave me following result:
100000 20565 ns/op
PASS
ok gopl.io/ch2 2.303s
After observing both the results it is clear that init() function is not in the scope of benchmark function. That's why first benchmark execution took lesser time compared to second execution.
Now I have another question which I am looking to get answer for.
If I need to benchmark only the init() method, considering there can be multiple init() functions in a package. How is it done in golang?
Thanks in advance.
Yes there can be multiple init()'s in a package, in-fact you can have multiple init()'s in a file. More information about init can be found here. Remember that init() is automatically called one time before your program's main() is even started.
The benchmark framework runs your code multiple times (in your case 100000). This allows it to measure very short functions, as well as very long functions. It doesn't make sense for benchmark to include the time for init(). The problem you are having is that you are not understanding the purpose of benchmarking. Benchmarking lets you compare two or more separate implementations to determine which one is faster (also you can compare performance based on input of the same function). It does not tell you where you should be doing that.
What you are basically doing is known as Premature Optimization. It's where you start optimizing code trying to make it as fast as possible, without knowing where your program actually spends most of its time. Profiling is the process of measuring the time and space complexity of a program. In practice, it allows you to see where your program is spending most of its time. Using that information, you can write more efficient functions. More information about profiling in go can be found in this blog post.

High resolution timers (millisecond precision) in Go on Windows

I'm trying to use Go's time.Timers to schedule tasks that need to be run in the right order with a precision in the order of half a millisecond. This works perfectly fine on OSX and on Linux, but fails every time on Windows.
The following code demonstrates the issue. It sets 5 timers, the first one to 1 ms, the second to 2 ms, ..., and the last one to 5 ms. Once a timer fires, its number is printed. On OSX and Linux, this obviously produced "12345" as output, but on Windows the numbers are more or less random (tested on Win 7 and Windows Server 2012).
package main
import (
"fmt"
"time"
)
func main() {
var timer1, timer2, timer3, timer4, timer5 *time.Timer
timer1 = time.NewTimer(1 * time.Millisecond)
timer2 = time.NewTimer(2 * time.Millisecond)
timer3 = time.NewTimer(3 * time.Millisecond)
timer4 = time.NewTimer(4 * time.Millisecond)
timer5 = time.NewTimer(5 * time.Millisecond)
// should print 12345
for {
select {
case <-timer1.C:
fmt.Print("1")
case <-timer2.C:
fmt.Print("2")
case <-timer3.C:
fmt.Print("3")
case <-timer4.C:
fmt.Print("4")
case <-timer5.C:
fmt.Print("5")
case <-time.After(200 * time.Millisecond):
return // exit the program
}
}
}
I think this behavior is due to the changes made in Go 1.6 (https://golang.org/doc/go1.6#runtime, 4th paragraph), where the Windows timer precision was reduced from 1 ms to 16 ms, although it should also have occurred with shorter intervals (of the order of 100 μs) before.
Is there any way to reset the global Windows timer precision back to 1 ms, or to access a high resolution timer that would make the example above work?
Since Go 1.7, timers now have a higher resolution and this problem should not occur.

Resources