How to benchmark init() function - go

I was playing with following Go code which calculates Population count using lookup table:
package population
import (
"fmt"
)
var pc [256]byte
func init(){
for i := range pc {
pc[i] = pc[i/2] + byte(i&1)
}
}
func countPopulation() {
var x uint64 = 65535
populationCount := int(pc[byte(x>>(0*8))] +
pc[byte(x>>(1*8))] +
pc[byte(x>>(2*8))] +
pc[byte(x>>(3*8))] +
pc[byte(x>>(4*8))] +
pc[byte(x>>(5*8))] +
pc[byte(x>>(6*8))] +
pc[byte(x>>(7*8))])
fmt.Printf("Population count: %d\n", populationCount)
}
I have written following benchmark code to check performance of above code block:
package population
import "testing"
func BenchmarkCountPopulation(b *testing.B) {
for i := 0; i < b.N; i++ {
countPopulation()
}
}
Which gave me following result:
100000 18760 ns/op
PASS
ok gopl.io/ch2 2.055s
Then I moved the code from init() function to the countPopulation() function as below:
func countPopulation() {
var pc [256]byte
for i := range pc {
pc[i] = pc[i/2] + byte(i&1)
}
var x uint64 = 65535
populationCount := int(pc[byte(x>>(0*8))] +
pc[byte(x>>(1*8))] +
pc[byte(x>>(2*8))] +
pc[byte(x>>(3*8))] +
pc[byte(x>>(4*8))] +
pc[byte(x>>(5*8))] +
pc[byte(x>>(6*8))] +
pc[byte(x>>(7*8))])
fmt.Printf("Population count: %d\n", populationCount)
}
and once again ran the same benchmark code, which gave me following result:
100000 20565 ns/op
PASS
ok gopl.io/ch2 2.303s
After observing both the results it is clear that init() function is not in the scope of benchmark function. That's why first benchmark execution took lesser time compared to second execution.
Now I have another question which I am looking to get answer for.
If I need to benchmark only the init() method, considering there can be multiple init() functions in a package. How is it done in golang?
Thanks in advance.

Yes there can be multiple init()'s in a package, in-fact you can have multiple init()'s in a file. More information about init can be found here. Remember that init() is automatically called one time before your program's main() is even started.
The benchmark framework runs your code multiple times (in your case 100000). This allows it to measure very short functions, as well as very long functions. It doesn't make sense for benchmark to include the time for init(). The problem you are having is that you are not understanding the purpose of benchmarking. Benchmarking lets you compare two or more separate implementations to determine which one is faster (also you can compare performance based on input of the same function). It does not tell you where you should be doing that.
What you are basically doing is known as Premature Optimization. It's where you start optimizing code trying to make it as fast as possible, without knowing where your program actually spends most of its time. Profiling is the process of measuring the time and space complexity of a program. In practice, it allows you to see where your program is spending most of its time. Using that information, you can write more efficient functions. More information about profiling in go can be found in this blog post.

Related

Is this a safe way to use ulid concurrently with other libraries too?

I'm trying to use for the first time the ulid package.
In their README they say:
Please note that rand.Rand from the math package is not safe for concurrent use.
Instantiate one per long living go-routine or use a sync.Pool if you want to avoid the potential contention of a locked rand.Source as its been frequently observed in the package level functions.
Can you help me understand what does this mean and how to write SAFE code for concurrent use with libraries such ent or gqlgen?
Example: I'm using the below code in my app to generate new IDs (sometimes even many of them in the same millisecond which is fine for ulid).
import (
"math/rand"
"time"
"github.com/oklog/ulid/v2"
)
var defaultEntropySource *ulid.MonotonicEntropy
func init() {
defaultEntropySource = ulid.Monotonic(rand.New(rand.NewSource(time.Now().UnixNano())), 0)
}
func NewID() string {
return ulid.MustNew(ulid.Timestamp(time.Now()), defaultEntropySource).String()
}
Is this a safe way to use the package?
Is this a safe way to use the package?
No, that sentence suggests that each rand.Source should be local to the goroutine, your defaultEntropySource rand.Source piece is potentially shared between multiple goroutines.
As documentated New function, you only need to make sure the entropy reader is safe for concurrent use, but Monotonic is not. Here is a two ways of implementing the documentation suggestion:
Create a single rand.Source per call o NewID(), allocates a new entropy for each call to NewID
func NewID() string {
defaultEntropySource := ulid.Monotonic(rand.New(rand.NewSource(time.Now().UnixNano())), 0)
return ulid.MustNew(ulid.Timestamp(time.Now()), defaultEntropySource).String()
}
Playground
Like above but using sync.Pool to possible reuse previously allocated rand.Sources
var entropyPool = sync.Pool{
New: func() any {
entropy := ulid.Monotonic(rand.New(rand.NewSource(time.Now().UnixNano())), 0)
return entropy
},
}
func NewID() string {
e := entropyPool.Get().(*ulid.MonotonicEntropy)
s := ulid.MustNew(ulid.Timestamp(time.Now()), e).String()
entropyPool.Put(e)
return s
}
Playground

What is the most time efficient way to guarantee at least one nanosecond has elapsed in Go? time.Sleep(time.Nanosecond) can take milliseconds

I have two function calls that I would like to separate by at least a nanosecond. But I want the delay to be as small as possible.
The code below shows an empty for loop is much more efficient at this than using time.Sleep(time.Nanosecond)
Is there an even more efficient way to guarantee at least one nanosecond has elapsed?
func TimeWaster () {
start := uint64(time.Now().UnixNano())
stop := uint64(time.Now().UnixNano())
fmt.Println(time.Duration(stop-start))//0s
//no nanoseconds pass
start = uint64(time.Now().UnixNano())
time.Sleep(time.Nanosecond)
stop = uint64(time.Now().UnixNano())
fmt.Println(time.Duration(stop-start))//6.9482ms
//much *much* more than one nanosecond passes
start = uint64(time.Now().UnixNano())
for uint64(time.Now().UnixNano()) == start {
//intentionally empty loop
}
stop = uint64(time.Now().UnixNano())
fmt.Println(time.Duration(stop-start))//59.3µs
//much quicker than time.Sleep(time.Nanosecond), but still much slower than 1 nanosecond
}
The package you're using strangely enforces uniqueness of values by time, so all you need to do is loop until the time package is no longer reporting the same value for the current nanosecond. This doesn't happen after 1 nanosecond, in fact the resolution of the UnixNano is about 100 nanoseconds on my machine and only updates about every 0.5 milliseconds.
package main
import (
"fmt"
"time"
)
func main() {
fmt.Println(time.Now().UnixNano())
smallWait()
fmt.Println(time.Now().UnixNano())
}
func smallWait() {
for start := time.Now().UnixNano(); time.Now().UnixNano() == start; {}
}
The loop is pretty self-explanatory, just repeat until the UnixNano() is different

What happens when reading or writing concurrently without a mutex

In Go, a sync.Mutex or chan is used to prevent concurrent access of shared objects. However, in some cases I am just interested in the "latest" value of a variable or field of an object.
Or I like to write a value and do not care if another go-routine overwrites it later or has just overwritten it before.
Update: TLDR; Just don't do this. It is not safe. Read the answers, comments, and linked documents!
Update 2021: The Go memory model is going to be specified more thoroughly and there are three great articles by Russ Cox that will teach you more about the surprising effects of unsynchronized memory access. These articles summarize a lot of the below discussions and learnings.
Here are two variants good and bad of an example program, where both seem to produce "correct" output using the current Go runtime:
package main
import (
"flag"
"fmt"
"math/rand"
"time"
)
var bogus = flag.Bool("bogus", false, "use bogus code")
func pause() {
time.Sleep(time.Duration(rand.Uint32()%100) * time.Millisecond)
}
func bad() {
stop := time.After(100 * time.Millisecond)
var name string
// start some producers doing concurrent writes (DANGER!)
for i := 0; i < 10; i++ {
go func(i int) {
pause()
name = fmt.Sprintf("name = %d", i)
}(i)
}
// start consumer that shows the current value every 10ms
go func() {
tick := time.Tick(10 * time.Millisecond)
for {
select {
case <-stop:
return
case <-tick:
fmt.Println("read:", name)
}
}
}()
<-stop
}
func good() {
stop := time.After(100 * time.Millisecond)
names := make(chan string, 10)
// start some producers concurrently writing to a channel (GOOD!)
for i := 0; i < 10; i++ {
go func(i int) {
pause()
names <- fmt.Sprintf("name = %d", i)
}(i)
}
// start consumer that shows the current value every 10ms
go func() {
tick := time.Tick(10 * time.Millisecond)
var name string
for {
select {
case name = <-names:
case <-stop:
return
case <-tick:
fmt.Println("read:", name)
}
}
}()
<-stop
}
func main() {
flag.Parse()
if *bogus {
bad()
} else {
good()
}
}
The expected output is as follows:
...
read: name = 3
read: name = 3
read: name = 5
read: name = 4
...
Any combination of read: and read: name=[0-9] is correct output for this program. Receiving any other string as output would be an error.
When running this program with go run --race bogus.go it is safe.
However, go run --race bogus.go -bogus warns of the concurrent reads and writes.
For map types and when appending to slices I always need a mutex or a similar method of protection to avoid segfaults or unexpected behavior. However, reading and writing literals (atomic values) to variables or field values seems to be safe.
Question: Which Go data types can I safely read and safely write concurrently without a mutext and without producing segfaults and without reading garbage from memory?
Please explain why something is safe or unsafe in Go in your answer.
Update: I rewrote the example to better reflect the original code, where I had the the concurrent writes issue. The important leanings are already in the comments. I will accept an answer that summarizes these learnings with enough detail (esp. on the Go-runtime).
However, in some cases I am just interested in the latest value of a variable or field of an object.
Here is the fundamental problem: What does the word "latest" mean?
Suppoose that, mathematically speaking, we have a sequence of values Xi, with 0 <= i < N. Then obviously Xj is "later than" Xi if j > i. That's a nice simple definition of "latest" and is probably the one you want.
But when two separate CPUs within a single machine—including two goroutines in a Go program—are working at the same time, time itself loses meaning. We cannot say whether i < j, i == j, or i > j. So there is no correct definition for the word latest.
To solve this kind of problem, modern CPU hardware, and Go as a programming language, gives us certain synchronization primitives. If CPUs A and B execute memory fence instructions, or synchronization instructions, or use whatever other hardware provisions exist, the CPUs (and/or some external hardware) will insert whatever is required for the notion of "time" to regain its meaning. That is, if the CPU uses barrier instructions, we can say that a memory load or store that was executed before the barrier is a "before" and a memory load or store that is executed after the barrier is an "after".
(The actual implementation, in some modern hardware, consists of load and store buffers that can rearrange the order in which loads and stores go to memory. The barrier instruction either synchronizes the buffers, or places an actual barrier in them, so that loads and stores cannot move across the barrier. This particular concrete implementation gives an easy way to think about the problem, but isn't complete: you should think of time as simply not existing outside the hardware-provided synchronization, i.e., all loads from, and stores to, some location are happening simultaneously, rather than in some sequential order, except for these barriers.)
In any case, Go's sync package gives you a simple high level access method to these kinds of barriers. Compiled code that executes before a mutex Lock call really does complete before the lock function returns, and the code that executes after the call really does not start until after the lock function returns.
Go's channels provide the same kinds of before/after time guarantees.
Go's sync/atomic package provides much lower level guarantees. In general you should avoid this in favor of the higher level channel or sync.Mutex style guarantees. (Edit to add note: You could use sync/atomic's Pointer operations here, but not with the string type directly, as Go strings are actually implemented as a header containing two separate values: a pointer, and a length. You could solve this with another layer of indirection, by updating a pointer that points to the string object. But before you even consider doing that, you should benchmark the use of the language's preferred methods and verify that these are a problem, because code that works at the sync/atomic level is hard to write and hard to debug.)
Which Go data types can I safely read and safely write concurrently without a mutext and without producing segfaults and without reading garbage from memory?
None.
It really is that simple: You cannot, under no circumstance whatsoever, read and write concurrently to anything in Go.
(Btw: Your "correct" program is not correct, it is racy and even if you get rid of the race condition it would not deterministically produce the output.)
Why can't you use channels
package main
import (
"fmt"
"sync"
)
func main() {
var wg sync.WaitGroup // wait group to close channel
var buffer int = 1 // buffer of the channel
// channel to get the share data
cName := make(chan string, buffer)
for i := 0; i < 10; i++ {
wg.Add(1) // add to wait group
go func(i int) {
cName <- fmt.Sprintf("name = %d", i)
wg.Done() // decrease wait group.
}(i)
}
go func() {
wg.Wait() // wait of wait group to be 0
close(cName) // close the channel
}()
// process all the data
for n := range cName {
println("read:", n)
}
}
The above code returns the following output
read: name = 0
read: name = 5
read: name = 1
read: name = 2
read: name = 3
read: name = 4
read: name = 7
read: name = 6
read: name = 8
read: name = 9
https://play.golang.org/p/R4n9ssPMOeS
Article about channels

Run a benchmark in parallel, i.e. simulate simultaneous requests

When testing a database procedure invoked from an API, when it runs sequentially, it seems to run consistently within ~3s. However we've noticed that when several requests come in at the same time, this can take much longer, causing time outs. I am trying to reproduce the "several requests at one time" case as a go test.
I tried the -parallel 10 go test flag, but the timings were the same at ~28s.
Is there something wrong with my benchmark function?
func Benchmark_RealCreate(b *testing.B) {
b.ResetTimer()
for n := 0; n < b.N; n++ {
name := randomdata.SillyName()
r := gofight.New()
u := []unit{unit{MefeUnitID: name, MefeCreatorUserID: "user", BzfeCreatorUserID: 55, ClassificationID: 2, UnitName: name, UnitDescriptionDetails: "Up on the hills and testing"}}
uJSON, _ := json.Marshal(u)
r.POST("/create").
SetBody(string(uJSON)).
Run(h.BasicEngine(), func(r gofight.HTTPResponse, rq gofight.HTTPRequest) {
assert.Contains(b, r.Body.String(), name)
assert.Equal(b, http.StatusOK, r.Code)
})
}
}
Else how I can achieve what I am after?
The -parallel flag is not for running the same test or benchmark parallel, in multiple instances.
Quoting from Command go: Testing flags:
-parallel n
Allow parallel execution of test functions that call t.Parallel.
The value of this flag is the maximum number of tests to run
simultaneously; by default, it is set to the value of GOMAXPROCS.
Note that -parallel only applies within a single test binary.
The 'go test' command may run tests for different packages
in parallel as well, according to the setting of the -p flag
(see 'go help build').
So basically if your tests allow, you can use -parallel to run multiple distinct testing or benchmark functions parallel, but not the same one in multiple instances.
In general, running multiple benchmark functions parallel defeats the purpose of benchmarking a function, because running it parallel in multiple instances usually distorts the benchmarking.
However, in your case code efficiency is not what you want to measure, you want to measure an external service. So go's built-in testing and benchmarking facilities are not really suitable.
Of course we could still use the convenience of having this "benchmark" run automatically when our other tests and benchmarks run, but you should not force this into the conventional benchmarking framework.
First thing that comes to mind is to use a for loop to launch n goroutines which all attempt to call the testable service. One problem with this is that this only ensures n concurrent goroutines at the start, because as the calls start to complete, there will be less and less concurrency for the remaining ones.
To overcome this and truly test n concurrent calls, you should have a worker pool with n workers, and continously feed jobs to this worker pool, making sure there will be n concurrent service calls at all times. For a worker pool implementation, see Is this an idiomatic worker thread pool in Go?
So all in all, fire up a worker pool with n workers, have a goroutine send jobs to it for an arbitrary time (e.g. for 30 seconds or 1 minute), and measure (count) the completed jobs. The benchmark result will be a simple division.
Also note that for solely testing purposes, a worker pool might not even be needed. You can just use a loop to launch n goroutines, but make sure each started goroutine keeps calling the service and not return after a single call.
I'm new to go, but why don't you try to make a function and run it using the standard parallel test?
func Benchmark_YourFunc(b *testing.B) {
b.RunParralel(func(pb *testing.PB) {
for pb.Next() {
YourFunc(staff ...T)
}
})
}
Your example code mixes several things. Why are you using assert there? This is not a test it is a benchmark. If the assert methods are slow, your benchmark will be.
You also moved the parallel execution out of your code into the test command. You should try to make a parallel request by using concurrency. Here just a possibility how to start:
func executeRoutines(routines int) {
wg := &sync.WaitGroup{}
wg.Add(routines)
starter := make(chan struct{})
for i := 0; i < routines; i++ {
go func() {
<-starter
// your request here
wg.Done()
}()
}
close(starter)
wg.Wait()
}
https://play.golang.org/p/ZFjUodniDHr
We start some goroutines here, which are waiting until starter is closed. So you can set your request direct after that line. That the function waits until all the requests are done we are using a WaitGroup.
BUT IMPORTANT: Go just supports concurrency. So if your system has not 10 cores the 10 goroutines will not run parallel. So ensure that you have enough cores availiable.
With this start you can play a little bit. You could start to call this function inside your benchmark. You could also play around with the numbers of goroutines.
As the documentation indicates, the parallel flag is to allow multiple different tests to be run in parallel. You generally do not want to run benchmarks in parallel because that would run different benchmarks at the same time, throwing off the results for all of them. If you want to benchmark parallel traffic, you need to write parallel traffic generation into your test. You need to decide how this should work with b.N which is your work factor; I would probably use it as the total request count, and write a benchmark or multiple benchmarks testing different concurrent load levels, e.g.:
func Benchmark_RealCreate(b *testing.B) {
concurrencyLevels := []int{5, 10, 20, 50}
for _, clients := range concurrencyLevels {
b.Run(fmt.Sprintf("%d_clients", clients), func(b *testing.B) {
sem := make(chan struct{}, clients)
wg := sync.WaitGroup{}
for n := 0; n < b.N; n++ {
wg.Add(1)
go func() {
name := randomdata.SillyName()
r := gofight.New()
u := []unit{unit{MefeUnitID: name, MefeCreatorUserID: "user", BzfeCreatorUserID: 55, ClassificationID: 2, UnitName: name, UnitDescriptionDetails: "Up on the hills and testing"}}
uJSON, _ := json.Marshal(u)
sem <- struct{}{}
r.POST("/create").
SetBody(string(uJSON)).
Run(h.BasicEngine(), func(r gofight.HTTPResponse, rq gofight.HTTPRequest) {})
<-sem
wg.Done()
}()
}
wg.Wait()
})
}
}
Note here I removed the initial ResetTimer; the timer doesn't start until you benchmark function is called, so calling it as the first op in your function is pointless. It's intended for cases where you have time-consuming setup prior to the benchmark loop that you don't want included in the benchmark results. I've also removed the assertions, because this is a benchmark, not a test; assertions are for validity checking in tests and only serve to throw off timing results in benchmarks.
One thing is benchmarking (measuring time code takes to run) another one is load/stress testing.
The -parallel flag as stated above, is to allow a set of tests to execute in parallel, allowing the test set to execute faster, not to execute some test N times in parallel.
But is simple to achieve what you want (execution of same test N times). Bellow a very simple (really quick and dirty) example just to clarify/demonstrate the important points, that gets this very specific situation done:
You define a test and mark it to be executed in parallel => TestAverage with a call to t.Parallel
You then define another test and use RunParallel to execute the number of instances of the test (TestAverage) you want.
The class to test:
package math
import (
"fmt"
"time"
)
func Average(xs []float64) float64 {
total := float64(0)
for _, x := range xs {
total += x
}
fmt.Printf("Current Unix Time: %v\n", time.Now().Unix())
time.Sleep(10 * time.Second)
fmt.Printf("Current Unix Time: %v\n", time.Now().Unix())
return total / float64(len(xs))
}
The testing funcs:
package math
import "testing"
func TestAverage(t *testing.T) {
t.Parallel()
var v float64
v = Average([]float64{1,2})
if v != 1.5 {
t.Error("Expected 1.5, got ", v)
}
}
func TestTeardownParallel(t *testing.T) {
// This Run will not return until the parallel tests finish.
t.Run("group", func(t *testing.T) {
t.Run("Test1", TestAverage)
t.Run("Test2", TestAverage)
t.Run("Test3", TestAverage)
})
// <tear-down code>
}
Then just do a go test and you should see:
X:\>go test
Current Unix Time: 1556717363
Current Unix Time: 1556717363
Current Unix Time: 1556717363
And 10 secs after that
...
Current Unix Time: 1556717373
Current Unix Time: 1556717373
Current Unix Time: 1556717373
Current Unix Time: 1556717373
Current Unix Time: 1556717383
PASS
ok _/X_/y 20.259s
The two extra lines, in the end are because TestAverage is executed also.
The interesting point here: if you remove t.Parallel() from TestAverage, it will all be execute sequencially:
X:> go test
Current Unix Time: 1556717564
Current Unix Time: 1556717574
Current Unix Time: 1556717574
Current Unix Time: 1556717584
Current Unix Time: 1556717584
Current Unix Time: 1556717594
Current Unix Time: 1556717594
Current Unix Time: 1556717604
PASS
ok _/X_/y 40.270s
This can of course be made more complex and extensible...

How do I find out which of 2 methods is faster?

I have 2 methods to trim the domain suffix from a subdomain and I'd like to find out which one is faster. How do I do that?
2 string trimming methods
You can use the builtin benchmark capabilities of go test.
For example (on play):
import (
"strings"
"testing"
)
func BenchmarkStrip1(b *testing.B) {
for br := 0; br < b.N; br++ {
host := "subdomain.domain.tld"
s := strings.Index(host, ".")
_ = host[:s]
}
}
func BenchmarkStrip2(b *testing.B) {
for br := 0; br < b.N; br++ {
host := "subdomain.domain.tld"
strings.TrimSuffix(host, ".domain.tld")
}
}
Store this code in somename_test.go and run go test -test.bench='.*'. For me this gives
the following output:
% go test -test.bench='.*'
testing: warning: no tests to run
PASS
BenchmarkStrip1 100000000 12.9 ns/op
BenchmarkStrip2 100000000 16.1 ns/op
ok 21614966 2.935s
The benchmark utility will attempt to do a certain number of runs until a meaningful time is
measured which is reflected in the output by the number 100000000. The code was run
100000000 times and each operation in the loop took 12.9 ns and 16.1 ns respectively.
So you can conclude that the code in BenchmarkStrip1 performed better.
Regardless of the outcome, it is often better to profile your program to see where the
real bottleneck is instead of wasting your time with micro benchmarks like these.
I would also not recommend writing your own benchmarking as there are some factors you might
not consider such as the garbage collector and running your samples long enough.

Resources