I have written a benchmark for my chess engine in Go:
func BenchmarkStartpos(b *testing.B) {
board := ParseFen(startpos)
for i := 0; i < b.N; i++ {
Perft(&board, 5)
}
}
I see this output when it runs:
goos: darwin
goarch: amd64
BenchmarkStartpos-4 10 108737398 ns/op
PASS
ok _/Users/dylhunn/Documents/go-chess 1.215s
I want to use the time per execution (in this case, 108737398 ns/op) to compute another value, and also print it as a result of the benchmark. Specifically, I want to output nodes per second, which is given as the result of the Perft call divided by the time per call.
How can I access the time the benchmark took to execute, so I can print my own derived results?
You may use the testing.Benchmark() function to manually measure / benchmark "benchmark" functions (that have the signature of func(*testing.B)), and you get the result as a value of testing.BenchmarkResult, which is a struct with all the details you need:
type BenchmarkResult struct {
N int // The number of iterations.
T time.Duration // The total time taken.
Bytes int64 // Bytes processed in one iteration.
MemAllocs uint64 // The total number of memory allocations.
MemBytes uint64 // The total number of bytes allocated.
}
The time per execution is returned by the BenchmarkResult.NsPerOp() method, you can do whatever you want to with that.
See this simple example:
func main() {
res := testing.Benchmark(BenchmarkSleep)
fmt.Println(res)
fmt.Println("Ns per op:", res.NsPerOp())
fmt.Println("Time per op:", time.Duration(res.NsPerOp()))
}
func BenchmarkSleep(b *testing.B) {
for i := 0; i < b.N; i++ {
time.Sleep(time.Millisecond * 12)
}
}
Output is (try it on the Go Playground):
100 12000000 ns/op
Ns per op: 12000000
Time per op: 12ms
Related
I am trying to measure the evolution of the number of heap-allocated objects before and after I call a function. I am forcing runtime.GC() and using runtime.ReadMemStats to measure the number of heap objects I have before and after.
The problem I have is that I sometimes see unexpected heap growth. And it is different after each run.
A simple example below, where I would always expect to see a zero heap-objects growth.
https://go.dev/play/p/FBWfXQHClaG
var mem1_before, mem2_before, mem1_after, mem2_after runtime.MemStats
func measure_nothing(before, after *runtime.MemStats) {
runtime.GC()
runtime.ReadMemStats(before)
runtime.GC()
runtime.ReadMemStats(after)
}
func main() {
measure_nothing(&mem1_before, &mem1_after)
measure_nothing(&mem2_before, &mem2_after)
log.Printf("HeapObjects diff = %d", int64(mem1_after.HeapObjects-mem1_before.HeapObjects))
log.Printf("HeapAlloc diff %d", int64(mem1_after.HeapAlloc-mem1_before.HeapAlloc))
log.Printf("HeapObjects diff = %d", int64(mem2_after.HeapObjects-mem2_before.HeapObjects))
log.Printf("HeapAlloc diff %d", int64(mem2_after.HeapAlloc-mem2_before.HeapAlloc))
}
Sample output:
2009/11/10 23:00:00 HeapObjects diff = 0
2009/11/10 23:00:00 HeapAlloc diff 0
2009/11/10 23:00:00 HeapObjects diff = 4
2009/11/10 23:00:00 HeapAlloc diff 1864
Is what I'm trying to do unpractical? I assume the runtime is doing things that allocate/free heap-memory. Can I tell it to stop to make my measurements? (this is for a test checking for memory leaks, not production code)
You can't predict what garbage collection and reading all the memory stats require in the background. Calling those to calculate memory allocations and usage is not a reliable way.
Luckily for us, Go's testing framework can monitor and calculate memory usage.
So what you should do is write a benchmark function and let the testing framework do its job to report memory allocations and usage.
Let's assume we want to measure this foo() function:
var x []int64
func foo(allocs, size int) {
for i := 0; i < allocs; i++ {
x = make([]int64, size)
}
}
All it does is allocate a slice of the given size, and it does this with the given number of times (allocs).
Let's write benchmarking functions for different scenarios:
func BenchmarkFoo_0_0(b *testing.B) {
for i := 0; i < b.N; i++ {
foo(0, 0)
}
}
func BenchmarkFoo_1_1(b *testing.B) {
for i := 0; i < b.N; i++ {
foo(1, 1)
}
}
func BenchmarkFoo_2_2(b *testing.B) {
for i := 0; i < b.N; i++ {
foo(2, 2)
}
}
Running the benchmark with go test -bench . -benchmem, the output is:
BenchmarkFoo_0_0-8 1000000000 0.3204 ns/op 0 B/op 0 allocs/op
BenchmarkFoo_1_1-8 67101626 16.58 ns/op 8 B/op 1 allocs/op
BenchmarkFoo_2_2-8 27375050 42.42 ns/op 32 B/op 2 allocs/op
As you can see, the allocations per function call is the same what we pass as the allocs argument. The allocated memory is the expected allocs * size * 8 bytes.
Note that the reported allocations per op is an integer value (it's the result of an integer division), so if the benchmarked function only occasionally allocates, it might not be reported in the integer result. For details, see Output from benchmem.
Like in this example:
var x []int64
func bar() {
if rand.Float64() < 0.3 {
x = make([]int64, 10)
}
}
This bar() function does 1 allocation with 30% probability (and none with 70% probability), which means on average it does 0.3 allocations. Benchmarking it:
func BenchmarkBar(b *testing.B) {
for i := 0; i < b.N; i++ {
bar()
}
}
Output is:
BenchmarkBar-8 38514928 29.60 ns/op 24 B/op 0 allocs/op
We can see there is 24 bytes allocation (0.3 * 10 * 8 bytes), which is correct, but the reported allocations per op is 0.
Luckily for us, we can also benchmark a function from our main app using the testing.Benchmark() function. It returns a testing.BenchmarkResult including all details about memory usage. We have access to the total number of allocations and to the number of iterations, so we can calculate allocations per op using floating point numbers:
func main() {
rand.Seed(time.Now().UnixNano())
tr := testing.Benchmark(BenchmarkBar)
fmt.Println("Allocs/op", tr.AllocsPerOp())
fmt.Println("B/op", tr.AllocedBytesPerOp())
fmt.Println("Precise allocs/op:", float64(tr.MemAllocs)/float64(tr.N))
}
This will output:
Allocs/op 0
B/op 24
Precise allocs/op: 0.3000516369276302
We can see the expected ~0.3 allocations per op.
Now if we go ahead and benchmark your measure_nothing() function:
func BenchmarkNothing(b *testing.B) {
for i := 0; i < b.N; i++ {
measure_nothing(&mem1_before, &mem1_after)
}
}
We get this output:
Allocs/op 0
B/op 11
Precise allocs/op: 0.12182030338389732
As you can see, running the garbage collector twice and reading memory stats twice occasionally needs allocation (~1 out of 10 calls: 0.12 times on average).
How to pass const pointer of large struct to a function or a go channel.
Purpose of this ask is:
Avoid the accidental modification of pointer by the function
Avoid the copy of the struct object while passing to
function/channel
This functionality is very common in C++, C#, Java, but how can we achieve the same in golang?
============== Update 2 ===================
Thank you #zarkams, #mkopriva and #peterSO
It was the compiler optimization causing the same result in both byValue() and byPointer().
Modified the functions byValue() and byPointer() by adding
data.array[0] = reverse(data.array[0]), just to make compiler not to make the functions inline.
func byValue(data Data) int {
data.array[0] = reverse(data.array[0])
return len(data.array)
}
func byPointer(data *Data) int {
data.array[0] = reverse(data.array[0])
return len(data.array)
}
func reverse(s string) string {
runes := []rune(s)
for i, j := 0, len(runes)-1; i < j; i, j = i+1, j-1 {
runes[i], runes[j] = runes[j], runes[i]
}
return string(runes)
}
After that running the benchmarks, passing by pointer was much efficient than passing by value.
C:\Users\anikumar\Desktop\TestGo>go test -bench=.
goos: windows
goarch: amd64
BenchmarkByValue-4 18978 58228 ns/op 3 B/op 1 allocs/op
BenchmarkByPointer-4 40034295 33.1 ns/op 3 B/op 1 allocs/op
PASS
ok _/C_/Users/anikumar/Desktop/TestGo 3.336s
C:\Users\anikumar\Desktop\TestGo>go test -gcflags -N -run=none -bench=.
goos: windows
goarch: amd64
BenchmarkByValue-4 20961 59380 ns/op 3 B/op 1 allocs/op
BenchmarkByPointer-4 31386213 36.5 ns/op 3 B/op 1 allocs/op
PASS
ok _/C_/Users/anikumar/Desktop/TestGo 3.909s
============== Update ===================
Based on feedback from #zerkms, I created a test to find the performance difference between copy by value and copy by the pointer.
package main
import (
"log"
"time"
)
const size = 99999
// Data ...
type Data struct {
array [size]string
}
func main() {
// Preparing large data
var data Data
for i := 0; i < size; i++ {
data.array[i] = "This is really long string"
}
// Starting test
const max = 9999999999
start := time.Now()
for i := 0; i < max; i++ {
byValue(data)
}
elapsed := time.Since(start)
log.Printf("By Value took %s", elapsed)
start = time.Now()
for i := 0; i < max; i++ {
byPointer(&data)
}
elapsed = time.Since(start)
log.Printf("By Pointer took %s", elapsed)
}
func byValue(data Data) int {
data.array[0] = reverse(data.array[0])
return len(data.array)
}
func byPointer(data *Data) int {
data.array[0] = reverse(data.array[0])
return len(data.array)
}
func reverse(s string) string {
runes := []rune(s)
for i, j := 0, len(runes)-1; i < j; i, j = i+1, j-1 {
runes[i], runes[j] = runes[j], runes[i]
}
return string(runes)
}
After 10 iterations of the above program, I did not find any difference in execution time.
C:\Users\anikumar\Desktop\TestGo>TestGo.exe
2020/02/16 15:52:03 By Value took 5.2798936s
2020/02/16 15:52:09 By Pointer took 5.3466306s
C:\Users\anikumar\Desktop\TestGo>TestGo.exe
2020/02/16 15:52:18 By Value took 5.3596692s
2020/02/16 15:52:23 By Pointer took 5.2724685s
C:\Users\anikumar\Desktop\TestGo>TestGo.exe
2020/02/16 15:52:29 By Value took 5.2359938s
2020/02/16 15:52:34 By Pointer took 5.2838676s
C:\Users\anikumar\Desktop\TestGo>TestGo.exe
2020/02/16 15:52:42 By Value took 5.8374936s
2020/02/16 15:52:49 By Pointer took 6.9524342s
C:\Users\anikumar\Desktop\TestGo>TestGo.exe
2020/02/16 15:53:40 By Value took 5.4364867s
2020/02/16 15:53:46 By Pointer took 5.8712875s
C:\Users\anikumar\Desktop\TestGo>TestGo.exe
2020/02/16 15:53:54 By Value took 5.5481591s
2020/02/16 15:54:00 By Pointer took 5.5600314s
C:\Users\anikumar\Desktop\TestGo>TestGo.exe
2020/02/16 15:54:10 By Value took 5.4753771s
2020/02/16 15:54:16 By Pointer took 6.4368084s
C:\Users\anikumar\Desktop\TestGo>TestGo.exe
2020/02/16 15:54:24 By Value took 5.4783356s
2020/02/16 15:54:30 By Pointer took 5.5312314s
C:\Users\anikumar\Desktop\TestGo>TestGo.exe
2020/02/16 15:54:39 By Value took 5.4853542s
2020/02/16 15:54:45 By Pointer took 5.5541164s
C:\Users\anikumar\Desktop\TestGo>TestGo.exe
2020/02/16 15:54:57 By Value took 5.4633856s
2020/02/16 15:55:03 By Pointer took 5.4863226s
Looks like #zerkms is right. It is not because of language, it is because of modern hardware.
Meaningless microbenchmarks produce meaningless results.
In Go, all arguments are passed by value.
For your updated example (TestGo),
$ go version
go version devel +6917529cc6 Sat Feb 15 16:40:12 2020 +0000 linux/amd64
$ go run microbench.go
2020/02/16 13:12:56 By Value took 2.877045229s
2020/02/16 13:12:59 By Pointer took 2.875847918s
$
Go compilers are usually optimizing compilers. For example,
./microbench.go:39:6: can inline byValue
./microbench.go:43:6: can inline byPointer
./microbench.go:26:10: inlining call to byValue
./microbench.go:33:12: inlining call to byPointer
There is no function call overhead. Therefore, there is no difference in execution time.
microbench.go:
package main
import (
"log"
"time"
)
const size = 99999
// Data ...
type Data struct {
array [size]string
}
func main() {
// Preparing large data
var data Data
for i := 0; i < size; i++ {
data.array[i] = "This is really long string"
}
// Starting test
const max = 9999999999
start := time.Now()
for i := 0; i < max; i++ {
byValue(data)
}
elapsed := time.Since(start)
log.Printf("By Value took %s", elapsed)
start = time.Now()
for i := 0; i < max; i++ {
byPointer(&data)
}
elapsed = time.Since(start)
log.Printf("By Pointer took %s", elapsed)
}
func byValue(data Data) int {
return len(data.array)
}
func byPointer(data *Data) int {
return len(data.array)
}
ADDENDUM
Comment: #Anil8753 another thing to note is that Go standard library has a testing package which provides some useful functionality for benchmarking. For example next to your main.go file add a main_test.go file (the file name is important) and add these two benchmarks to it and then from inside the folder run this command go test -run=none -bench=., this will print how many operations were executed, how much time a single operation took, how much memory a single operation required, and how many allocations were required. – mkopriva
Go compilers are usually optimizing compilers. Modern hardware is usually heavily optimized.
For mkopriva's microbenchmark,
$ go test microbench.go mkopriva_test.go -bench=.
BenchmarkByValue-4 1000000000 0.289 ns/op 0 B/op 0 allocs/op
BenchmarkByPointer-4 1000000000 0.575 ns/op 0 B/op 0 allocs/op
$
However, for mkopriva's microbenchmark with a sink,
$ go test microbench.go sink_test.go -bench=.
BenchmarkByValue-4 1000000000 0.576 ns/op 0 B/op 0 allocs/op
BenchmarkByPointer-4 1000000000 0.592 ns/op 0 B/op 0 allocs/op
$
mkopriva_test.go:
package main
import (
"testing"
)
func BenchmarkByValue(b *testing.B) {
var data Data
b.ReportAllocs()
b.ResetTimer()
for n := 0; n < b.N; n++ {
byValue(data)
}
}
func BenchmarkByPointer(b *testing.B) {
var data Data
b.ReportAllocs()
b.ResetTimer()
for n := 0; n < b.N; n++ {
byPointer(&data)
}
}
sink_test.go:
package main
import (
"testing"
)
var banchInt int
func BenchmarkByValue(b *testing.B) {
var data Data
b.ReportAllocs()
b.ResetTimer()
for n := 0; n < b.N; n++ {
banchInt = byValue(data)
}
}
func BenchmarkByPointer(b *testing.B) {
var data Data
b.ReportAllocs()
b.ResetTimer()
for n := 0; n < b.N; n++ {
banchInt = byPointer(&data)
}
}
I think this is a really good question, and I don't know why people have marked it down. (That is, the original question of using a "const pointer" to pass a large struct.)
The simple answer is that Go has no way to indicate that a function (or channel) taking a pointer is not going to modify the thing pointed to. Basically it is up to the creator of the function to document that the function will not modify the structure.
#Anil8753 as you explicitly mention channels I should explain something further. Typically when using a channel you are passing data to another go-routine. If you pass a pointer to the struct then the sender must be careful not to modify the struct after it has been sent (at least while the receiver could be reading it) and vice versa. This would create a data race.
For this reason I typically pass structs by value with channels. If you need to create something in the sender for exclusive use of the receiver then create a struct (on the heap) and send a pointer to it in the channel and never use it again (even assigning nil to the pointer to make this explicit).
#zerkms makes a very good point that before you optimize you should understand what is happening and make measurements. However, in this case there is an obvious performance benefit to not copying memory around. Whether this happens when the struct is 1KB, 1MB, or 1GB there will come a point where you want to pass by "reference" (ie a pointer to the struct) rather than by value (as long as you know the struct won't be modified or don't care if it is).
In theory and in practice copy by value will become very inefficient when the struct becomes large enough or the function is called many times.
I had a task to simulate race conditions in Go. However, I've run into a case, that I am unable to explain. The code snippet below
package main
import (
"fmt"
"sync"
)
var value, totalOps, totalIncOps, totalDecOps int
func main() {
fmt.Println("Total value: ", simulateRacing(10000))
fmt.Print("Total iterations: ", totalOps)
fmt.Print(" of it, increments: ", totalIncOps)
fmt.Print(", decrements: ", totalDecOps)
}
// Function to simulate racing condition
func simulateRacing(iterationsNumber int) int {
value = 0
// Define WaitGroup
var waitGroup sync.WaitGroup
waitGroup.Add(2)
go increaseByOne(iterationsNumber, &waitGroup)
go decreaseByOne(iterationsNumber, &waitGroup)
waitGroup.Wait()
return value
}
// Function to do N iterations, each time increasing value by 1
func increaseByOne(N int, waitGroup *sync.WaitGroup) {
for i := 0; i < N; i++ {
value++
// Collecting stats
totalOps++
totalIncOps++
}
waitGroup.Done()
}
// Same with decrease
func decreaseByOne(N int, waitGroup *sync.WaitGroup) {
for i := 0; i < N; i++ {
value--
// Collecting stats
totalOps++
totalDecOps++
}
waitGroup.Done()
}
In my understanding, it should produce consistent (deterministic) result each time, since we are doing the same number of increments and decrements, with a WaitGroup making sure both functions will execute.
However, each time output is different, with only increments and decrements counters staying the same.
Total value: 2113
Total iterations: 17738 of it, increments: 10000, decrements: 10000
and
Total value: 35
Total iterations: 10741 of it, increments: 10000, decrements: 10000
Maybe you can help me to explain this behaviour? Why total iterations counter and value itself is non-deterministic?
That's a classical example of race condition. value++ is not an atomic operation, so there are no guarantees that it will work correctly or deterministically when called from multiple threads without synchronization.
To give some intuition, value++ is more or less equivalent to value = value + 1. You can think of it as three operations, not one: load value from memory to a CPU register, increase value in the register (you cannot modify memory directly), store the value back to the memory. Two threads may load the same value simultaneously, increase it, get the same result, and then write it back, so it effectively increases value by 1, not two.
As order of operations between threads is non-deterministic, result is also non-deterministic.
The same effect happens with totalOps. However, totalIncOps and totalDecOps are only ever modified/read by a single thread, so there is no race here and their end values are deterministic.
because the operations on the variables value, totalOps, totalIncOps and totalDecOps are not locked
Adding a mutex should help. The Go race detector feature would find this fault
var m sync.Mutex
func increaseByOne(N int, waitGroup *sync.WaitGroup) {
for i := 0; i < N; i++ {
m.Lock()
value++
// Collecting stats
totalOps++
totalIncOps++
m.Unlock()
}
waitGroup.Done()
}
// Same with decrease
func decreaseByOne(N int, waitGroup *sync.WaitGroup) {
for i := 0; i < N; i++ {
m.Lock()
value--
// Collecting stats
totalOps++
totalDecOps++
m.Unlock()
}
waitGroup.Done()
}
An alternative to the above would be to use Sync.Atomic for the counters
Hi while doing some exercises I've came across this question...
Lets say you have a map with the capacity of 100,000.
Which value is the most efficient to fill the whole map in the least amount of time?
I've ran some benchmarks on my own trying out most of the types I could think of and the resulting top list is:
Benchmark_Struct-8 200 6010422 ns/op (struct{}{})
Benchmark_Byte-8 200 6167230 ns/op (byte = 0)
Benchmark_Int-8 200 6112927 ns/op (int8 = 0)
Benchmark_Bool-8 200 6117155 ns/op (bool = false)
Example function:
func Struct() {
m := make(map[int]struct{}, 100000)
for i := 0; i < 100000; i++ {
m[i] = struct{}{}
}
}
As you can see the fastest one (most of the time) is type struct{}{} - empty struct.
But why is this the case in go?
Is there a faster/lighter nil or non-nil value?
- Thank you for your time :)
Theoretically, struct{}{} should be the most efficient because it requires no memory. In practice, a) results may vary between Go versions, operating systems, and system architectures; and b) I can't think of any case where maximizing the execution-time efficiency of empty values is relevant.
as we know there are two ways to initialize a map (as listed below). I'm wondering if there is any performance difference between the two approaches.
var myMap map[string]int
then
myMap = map[string]int{}
vs
myMap = make(map[string]int)
On my machine they appear to be about equivalent.
You can easily make a benchmark test to compare. For example:
package bench
import "testing"
var result map[string]int
func BenchmarkMakeLiteral(b *testing.B) {
var m map[string]int
for n := 0; n < b.N; n++ {
m = InitMapLiteral()
}
result = m
}
func BenchmarkMakeMake(b *testing.B) {
var m map[string]int
for n := 0; n < b.N; n++ {
m = InitMapMake()
}
result = m
}
func InitMapLiteral() map[string]int {
return map[string]int{}
}
func InitMapMake() map[string]int {
return make(map[string]int)
}
Which on 3 different runs yielded results that are close enough to be insignificant:
First Run
$ go test -bench=.
testing: warning: no tests to run
PASS
BenchmarkMakeLiteral-8 10000000 160 ns/op
BenchmarkMakeMake-8 10000000 171 ns/op
ok github.com/johnweldon/bench 3.664s
Second Run
$ go test -bench=.
testing: warning: no tests to run
PASS
BenchmarkMakeLiteral-8 10000000 182 ns/op
BenchmarkMakeMake-8 10000000 173 ns/op
ok github.com/johnweldon/bench 3.945s
Third Run
$ go test -bench=.
testing: warning: no tests to run
PASS
BenchmarkMakeLiteral-8 10000000 170 ns/op
BenchmarkMakeMake-8 10000000 170 ns/op
ok github.com/johnweldon/bench 3.751s
When allocating empty maps there is no difference but with make you can pass second parameter to pre-allocate space in map. This will save a lot of reallocations when maps are being populated.
Benchmarks
package maps
import "testing"
const SIZE = 10000
func fill(m map[int]bool, size int) {
for i := 0; i < size; i++ {
m[i] = true
}
}
func BenchmarkEmpty(b *testing.B) {
for n := 0; n < b.N; n++ {
m := make(map[int]bool)
fill(m, SIZE)
}
}
func BenchmarkAllocated(b *testing.B) {
for n := 0; n < b.N; n++ {
m := make(map[int]bool, 2*SIZE)
fill(m, SIZE)
}
}
Results
go test -benchmem -bench .
BenchmarkEmpty-8 500 2988680 ns/op 431848 B/op 625 allocs/op
BenchmarkAllocated-8 1000 1618251 ns/op 360949 B/op 11 allocs/op
A year ago I actually stumped on the fact that using make with explicitly allocated space is better then using map literal if your values are not static
So doing
return map[string]float {
"key1": SOME_COMPUTED_ABOVE_VALUE,
"key2": SOME_COMPUTED_ABOVE_VALUE,
// more keys here
"keyN": SOME_COMPUTED_ABOVE_VALUE,
}
is slower then
// some code above
result := make(map[string]float, SIZE) // SIZE >= N
result["key1"] = SOME_COMPUTED_ABOVE_VALUE
result["key2"] = SOME_COMPUTED_ABOVE_VALUE
// more keys here
result["keyN"] = SOME_COMPUTED_ABOVE_VALUE
return result
for N which are quite big (N=300 in my use case).
The reason is the compiler fails to understand that one needs to allocate at least N slots in the first case.
I wrote a blog post about it
https://trams.github.io/golang-map-literal-performance/
and I reported a bug to the community
https://github.com/golang/go/issues/43020
As of golang 1.17 it is still an issue.