Confusing results from golang benchmarking of function and go routine call overhead

Confusing results from golang benchmarking of function and go routine call overhead - go

Out of curiosity, I am trying to understand what the function and go routine call overhead is for golang. I therefore wrote the benchmarks below giving the results below that. The result for BenchmarkNestedFunctions confuses me as it seems far too high so I naturally assume I have done something wrong. I was expecting the BenchmarkNestedFunctions to be slightly higher than the BenchmarkNopFunc and very close to the BenchmarkSplitNestedFunctions. Please can anyone suggest what I may be either not understanding or doing wrong.
package main
import (
"testing"
)
// Intended to allow me to see the iteration overhead being used in the benchmarking
func BenchmarkTestLoop(b *testing.B) {
for i := 0; i < b.N; i++ {
}
}
//go:noinline
func nop() {
}
// Intended to allow me to see the overhead from making a do nothing function call which I hope is not being optimised out
func BenchmarkNopFunc(b *testing.B) {
for i := 0; i < b.N; i++ {
nop()
}
}
// Intended to allow me to see the added cost from creating a channel, closing it and then reading from it
func BenchmarkChannelMakeCloseRead(b *testing.B) {
for i := 0; i < b.N; i++ {
done := make(chan struct{})
close(done)
_, _ = <-done
}
}
//go:noinline
func nestedfunction(n int, done chan<- struct{}) {
n--
if n > 0 {
nestedfunction(n, done)
} else {
close(done)
}
}
// Intended to allow me to see the added cost of making 1 function call doing a set of channel operations for each call
func BenchmarkUnnestedFunctions(b *testing.B) {
for i := 0; i < b.N; i++ {
done := make(chan struct{})
nestedfunction(1, done)
_, _ = <-done
}
}
// Intended to allow me to see the added cost of repeated nested calls and stack growth with an upper limit on the call depth to allow examination of a particular stack size
func BenchmarkNestedFunctions(b *testing.B) {
// Max number of nested function calls to prevent excessive stack growth
const max int = 200000
if b.N > max {
b.N = max
}
done := make(chan struct{})
nestedfunction(b.N, done)
_, _ = <-done
}
// Intended to allow me to see the added cost of repeated nested call with any stack reuse the runtime supports (presuming it doesn't free and the realloc the stack as it grows)
func BenchmarkSplitNestedFunctions(b *testing.B) {
// Max number of nested function calls to prevent excessive stack growth
const max int = 200000
for i := 0; i < b.N; i += max {
done := make(chan struct{})
if (b.N - i) > max {
nestedfunction(max, done)
} else {
nestedfunction(b.N-i, done)
}
_, _ = <-done
}
}
// Intended to allow me to see the added cost of spinning up a go routine to perform comparable useful work as the nested function calls
func BenchmarkNestedGoRoutines(b *testing.B) {
done := make(chan struct{})
go nestedgoroutines(b.N, done)
_, _ = <-done
}
The benchmarks are invoked as follows:
$ go test -bench=. -benchmem -benchtime=200ms
goos: windows
goarch: amd64
pkg: golangbenchmarks
cpu: AMD Ryzen 9 3900X 12-Core Processor
BenchmarkTestLoop-24 1000000000 0.2247 ns/op 0 B/op 0 allocs/op
BenchmarkNopFunc-24 170787386 1.402 ns/op 0 B/op 0 allocs/op
BenchmarkChannelMakeCloseRead-24 3990243 52.72 ns/op 96 B/op 1 allocs/op
BenchmarkUnnestedFunctions-24 4791862 58.63 ns/op 96 B/op 1 allocs/op
BenchmarkNestedFunctions-24 200000 50.11 ns/op 0 B/op 0 allocs/op
BenchmarkSplitNestedFunctions-24 155160835 1.528 ns/op 0 B/op 0 allocs/op
BenchmarkNestedGoRoutines-24 636734 412.2 ns/op 24 B/op 1 allocs/op
PASS
ok golangbenchmarks 1.700s
The BenchmarkTestLoop, BenchmarkNopFunc and BenchmarkSplitNestedFunctions results seem reasonably consistent with each other and make sense, the BenchmarkSplitNestedFunctions is doing more work than the BenchmarkNopFunc on average per benchmark operation but not by much because the expensive BenchmarkChannelMakeCloseRead operation is only done about once every 200,000 benchmarking operations.
Similarly the BenchmarkChannelMakeCloseRead and BenchmarkUnnestedFunctions results seem consistent since each BenchmarkUnnestedFunctions is doing slightly more than each BenchmarkChannelMakeCloseRead if only by a decrement and if test which is potentially causing a branch prediction failure (although I would have hoped the branch predicter would have been able to use the last branch result, but I don't know how complex the close function implementation is which may be overwhelming the branch history)
However BenchmarkNestedFunctions and BenchmarkSplitNestedFunctions are radically different and I don't understand why. There should be similar with the only intentional difference being any grown stack re-use and I did not expect the stack growth cost to be nearly so high (or is that the explanation and it is just co-incidence that result is so similar to the BenchmarkChannelMakeCloseRead result making me think it is not actually doing what I thought it was?)
It should also be noted that the BenchmarkSplitNestedFunctions result can occasionally take significantly different values; I have seen a few values in the range of 10 to 200 ns/op when running it repeatedly. It can also fail to report any result ns/op time while still passing when I run it; I have no idea what is going on there:
BenchmarkChannelMakeCloseRead-24 5724488 54.26 ns/op 96 B/op 1 allocs/op
BenchmarkUnnestedFunctions-24 3992061 57.49 ns/op 96 B/op 1 allocs/op
BenchmarkNestedFunctions-24 200000 0 B/op 0 allocs/op
BenchmarkNestedFunctions2-24 154956972 1.590 ns/op 0 B/op 0 allocs/op
BenchmarkNestedGoRoutines-24 1000000 342.1 ns/op 24 B/op 1 allocs/op
If anyone can point out my mistake in the benchmark / my interpretation of the results and explain what is really happening then that would be greatly appreciated
Background info:
Stack growth and function inlining: https://dave.cheney.net/2020/04/25/inlining-optimisations-in-go
Stack growth limitations: https://dave.cheney.net/2013/06/02/why-is-a-goroutines-stack-infinite
Golang stack structure: https://blog.cloudflare.com/how-stacks-are-handled-in-go/
Branch prediction: https://en.wikipedia.org/wiki/Branch_predictor
Top level 3900X architecture overview: https://www.techpowerup.com/review/amd-ryzen-9-3900x/3.html
3900X branch prediction history/buffer size 16/512/7k: https://www.techpowerup.com/review/amd-ryzen-9-3900x/images/arch3.jpg

Related

Measure heap growth accurately

I am trying to measure the evolution of the number of heap-allocated objects before and after I call a function. I am forcing runtime.GC() and using runtime.ReadMemStats to measure the number of heap objects I have before and after.
The problem I have is that I sometimes see unexpected heap growth. And it is different after each run.
A simple example below, where I would always expect to see a zero heap-objects growth.
https://go.dev/play/p/FBWfXQHClaG
var mem1_before, mem2_before, mem1_after, mem2_after runtime.MemStats
func measure_nothing(before, after *runtime.MemStats) {
runtime.GC()
runtime.ReadMemStats(before)
runtime.GC()
runtime.ReadMemStats(after)
}
func main() {
measure_nothing(&mem1_before, &mem1_after)
measure_nothing(&mem2_before, &mem2_after)
log.Printf("HeapObjects diff = %d", int64(mem1_after.HeapObjects-mem1_before.HeapObjects))
log.Printf("HeapAlloc diff %d", int64(mem1_after.HeapAlloc-mem1_before.HeapAlloc))
log.Printf("HeapObjects diff = %d", int64(mem2_after.HeapObjects-mem2_before.HeapObjects))
log.Printf("HeapAlloc diff %d", int64(mem2_after.HeapAlloc-mem2_before.HeapAlloc))
}
Sample output:
2009/11/10 23:00:00 HeapObjects diff = 0
2009/11/10 23:00:00 HeapAlloc diff 0
2009/11/10 23:00:00 HeapObjects diff = 4
2009/11/10 23:00:00 HeapAlloc diff 1864
Is what I'm trying to do unpractical? I assume the runtime is doing things that allocate/free heap-memory. Can I tell it to stop to make my measurements? (this is for a test checking for memory leaks, not production code)

You can't predict what garbage collection and reading all the memory stats require in the background. Calling those to calculate memory allocations and usage is not a reliable way.
Luckily for us, Go's testing framework can monitor and calculate memory usage.
So what you should do is write a benchmark function and let the testing framework do its job to report memory allocations and usage.
Let's assume we want to measure this foo() function:
var x []int64
func foo(allocs, size int) {
for i := 0; i < allocs; i++ {
x = make([]int64, size)
}
}
All it does is allocate a slice of the given size, and it does this with the given number of times (allocs).
Let's write benchmarking functions for different scenarios:
func BenchmarkFoo_0_0(b *testing.B) {
for i := 0; i < b.N; i++ {
foo(0, 0)
}
}
func BenchmarkFoo_1_1(b *testing.B) {
for i := 0; i < b.N; i++ {
foo(1, 1)
}
}
func BenchmarkFoo_2_2(b *testing.B) {
for i := 0; i < b.N; i++ {
foo(2, 2)
}
}
Running the benchmark with go test -bench . -benchmem, the output is:
BenchmarkFoo_0_0-8 1000000000 0.3204 ns/op 0 B/op 0 allocs/op
BenchmarkFoo_1_1-8 67101626 16.58 ns/op 8 B/op 1 allocs/op
BenchmarkFoo_2_2-8 27375050 42.42 ns/op 32 B/op 2 allocs/op
As you can see, the allocations per function call is the same what we pass as the allocs argument. The allocated memory is the expected allocs * size * 8 bytes.
Note that the reported allocations per op is an integer value (it's the result of an integer division), so if the benchmarked function only occasionally allocates, it might not be reported in the integer result. For details, see Output from benchmem.
Like in this example:
var x []int64
func bar() {
if rand.Float64() < 0.3 {
x = make([]int64, 10)
}
}
This bar() function does 1 allocation with 30% probability (and none with 70% probability), which means on average it does 0.3 allocations. Benchmarking it:
func BenchmarkBar(b *testing.B) {
for i := 0; i < b.N; i++ {
bar()
}
}
Output is:
BenchmarkBar-8 38514928 29.60 ns/op 24 B/op 0 allocs/op
We can see there is 24 bytes allocation (0.3 * 10 * 8 bytes), which is correct, but the reported allocations per op is 0.
Luckily for us, we can also benchmark a function from our main app using the testing.Benchmark() function. It returns a testing.BenchmarkResult including all details about memory usage. We have access to the total number of allocations and to the number of iterations, so we can calculate allocations per op using floating point numbers:
func main() {
rand.Seed(time.Now().UnixNano())
tr := testing.Benchmark(BenchmarkBar)
fmt.Println("Allocs/op", tr.AllocsPerOp())
fmt.Println("B/op", tr.AllocedBytesPerOp())
fmt.Println("Precise allocs/op:", float64(tr.MemAllocs)/float64(tr.N))
}
This will output:
Allocs/op 0
B/op 24
Precise allocs/op: 0.3000516369276302
We can see the expected ~0.3 allocations per op.
Now if we go ahead and benchmark your measure_nothing() function:
func BenchmarkNothing(b *testing.B) {
for i := 0; i < b.N; i++ {
measure_nothing(&mem1_before, &mem1_after)
}
}
We get this output:
Allocs/op 0
B/op 11
Precise allocs/op: 0.12182030338389732
As you can see, running the garbage collector twice and reading memory stats twice occasionally needs allocation (~1 out of 10 calls: 0.12 times on average).

Golang benchmark: why does allocs/op show 0 B/op?

Here is a code snippet for benchmark:
// bench_test.go
package main
import (
"testing"
)
func BenchmarkHello(b *testing.B) {
for i := 0; i < b.N; i++ {
a := 1
a++
}
}
The metric allocs/op shows 0 B/op. variable a is an int type and doesn't take too much memory, but it should not take zero B.
> go test -bench=. -benchmem
goos: darwin
goarch: amd64
pkg: a
BenchmarkHello-4 2000000000 0.26 ns/op 0 B/op 0 allocs/op
PASS
ok a 0.553s

Why is this metric allocs/ops zero?
package main
import (
"testing"
)
func BenchmarkHello(b *testing.B) {
for i := 0; i < b.N; i++ {
a := 1
a++
}
}
The allocs/ops average only counts heap allocations, not stack allocations.
The allocs/ops average is rounded down to the nearest integer value.
The Go gc compiler is an optimizing compiler. Since
{
a := 1
a++
}
doesn't accomplish anything, it is elided.

The benchmark tool only reports heap allocations. Stack allocations via escape analysis are less costly, possibly free, so are not reported.
Reference
Why is this simple benchmark showing zero allocations?

Benchmark with Goroutines

Pretty new to the Golang here and bumped into a problem when benchmarking with goroutines.
The code I have is here:
type store struct{}
func (n *store) WriteSpan(span interface{}) error {
return nil
}
func smallTest(times int, b *testing.B) {
writer := store{}
var wg sync.WaitGroup
numGoroutines := times
wg.Add(numGoroutines)
b.ResetTimer()
b.ReportAllocs()
for n := 0; n < numGoroutines; n++ {
go func() {
writer.WriteSpan(nil)
wg.Done()
}()
}
wg.Wait()
}
func BenchmarkTest1(b *testing.B) {
smallTest(1000000, b)
}
func BenchmarkTest2(b *testing.B) {
smallTest(10000000, b)
}
It looks to me the runtime and allocation for both scenario should be similar, but running them gives me the following results which are vastly different. Wonder why this happens? Where do those extra allocations come from?
BenchmarkTest1-12 1000000000 0.26 ns/op 0 B/op 0 allocs/op
BenchmarkTest2-12 1 2868129398 ns/op 31872 B/op 83 allocs/op
PASS
I also notice If I add a inner loop to writeSpan multiple times, the runtime and allocation kind of relates to the numGoroutines * multiple times. If this is not the way how people benchmark with goroutines, are there any other standard ways to test? Thanks in advance.

Meaningless microbenchmarks produce meaningless results.
If this is not the way how people benchmark with goroutines, are there
any other standard ways to test?
It's not the way to benchmark anything. Benchmark real problems.
You run a very large number of goroutines, which do nothing, until you saturate the scheduler, the machine, and other resources. That merely proves that if you run anything enough times you can bring a machine to its knees.

go maps non-performant for large number of keys

I discovered very strange behaviour with go maps recently. The use case is to create a group of integers and have O(1) check for IsMember(id int).
The current implementation is :
func convertToMap(v []int64) map[int64]void {
out := make(map[int64]void, len(v))
for _, i := range v {
out[i] = void{}
}
return out
}
type Group struct {
members map[int64]void
}
type void struct{}
func (g *Group) IsMember(input string) (ok bool) {
memberID, _ := strconv.ParseInt(input, 10, 64)
_, ok = g.members[memberID]
return
}
When i benchmark the IsMember method, until 6 million members, everything looks fine. But above that the map look up is taking 1 second for each lookup!!
The benchmark test:
func BenchmarkIsMember(b *testing.B) {
b.ReportAllocs()
b.ResetTimer()
g := &Group{}
g.members = convertToMap(benchmarkV)
for N := 0; N < b.N && N < sizeOfGroup; N++ {
g.IsMember(benchmarkKVString[N])
}
}
var benchmarkV, benchmarkKVString = func(size int) ([]int64, []string{
v := make([]int64, size)
s := make([]string, size)
for i := range v {
val := rand.Int63()
v[i] = val
s[i] = strconv.FormatInt(val, 10)
}
return v, s
}(sizeOfGroup)
Benchmark numbers:
const sizeOfGroup = 6000000
BenchmarkIsMember-8 2000000 568 ns/op 50 B/op 0 allocs/op
const sizeOfGroup = 6830000
BenchmarkIsMember-8 1 1051725455 ns/op 178767208 B/op 25 allocs/op
Anything above group size of 6.8 million gives the same result.
Can someone help me to explain why this is happening, and can anything be done to make this performant while still using maps?
Also, i dont understand why so much memory is being allocated? Even if the time taken is due to collision and then linked list traversal, there shouldn't be any mem allocation, is my thought process wrong?

No need to measure extra allocation for converting slice to map because we just want to measure the lookup operation.
I've slightly modify the benchmark:
func BenchmarkIsMember(b *testing.B) {
fn := func(size int) ([]int64, []string) {
v := make([]int64, size)
s := make([]string, size)
for i := range v {
val := rand.Int63()
v[i] = val
s[i] = strconv.FormatInt(val, 10)
}
return v, s
}
for _, size := range []int{
6000000,
6800000,
6830000,
60000000,
} {
b.Run(fmt.Sprintf("size=%d", size), func(b *testing.B) {
var benchmarkV, benchmarkKVString = fn(size)
g := &deltaGroup{}
g.members = convertToMap(benchmarkV)
b.ReportAllocs()
b.ResetTimer()
for N := 0; N < b.N && N < size; N++ {
g.IsMember(benchmarkKVString[N])
}
})
}
}
And got the following results:
go test ./... -bench=. -benchtime=10s -cpu=1
goos: linux
goarch: amd64
pkg: trash
BenchmarkIsMember/size=6000000 2000000000 0.55 ns/op 0 B/op 0 allocs/op
BenchmarkIsMember/size=6800000 1000000000 1.27 ns/op 0 B/op 0 allocs/op
BenchmarkIsMember/size=6830000 1000000000 1.23 ns/op 0 B/op 0 allocs/op
BenchmarkIsMember/size=60000000 100000000 136 ns/op 0 B/op 0 allocs/op
PASS
ok trash 167.578s
Degradation isn't so significant as in your example.

Performance of function slice parameter vs global variable?

I've got the following function:
func checkFiles(path string, excludedPatterns []string) {
// ...
}
I'm wondering, since excludedPatterns never changes, should I optimize it by making the var global (and not passing it to the function every time), or does Golang already handle this by passing them as copy-on-write?
Edit: I guess I could pass the slice as a pointer, but I'm still wondering about the copy-on-write behavior (if it exists) and whether, in general, I should worry about passing by value or by pointer.

Judging from the name of your function, performance can't be that critical to even consider moving parameters to global variables just to save time/space required to pass them as parameters (IO operations like checking files are much-much slower than calling functions and passing values to them).
Slices in Go are just small descriptors, something like a struct with a pointer to a backing array and 2 ints, a length and capacity. No matter how big the backing array is, passing slices are always efficient and you shouldn't even consider passing a pointer to them, unless you want to modify the slice header of course.
Parameters in Go are always passed by value, and a copy of the value being passed is made. If you pass a pointer, then the pointer value will be copied and passed. When a slice is passed, the slice value (which is a small descriptor) will be copied and passed - which will point to the same backing array (which will not be copied).
Also if you need to access the slice multiple times in the function, a parameter is usually an extra gain as compilers can make further optimization / caching, while if it is a global variable, more care has to be taken.
More about slices and their internals: Go Slices: usage and internals
And if you want exact numbers on performance, benchmark!
Here comes a little benchmarking code which shows no difference between the 2 solutions (passing slice as argument or accessing a global slice). Save it into a file like slices_test.go and run it with go test -bench .
package main
import (
"testing"
)
var gslice = make([]string, 1000)
func global(s string) {
for i := 0; i < 100; i++ { // Cycle to access slice may times
_ = s
_ = gslice // Access global-slice
}
}
func param(s string, ss []string) {
for i := 0; i < 100; i++ { // Cycle to access slice may times
_ = s
_ = ss // Access parameter-slice
}
}
func BenchmarkParameter(b *testing.B) {
for i := 0; i < b.N; i++ {
param("hi", gslice)
}
}
func BenchmarkGlobal(b *testing.B) {
for i := 0; i < b.N; i++ {
global("hi")
}
}
Example output:
testing: warning: no tests to run
PASS
BenchmarkParameter-2 30000000 55.4 ns/op
BenchmarkGlobal-2 30000000 55.1 ns/op
ok _/V_/workspace/IczaGo/src/play 3.569s

Piggybacking on #icza's excellent answer, there is another way to pass a slice as parameter: a pointer to a slice.
When you need to modify the underlying slice, the global variable slice works, but passing it as a parameter does not work, you are effectively working with a copy. To mitigate that, one can actually pass the slice as a pointer.
Interesting enough, it's actually faster than accessing a global variable:
package main
import (
"testing"
)
var gslice = make([]string, 1000000)
func global(s string) {
for i := 0; i < 100; i++ { // Cycle to access slice may times
_ = s
_ = gslice // Access global-slice
}
}
func param(s string, ss []string) {
for i := 0; i < 100; i++ { // Cycle to access slice may times
_ = s
_ = ss // Access parameter-slice
}
}
func paramPointer(s string, ss *[]string) {
for i := 0; i < 100; i++ { // Cycle to access slice may times
_ = s
_ = ss // Access parameter-slice
}
}
func BenchmarkParameter(b *testing.B) {
for i := 0; i < b.N; i++ {
param("hi", gslice)
}
}
func BenchmarkParameterPointer(b *testing.B) {
for i := 0; i < b.N; i++ {
paramPointer("hi", &gslice)
}
}
func BenchmarkGlobal(b *testing.B) {
for i := 0; i < b.N; i++ {
global("hi")
}
}
Results:
goos: darwin
goarch: amd64
pkg: untitled
BenchmarkParameter-8 24437403 48.2 ns/op
BenchmarkParameterPointer-8 27483115 40.3 ns/op
BenchmarkGlobal-8 25631470 46.0 ns/op

I rewrote the benchmark so you can compare the results.
As you can see the ParameterPointer bench start to get ahead after 10 records. Which is very interesting.
package slices_bench
import (
"testing"
)
var gslice = make([]string, 1000)
func global(s string) {
for i := 0; i < 100; i++ { // Cycle to access slice may times
_ = s
_ = gslice // Access global-slice
}
}
func param(s string, ss []string) {
for i := 0; i < 100; i++ { // Cycle to access slice may times
_ = s
_ = ss // Access parameter-slice
}
}
func paramPointer(s string, ss *[]string) {
for i := 0; i < 100; i++ { // Cycle to access slice may times
_ = s
_ = ss // Access parameter-slice
}
}
func BenchmarkPerformance(b *testing.B){
fixture := []struct {
desc string
records int
}{
{
desc: "1 record",
records: 1,
},
{
desc: "10 records",
records: 10,
},
{
desc: "100 records",
records: 100,
},
{
desc: "1000 records",
records: 1000,
},
{
desc: "10000 records",
records: 10000,
},
{
desc: "100000 records",
records: 100000,
},
}
tests := []struct {
desc string
fn func(b *testing.B, n int)
}{
{
desc: "ParameterPointer",
fn: func(b *testing.B, n int) {
for j := 0; j < n; j++ {
paramPointer("hi", &gslice)
}
},
},
{
desc: "Parameter",
fn: func(b *testing.B, n int) {
for j := 0; j < n; j++ {
param("hi", gslice)
}
},
},
{
desc: "Global",
fn: func(b *testing.B, n int) {
for j := 0; j < n; j++ {
global("hi")
}
},
},
}
for _, t := range tests {
b.Run(t.desc, func(b *testing.B) {
for _, f := range fixture {
b.Run(f.desc, func(b *testing.B) {
b.ReportAllocs()
b.ResetTimer()
for i := 0; i < b.N; i++ {
t.fn(b, f.records)
}
})
}
})
}
}
Results:
goos: windows
goarch: amd64
pkg: benchs/slices-bench
cpu: Intel(R) Core(TM) i7-10700 CPU # 2.90GHz
BenchmarkPerformance
BenchmarkPerformance/ParameterPointer
BenchmarkPerformance/ParameterPointer/1_record
BenchmarkPerformance/ParameterPointer/1_record-16 38661910 31.18 ns/op 0 B/op 0 allocs/op
BenchmarkPerformance/ParameterPointer/10_records
BenchmarkPerformance/ParameterPointer/10_records-16 4160023 288.4 ns/op 0 B/op 0 allocs/op
BenchmarkPerformance/ParameterPointer/100_records
BenchmarkPerformance/ParameterPointer/100_records-16 445131 2748 ns/op 0 B/op 0 allocs/op
BenchmarkPerformance/ParameterPointer/1000_records
BenchmarkPerformance/ParameterPointer/1000_records-16 43876 27380 ns/op 0 B/op 0 allocs/op
BenchmarkPerformance/ParameterPointer/10000_records
BenchmarkPerformance/ParameterPointer/10000_records-16 4441 273922 ns/op 0 B/op 0 allocs/op
BenchmarkPerformance/ParameterPointer/100000_records
BenchmarkPerformance/ParameterPointer/100000_records-16 439 2739282 ns/op 0 B/op 0 allocs/op
BenchmarkPerformance/Parameter
BenchmarkPerformance/Parameter/1_record
BenchmarkPerformance/Parameter/1_record-16 39860619 30.79 ns/op 0 B/op 0 allocs/op
BenchmarkPerformance/Parameter/10_records
BenchmarkPerformance/Parameter/10_records-16 4152728 288.6 ns/op 0 B/op 0 allocs/op
BenchmarkPerformance/Parameter/100_records
BenchmarkPerformance/Parameter/100_records-16 445634 2757 ns/op 0 B/op 0 allocs/op
BenchmarkPerformance/Parameter/1000_records
BenchmarkPerformance/Parameter/1000_records-16 43618 27496 ns/op 0 B/op 0 allocs/op
BenchmarkPerformance/Parameter/10000_records
BenchmarkPerformance/Parameter/10000_records-16 4450 273960 ns/op 0 B/op 0 allocs/op
BenchmarkPerformance/Parameter/100000_records
BenchmarkPerformance/Parameter/100000_records-16 435 2739053 ns/op 0 B/op 0 allocs/op
BenchmarkPerformance/Global
BenchmarkPerformance/Global/1_record
BenchmarkPerformance/Global/1_record-16 38813095 30.97 ns/op 0 B/op 0 allocs/op
BenchmarkPerformance/Global/10_records
BenchmarkPerformance/Global/10_records-16 4148433 288.4 ns/op 0 B/op 0 allocs/op
BenchmarkPerformance/Global/100_records
BenchmarkPerformance/Global/100_records-16 429274 2758 ns/op 0 B/op 0 allocs/op
BenchmarkPerformance/Global/1000_records
BenchmarkPerformance/Global/1000_records-16 43591 27412 ns/op 0 B/op 0 allocs/op
BenchmarkPerformance/Global/10000_records
BenchmarkPerformance/Global/10000_records-16 4521 274420 ns/op 0 B/op 0 allocs/op
BenchmarkPerformance/Global/100000_records
BenchmarkPerformance/Global/100000_records-16 436 2751042 ns/op 0 B/op 0 allocs/op

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Confusing results from golang benchmarking of function and go routine call overhead - go

Related

Measure heap growth accurately

Golang benchmark: why does allocs/op show 0 B/op?

Benchmark with Goroutines

go maps non-performant for large number of keys

Performance of function slice parameter vs global variable?

Categories

Resources