what's difference between make and initialize struct in golang? - go

We can make channel by make function, new an object by {} expression.
ch := make(chan interface{})
o := struct{}{}
But, what's difference between make and {} to new a map?
m0 := make(map[int]int)
m1 := map[int]int{}

make can be used to initialize a map with preallocated space. It takes an optional second parameter.
m0 := make(map[int]int, 1000) // allocateds space for 1000 entries
Allocation takes cpu time. If you know how many entries there will be in the map you can preallocate space to all of them. This reduces execution time. Here is a program you can run to verify this.
package main
import "fmt"
import "testing"
func BenchmarkWithMake(b *testing.B) {
m0 := make(map[int]int, b.N)
for i := 0; i < b.N; i++ {
m0[i] = 1000
}
}
func BenchmarkWithLitteral(b *testing.B) {
m1 := map[int]int{}
for i := 0; i < b.N; i++ {
m1[i] = 1000
}
}
func main() {
bwm := testing.Benchmark(BenchmarkWithMake)
fmt.Println(bwm) // gives 176 ns/op
bwl := testing.Benchmark(BenchmarkWithLitteral)
fmt.Println(bwl) // gives 259 ns/op
}

From the docs for the make keyword:
Map: An initial allocation is made according to the size but the
resulting map has length 0. The size may be omitted, in which case a
small starting size is allocated.
So, in the case of maps, there is no difference between using make and using an empty map literal.

Related

If I know the max size of many tmp slices, should I set capacity when creating them?

If I need to use tmp slices in a function and the function will be called many times, their max capacity will not exceed 10. But the length of them are varied. Just for example, maybe 80% of them only have size of 1. 10% of them have size 3 and 10% of them have size 10.
I can think of an example function like the following:
func getDataFromDb(s []string) []string {
tmpSlice := make([]string, 0, 10)
for _, v := range s {
if check(v) {
tmpSlice = append(tmpSlice, v)
}
}
......
return searchDb(tmpSlice)
}
So should I do var tmpSlice []string, tmpSlice := make([]string, 0, 0), tmpSlice := make([]string, 0, 5), or tmpSlice := make([]string, 0, 10)? or any other suggestions?
Fastest would be if code doesn't allocate on the heap.
Create variables that allocate on the stack and do no escape (pass variables by value, otherwise they will escape).
Escaping you can check by adding -gcflags "-m -l" on building.
Here is an example that shows if we substitute slice with array and pass it by value, it results in fast code without allocation (on the heap).
package main
import "testing"
func BenchmarkAllocation(b *testing.B) {
b.Run("Slice", func(b2 *testing.B) {
for i := 0; i < b2.N; i++ {
_ = getDataFromDbSlice([]string{"one", "two"})
}
})
b.Run("Array", func(b2 *testing.B) {
for i := 0; i < b2.N; i++ {
_ = getDataFromDbArray([]string{"one", "two"})
}
})
}
type DbQuery [10]string
type DbQueryResult [10]string
func getDataFromDbArray(s []string) DbQueryResult {
q := DbQuery{}
return processQueryArray(q)
}
func processQueryArray(q DbQuery) DbQueryResult {
return (DbQueryResult)(q)
}
func getDataFromDbSlice(s []string) []string {
tmpArray := make([]string, 0, 10)
return processQuerySlice(tmpArray)
}
func processQuerySlice(q []string) []string {
return q
}
Running benchmark with benchmem gives this results:
BenchmarkAllocation/Slice-6 30000000 51.8 ns/op 160 B/op 1 allocs/op
BenchmarkAllocation/Array-6 100000000 15.7 ns/op 0 B/op 0 allocs/op
This answer assumes that searchDB does not retain a reference to the slice passed to it. It seems unlikely that the function retains a reference given the variable and function names.
These options have the same memory and performance characteristics:
var tmpSlice []string
tmpSlice := []string{}
tmpSlice := make([]string, 0)
tmpSlice := make([]string, 0, 0)
None of them allocate memory until the first append operation. If these are your only options, then pick one of the first two because they are easier to read.
This option will have the best performance:
tmpSlice := make([]string, 0, 10)
This ensures that the backing array for the slice is allocated once. There will be no reallocations of the backing array as values are appended.
If searchDB's argument does not escape, then the one allocation for the backing array will be made on the stack. This is the best possible performance. You can find out if the argument escapes by building with the -gcflags "-m -l" option.
Given that getDataFromDb invokes a database operation, any performance difference between the options will be in the noise. It's more important is to write clear and simple code than to optimize this.
I would probably go with the var tmpSlice []string over tmpSlice := make([]string, 0, 10) because there's no need to understand where the value 10 came from with the former.
I would do
var tmpSlice []string
This would give you an empty string slice and you can append as needed.
Unless the slice range gets big and you know the dimension beforehand, otherwise I wouldn't pre-allocate memory for it

Memory efficient implementation of go map?

My use case is to transfer a group of members (integers) over network, so we employ delta encoding and on the receiving end we decode and put the whole list as a map,
map[string]struct{}
for O(1) complexity for membership check.
The problem I am facing is that the actual size of members is only 15MB for 2 Million integers, but the size of the map in heap is 100+MB. Seems like the actual map implementation of Go is not suitable for large maps.
Since it is a client side SDK, I do not want to impact the usable memory much, and there can be multiple such groups that need to be kept in memory for long periods of time--around 1 week.
Is there a better alternative DS in Go for this?
type void struct{}
func ToMap(v []int64) map[string]void {
out := map[string]void{}
for _, i := range v {
out[strconv.Itoa(int(i))] = void{}
}
return out
}
This is a more memory efficient form of the map:
type void struct{}
func ToMap(v []int64) map[int64]void {
m := make(map[int64]void, len(v))
for _, i := range v {
m[i] = void{}
}
return m
}
Go maps are optimized for integer keys. Optimize the map allocation by giving the exact map size as a hint.
A string has an implicit pointer which would make the garbage collector (gc) follow the pointer every time it scans.
Here is a Go benchmark for 2 million pseudorandom integers:
package main
import (
"math/rand"
"strconv"
"testing"
)
type void struct{}
func ToMap1(v []int64) map[string]void {
out := map[string]void{}
for _, i := range v {
out[strconv.Itoa(int(i))] = void{}
}
return out
}
func ToMap2(v []int64) map[int64]void {
m := make(map[int64]void, len(v))
for _, i := range v {
m[i] = void{}
}
return m
}
var benchmarkV = func() []int64 {
v := make([]int64, 2000000)
for i := range v {
v[i] = rand.Int63()
}
return v
}()
func BenchmarkToMap1(b *testing.B) {
b.ReportAllocs()
b.ResetTimer()
for N := 0; N < b.N; N++ {
ToMap1(benchmarkV)
}
}
func BenchmarkToMap2(b *testing.B) {
b.ReportAllocs()
b.ResetTimer()
for N := 0; N < b.N; N++ {
ToMap2(benchmarkV)
}
}
Output:
$ go test tomap_test.go -bench=.
BenchmarkToMap1-4 2 973358894 ns/op 235475280 B/op 2076779 allocs/op
BenchmarkToMap2-4 10 188489170 ns/op 44852584 B/op 23 allocs/op
$

LoadOrStore in a sync.Map without creating a new structure each time

Is it possible to LoadOrStore into a Go sync.Map without creating a new structure every time? If not, what alternatives are available?
The use case here is if I'm using the sync.Map as a cache where cache misses are rare (but possible) and on a cache miss I want to add to the map, I need to initialize a structure every single time LoadOrStore is called rather than just creating the struct when needed. I'm worried this will hurt the GC, initializing hundreds of thousands of structures that will not be needed.
In Java this can be done using computeIfAbsent.
you can try:
var m sync.Map
s, ok := m.Load("key")
if !ok {
s, _ = m.LoadOrStore("key", "value")
}
fmt.Println(s)
play demo
This is my solution: use sync.Map and sync.One
type syncData struct {
data interface{}
once *sync.Once
}
func LoadOrStore(m *sync.Map, key string, f func() (interface{}, error)) (interface{}, error) {
temp, _ := m.LoadOrStore(key, &syncData{
data: nil,
once: &sync.Once{},
})
d := temp.(*syncData)
var err error
if d.data == nil {
d.once.Do(func() {
d.data, err = f()
if err != nil {
//if failed, will try again by new sync.Once
d.once = &sync.Once{}
}
})
}
return d.data, err
}
Package sync
import "sync"
type Map
Map is like a Go map[interface{}]interface{} but is safe for
concurrent use by multiple goroutines without additional locking or
coordination. Loads, stores, and deletes run in amortized constant
time.
The Map type is specialized. Most code should use a plain Go map
instead, with separate locking or coordination, for better type safety
and to make it easier to maintain other invariants along with the map
content.
The Map type is optimized for two common use cases: (1) when the entry
for a given key is only ever written once but read many times, as in
caches that only grow, or (2) when multiple goroutines read, write,
and overwrite entries for disjoint sets of keys. In these two cases,
use of a Map may significantly reduce lock contention compared to a Go
map paired with a separate Mutex or RWMutex.
The usual way to solve these problems is to construct a usage model and then benchmark it.
For example, since "cache misses are rare", assume that Load wiil work most of the time and only LoadOrStore (with value allocation and initialization) when necessary.
$ go test map_test.go -bench=. -benchmem
BenchmarkHit-4 2 898810447 ns/op 44536 B/op 1198 allocs/op
BenchmarkMiss-4 1 2958103053 ns/op 483957168 B/op 43713042 allocs/op
$
map_test.go:
package main
import (
"strconv"
"sync"
"testing"
)
func BenchmarkHit(b *testing.B) {
for N := 0; N < b.N; N++ {
var m sync.Map
for i := 0; i < 64*1024; i++ {
for k := 0; k < 256; k++ {
// Assume cache hit
v, ok := m.Load(k)
if !ok {
// allocate and initialize value
v = strconv.Itoa(k)
a, loaded := m.LoadOrStore(k, v)
if loaded {
v = a
}
}
_ = v
}
}
}
}
func BenchmarkMiss(b *testing.B) {
for N := 0; N < b.N; N++ {
var m sync.Map
for i := 0; i < 64*1024; i++ {
for k := 0; k < 256; k++ {
// Assume cache miss
// allocate and initialize value
var v interface{} = strconv.Itoa(k)
a, loaded := m.LoadOrStore(k, v)
if loaded {
v = a
}
_ = v
}
}
}
}

How to iterate int range concurrently

For purely educational purposes I created a base58 package. It will encode/decode a uint64 using the bitcoin base58 symbol chart, for example:
b58 := Encode(100) // return 2j
num := Decode("2j") // return 100
While creating the first tests I came with this:
func TestEncode(t *testing.T) {
var i uint64
for i = 0; i <= (1<<64 - 1); i++ {
b58 := Encode(i)
num := Decode(b58)
if num != i {
t.Fatalf("Expecting %d for %s", i, b58)
}
}
}
This "naive" implementation, tries to convert all the range from uint64 (From 0 to 18,446,744,073,709,551,615) to base58 and later back to uint64 but takes too much time.
To better understand how go handles concurrency I would like to know how to use channels or goroutines and perform the iteration across the full uint64 range in the most efficient way?
Could data be processed by chunks and in parallel, if yes how to accomplish this?
Thanks in advance.
UPDATE:
Like mention in the answer by #Adrien, one-way is to use t.Parallel() but that applies only when testing the package, In any case, by implementing it I found that is noticeably slower, it runs in parallel but there is no speed gain.
I understand that doing the full uint64 may take years but what I want to find/now is how could a channel or goroutine, may help to speed up the process (testing with small range 1<<16) probably by using something like this https://play.golang.org/p/9U22NfrXeq just as an example.
The question is not about how to test the package is about what algorithm, technic could be used to iterate faster by using concurrency.
This functionality is built into the Go testing package, in the form of T.Parallel:
func TestEncode(t *testing.T) {
var i uint64
for i = 0; i <= (1<<64 - 1); i++ {
t.Run(fmt.Sprintf("%d",i), func(t *testing.T) {
j := i // Copy to local var - important
t.Parallel() // Mark test as parallelizable
b58 := Encode(j)
num := Decode(b58)
if num != j {
t.Fatalf("Expecting %d for %s", j, b58)
}
})
}
}
I came up with this solutions:
package main
import (
"fmt"
"time"
"github.com/nbari/base58"
)
func encode(i uint64) {
x := base58.Encode(i)
fmt.Printf("%d = %s\n", i, x)
time.Sleep(time.Second)
}
func main() {
concurrency := 4
sem := make(chan struct{}, concurrency)
for i, val := uint64(0), uint64(1<<16); i <= val; i++ {
sem <- struct{}{}
go func(i uint64) {
defer func() { <-sem }()
encode(i)
}(i)
}
for i := 0; i < cap(sem); i++ {
sem <- struct{}{}
}
}
Basically, start 4 workers and calls the encode function, to notice/understand more this behavior a sleep is added so that the data can be printed in chunks of 4.
Also, these answers helped me to better understand concurrency understanding: https://stackoverflow.com/a/18405460/1135424
If there is a better way please let me know.

Generating random numbers concurrently in Go

I'm new to Go and to concurrent/parallel programming in general. In order to try out (and hopefully see the performance benefits of) goroutines, I've put together a small test program that simply generates 100 million random ints - first in a single goroutine, and then in as many goroutines as reported by runtime.NumCPU().
However, I consistently get worse performance using more goroutines than using a single one. I assume I'm missing something vital in either my programs design or the way in which I use goroutines/channels/other Go features. Any feedback is much appreciated.
I attach the code below.
package main
import "fmt"
import "time"
import "math/rand"
import "runtime"
func main() {
// Figure out how many CPUs are available and tell Go to use all of them
numThreads := runtime.NumCPU()
runtime.GOMAXPROCS(numThreads)
// Number of random ints to generate
var numIntsToGenerate = 100000000
// Number of ints to be generated by each spawned goroutine thread
var numIntsPerThread = numIntsToGenerate / numThreads
// Channel for communicating from goroutines back to main function
ch := make(chan int, numIntsToGenerate)
// Slices to keep resulting ints
singleThreadIntSlice := make([]int, numIntsToGenerate, numIntsToGenerate)
multiThreadIntSlice := make([]int, numIntsToGenerate, numIntsToGenerate)
fmt.Printf("Initiating single-threaded random number generation.\n")
startSingleRun := time.Now()
// Generate all of the ints from a single goroutine, retrieve the expected
// number of ints from the channel and put in target slice
go makeRandomNumbers(numIntsToGenerate, ch)
for i := 0; i < numIntsToGenerate; i++ {
singleThreadIntSlice = append(singleThreadIntSlice,(<-ch))
}
elapsedSingleRun := time.Since(startSingleRun)
fmt.Printf("Single-threaded run took %s\n", elapsedSingleRun)
fmt.Printf("Initiating multi-threaded random number generation.\n")
startMultiRun := time.Now()
// Run the designated number of goroutines, each of which generates its
// expected share of the total random ints, retrieve the expected number
// of ints from the channel and put in target slice
for i := 0; i < numThreads; i++ {
go makeRandomNumbers(numIntsPerThread, ch)
}
for i := 0; i < numIntsToGenerate; i++ {
multiThreadIntSlice = append(multiThreadIntSlice,(<-ch))
}
elapsedMultiRun := time.Since(startMultiRun)
fmt.Printf("Multi-threaded run took %s\n", elapsedMultiRun)
}
func makeRandomNumbers(numInts int, ch chan int) {
source := rand.NewSource(time.Now().UnixNano())
generator := rand.New(source)
for i := 0; i < numInts; i++ {
ch <- generator.Intn(numInts*100)
}
}
First let's correct and optimize some things in your code:
Since Go 1.5, GOMAXPROCS defaults to the number of CPU cores available, so no need to set that (although it does no harm).
Numbers to generate:
var numIntsToGenerate = 100000000
var numIntsPerThread = numIntsToGenerate / numThreads
If numThreads is like 3, in case of multi goroutines, you'll have less numbers generated (due to integer division), so let's correct it:
numIntsToGenerate = numIntsPerThread * numThreads
No need a buffer for 100 million values, reduce that to a sensible value (e.g. 1000):
ch := make(chan int, 1000)
If you want to use append(), the slices you create should have 0 length (and proper capacity):
singleThreadIntSlice := make([]int, 0, numIntsToGenerate)
multiThreadIntSlice := make([]int, 0, numIntsToGenerate)
But in your case that's unnecessary, as only 1 goroutine is collecting the results, you can simply use indexing, and create slices like this:
singleThreadIntSlice := make([]int, numIntsToGenerate)
multiThreadIntSlice := make([]int, numIntsToGenerate)
And when collecting results:
for i := 0; i < numIntsToGenerate; i++ {
singleThreadIntSlice[i] = <-ch
}
// ...
for i := 0; i < numIntsToGenerate; i++ {
multiThreadIntSlice[i] = <-ch
}
Ok. Code is now better. Attempting to run it, you will still experience that the multi-goroutine version runs slower. Why is that?
It's because controlling, synchronizing and collecting results from multiple goroutines does have overhead. If the task they perform is little, the communication overhead will be greater and overall you lose performance.
Your case is such a case. Generating a single random number once you set up your rand.Rand() is pretty fast.
Let's modify your "task" to be big enough so that we can see the benefit of multiple goroutines:
// 1 million is enough now:
var numIntsToGenerate = 1000 * 1000
func makeRandomNumbers(numInts int, ch chan int) {
source := rand.NewSource(time.Now().UnixNano())
generator := rand.New(source)
for i := 0; i < numInts; i++ {
// Kill time, do some processing:
for j := 0; j < 1000; j++ {
generator.Intn(numInts * 100)
}
// and now return a single random number
ch <- generator.Intn(numInts * 100)
}
}
In this case to get a random number, we generate 1000 random numbers and just throw them away (to make some calculation / kill time) before we generate the one we return. We do this so that the calculation time of the worker goroutines outweights the communication overhead of multiple goroutines.
Running the app now, my results on a 4-core machine:
Initiating single-threaded random number generation.
Single-threaded run took 2.440604504s
Initiating multi-threaded random number generation.
Multi-threaded run took 987.946758ms
The multi-goroutine version runs 2.5 times faster. This means if your goroutines would deliver random numbers in 1000-blocks, you would see 2.5 times faster execution (compared to the single goroutine generation).
One last note:
Your single-goroutine version also uses multiple goroutines: 1 to generate numbers and 1 to collect the results. Most likely the collector does not fully utilize a CPU core and mostly just waits for the results, but still: 2 CPU cores are used. Let's estimate that "1.5" CPU cores are utilized. While the multi-goroutine version utilizes 4 CPU cores. Just as a rough estimation: 4 / 1.5 = 2.66, very close to our performance gain.
If you really want to generate the random numbers in parallel then each task should be about generate the numbers and then return them in one go rather than the task being generate one number at a time and feed them to a channel as that reading and writing to channel will slow things down in multi go routine case. Below is the modified code where then task generate the required numbers in one go and this performs better in multi go routines case, also I have used slice of slices to collect the result from multi go routines.
package main
import "fmt"
import "time"
import "math/rand"
import "runtime"
func main() {
// Figure out how many CPUs are available and tell Go to use all of them
numThreads := runtime.NumCPU()
runtime.GOMAXPROCS(numThreads)
// Number of random ints to generate
var numIntsToGenerate = 100000000
// Number of ints to be generated by each spawned goroutine thread
var numIntsPerThread = numIntsToGenerate / numThreads
// Channel for communicating from goroutines back to main function
ch := make(chan []int)
fmt.Printf("Initiating single-threaded random number generation.\n")
startSingleRun := time.Now()
// Generate all of the ints from a single goroutine, retrieve the expected
// number of ints from the channel and put in target slice
go makeRandomNumbers(numIntsToGenerate, ch)
singleThreadIntSlice := <-ch
elapsedSingleRun := time.Since(startSingleRun)
fmt.Printf("Single-threaded run took %s\n", elapsedSingleRun)
fmt.Printf("Initiating multi-threaded random number generation.\n")
multiThreadIntSlice := make([][]int, numThreads)
startMultiRun := time.Now()
// Run the designated number of goroutines, each of which generates its
// expected share of the total random ints, retrieve the expected number
// of ints from the channel and put in target slice
for i := 0; i < numThreads; i++ {
go makeRandomNumbers(numIntsPerThread, ch)
}
for i := 0; i < numThreads; i++ {
multiThreadIntSlice[i] = <-ch
}
elapsedMultiRun := time.Since(startMultiRun)
fmt.Printf("Multi-threaded run took %s\n", elapsedMultiRun)
//To avoid not used warning
fmt.Print(len(singleThreadIntSlice))
}
func makeRandomNumbers(numInts int, ch chan []int) {
source := rand.NewSource(time.Now().UnixNano())
generator := rand.New(source)
result := make([]int, numInts)
for i := 0; i < numInts; i++ {
result[i] = generator.Intn(numInts * 100)
}
ch <- result
}

Resources