64-bit Unsigned Integer concat benchmarking - performance

I have a tweak question. I had some repetitive pieces of code in my web-app requests where simple things like fmt.Sprintf("%d%d", 01293123, 234993923) happens.
Now I did some benchmarks and tested a bytes writer, which was quite slow (~400 ns/op). The sprintf itself was about 240 ns/op. Then I did strings joining:
var concatBuffStr []byte
func BenchmarkStringsJoin(b *testing.B) {
for n := 0; n < b.N; n++ {
concatBuffStr = []byte(strings.Join([]string{
strconv.Itoa(2),
strconv.Itoa(3),
}, " "))
}
}
152 ns/op
A big improvement already. But then I tried clunking 2 ints converted to bytes together
var concatBuffStr []byte
func BenchmarkConcatIntsAsBytes(b *testing.B) {
for n := 0; n < b.N; n++ {
aBuf := make([]byte, 8)
bBuf := make([]byte, 8)
binary.BigEndian.PutUint64(aBuf, uint64(2))
binary.BigEndian.PutUint64(bBuf, uint64(3))
concatBuffStr = append(aBuf, bBuf...)
}
}
57.8 ns/op
Amazing! But I could just avoid glueing them together and already reserve the full 16 bytes for 2 maxed out 64 bit uints spaces:
var concatBuffStr []byte
func BenchmarkCopyIntsAsBytes(b *testing.B) {
for n := 0; n < b.N; n++ {
concatBuffStr = make([]byte, 16)
bBuf := make([]byte, 8)
binary.BigEndian.PutUint64(concatBuffStr, uint64(123123))
binary.BigEndian.PutUint64(bBuf, uint64(3453455))
copy(concatBuffStr[8:], bBuf)
}
}
30.4 ns/op
By now we're 8 times faster than with the fmt.Sprintf() method.
Was wondering if there is an even faster way than this. I want to avoid unsafe tho.
I was also thinking of checking the max value of both 2 ints and if they are below the MAX of an uint32 or uint16 I could customize the logic. Benchmarking the uint16 downgrade by hand is around 23 ns/op, but that's not realistic as in the web-request itself it would still need to check the size of the ints and do extra logic, which will likely result in more overhead than that 7 ns/op gain.
Update
I managed to simplify it a little bit more:
var concatBuffStr []byte
func BenchmarkCopyIntsAsBytesShort(b *testing.B) {
for n := 0; n < b.N; n++ {
concatBuffStr = make([]byte, 16)
binary.BigEndian.PutUint64(concatBuffStr, uint64(123123))
binary.BigEndian.PutUint64(concatBuffStr[8:], uint64(3453455))
}
}
28.6 ns/op

Related

Efficient allocation of slices (cap vs length)

Assuming I am creating a slice, which I know in advance that I want to populate via a for loop with 1e5 elements via successive calls to append:
// Append 1e5 strings to the slice
for i := 0; i<= 1e5; i++ {
value := fmt.Sprintf("Entry: %d", i)
myslice = append(myslice, value)
}
which is the more efficient way of initialising the slice and why:
a. declaring a nil slice of strings?
var myslice []string
b. setting its length in advance to 1e5?
myslice = make([]string, 1e5)
c. setting both its length and capacity to 1e5?
myslice = make([]string, 1e5, 1e5)
Your b and c solutions are identical: creating a slice with make() where you don't specify the capacity, the "missing" capacity defaults to the given length.
Also note that if you create the slice with a length in advance, you can't use append() to populate the slice, because it adds new elements to the slice, and it doesn't "reuse" the allocated elements. So in that case you have to assign values to the elements using an index expression, e.g. myslice[i] = value.
If you start with a slice with 0 capacity, a new backing array have to be allocated and "old" content have to be copied over whenever you append an element that does not fit into the capacity, so that solution must be slower inherently.
I would define and consider the following different solutions (I use an []int slice to avoid fmt.Sprintf() to intervene / interfere with our benchmarks):
var s []int
func BenchmarkA(b *testing.B) {
for i := 0; i < b.N; i++ {
s = nil
for j := 0; j < 1e5; j++ {
s = append(s, j)
}
}
}
func BenchmarkB(b *testing.B) {
for i := 0; i < b.N; i++ {
s = make([]int, 0, 1e5)
for j := 0; j < 1e5; j++ {
s = append(s, j)
}
}
}
func BenchmarkBLocal(b *testing.B) {
for i := 0; i < b.N; i++ {
s := make([]int, 0, 1e5)
for j := 0; j < 1e5; j++ {
s = append(s, j)
}
}
}
func BenchmarkD(b *testing.B) {
for i := 0; i < b.N; i++ {
s = make([]int, 1e5)
for j := range s {
s[j] = j
}
}
}
Note: I use package level variables in benchmarks (except BLocal), because some optimization may (and actually do) happen when using a local slice variable).
And the benchmark results:
BenchmarkA-4 1000 1081599 ns/op 4654332 B/op 30 allocs/op
BenchmarkB-4 3000 371096 ns/op 802816 B/op 1 allocs/op
BenchmarkBLocal-4 10000 172427 ns/op 802816 B/op 1 allocs/op
BenchmarkD-4 10000 167305 ns/op 802816 B/op 1 allocs/op
A: As you can see, starting with a nil slice is the slowest, uses the most memory and allocations.
B: Pre-allocating the slice with capacity (but still 0 length) and using append: it requires only a single allocation and is much faster, almost thrice as fast.
BLocal: Do note that when using a local slice instead of a package variable, (compiler) optimizations happen and it gets a lot faster: twice as fast, almost as fast as D.
D: Not using append() but assigning elements to a preallocated slice wins in every aspect, even when using a non-local variable.
For this use case, since you already know the number of string elements that you want to assign to the slice,
I would prefer approach b or c.
Since you will prevent resizing of the slice using these two approaches.
If you choose to use approach a, the slice will double its size everytime a new element is added after len equals capacity.
https://play.golang.org/p/kSuX7cE176j

Memory efficient implementation of go map?

My use case is to transfer a group of members (integers) over network, so we employ delta encoding and on the receiving end we decode and put the whole list as a map,
map[string]struct{}
for O(1) complexity for membership check.
The problem I am facing is that the actual size of members is only 15MB for 2 Million integers, but the size of the map in heap is 100+MB. Seems like the actual map implementation of Go is not suitable for large maps.
Since it is a client side SDK, I do not want to impact the usable memory much, and there can be multiple such groups that need to be kept in memory for long periods of time--around 1 week.
Is there a better alternative DS in Go for this?
type void struct{}
func ToMap(v []int64) map[string]void {
out := map[string]void{}
for _, i := range v {
out[strconv.Itoa(int(i))] = void{}
}
return out
}
This is a more memory efficient form of the map:
type void struct{}
func ToMap(v []int64) map[int64]void {
m := make(map[int64]void, len(v))
for _, i := range v {
m[i] = void{}
}
return m
}
Go maps are optimized for integer keys. Optimize the map allocation by giving the exact map size as a hint.
A string has an implicit pointer which would make the garbage collector (gc) follow the pointer every time it scans.
Here is a Go benchmark for 2 million pseudorandom integers:
package main
import (
"math/rand"
"strconv"
"testing"
)
type void struct{}
func ToMap1(v []int64) map[string]void {
out := map[string]void{}
for _, i := range v {
out[strconv.Itoa(int(i))] = void{}
}
return out
}
func ToMap2(v []int64) map[int64]void {
m := make(map[int64]void, len(v))
for _, i := range v {
m[i] = void{}
}
return m
}
var benchmarkV = func() []int64 {
v := make([]int64, 2000000)
for i := range v {
v[i] = rand.Int63()
}
return v
}()
func BenchmarkToMap1(b *testing.B) {
b.ReportAllocs()
b.ResetTimer()
for N := 0; N < b.N; N++ {
ToMap1(benchmarkV)
}
}
func BenchmarkToMap2(b *testing.B) {
b.ReportAllocs()
b.ResetTimer()
for N := 0; N < b.N; N++ {
ToMap2(benchmarkV)
}
}
Output:
$ go test tomap_test.go -bench=.
BenchmarkToMap1-4 2 973358894 ns/op 235475280 B/op 2076779 allocs/op
BenchmarkToMap2-4 10 188489170 ns/op 44852584 B/op 23 allocs/op
$

in golang, is there any performance difference between maps initialized using make vs {}

as we know there are two ways to initialize a map (as listed below). I'm wondering if there is any performance difference between the two approaches.
var myMap map[string]int
then
myMap = map[string]int{}
vs
myMap = make(map[string]int)
On my machine they appear to be about equivalent.
You can easily make a benchmark test to compare. For example:
package bench
import "testing"
var result map[string]int
func BenchmarkMakeLiteral(b *testing.B) {
var m map[string]int
for n := 0; n < b.N; n++ {
m = InitMapLiteral()
}
result = m
}
func BenchmarkMakeMake(b *testing.B) {
var m map[string]int
for n := 0; n < b.N; n++ {
m = InitMapMake()
}
result = m
}
func InitMapLiteral() map[string]int {
return map[string]int{}
}
func InitMapMake() map[string]int {
return make(map[string]int)
}
Which on 3 different runs yielded results that are close enough to be insignificant:
First Run
$ go test -bench=.
testing: warning: no tests to run
PASS
BenchmarkMakeLiteral-8 10000000 160 ns/op
BenchmarkMakeMake-8 10000000 171 ns/op
ok github.com/johnweldon/bench 3.664s
Second Run
$ go test -bench=.
testing: warning: no tests to run
PASS
BenchmarkMakeLiteral-8 10000000 182 ns/op
BenchmarkMakeMake-8 10000000 173 ns/op
ok github.com/johnweldon/bench 3.945s
Third Run
$ go test -bench=.
testing: warning: no tests to run
PASS
BenchmarkMakeLiteral-8 10000000 170 ns/op
BenchmarkMakeMake-8 10000000 170 ns/op
ok github.com/johnweldon/bench 3.751s
When allocating empty maps there is no difference but with make you can pass second parameter to pre-allocate space in map. This will save a lot of reallocations when maps are being populated.
Benchmarks
package maps
import "testing"
const SIZE = 10000
func fill(m map[int]bool, size int) {
for i := 0; i < size; i++ {
m[i] = true
}
}
func BenchmarkEmpty(b *testing.B) {
for n := 0; n < b.N; n++ {
m := make(map[int]bool)
fill(m, SIZE)
}
}
func BenchmarkAllocated(b *testing.B) {
for n := 0; n < b.N; n++ {
m := make(map[int]bool, 2*SIZE)
fill(m, SIZE)
}
}
Results
go test -benchmem -bench .
BenchmarkEmpty-8 500 2988680 ns/op 431848 B/op 625 allocs/op
BenchmarkAllocated-8 1000 1618251 ns/op 360949 B/op 11 allocs/op
A year ago I actually stumped on the fact that using make with explicitly allocated space is better then using map literal if your values are not static
So doing
return map[string]float {
"key1": SOME_COMPUTED_ABOVE_VALUE,
"key2": SOME_COMPUTED_ABOVE_VALUE,
// more keys here
"keyN": SOME_COMPUTED_ABOVE_VALUE,
}
is slower then
// some code above
result := make(map[string]float, SIZE) // SIZE >= N
result["key1"] = SOME_COMPUTED_ABOVE_VALUE
result["key2"] = SOME_COMPUTED_ABOVE_VALUE
// more keys here
result["keyN"] = SOME_COMPUTED_ABOVE_VALUE
return result
for N which are quite big (N=300 in my use case).
The reason is the compiler fails to understand that one needs to allocate at least N slots in the first case.
I wrote a blog post about it
https://trams.github.io/golang-map-literal-performance/
and I reported a bug to the community
https://github.com/golang/go/issues/43020
As of golang 1.17 it is still an issue.

what's difference between make and initialize struct in golang?

We can make channel by make function, new an object by {} expression.
ch := make(chan interface{})
o := struct{}{}
But, what's difference between make and {} to new a map?
m0 := make(map[int]int)
m1 := map[int]int{}
make can be used to initialize a map with preallocated space. It takes an optional second parameter.
m0 := make(map[int]int, 1000) // allocateds space for 1000 entries
Allocation takes cpu time. If you know how many entries there will be in the map you can preallocate space to all of them. This reduces execution time. Here is a program you can run to verify this.
package main
import "fmt"
import "testing"
func BenchmarkWithMake(b *testing.B) {
m0 := make(map[int]int, b.N)
for i := 0; i < b.N; i++ {
m0[i] = 1000
}
}
func BenchmarkWithLitteral(b *testing.B) {
m1 := map[int]int{}
for i := 0; i < b.N; i++ {
m1[i] = 1000
}
}
func main() {
bwm := testing.Benchmark(BenchmarkWithMake)
fmt.Println(bwm) // gives 176 ns/op
bwl := testing.Benchmark(BenchmarkWithLitteral)
fmt.Println(bwl) // gives 259 ns/op
}
From the docs for the make keyword:
Map: An initial allocation is made according to the size but the
resulting map has length 0. The size may be omitted, in which case a
small starting size is allocated.
So, in the case of maps, there is no difference between using make and using an empty map literal.

Concat byte arrays

Can someone please point at a more efficient version of the following
b:=make([]byte,0,sizeTotal)
b=append(b,size...)
b=append(b,contentType...)
b=append(b,lenCallbackid...)
b=append(b,lenTarget...)
b=append(b,lenAction...)
b=append(b,lenContent...)
b=append(b,callbackid...)
b=append(b,target...)
b=append(b,action...)
b=append(b,content...)
every variable is a byte slice apart from size sizeTotal
Update:
Code:
type Message struct {
size uint32
contentType uint8
callbackId string
target string
action string
content string
}
var res []byte
var b []byte = make([]byte,0,4096)
func (m *Message)ToByte()[]byte{
callbackIdIntLen:=len(m.callbackId)
targetIntLen := len(m.target)
actionIntLen := len(m.action)
contentIntLen := len(m.content)
lenCallbackid:=make([]byte,4)
binary.LittleEndian.PutUint32(lenCallbackid, uint32(callbackIdIntLen))
callbackid := []byte(m.callbackId)
lenTarget := make([]byte,4)
binary.LittleEndian.PutUint32(lenTarget, uint32(targetIntLen))
target:=[]byte(m.target)
lenAction := make([]byte,4)
binary.LittleEndian.PutUint32(lenAction, uint32(actionIntLen))
action := []byte(m.action)
lenContent:= make([]byte,4)
binary.LittleEndian.PutUint32(lenContent, uint32(contentIntLen))
content := []byte(m.content)
sizeTotal:= 21+callbackIdIntLen+targetIntLen+actionIntLen+contentIntLen
size := make([]byte,4)
binary.LittleEndian.PutUint32(size, uint32(sizeTotal))
b=b[:0]
b=append(b,size...)
b=append(b,byte(m.contentType))
b=append(b,lenCallbackid...)
b=append(b,lenTarget...)
b=append(b,lenAction...)
b=append(b,lenContent...)
b=append(b,callbackid...)
b=append(b,target...)
b=append(b,action...)
b=append(b,content...)
res = b
return b
}
func FromByte(bytes []byte)(*Message){
size :=binary.LittleEndian.Uint32(bytes[0:4])
contentType :=bytes[4:5][0]
lenCallbackid:=binary.LittleEndian.Uint32(bytes[5:9])
lenTarget :=binary.LittleEndian.Uint32(bytes[9:13])
lenAction :=binary.LittleEndian.Uint32(bytes[13:17])
lenContent :=binary.LittleEndian.Uint32(bytes[17:21])
callbackid := string(bytes[21:21+lenCallbackid])
target:= string(bytes[21+lenCallbackid:21+lenCallbackid+lenTarget])
action:= string(bytes[21+lenCallbackid+lenTarget:21+lenCallbackid+lenTarget+lenAction])
content:=string(bytes[size-lenContent:size])
return &Message{size,contentType,callbackid,target,action,content}
}
Benchs:
func BenchmarkMessageToByte(b *testing.B) {
m:=NewMessage(uint8(3),"agsdggsdasagdsdgsgddggds","sometarSFAFFget","somFSAFSAFFSeaction","somfasfsasfafsejsonzhit")
for n := 0; n < b.N; n++ {
m.ToByte()
}
}
func BenchmarkMessageFromByte(b *testing.B) {
m:=NewMessage(uint8(1),"sagdsgaasdg","soSASFASFASAFSFASFAGmetarget","adsgdgsagdssgdsgd","agsdsdgsagdsdgasdg").ToByte()
for n := 0; n < b.N; n++ {
FromByte(m)
}
}
func BenchmarkStringToByte(b *testing.B) {
for n := 0; n < b.N; n++ {
_ = []byte("abcdefghijklmnoqrstuvwxyz")
}
}
func BenchmarkStringFromByte(b *testing.B) {
s:=[]byte("abcdefghijklmnoqrstuvwxyz")
for n := 0; n < b.N; n++ {
_ = string(s)
}
}
func BenchmarkUintToByte(b *testing.B) {
for n := 0; n < b.N; n++ {
i:=make([]byte,4)
binary.LittleEndian.PutUint32(i, uint32(99))
}
}
func BenchmarkUintFromByte(b *testing.B) {
i:=make([]byte,4)
binary.LittleEndian.PutUint32(i, uint32(99))
for n := 0; n < b.N; n++ {
binary.LittleEndian.Uint32(i)
}
}
Bench results:
BenchmarkMessageToByte 10000000 280 ns/op
BenchmarkMessageFromByte 10000000 293 ns/op
BenchmarkStringToByte 50000000 55.1 ns/op
BenchmarkStringFromByte 50000000 49.7 ns/op
BenchmarkUintToByte 1000000000 2.14 ns/op
BenchmarkUintFromByte 2000000000 1.71 ns/op
Provided memory is already allocated, a sequence of x=append(x,a...) is rather efficient in Go.
In your example, the initial allocation (make) probably costs more than the sequence of appends. It depends on the size of the fields. Consider the following benchmark:
package main
import (
"testing"
)
const sizeTotal = 25
var res []byte // To enforce heap allocation
func BenchmarkWithAlloc(b *testing.B) {
a := []byte("abcde")
for i := 0; i < b.N; i++ {
x := make([]byte, 0, sizeTotal)
x = append(x, a...)
x = append(x, a...)
x = append(x, a...)
x = append(x, a...)
x = append(x, a...)
res = x // Make sure x escapes, and is therefore heap allocated
}
}
func BenchmarkWithoutAlloc(b *testing.B) {
a := []byte("abcde")
x := make([]byte, 0, sizeTotal)
for i := 0; i < b.N; i++ {
x = x[:0]
x = append(x, a...)
x = append(x, a...)
x = append(x, a...)
x = append(x, a...)
x = append(x, a...)
res = x
}
}
On my box, the result is:
testing: warning: no tests to run
PASS
BenchmarkWithAlloc 10000000 116 ns/op 32 B/op 1 allocs/op
BenchmarkWithoutAlloc 50000000 24.0 ns/op 0 B/op 0 allocs/op
Systematically reallocating the buffer (even a small one) makes this benchmark at least 5 times slower.
So your best hope to optimize this code it to make sure you do not reallocate a buffer for each packet you build. On the contrary, you should keep your buffer, and reuse it for each marshalling operation.
You can reset a slice while keeping its underlying buffer allocated with the following statement:
x = x[:0]
I looked carefully at that and made the following benchmarks.
package append
import "testing"
func BenchmarkAppend(b *testing.B) {
as := 1000
a := make([]byte, as)
s := make([]byte, 0, b.N*as)
for i := 0; i < b.N; i++ {
s = append(s, a...)
}
}
func BenchmarkCopy(b *testing.B) {
as := 1000
a := make([]byte, as)
s := make([]byte, 0, b.N*as)
for i := 0; i < b.N; i++ {
copy(s[i*as:(i+1)*as], a)
}
}
The results are
grzesiek#klapacjusz ~/g/s/t/append> go test -bench . -benchmem
testing: warning: no tests to run
PASS
BenchmarkAppend 10000000 202 ns/op 1000 B/op 0 allocs/op
BenchmarkCopy 10000000 201 ns/op 1000 B/op 0 allocs/op
ok test/append 4.564s
If the totalSize is big enough then your code makes no memory allocations. It copies only the amount of bytes it needs to copy. It is perfectly fine.

Resources