Compare string and byte slice in Go without copy - go

What is the best way to check that Go string and a byte slice contain the same bytes? The simplest str == string(byteSlice) is inefficient as it copies byteSlice first.
I was looking for a version of Equal(a, b []byte) that takes a string as its argument, but could not find anything suitable.

Starting from Go 1.5 the compiler optimizes string(bytes) when comparing to a string using a stack-allocated temporary. Thus since Go 1.5
str == string(byteSlice)
became a canonical and efficient way to compare string to a byte slice.

The Go Programming Language Specification
String types
A string type represents the set of string values. A string value is a
(possibly empty) sequence of bytes. The predeclared string type is
string.
The length of a string s (its size in bytes) can be discovered using
the built-in function len. A string's bytes can be accessed by integer
indices 0 through len(s)-1.
For example,
package main
import "fmt"
func equal(s string, b []byte) bool {
if len(s) != len(b) {
return false
}
for i, x := range b {
if x != s[i] {
return false
}
}
return true
}
func main() {
s := "equal"
b := []byte(s)
fmt.Println(equal(s, b))
s = "not" + s
fmt.Println(equal(s, b))
}
Output:
true
false

If you're comfortable enough with the fact that this can break on a later release (doubtful though), you can use unsafe:
func unsafeCompare(a string, b []byte) int {
abp := *(*[]byte)(unsafe.Pointer(&a))
return bytes.Compare(abp, b)
}
func unsafeEqual(a string, b []byte) bool {
bbp := *(*string)(unsafe.Pointer(&b))
return a == bbp
}
playground
Benchmarks:
// using:
// aaa = strings.Repeat("a", 100)
// bbb = []byte(strings.Repeat("a", 99) + "b")
// go 1.5
BenchmarkCopy-8 20000000 75.4 ns/op
BenchmarkPetersEqual-8 20000000 83.1 ns/op
BenchmarkUnsafe-8 100000000 12.2 ns/op
BenchmarkUnsafeEqual-8 200000000 8.94 ns/op
// go 1.4
BenchmarkCopy 10000000 233 ns/op
BenchmarkPetersEqual 20000000 72.3 ns/op
BenchmarkUnsafe 100000000 15.5 ns/op
BenchmarkUnsafeEqual 100000000 10.7 ns/op

There is no reason to use the unsafe package or something just to compare []byte and string. The Go compiler is clever enough now, and it can optimize such conversions.
Here's a benchmark:
BenchmarkEqual-8 172135624 6.96 ns/op <--
BenchmarkUnsafe-8 179866616 6.65 ns/op <--
BenchmarkUnsafeEqual-8 175588575 6.85 ns/op <--
BenchmarkCopy-8 23715144 47.3 ns/op
BenchmarkPetersEqual-8 24709376 47.3 ns/op
Just convert a byte slice to a string and compare:
var (
aaa = strings.Repeat("a", 100)
bbb = []byte(strings.Repeat("a", 99) + "b")
)
func BenchmarkEqual(b *testing.B) {
for i := 0; i < b.N; i++ {
_ = aaa == string(bbb)
}
}
👉 Here is more information about the optimization, and this.

Related

go maps non-performant for large number of keys

I discovered very strange behaviour with go maps recently. The use case is to create a group of integers and have O(1) check for IsMember(id int).
The current implementation is :
func convertToMap(v []int64) map[int64]void {
out := make(map[int64]void, len(v))
for _, i := range v {
out[i] = void{}
}
return out
}
type Group struct {
members map[int64]void
}
type void struct{}
func (g *Group) IsMember(input string) (ok bool) {
memberID, _ := strconv.ParseInt(input, 10, 64)
_, ok = g.members[memberID]
return
}
When i benchmark the IsMember method, until 6 million members, everything looks fine. But above that the map look up is taking 1 second for each lookup!!
The benchmark test:
func BenchmarkIsMember(b *testing.B) {
b.ReportAllocs()
b.ResetTimer()
g := &Group{}
g.members = convertToMap(benchmarkV)
for N := 0; N < b.N && N < sizeOfGroup; N++ {
g.IsMember(benchmarkKVString[N])
}
}
var benchmarkV, benchmarkKVString = func(size int) ([]int64, []string{
v := make([]int64, size)
s := make([]string, size)
for i := range v {
val := rand.Int63()
v[i] = val
s[i] = strconv.FormatInt(val, 10)
}
return v, s
}(sizeOfGroup)
Benchmark numbers:
const sizeOfGroup = 6000000
BenchmarkIsMember-8 2000000 568 ns/op 50 B/op 0 allocs/op
const sizeOfGroup = 6830000
BenchmarkIsMember-8 1 1051725455 ns/op 178767208 B/op 25 allocs/op
Anything above group size of 6.8 million gives the same result.
Can someone help me to explain why this is happening, and can anything be done to make this performant while still using maps?
Also, i dont understand why so much memory is being allocated? Even if the time taken is due to collision and then linked list traversal, there shouldn't be any mem allocation, is my thought process wrong?
No need to measure extra allocation for converting slice to map because we just want to measure the lookup operation.
I've slightly modify the benchmark:
func BenchmarkIsMember(b *testing.B) {
fn := func(size int) ([]int64, []string) {
v := make([]int64, size)
s := make([]string, size)
for i := range v {
val := rand.Int63()
v[i] = val
s[i] = strconv.FormatInt(val, 10)
}
return v, s
}
for _, size := range []int{
6000000,
6800000,
6830000,
60000000,
} {
b.Run(fmt.Sprintf("size=%d", size), func(b *testing.B) {
var benchmarkV, benchmarkKVString = fn(size)
g := &deltaGroup{}
g.members = convertToMap(benchmarkV)
b.ReportAllocs()
b.ResetTimer()
for N := 0; N < b.N && N < size; N++ {
g.IsMember(benchmarkKVString[N])
}
})
}
}
And got the following results:
go test ./... -bench=. -benchtime=10s -cpu=1
goos: linux
goarch: amd64
pkg: trash
BenchmarkIsMember/size=6000000 2000000000 0.55 ns/op 0 B/op 0 allocs/op
BenchmarkIsMember/size=6800000 1000000000 1.27 ns/op 0 B/op 0 allocs/op
BenchmarkIsMember/size=6830000 1000000000 1.23 ns/op 0 B/op 0 allocs/op
BenchmarkIsMember/size=60000000 100000000 136 ns/op 0 B/op 0 allocs/op
PASS
ok trash 167.578s
Degradation isn't so significant as in your example.

How to check if []byte is all zeros in go

Is there a way to check if a byte slice is empty or 0 without checking each element or using reflect?
theByteVar := make([]byte, 128)
if "theByteVar is empty or zeroes" {
doSomething()
}
One solution which seems weird that I found was to keep an empty byte array for comparison.
theByteVar := make([]byte, 128)
emptyByteVar := make([]byte, 128)
// fill with anything
theByteVar[1] = 2
if reflect.DeepEqual(theByteVar,empty) == false {
doSomething(theByteVar)
}
For sure there must be a better/quicker solution.
Thanks
UPDATE did some comparison for 1000 loops and the reflect way is the worst by far...
Equal Loops: 1000 in true in 19.197µs
Contains Loops: 1000 in true in 34.507µs
AllZero Loops: 1000 in true in 117.275µs
Reflect Loops: 1000 in true in 14.616277ms
Comparing it with another slice containing only zeros, that requires reading (and comparing) 2 slices.
Using a single for loop will be more efficient here:
for _, v := range theByteVar {
if v != 0 {
doSomething(theByteVar)
break
}
}
If you do need to use it in multiple places, wrap it in a utility function:
func allZero(s []byte) bool {
for _, v := range s {
if v != 0 {
return false
}
}
return true
}
And then using it:
if !allZero(theByteVar) {
doSomething(theByteVar)
}
Another solution borrows an idea from C. It could be achieved by using the unsafe package in Go.
The idea is simple, instead of checking each byte from []byte, we can check the value of byte[i:i+8], which is a uint64 value, in each steps. By doing this we can check 8 bytes instead of checking only one byte in each iteration.
Below codes are not best practice but only show the idea.
const (
len8 int = 0xFFFFFFF8
)
func IsAllBytesZero(data []byte) bool {
n := len(data)
// Magic to get largest length which could be divided by 8.
nlen8 := n & len8
i := 0
for ; i < nlen8; i += 8 {
b := *(*uint64)(unsafe.Pointer(uintptr(unsafe.Pointer(&data[0])) + 8*uintptr(i)))
if b != 0 {
return false
}
}
for ; i < n; i++ {
if data[i] != 0 {
return false
}
}
return true
}
Benchmark
Testcases:
Only test for worst cases (all elements are zero)
Methods:
IsAllBytesZero: unsafe package solution
NaiveCheckAllBytesAreZero: a loop to iterate the whole byte array and check it.
CompareAllBytesWithFixedEmptyArray: using bytes.Compare solution with pre-allocated fixed size empty byte array.
CompareAllBytesWithDynamicEmptyArray: using bytes.Compare solution without pre-allocated fixed size empty byte array.
Results
BenchmarkIsAllBytesZero10-8 254072224 4.68 ns/op
BenchmarkIsAllBytesZero100-8 132266841 9.09 ns/op
BenchmarkIsAllBytesZero1000-8 19989015 55.6 ns/op
BenchmarkIsAllBytesZero10000-8 2344436 507 ns/op
BenchmarkIsAllBytesZero100000-8 1727826 679 ns/op
BenchmarkNaiveCheckAllBytesAreZero10-8 234153582 5.15 ns/op
BenchmarkNaiveCheckAllBytesAreZero100-8 30038720 38.2 ns/op
BenchmarkNaiveCheckAllBytesAreZero1000-8 4300405 291 ns/op
BenchmarkNaiveCheckAllBytesAreZero10000-8 407547 2666 ns/op
BenchmarkNaiveCheckAllBytesAreZero100000-8 43382 27265 ns/op
BenchmarkCompareAllBytesWithFixedEmptyArray10-8 415171356 2.71 ns/op
BenchmarkCompareAllBytesWithFixedEmptyArray100-8 218871330 5.51 ns/op
BenchmarkCompareAllBytesWithFixedEmptyArray1000-8 56569351 21.0 ns/op
BenchmarkCompareAllBytesWithFixedEmptyArray10000-8 6592575 177 ns/op
BenchmarkCompareAllBytesWithFixedEmptyArray100000-8 567784 2104 ns/op
BenchmarkCompareAllBytesWithDynamicEmptyArray10-8 64215448 19.8 ns/op
BenchmarkCompareAllBytesWithDynamicEmptyArray100-8 32875428 35.4 ns/op
BenchmarkCompareAllBytesWithDynamicEmptyArray1000-8 8580890 140 ns/op
BenchmarkCompareAllBytesWithDynamicEmptyArray10000-8 1277070 938 ns/op
BenchmarkCompareAllBytesWithDynamicEmptyArray100000-8 121256 10355 ns/op
Summary
Assumed that we're talking about the condition in sparse zero byte array. According to the benchmark, if performance is an issue, the naive check solution would be a bad idea. And, if you don't want to use unsafe package in your project, then consider using bytes.Compare solution with pre-allocated empty array as an alternative.
An interesting point could be pointed out is that the performance comes from unsafe package varies a lot, but it basically outperform all other solution mentioned above. I think it was relevant to the CPU cache mechanism.
You can possibly use bytes.Equal or bytes.Contains to compare with a zero initialized byte slice, see https://play.golang.org/p/mvUXaTwKjP, I haven't checked for performance, but hopefully it's been optimized. You might want to try out other solutions and compare the performance numbers, if needed.
I think it is better (faster) if binary or is used instead of if condition inside loop:
func isZero(bytes []byte) bool {
b := byte(0)
for _, s := range bytes {
b |= s
}
return b == 0
}
One can optimize this even more by using idea with uint64 mentioned in previous answers

in golang, is there any performance difference between maps initialized using make vs {}

as we know there are two ways to initialize a map (as listed below). I'm wondering if there is any performance difference between the two approaches.
var myMap map[string]int
then
myMap = map[string]int{}
vs
myMap = make(map[string]int)
On my machine they appear to be about equivalent.
You can easily make a benchmark test to compare. For example:
package bench
import "testing"
var result map[string]int
func BenchmarkMakeLiteral(b *testing.B) {
var m map[string]int
for n := 0; n < b.N; n++ {
m = InitMapLiteral()
}
result = m
}
func BenchmarkMakeMake(b *testing.B) {
var m map[string]int
for n := 0; n < b.N; n++ {
m = InitMapMake()
}
result = m
}
func InitMapLiteral() map[string]int {
return map[string]int{}
}
func InitMapMake() map[string]int {
return make(map[string]int)
}
Which on 3 different runs yielded results that are close enough to be insignificant:
First Run
$ go test -bench=.
testing: warning: no tests to run
PASS
BenchmarkMakeLiteral-8 10000000 160 ns/op
BenchmarkMakeMake-8 10000000 171 ns/op
ok github.com/johnweldon/bench 3.664s
Second Run
$ go test -bench=.
testing: warning: no tests to run
PASS
BenchmarkMakeLiteral-8 10000000 182 ns/op
BenchmarkMakeMake-8 10000000 173 ns/op
ok github.com/johnweldon/bench 3.945s
Third Run
$ go test -bench=.
testing: warning: no tests to run
PASS
BenchmarkMakeLiteral-8 10000000 170 ns/op
BenchmarkMakeMake-8 10000000 170 ns/op
ok github.com/johnweldon/bench 3.751s
When allocating empty maps there is no difference but with make you can pass second parameter to pre-allocate space in map. This will save a lot of reallocations when maps are being populated.
Benchmarks
package maps
import "testing"
const SIZE = 10000
func fill(m map[int]bool, size int) {
for i := 0; i < size; i++ {
m[i] = true
}
}
func BenchmarkEmpty(b *testing.B) {
for n := 0; n < b.N; n++ {
m := make(map[int]bool)
fill(m, SIZE)
}
}
func BenchmarkAllocated(b *testing.B) {
for n := 0; n < b.N; n++ {
m := make(map[int]bool, 2*SIZE)
fill(m, SIZE)
}
}
Results
go test -benchmem -bench .
BenchmarkEmpty-8 500 2988680 ns/op 431848 B/op 625 allocs/op
BenchmarkAllocated-8 1000 1618251 ns/op 360949 B/op 11 allocs/op
A year ago I actually stumped on the fact that using make with explicitly allocated space is better then using map literal if your values are not static
So doing
return map[string]float {
"key1": SOME_COMPUTED_ABOVE_VALUE,
"key2": SOME_COMPUTED_ABOVE_VALUE,
// more keys here
"keyN": SOME_COMPUTED_ABOVE_VALUE,
}
is slower then
// some code above
result := make(map[string]float, SIZE) // SIZE >= N
result["key1"] = SOME_COMPUTED_ABOVE_VALUE
result["key2"] = SOME_COMPUTED_ABOVE_VALUE
// more keys here
result["keyN"] = SOME_COMPUTED_ABOVE_VALUE
return result
for N which are quite big (N=300 in my use case).
The reason is the compiler fails to understand that one needs to allocate at least N slots in the first case.
I wrote a blog post about it
https://trams.github.io/golang-map-literal-performance/
and I reported a bug to the community
https://github.com/golang/go/issues/43020
As of golang 1.17 it is still an issue.

Minimum value of set in idiomatic Go

How do I write a function that returns the minimum value of a set in go? I am not just looking for a solution (I know I could just initialize the min value when iterating over the first element and then set a boolean variable that I initialized the min value) but rather an idiomatic solution. Since go doesn't have native sets, assume we have a map[Cell]bool.
Maps are the idiomatic way to implement sets in Go. Idiomatic code uses either bool or struct{} as the map's value type. The latter uses less storage, but requires a little more typing at the keyboard to use.
Assuming that the maximum value for a cell is maxCell, then this function will compute the min:
func min(m map[Cell]bool) Cell {
min := maxCell
for k := range m {
if k < min {
min = k
}
}
return min
}
If Cell is a numeric type, then maxCell can be set to one of the math constants.
Any solution using a map will require a loop over the keys.
You can keep a heap in addition to the map to find a minimum. This will require more storage and code, but can be more efficient depending on the size of the set and how often the minimum function is called.
A different approach and depending on how big your set is, using a self-sorting-slice can be more efficient:
type Cell uint64
type CellSet struct {
cells []Cell
}
func (cs *CellSet) Len() int {
return len(cs.cells)
}
func (cs *CellSet) Swap(i, j int) {
cs.cells[i], cs.cells[j] = cs.cells[j], cs.cells[i]
}
func (cs *CellSet) Less(i, j int) bool {
return cs.cells[i] < cs.cells[j]
}
func (cs *CellSet) Add(c Cell) {
for _, v := range cs.cells {
if v == c {
return
}
}
cs.cells = append(cs.cells, c)
sort.Sort(cs)
}
func (cs *CellSet) Min() Cell {
if cs.Len() > 0 {
return cs.cells[0]
}
return 0
}
func (cs *CellSet) Max() Cell {
if l := cs.Len(); l > 0 {
return cs.cells[l-1]
}
return ^Cell(0)
}
playground // this is a test file, copy it to set_test.go and run go test -bench=. -benchmem -v
BenchmarkSlice 20 75385089 ns/op 104 B/op 0 allocs/op
BenchmarkMap 20 77541424 ns/op 158 B/op 0 allocs/op
BenchmarkSliceAndMin 20 77155563 ns/op 104 B/op 0 allocs/op
BenchmarkMapAndMin 1 1827782378 ns/op 2976 B/op 8 allocs/op

How to assign string to bytes array

I want to assign string to bytes array:
var arr [20]byte
str := "abc"
for k, v := range []byte(str) {
arr[k] = byte(v)
}
Have another method?
Safe and simple:
[]byte("Here is a string....")
For converting from a string to a byte slice, string -> []byte:
[]byte(str)
For converting an array to a slice, [20]byte -> []byte:
arr[:]
For copying a string to an array, string -> [20]byte:
copy(arr[:], str)
Same as above, but explicitly converting the string to a slice first:
copy(arr[:], []byte(str))
The built-in copy function only copies to a slice, from a slice.
Arrays are "the underlying data", while slices are "a viewport into underlying data".
Using [:] makes an array qualify as a slice.
A string does not qualify as a slice that can be copied to, but it qualifies as a slice that can be copied from (strings are immutable).
If the string is too long, copy will only copy the part of the string that fits (and multi-byte runes may then be copied only partly, which will corrupt the last rune of the resulting string).
This code:
var arr [20]byte
copy(arr[:], "abc")
fmt.Printf("array: %v (%T)\n", arr, arr)
...gives the following output:
array: [97 98 99 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0] ([20]uint8)
I also made it available at the Go Playground
For example,
package main
import "fmt"
func main() {
s := "abc"
var a [20]byte
copy(a[:], s)
fmt.Println("s:", []byte(s), "a:", a)
}
Output:
s: [97 98 99] a: [97 98 99 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
Piece of cake:
arr := []byte("That's all folks!!")
I think it's better..
package main
import "fmt"
func main() {
str := "abc"
mySlice := []byte(str)
fmt.Printf("%v -> '%s'",mySlice,mySlice )
}
Check here: http://play.golang.org/p/vpnAWHZZk7
Go, convert a string to a bytes slice
You need a fast way to convert a []string to []byte type. To use in situations such as storing text data into a random access file or other type of data manipulation that requires the input data to be in []byte type.
package main
func main() {
var s string
//...
b := []byte(s)
//...
}
which is useful when using ioutil.WriteFile, which accepts a bytes slice as its data parameter:
WriteFile func(filename string, data []byte, perm os.FileMode) error
Another example
package main
import (
"fmt"
"strings"
)
func main() {
stringSlice := []string{"hello", "world"}
stringByte := strings.Join(stringSlice, " ")
// Byte array value
fmt.Println([]byte(stringByte))
// Corresponding string value
fmt.Println(string([]byte(stringByte)))
}
Output:
[104 101 108 108 111 32 119 111 114 108 100] hello world
Please check the link playground
Besides the methods mentioned above, you can also do a trick as
s := "hello"
b := *(*[]byte)(unsafe.Pointer((*reflect.SliceHeader)(unsafe.Pointer(&s))))
Go Play: http://play.golang.org/p/xASsiSpQmC
You should never use this :-)
Ended up creating array specific methods to do this. Much like the encoding/binary package with specific methods for each int type. For example binary.BigEndian.PutUint16([]byte, uint16).
func byte16PutString(s string) [16]byte {
var a [16]byte
if len(s) > 16 {
copy(a[:], s)
} else {
copy(a[16-len(s):], s)
}
return a
}
var b [16]byte
b = byte16PutString("abc")
fmt.Printf("%v\n", b)
Output:
[0 0 0 0 0 0 0 0 0 0 0 0 0 97 98 99]
Notice how I wanted padding on the left, not the right.
http://play.golang.org/p/7tNumnJaiN
Arrays are values... slices are more like pointers. That is [n]type is not compatible with []type as they are fundamentally two different things. You can get a slice that points to an array by using arr[:] which returns a slice that has arr as it's backing storage.
One way to convert a slice of for example []byte to [20]byte is to actually allocate a [20]byte which you can do by using var [20]byte (as it's a value... no make needed) and then copy data into it:
buf := make([]byte, 10)
var arr [10]byte
copy(arr[:], buf)
Essentially what a lot of other answers get wrong is that []type is NOT an array.
[n]T and []T are completely different things!
When using reflect []T is not of kind Array but of kind Slice and [n]T is of kind Array.
You also can't use map[[]byte]T but you can use map[[n]byte]T.
This can sometimes be cumbersome because a lot of functions operate for example on []byte whereas some functions return [n]byte (most notably the hash functions in crypto/*).
A sha256 hash for example is [32]byte and not []byte so when beginners try to write it to a file for example:
sum := sha256.Sum256(data)
w.Write(sum)
they will get an error. The correct way of is to use
w.Write(sum[:])
However, what is it that you want? Just accessing the string bytewise? You can easily convert a string to []byte using:
bytes := []byte(str)
but this isn't an array, it's a slice. Also, byte != rune. In case you want to operate on "characters" you need to use rune... not byte.
If someone is looking for a quick consider use unsafe conversion between slices, you can refer to the following comparison.
package demo_test
import (
"testing"
"unsafe"
)
var testStr = "hello world"
var testBytes = []byte("hello world")
// Avoid copying the data.
func UnsafeStrToBytes(s string) []byte {
return *(*[]byte)(unsafe.Pointer(&s))
}
// Avoid copying the data.
func UnsafeBytesToStr(b []byte) string {
return *(*string)(unsafe.Pointer(&b))
}
func Benchmark_UnsafeStrToBytes(b *testing.B) {
for i := 0; i < b.N; i++ {
_ = UnsafeStrToBytes(testStr)
}
}
func Benchmark_SafeStrToBytes(b *testing.B) {
for i := 0; i < b.N; i++ {
_ = []byte(testStr)
}
}
func Benchmark_UnSafeBytesToStr(b *testing.B) {
for i := 0; i < b.N; i++ {
_ = UnsafeBytesToStr(testBytes)
}
}
func Benchmark_SafeBytesToStr(b *testing.B) {
for i := 0; i < b.N; i++ {
_ = string(testBytes)
}
}
go test -v -bench="^Benchmark" -run=none
output
cpu: Intel(R) Core(TM) i7-8565U CPU # 1.80GHz
Benchmark_UnsafeStrToBytes
Benchmark_UnsafeStrToBytes-8 1000000000 0.2465 ns/op
Benchmark_SafeStrToBytes
Benchmark_SafeStrToBytes-8 289119562 4.181 ns/op
Benchmark_UnSafeBytesToStr
Benchmark_UnSafeBytesToStr-8 1000000000 0.2530 ns/op
Benchmark_SafeBytesToStr
Benchmark_SafeBytesToStr-8 342842938 3.623 ns/op
PASS

Resources