Is there more efficient way of multiplying byte array? - go

I developed a Golang package BESON for doing operation of big number.
The Multiply operation code:
func Multiply(a []byte, b []byte) {
ans := make([]byte, len(a) + len(b))
bits := nbits(b)
var i uint
for i = bits - 1; int(i) >= 0; i-- {
byteNum := i >> 3
bitNum := i & 7
LeftShift(ans, 1, 0)
if (b[byteNum] & (1 << bitNum)) > 0 {
Add(ans, a)
}
}
copy(a, ans)
}
My way is to add every a multiply bits of b.
Is there more efficient way to implement Multiply?
Edit
The BESON package represent a big number in byte array. For example, it represent a 128-bit unsigned interger in an byte array of size 16. Therefore, when doing two 128-bit unsigned interger multiplying, it's actually multiplying two byte array.
Example:
input: a, b
a = []byte{ 204, 19, 46, 255, 0, 0, 0, 0 }
b = []byte{ 117, 10, 68, 47, 0, 0, 0, 0 }
Multiply(a, b)
fmt.Println(a)
output: a (The result will write back to a)
[60 4 5 35 76 72 29 47]

Related

Go code running result in local environment is not same with run in go play

I am using Go to implement an algorithm described below:
There is an array,only one number appear one time,all the other numbers appear three times,find the number only appear one time
My code listed below:
import (
"testing"
)
func findBySum(arr []int) int {
result := 0
sum := [32]int{}
for i := 0; i < 32; i++ {
for _, v := range arr {
sum[i] += (v >> uint(i)) & 0x1
}
sum[i] %= 3
sum[i] <<= uint(i)
result |= sum[i]
}
return result
}
func TestThree(t *testing.T) {
// except one nubmer,all other number appear three times
a1 := []int{11, 222, 444, 444, 222, 11, 11, 17, -123, 222, -123, 444, -123} // unqiue number is 17
a2 := []int{11, 222, 444, 444, 222, 11, 11, -17, -123, 222, -123, 444, -123} // unque number is -17
t.Log(findBySum(a1))
t.Log(findBySum(a2))
}
However,I found that the running result in my PC is wrong,and the same code running in https://play.golang.org/p/hEseLZVL617 is correct,I do not know why.
Result in my PC:
Result in https://play.golang.org/p/hEseLZVL617:
As we see,when the unique number is positive,both result are right,but when the unique number is negative,the result in my PC in wrong and the result online is right.
I think it has something to do with the bit operations in my code,but I can't find the root cause.
I used IDEA 2019.1.1 and my Golang version listed below:
I don't know why the same code can works fine online and do not work in my local PC,can anyone help me analysis this? Thanks in advance!
Size of int is platform dependent, it may be 32-bit and it may be 64-bit. On the Go Playground it's 32-bit, on your local machine it's 64-bit.
If we change your example to use int64 explicitly instead of int, the result is the same on the Go Playground too:
func findBySum(arr []int64) int64 {
result := int64(0)
sum := [32]int64{}
for i := int64(0); i < 32; i++ {
for _, v := range arr {
sum[i] += (v >> uint64(i)) & 0x1
}
sum[i] %= 3
sum[i] <<= uint(i)
result |= sum[i]
}
return result
}
func TestThree(t *testing.T) {
// except one nubmer,all other number appear three times
a1 := []int64{11, 222, 444, 444, 222, 11, 11, 17, -123, 222, -123, 444, -123} // unqiue number is 17
a2 := []int64{11, 222, 444, 444, 222, 11, 11, -17, -123, 222, -123, 444, -123} // unque number is -17
t.Log(findBySum(a1))
t.Log(findBySum(a2))
}
You perform bitwise operations that assume 32-bit integer size. To get correct results locally (where your architecture and thus size of int and uint is 64-bit), change all ints to int32 and uint to uint32:
func findBySum(arr []int32) int32 {
result := int32(0)
sum := [32]int32{}
for i := int32(0); i < 32; i++ {
for _, v := range arr {
sum[i] += (v >> uint32(i)) & 0x1
}
sum[i] %= 3
sum[i] <<= uint(i)
result |= sum[i]
}
return result
}
func TestThree(t *testing.T) {
// except one nubmer,all other number appear three times
a1 := []int32{11, 222, 444, 444, 222, 11, 11, 17, -123, 222, -123, 444, -123} // unqiue number is 17
a2 := []int32{11, 222, 444, 444, 222, 11, 11, -17, -123, 222, -123, 444, -123} // unque number is -17
t.Log(findBySum(a1))
t.Log(findBySum(a2))
}
Lesson: if you perform calculations whose result depend on the representation size, always be explicit, and use fixed-size numbers like int32, int64, uint32, uint64.

Byte ordering of floats

I'm working on Tormenta (https://github.com/jpincas/tormenta) which is backed by BadgerDB (https://github.com/dgraph-io/badger). BadgerDB stores keys (slices of bytes) in byte order. I am creating keys which contain floats which need to be stored in order so I can user Badger's key iteration properly. I don't have a solid CS background so I'm a little out of my depth.
I encode the floats like this: binary.Write(buf, binary.BigEndian, myFloat). This works fine for positive floats - the key order is what you'd expect, but the byte ordering breaks down for negative floats.
As an aside, ints present the same problem, but I was able to fix that relatively easily by flipping the sign bit on the int with b[0] ^= 1 << 7 (where b is the []byte holding the result of encoding the int), and then flipping back when retrieving the key.
Although b[0] ^= 1 << 7 DOES also flip the sign bit on floats and thus places all the negative floats before the positive ones, the negative ones are incorrectly (backwards) ordered. It is necessary to flip the sign bit and reverse the order of the negative floats.
A similar question was asked on StackOverflow here: Sorting floating-point values using their byte-representation, and the solution was agreed to be:
XOR all positive numbers with 0x8000... and negative numbers with 0xffff.... This should flip the sign bit on both (so negative numbers go first), and then reverse the ordering on negative numbers.
However, that's way above my bit-flipping-skills level, so I was hoping a Go bit-ninja could help me translate that into some Go code.
Using math.Float64bits()
You could use math.Float64bits() which returns an uint64 value having the same bytes / bits as the float64 value passed to it.
Once you have an uint64, performing bitwise operations on it is trivial:
f := 1.0 // Some float64 value
bits := math.Float64bits(f)
if f >= 0 {
bits ^= 0x8000000000000000
} else {
bits ^= 0xffffffffffffffff
}
Then serialize the bits value instead of the f float64 value, and you're done.
Let's see this in action. Let's create a wrapper type holding a float64 number and its bytes:
type num struct {
f float64
data [8]byte
}
Let's create a slice of these nums:
nums := []*num{
{f: 1.0},
{f: 2.0},
{f: 0.0},
{f: -1.0},
{f: -2.0},
{f: math.Pi},
}
Serializing them:
for _, n := range nums {
bits := math.Float64bits(n.f)
if n.f >= 0 {
bits ^= 0x8000000000000000
} else {
bits ^= 0xffffffffffffffff
}
if err := binary.Write(bytes.NewBuffer(n.data[:0]), binary.BigEndian, bits); err != nil {
panic(err)
}
}
This is how we can sort them byte-wise:
sort.Slice(nums, func(i int, j int) bool {
ni, nj := nums[i], nums[j]
for k := range ni.data {
if bi, bj := ni.data[k], nj.data[k]; bi < bj {
return true // We're certain it's less
} else if bi > bj {
return false // We're certain it's not less
} // We have to check the next byte
}
return false // If we got this far, they are equal (=> not less)
})
And now let's see the order after byte-wise sorting:
fmt.Println("Final order byte-wise:")
for _, n := range nums {
fmt.Printf("% .7f %3v\n", n.f, n.data)
}
Output will be (try it on the Go Playground):
Final order byte-wise:
-2.0000000 [ 63 255 255 255 255 255 255 255]
-1.0000000 [ 64 15 255 255 255 255 255 255]
0.0000000 [128 0 0 0 0 0 0 0]
1.0000000 [191 240 0 0 0 0 0 0]
2.0000000 [192 0 0 0 0 0 0 0]
3.1415927 [192 9 33 251 84 68 45 24]
Without math.Float64bits()
Another option would be to first serialize the float64 value, and then perform XOR operations on the bytes.
If the number is positive (or zero), XOR the first byte with 0x80, and the rest with 0x00, which is basically do nothing with them.
If the number is negative, XOR all the bytes with 0xff, which is basically a bitwise negation.
In action: the only part that is different is the serialization and XOR operations:
for _, n := range nums {
if err := binary.Write(bytes.NewBuffer(n.data[:0]), binary.BigEndian, n.f); err != nil {
panic(err)
}
if n.f >= 0 {
n.data[0] ^= 0x80
} else {
for i, b := range n.data {
n.data[i] = ^b
}
}
}
The rest is the same. The output will also be the same. Try this one on the Go Playground.

Matrix multiplication with goroutine drops performance

I am optimizing matrix multiplication via goroutines in Go.
My benchmark shows, introducing concurrency per row or per element largely drops performance:
goos: darwin
goarch: amd64
BenchmarkMatrixDotNaive/A.MultNaive-8 2000000 869 ns/op 0 B/op 0 allocs/op
BenchmarkMatrixDotNaive/A.ParalMultNaivePerRow-8 100000 14467 ns/op 80 B/op 9 allocs/op
BenchmarkMatrixDotNaive/A.ParalMultNaivePerElem-8 20000 77299 ns/op 528 B/op 65 allocs/op
I know some basic prior knowledge of cache locality, it make sense that per element concurrency drops performance. However, why per row still drops the performance even in naive version?
In fact, I also wrote a block/tiling optimization, its vanilla version (without goroutine concurrency) even worse than naive version (not present here, let's focus on naive first).
What did I do wrong here? Why? How to optimize here?
Multiplication:
package naive
import (
"errors"
"sync"
)
// Errors
var (
ErrNumElements = errors.New("Error number of elements")
ErrMatrixSize = errors.New("Error size of matrix")
)
// Matrix is a 2d array
type Matrix struct {
N int
data [][]float64
}
// New a size by size matrix
func New(size int) func(...float64) (*Matrix, error) {
wg := sync.WaitGroup{}
d := make([][]float64, size)
for i := range d {
wg.Add(1)
go func(i int) {
defer wg.Done()
d[i] = make([]float64, size)
}(i)
}
wg.Wait()
m := &Matrix{N: size, data: d}
return func(es ...float64) (*Matrix, error) {
if len(es) != size*size {
return nil, ErrNumElements
}
for i := range es {
wg.Add(1)
go func(i int) {
defer wg.Done()
m.data[i/size][i%size] = es[i]
}(i)
}
wg.Wait()
return m, nil
}
}
// At access element (i, j)
func (A *Matrix) At(i, j int) float64 {
return A.data[i][j]
}
// Set set element (i, j) with val
func (A *Matrix) Set(i, j int, val float64) {
A.data[i][j] = val
}
// MultNaive matrix multiplication O(n^3)
func (A *Matrix) MultNaive(B, C *Matrix) (err error) {
var (
i, j, k int
sum float64
N = A.N
)
if N != B.N || N != C.N {
return ErrMatrixSize
}
for i = 0; i < N; i++ {
for j = 0; j < N; j++ {
sum = 0.0
for k = 0; k < N; k++ {
sum += A.At(i, k) * B.At(k, j)
}
C.Set(i, j, sum)
}
}
return
}
// ParalMultNaivePerRow matrix multiplication O(n^3) in concurrency per row
func (A *Matrix) ParalMultNaivePerRow(B, C *Matrix) (err error) {
var N = A.N
if N != B.N || N != C.N {
return ErrMatrixSize
}
wg := sync.WaitGroup{}
for i := 0; i < N; i++ {
wg.Add(1)
go func(i int) {
defer wg.Done()
for j := 0; j < N; j++ {
sum := 0.0
for k := 0; k < N; k++ {
sum += A.At(i, k) * B.At(k, j)
}
C.Set(i, j, sum)
}
}(i)
}
wg.Wait()
return
}
// ParalMultNaivePerElem matrix multiplication O(n^3) in concurrency per element
func (A *Matrix) ParalMultNaivePerElem(B, C *Matrix) (err error) {
var N = A.N
if N != B.N || N != C.N {
return ErrMatrixSize
}
wg := sync.WaitGroup{}
for i := 0; i < N; i++ {
for j := 0; j < N; j++ {
wg.Add(1)
go func(i, j int) {
defer wg.Done()
sum := 0.0
for k := 0; k < N; k++ {
sum += A.At(i, k) * B.At(k, j)
}
C.Set(i, j, sum)
}(i, j)
}
}
wg.Wait()
return
}
Benchmark:
package naive
import (
"os"
"runtime/trace"
"testing"
)
type Dot func(B, C *Matrix) error
var (
A = &Matrix{
N: 8,
data: [][]float64{
[]float64{1, 2, 3, 4, 5, 6, 7, 8},
[]float64{9, 1, 2, 3, 4, 5, 6, 7},
[]float64{8, 9, 1, 2, 3, 4, 5, 6},
[]float64{7, 8, 9, 1, 2, 3, 4, 5},
[]float64{6, 7, 8, 9, 1, 2, 3, 4},
[]float64{5, 6, 7, 8, 9, 1, 2, 3},
[]float64{4, 5, 6, 7, 8, 9, 1, 2},
[]float64{3, 4, 5, 6, 7, 8, 9, 0},
},
}
B = &Matrix{
N: 8,
data: [][]float64{
[]float64{9, 8, 7, 6, 5, 4, 3, 2},
[]float64{1, 9, 8, 7, 6, 5, 4, 3},
[]float64{2, 1, 9, 8, 7, 6, 5, 4},
[]float64{3, 2, 1, 9, 8, 7, 6, 5},
[]float64{4, 3, 2, 1, 9, 8, 7, 6},
[]float64{5, 4, 3, 2, 1, 9, 8, 7},
[]float64{6, 5, 4, 3, 2, 1, 9, 8},
[]float64{7, 6, 5, 4, 3, 2, 1, 0},
},
}
C = &Matrix{
N: 8,
data: [][]float64{
[]float64{0, 0, 0, 0, 0, 0, 0, 0},
[]float64{0, 0, 0, 0, 0, 0, 0, 0},
[]float64{0, 0, 0, 0, 0, 0, 0, 0},
[]float64{0, 0, 0, 0, 0, 0, 0, 0},
[]float64{0, 0, 0, 0, 0, 0, 0, 0},
[]float64{0, 0, 0, 0, 0, 0, 0, 0},
[]float64{0, 0, 0, 0, 0, 0, 0, 0},
[]float64{0, 0, 0, 0, 0, 0, 0, 0},
},
}
)
func BenchmarkMatrixDotNaive(b *testing.B) {
f, _ := os.Create("bench.trace")
defer f.Close()
trace.Start(f)
defer trace.Stop()
tests := []struct {
name string
f Dot
}{
{
name: "A.MultNaive",
f: A.MultNaive,
},
{
name: "A.ParalMultNaivePerRow",
f: A.ParalMultNaivePerRow,
},
{
name: "A.ParalMultNaivePerElem",
f: A.ParalMultNaivePerElem,
},
}
for _, tt := range tests {
b.Run(tt.name, func(b *testing.B) {
for i := 0; i < b.N; i++ {
tt.f(B, C)
}
})
}
}
Performing 8x8 matrix multipliciation is relatively small work.
Goroutines (although may be lightweight) do have overhead. If the work they do is "small", the overhead of launching, synchronizing and throwing them away may outweight the performance gain of utilizing multiple cores / threads, and overall you might not gain performance by executing such small tasks concurrently (hell, you may even do worse than without using goroutines). Measure.
If we increase the matrix size to 80x80, running the benchmark we already see some performance gain in case of ParalMultNaivePerRow:
BenchmarkMatrixDotNaive/A.MultNaive-4 2000 1054775 ns/op
BenchmarkMatrixDotNaive/A.ParalMultNaivePerRow-4 2000 709367 ns/op
BenchmarkMatrixDotNaive/A.ParalMultNaivePerElem-4 100 10224927 ns/op
(As you see in the results, I have 4 CPU cores, running it on your 8-core machine might show more performance gain.)
When rows are small, you are using goroutines to do minimal work, you may improve performance by not "throwing" away goroutines once they're done with their "tiny" work, but you may "reuse" them. See related question: Is this an idiomatic worker thread pool in Go?
Also see related / possible duplicate: Vectorise a function taking advantage of concurrency

Go lang : search x digits from sets of numbers, why takes very long time to execute?

I tried to make small programs that find x digits from sets of numbers, for example : i want to find 89th digits from 1 - 1000000000.
Here is my code : https://play.golang.org/p/93yh_urX16
package main
import (
"fmt"
"strconv"
)
var bucket string
func main() {
findDigits( 89, 1000000000 )
}
func findDigits( digits int, length int) {
for i := 1; i <= length; i++ {
bucket += strconv.Itoa(i)
}
fmt.Println( "The", digits, "th digit from 1", "-", length, "is :", string ( [] rune ( bucket )[digits - 1] ) )
}
Does anyone knows, what mistakes i've made ? i need some advice to me for improving this code.
Thanks :)
Your program is very, very inefficient. user1431317's program is very inefficient.
Simply calculate the value. It will only take nanoseconds of CPU time and a few memory allocations, even for a digit index as large as 9,223,372,036,854,775,807 (95.6 nanoseconds and 2 allocations on my computer). For example,
package main
import (
"fmt"
"math"
"strconv"
)
// digit returns the ith digit from the sequence of
// concatenated non-negative integers.
// The sequence of digits is 01234567891011121314151617181920...
func digit(i int64) string {
// There are 9 one digit positive integers, 90 two digit,
// 900 three digit, and so on.
if i <= 0 {
return "0"
}
j := int64(1)
w := 1
for ; ; w++ {
t := j + 9*int64(math.Pow10(w-1))*int64(w)
if 0 > t || t > i {
break
}
j = t
}
k := i - j
n := k / int64(w)
m := k % int64(w)
d := strconv.FormatInt(int64(math.Pow10(w-1))+n, 10)[m]
return string(d)
}
func main() {
tests := []int64{
0, 1, 2, 3, 4, 5, 6, 7, 8, 9,
10, 11, 12, 13,
88, 89,
188, 189, 190, 191, 192,
math.MaxInt32, math.MaxInt64,
}
for _, n := range tests {
fmt.Println(n, digit(n))
}
}
Output:
0 0
1 1
2 2
3 3
4 4
5 5
6 6
7 7
8 8
9 9
10 1
11 0
12 1
13 1
88 4
89 9
188 9
189 9
190 1
191 0
192 0
2147483647 2
9223372036854775807 9

Why does range iterating with _ blank identifier produce different values

I'm learning Go and having a great time so far.
The following code outputs the sum as 45
package main
import "fmt"
func main(){
//declare a slice
numSlice := []int{1, 2, 3, 4, 5, 6, 7, 8, 9, 10}
var sum int = 0
for num := range numSlice {
sum += num
fmt.Println("num =", num)
}
fmt.Println("sum =", sum)
}
The following code, where I use _ the blank identifier to ignore the index in the for declaration outputs the sum as 55
//declare a slice
numSlice := []int{1, 2, 3, 4, 5, 6, 7, 8, 9, 10}
var sum int = 0
for _,num := range numSlice {
sum += num
fmt.Println("num =", num)
}
fmt.Println("sum =", sum)
This has got me slightly stumped. From my understanding the blank identifier is used to ignore the slice index . But it also seems to be shifting the index and thereby ignoring the last element in the slice.
Can you please explain what's happening here and possibly why. I'm assuming this is not a bug and is by design. Go is so well designed so what would the possible use cases be for this kind of behaviour?
Single parameter range uses indexes, not values. Because your indexes are also going up from 0 to 9 using range with a single param will add the indexes up from 0 to 9 and give you 45
package main
import "fmt"
func main(){
//declare a slice
numSlice := []int{0, 0, 0, 0}
var sum int = 0
for num := range numSlice {
sum += num
fmt.Println("num =", num)
}
fmt.Println("sum =", sum)
}
Output
num = 0
num = 1
num = 2
num = 3
sum = 6

Resources