I want shuffle db ids so that none of the id refer to themselves, but with this piece of code:
package main
import (
"log"
"math/rand"
"time"
)
func main() {
seed := time.Now().UnixNano() & 999999999
log.Print("seed: ", seed)
rand.Seed(seed)
ordered := []int{0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19}
randomized := shufflePreventCollision(ordered)
log.Print("Final Result")
log.Print("ordered: ", ordered)
log.Print("random: ", randomized)
}
func shufflePreventCollision(ordered []int) []int {
randomized := rand.Perm(len(ordered))
for i, o := range ordered {
if o == randomized[i] {
log.Printf("Doing it again because ordered[%d] (%d) is == randomized[%d] (%d)", i, o, i, randomized[i])
log.Print(ordered)
log.Print(randomized)
shufflePreventCollision(ordered)
}
}
return randomized
}
I find a strange behaviour, when it runs often it at some point hangs and cannot find non-colliding sequences anymore. I tried
go build -o rand_example3 rand_example3.go && time (for i in $(seq 10000) ; do ./rand_example3 ; done)
And it seems to never end. Am I missing some understanding here or is there really something fishy with math/rand?
"With this piece of code, I want shuffle db ids so that none of the
ids refer to themselves. [Sometimes] it doesn't end even when I let it
run for an hour or so."
tl;dr There is a faster solution that is more than a thousand times faster.
Your code:
slaxor.go:
package main
import (
"log"
"math/rand"
"time"
)
func main() {
seed := time.Now().UnixNano() & 999999999
log.Print("seed: ", seed)
rand.Seed(seed)
ordered := []int{0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19}
randomized := shufflePreventCollisionSlaxor(ordered)
log.Print("Final Result")
log.Print("ordered: ", ordered)
log.Print("random: ", randomized)
}
func shufflePreventCollisionSlaxor(ordered []int) []int {
randomized := rand.Perm(len(ordered))
for i, o := range ordered {
if o == randomized[i] {
log.Printf("Doing it again because ordered[%d] (%d) is == randomized[%d] (%d)", i, o, i, randomized[i])
log.Print(ordered)
log.Print(randomized)
shufflePreventCollisionSlaxor(ordered)
}
}
return randomized
}
Playground: https://play.golang.org/p/JI5rJGcAAz
It is not obvious how, or even if, the code fulfills its purpose.
The termination condition is probabalistic, not deterministic.
Let's leave aside the issue of whether the code fulfills its purpose.
This benchmark has been modified so that stderr is limited by the speed of the sink /dev/null, not a terminal.
slaxor.bash:
go build -o slaxor slaxor.go && time (for i in $(seq 10000) ; do ./slaxor 2> /dev/null ; done)
The benchmark measures the execution of a program and a single execution of the algorithm. The benchmark times are inconsistent because the pseudorandom seed value changes for each program execution. The benchmark sometimes "doesn't end even when I let it run for an hour or so."
There is a faster solution that runs and terminates in a few seconds, despite the program execution overhead.
peterso.bash:
go build -o peterso peterso.go && time (for i in $(seq 10000) ; do ./peterso 2> /dev/null ; done)
Output:
$ ./peterso.bash
real 0m5.290s
user 0m5.224s
sys 0m1.128s
$ ./peterso.bash
real 0m7.462s
user 0m7.109s
sys 0m1.922s
peterso.go:
package main
import (
"fmt"
"log"
"math/rand"
"time"
)
func main() {
seed := time.Now().UnixNano() & 999999999
log.Print("seed: ", seed)
r = rand.New(rand.NewSource(seed))
ordered := []int{0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19}
randomized := shufflePreventCollisionPeterSO(ordered)
log.Print("Final Result")
log.Print("ordered: ", ordered)
log.Print("random: ", randomized)
if randomized == nil {
err := "Shuffle Error!"
fmt.Print(err)
log.Fatal(err)
}
}
var r *rand.Rand
func isNoCollision(a, b []int) bool {
if len(a) == len(b) {
for i, ai := range a {
if ai == b[i] {
return false
}
}
return true
}
return false
}
func shufflePreventCollisionPeterSO(ordered []int) []int {
const guard = 4 * 1024 // deterministic, finite time
for n := 1; n <= guard; n++ {
randomized := r.Perm(len(ordered))
if isNoCollision(ordered, randomized) {
return randomized
}
}
return nil
}
The guard provides a deterministic, finite time, termination condition.
Playground: https://play.golang.org/p/ZT-sfDW5Mi
Let's put aside the distorted program execution benchmarks, let's look at function execution times. Because this is important, the Go standard library has the testing package for testing and benchmarking functions.
Tests:
$ go test shuffle_test.go -v -count=1 -run=. -bench=!
=== RUN TestTimeSlaxor
=== RUN TestTimeSlaxor/1K
=== RUN TestTimeSlaxor/2K
=== RUN TestTimeSlaxor/3K
--- PASS: TestTimeSlaxor (13.78s)
--- PASS: TestTimeSlaxor/1K (1.18s)
--- PASS: TestTimeSlaxor/2K (1.27s)
--- PASS: TestTimeSlaxor/3K (11.33s)
=== RUN TestTimePeterSO
=== RUN TestTimePeterSO/1K
=== RUN TestTimePeterSO/2K
=== RUN TestTimePeterSO/3K
=== RUN TestTimePeterSO/1M
=== RUN TestTimePeterSO/2M
=== RUN TestTimePeterSO/3M
--- PASS: TestTimePeterSO (6.57s)
--- PASS: TestTimePeterSO/1K (0.00s)
--- PASS: TestTimePeterSO/2K (0.00s)
--- PASS: TestTimePeterSO/3K (0.00s)
--- PASS: TestTimePeterSO/1M (1.13s)
--- PASS: TestTimePeterSO/2M (2.25s)
--- PASS: TestTimePeterSO/3M (3.19s)
PASS
ok command-line-arguments 20.347s
$
In a fraction of the rapidly increasing time it takes to run 3K (3,000) iterations of shufflePreventCollisionSlaxor, 3M (3,000,000) iterations of shufflePreventCollisionPeterSO run, a more than thousand-fold improvement.
Benchmarks:
$ go test shuffle_test.go -v -count=1 -run=! -bench=.
goos: linux
goarch: amd64
BenchmarkTimePeterSO-8 1000000 1048 ns/op 434 B/op 2 allocs/op
BenchmarkTimeSlaxor-8 10000 2256271 ns/op 636894 B/op 3980 allocs/op
PASS
ok command-line-arguments 23.643s
$
It's easy to see that the shufflePreventCollisionPeterSO average per iteration cost of 1,000,000 iterations is small, 1,048 nanoseconds, especially when compared to only 10,000 iterations of shufflePreventCollisionSlaxor at 2,256,271 nanoseconds average per iteration.
Also, note the sparing use of memory by shufflePreventCollisionPeterSO, on average 2 allocations for a total allocation of 434 bytes per iteration, versus the profligate use of memory by shufflePreventCollisionSlaxor, on average 3,980 allocations for a total allocation of 636,894 bytes per iteration.
shuffle_test.go:
package main
import (
"fmt"
"math/rand"
"strconv"
"testing"
)
func shufflePreventCollisionSlaxor(ordered []int) []int {
randomized := rand.Perm(len(ordered))
for i, o := range ordered {
if o == randomized[i] {
shufflePreventCollisionSlaxor(ordered)
}
}
return randomized
}
var r *rand.Rand
func isNoCollision(a, b []int) bool {
if len(a) == len(b) {
for i, ai := range a {
if ai == b[i] {
return false
}
}
return true
}
return false
}
func shufflePreventCollisionPeterSO(ordered []int) []int {
const guard = 4 * 1024 // deterministic, finite time
for n := 1; n <= guard; n++ {
randomized := r.Perm(len(ordered))
if isNoCollision(ordered, randomized) {
return randomized
}
}
return nil
}
const testSeed = int64(60309766)
func testTime(t *testing.T, ordered, randomized []int, shuffle func([]int) []int) {
shuffled := shuffle(ordered)
want := fmt.Sprintf("%v", randomized)
got := fmt.Sprintf("%v", shuffled)
if want != got {
t.Errorf("Error:\n from: %v\n want: %s\n got: %s\n", ordered, want, got)
}
}
func testTimeSlaxor(t *testing.T, n int) {
rand.Seed(testSeed)
ordered := []int{0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19}
randomized := []int{3, 1, 17, 15, 10, 16, 14, 19, 7, 6, 11, 2, 0, 12, 8, 18, 13, 4, 9, 5}
testTime(t, ordered, randomized, shufflePreventCollisionSlaxor)
for i := 1; i < n; i++ {
shufflePreventCollisionSlaxor(ordered)
}
}
func TestTimeSlaxor(t *testing.T) {
for k := 1; k <= 3; k++ {
n := 1000 * k
t.Run(strconv.Itoa(k)+"K", func(t *testing.T) { testTimeSlaxor(t, n) })
}
}
func testTimePeterSo(t *testing.T, n int) {
r = rand.New(rand.NewSource(testSeed))
ordered := []int{0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19}
randomized := []int{10, 7, 15, 14, 8, 6, 18, 17, 19, 11, 5, 16, 2, 12, 1, 13, 3, 0, 9, 4}
testTime(t, ordered, randomized, shufflePreventCollisionPeterSO)
for i := 1; i < n; i++ {
shufflePreventCollisionPeterSO(ordered)
}
}
func TestTimePeterSO(t *testing.T) {
for k := 1; k <= 3; k++ {
n := 1000 * k
t.Run(strconv.Itoa(k)+"K", func(t *testing.T) { testTimePeterSo(t, n) })
}
for m := 1; m <= 3; m++ {
n := 1000 * 1000 * m
t.Run(strconv.Itoa(m)+"M", func(t *testing.T) { testTimePeterSo(t, n) })
}
}
func benchTime(b *testing.B, ordered, randomized []int, shuffle func([]int) []int) {
shuffled := shuffle(ordered)
want := fmt.Sprintf("%v", randomized)
got := fmt.Sprintf("%v", shuffled)
if want != got {
b.Errorf("Error:\n from: %v\n want: %s\n got: %s\n", ordered, want, got)
}
}
func BenchmarkTimePeterSO(b *testing.B) {
b.ReportAllocs()
r = rand.New(rand.NewSource(testSeed))
ordered := []int{0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19}
randomized := []int{10, 7, 15, 14, 8, 6, 18, 17, 19, 11, 5, 16, 2, 12, 1, 13, 3, 0, 9, 4}
benchTime(b, ordered, randomized, shufflePreventCollisionPeterSO)
r = rand.New(rand.NewSource(testSeed))
b.ResetTimer()
for i := 0; i < b.N; i++ {
shufflePreventCollisionPeterSO(ordered)
}
}
func BenchmarkTimeSlaxor(b *testing.B) {
b.ReportAllocs()
r = rand.New(rand.NewSource(testSeed))
ordered := []int{0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19}
randomized := []int{10, 7, 15, 14, 8, 6, 18, 17, 19, 11, 5, 16, 2, 12, 1, 13, 3, 0, 9, 4}
benchTime(b, ordered, randomized, shufflePreventCollisionPeterSO)
r = rand.New(rand.NewSource(testSeed))
b.ResetTimer()
for i := 0; i < b.N; i++ {
shufflePreventCollisionSlaxor(ordered)
}
}
Playground: https://play.golang.org/p/ozazWGNZsu
Related
Apologies if this is a "please do my homework for me" style of question. But I was looking for advise from the community around performance of array operations in go.
My problem space: I am using Go to perform astrophotographic image calibration. To perform astrophotographic calibration, there is a requirement to calculate the average pixel element value across many multiples data arrays.
For argument sake let's give each sub-exposure a data array type []float32.
What I want to have, is a function that can perform a stacking and mean for any number of these data arrays, i.e., for a good master bias frame a good number of bias frame sub-exposures are needed, and this could be in the region of 10 to 100, depending on how statistically thorough the end user needs to be.
Therefore, I have proposed that I would need a function setup like this:
func MeanFloat32Arrays(a [][]float32) ([]float32, error) {
// calculate the mean across the array of arrays here
}
That is, I am happy for the user to call the function as follows:
m, err := MeanFloat32Arrays([][]float32{i.Dark.data, j.Bias.data, k.Bias.data, ... , z.Bias.data })
The construction of the arg var I feel is just another detail we can gloss over for the moment.
My problem: How do I optimise the "mean stacking" process for this function? That is to say, how should I go about making MeanFloat32Arrays as performant as possible?
My initial code is as follows (which passes the test suite outlined below):
func MeanFloat32Arrays(a [][]float32) ([]float32, error) {
if len(a) == 0 {
return nil, errors.New("to divide arrays they must be of same length")
}
m := make([]float32, len(a[0]))
for i := range m {
for j := range a {
// Ensure that each sub-array has the same length as the first one:
if len(a[j]) != len(a[0]) {
return nil, fmt.Errorf("issue at array input %v: to compute the mean of multiple arrays the length of each array must be the same", i)
}
if a[j][i] == 0 {
continue
}
m[i] += a[j][i]
}
m[i] /= float32(len(a))
}
return m, nil
}
My current unit test suite is as follows:
func TestMeanABC(t *testing.T) {
a := []float32{10, 9, 8, 7, 6, 5, 4, 3, 2, 1}
b := []float32{1, 2, 3, 4, 5, 6, 7, 8, 9, 10}
c := []float32{1, 2, 3, 4, 5, 6, 7, 8, 9, 10}
m, err := MeanFloat32Arrays([][]float32{a, b, c})
if err != nil {
t.Errorf("error should be nil, but got %v", err)
}
if len(m) != len(a) {
t.Errorf("result should be of same length as a")
}
if m[0] != 4 {
t.Errorf("result should be 1, but got %v", m[0])
}
if m[1] != 4.333333333333333 {
t.Errorf("result should be 1, but got %v", m[1])
}
if m[2] != 4.666666666666667 {
t.Errorf("result should be 1, but got %v", m[2])
}
if m[3] != 5 {
t.Errorf("result should be 1, but got %v", m[3])
}
if m[4] != 5.333333333333333 {
t.Errorf("result should be 6 but got %v", m[4])
}
if m[5] != 5.666666666666667 {
t.Errorf("result should be 6 but got %v", m[5])
}
//... Assume here that the mean calculation is correct for all other elements
}
func TestMeanABCD(t *testing.T) {
a := []float32{10, 9, 8, 7, 6, 5, 4, 3, 2, 1}
b := []float32{1, 2, 3, 4, 5, 6, 7, 8, 9, 10}
c := []float32{1, 2, 3, 4, 5, 6, 7, 8, 9, 10}
d := []float32{1, 2, 3, 4, 5, 6, 7, 8, 9, 10}
m, err := MeanFloat32Arrays([][]float32{a, b, c, d})
if err != nil {
t.Errorf("error should be nil, but got %v", err)
}
if len(m) != len(a) {
t.Errorf("result should be of same length as a")
}
if m[0] != 3.25 {
t.Errorf("result should be 1, but got %v", m[0])
}
if m[1] != 3.75 {
t.Errorf("result should be 1, but got %v", m[1])
}
if m[2] != 4.25 {
t.Errorf("result should be 1, but got %v", m[2])
}
if m[3] != 4.75 {
t.Errorf("result should be 1, but got %v", m[3])
}
if m[4] != 5.25 {
t.Errorf("result should be 6 but got %v", m[4])
}
if m[5] != 5.75 {
t.Errorf("result should be 6 but got %v", m[5])
}
//... Assume here that the mean calculation is correct for all other elements
}
func TestMeanABNotEqualLengthPanic(t *testing.T) {
a := []float32{2, 3, 4, 5, 6, 7, 8, 9, 10, 11}
b := []float32{1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11}
_, err := MeanFloat32Arrays([][]float32{a, b})
if err == nil {
t.Errorf("error should not be nil for two arrays of unequal length")
}
}
So my question, what tricks could I apply to reduce the linear behaviour of the function as much as possible, could I use multiple threads? Would this be thread safe? I'd welcome any comments as I am still in the phase of writing go as a hobbyist ...
How do I iterate through a Go slice 4 items at a time.
lets say I have [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15]
I want a for loop to be able to get
[1,2,3,4] //First Iteration
[5,6,7,8] //Second Iteration
[9,10,11,12] //Third Iteration
[13,14,15,] // Fourth Iteration
I can do this in java and python but for golang I really dont have an idea.
For example,
package main
import "fmt"
func main() {
slice := []int{1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15}
for i := 0; i < len(slice); i += 4 {
var section []int
if i > len(slice)-4 {
section = slice[i:]
} else {
section = slice[i : i+4]
}
fmt.Println(section)
}
}
Playground: https://play.golang.org/p/kf7_OJcP13t
Output:
[1 2 3 4]
[5 6 7 8]
[9 10 11 12]
[13 14 15]
How To Iterate on Slices in Go, Iterating 4 items at a time. I want a
for loop.
In Go, readability is paramount. First we read the normal path, then we read the exception/error paths.
We write the normal path.
n := 4
for i := 0; i < len(s); i += n {
ss := s[i : i+n]
fmt.Println(ss)
}
We use n for the stride value throughout.
We write a little tweak that doesn't disturb the normal path to handle an exception, the end of the slice.
if n > len(s)-i {
n = len(s) - i
}
For example,
package main
import "fmt"
func main() {
s := []int{1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15}
n := 4
for i := 0; i < len(s); i += n {
if n > len(s)-i {
n = len(s) - i
}
ss := s[i : i+n]
fmt.Println(ss)
}
}
Playground: https://play.golang.org/p/Vtpig2EeXB7
Output:
[1 2 3 4]
[5 6 7 8]
[9 10 11 12]
[13 14 15]
I am optimizing matrix multiplication via goroutines in Go.
My benchmark shows, introducing concurrency per row or per element largely drops performance:
goos: darwin
goarch: amd64
BenchmarkMatrixDotNaive/A.MultNaive-8 2000000 869 ns/op 0 B/op 0 allocs/op
BenchmarkMatrixDotNaive/A.ParalMultNaivePerRow-8 100000 14467 ns/op 80 B/op 9 allocs/op
BenchmarkMatrixDotNaive/A.ParalMultNaivePerElem-8 20000 77299 ns/op 528 B/op 65 allocs/op
I know some basic prior knowledge of cache locality, it make sense that per element concurrency drops performance. However, why per row still drops the performance even in naive version?
In fact, I also wrote a block/tiling optimization, its vanilla version (without goroutine concurrency) even worse than naive version (not present here, let's focus on naive first).
What did I do wrong here? Why? How to optimize here?
Multiplication:
package naive
import (
"errors"
"sync"
)
// Errors
var (
ErrNumElements = errors.New("Error number of elements")
ErrMatrixSize = errors.New("Error size of matrix")
)
// Matrix is a 2d array
type Matrix struct {
N int
data [][]float64
}
// New a size by size matrix
func New(size int) func(...float64) (*Matrix, error) {
wg := sync.WaitGroup{}
d := make([][]float64, size)
for i := range d {
wg.Add(1)
go func(i int) {
defer wg.Done()
d[i] = make([]float64, size)
}(i)
}
wg.Wait()
m := &Matrix{N: size, data: d}
return func(es ...float64) (*Matrix, error) {
if len(es) != size*size {
return nil, ErrNumElements
}
for i := range es {
wg.Add(1)
go func(i int) {
defer wg.Done()
m.data[i/size][i%size] = es[i]
}(i)
}
wg.Wait()
return m, nil
}
}
// At access element (i, j)
func (A *Matrix) At(i, j int) float64 {
return A.data[i][j]
}
// Set set element (i, j) with val
func (A *Matrix) Set(i, j int, val float64) {
A.data[i][j] = val
}
// MultNaive matrix multiplication O(n^3)
func (A *Matrix) MultNaive(B, C *Matrix) (err error) {
var (
i, j, k int
sum float64
N = A.N
)
if N != B.N || N != C.N {
return ErrMatrixSize
}
for i = 0; i < N; i++ {
for j = 0; j < N; j++ {
sum = 0.0
for k = 0; k < N; k++ {
sum += A.At(i, k) * B.At(k, j)
}
C.Set(i, j, sum)
}
}
return
}
// ParalMultNaivePerRow matrix multiplication O(n^3) in concurrency per row
func (A *Matrix) ParalMultNaivePerRow(B, C *Matrix) (err error) {
var N = A.N
if N != B.N || N != C.N {
return ErrMatrixSize
}
wg := sync.WaitGroup{}
for i := 0; i < N; i++ {
wg.Add(1)
go func(i int) {
defer wg.Done()
for j := 0; j < N; j++ {
sum := 0.0
for k := 0; k < N; k++ {
sum += A.At(i, k) * B.At(k, j)
}
C.Set(i, j, sum)
}
}(i)
}
wg.Wait()
return
}
// ParalMultNaivePerElem matrix multiplication O(n^3) in concurrency per element
func (A *Matrix) ParalMultNaivePerElem(B, C *Matrix) (err error) {
var N = A.N
if N != B.N || N != C.N {
return ErrMatrixSize
}
wg := sync.WaitGroup{}
for i := 0; i < N; i++ {
for j := 0; j < N; j++ {
wg.Add(1)
go func(i, j int) {
defer wg.Done()
sum := 0.0
for k := 0; k < N; k++ {
sum += A.At(i, k) * B.At(k, j)
}
C.Set(i, j, sum)
}(i, j)
}
}
wg.Wait()
return
}
Benchmark:
package naive
import (
"os"
"runtime/trace"
"testing"
)
type Dot func(B, C *Matrix) error
var (
A = &Matrix{
N: 8,
data: [][]float64{
[]float64{1, 2, 3, 4, 5, 6, 7, 8},
[]float64{9, 1, 2, 3, 4, 5, 6, 7},
[]float64{8, 9, 1, 2, 3, 4, 5, 6},
[]float64{7, 8, 9, 1, 2, 3, 4, 5},
[]float64{6, 7, 8, 9, 1, 2, 3, 4},
[]float64{5, 6, 7, 8, 9, 1, 2, 3},
[]float64{4, 5, 6, 7, 8, 9, 1, 2},
[]float64{3, 4, 5, 6, 7, 8, 9, 0},
},
}
B = &Matrix{
N: 8,
data: [][]float64{
[]float64{9, 8, 7, 6, 5, 4, 3, 2},
[]float64{1, 9, 8, 7, 6, 5, 4, 3},
[]float64{2, 1, 9, 8, 7, 6, 5, 4},
[]float64{3, 2, 1, 9, 8, 7, 6, 5},
[]float64{4, 3, 2, 1, 9, 8, 7, 6},
[]float64{5, 4, 3, 2, 1, 9, 8, 7},
[]float64{6, 5, 4, 3, 2, 1, 9, 8},
[]float64{7, 6, 5, 4, 3, 2, 1, 0},
},
}
C = &Matrix{
N: 8,
data: [][]float64{
[]float64{0, 0, 0, 0, 0, 0, 0, 0},
[]float64{0, 0, 0, 0, 0, 0, 0, 0},
[]float64{0, 0, 0, 0, 0, 0, 0, 0},
[]float64{0, 0, 0, 0, 0, 0, 0, 0},
[]float64{0, 0, 0, 0, 0, 0, 0, 0},
[]float64{0, 0, 0, 0, 0, 0, 0, 0},
[]float64{0, 0, 0, 0, 0, 0, 0, 0},
[]float64{0, 0, 0, 0, 0, 0, 0, 0},
},
}
)
func BenchmarkMatrixDotNaive(b *testing.B) {
f, _ := os.Create("bench.trace")
defer f.Close()
trace.Start(f)
defer trace.Stop()
tests := []struct {
name string
f Dot
}{
{
name: "A.MultNaive",
f: A.MultNaive,
},
{
name: "A.ParalMultNaivePerRow",
f: A.ParalMultNaivePerRow,
},
{
name: "A.ParalMultNaivePerElem",
f: A.ParalMultNaivePerElem,
},
}
for _, tt := range tests {
b.Run(tt.name, func(b *testing.B) {
for i := 0; i < b.N; i++ {
tt.f(B, C)
}
})
}
}
Performing 8x8 matrix multipliciation is relatively small work.
Goroutines (although may be lightweight) do have overhead. If the work they do is "small", the overhead of launching, synchronizing and throwing them away may outweight the performance gain of utilizing multiple cores / threads, and overall you might not gain performance by executing such small tasks concurrently (hell, you may even do worse than without using goroutines). Measure.
If we increase the matrix size to 80x80, running the benchmark we already see some performance gain in case of ParalMultNaivePerRow:
BenchmarkMatrixDotNaive/A.MultNaive-4 2000 1054775 ns/op
BenchmarkMatrixDotNaive/A.ParalMultNaivePerRow-4 2000 709367 ns/op
BenchmarkMatrixDotNaive/A.ParalMultNaivePerElem-4 100 10224927 ns/op
(As you see in the results, I have 4 CPU cores, running it on your 8-core machine might show more performance gain.)
When rows are small, you are using goroutines to do minimal work, you may improve performance by not "throwing" away goroutines once they're done with their "tiny" work, but you may "reuse" them. See related question: Is this an idiomatic worker thread pool in Go?
Also see related / possible duplicate: Vectorise a function taking advantage of concurrency
I posted a question with nearly the same code yesterday, asking how to make this concurrent amongst a variadic function. After it was resolved, I expected the program to run nearly the same amount of time with one generator as with 30+. It doesn't seem so.
The times I see are with one generator, about 5ms. With what's in the code below, 150ms. (For some reason, play.golang shows 0).
Why is it slower? My expectation was that, with the multiple goroutines, it would take about as long. Something to do with spinning up the goroutines?
package main
import (
"fmt"
"sync"
"time"
)
func main() {
t := time.Now()
_ = fanIn(
generator(4, 5, 6, 7),
generator(1, 2, 6, 3, 7),
generator(12, 15, 33, 40, 10),
generator(18, 13, 20, 40, 15),
generator(100, 200, 64000, 3121, 1237),
generator(4, 5, 6, 7),
generator(1, 2, 6, 3, 7),
generator(12, 15, 33, 40, 10),
generator(18, 13, 20, 40, 15),
generator(100, 200, 64000, 3121, 1237),
generator(4, 5, 6, 7),
generator(1, 2, 6, 3, 7),
generator(12, 15, 33, 40, 10),
generator(18, 13, 20, 40, 15),
generator(100, 200, 64000, 3121, 1237),
generator(4, 5, 6, 7),
generator(1, 2, 6, 3, 7),
generator(12, 15, 33, 40, 10),
generator(18, 13, 20, 40, 15),
generator(100, 200, 64000, 3121, 1237),
generator(4, 5, 6, 7),
generator(1, 2, 6, 3, 7),
generator(12, 15, 33, 40, 10),
generator(18, 13, 20, 40, 15),
generator(100, 200, 64000, 3121, 1237),
generator(4, 5, 6, 7),
generator(1, 2, 6, 3, 7),
generator(12, 15, 33, 40, 10),
generator(18, 13, 20, 40, 15),
generator(100, 200, 64000, 3121, 1237),
generator(4, 5, 6, 7),
generator(1, 2, 6, 3, 7),
generator(12, 15, 33, 40, 10),
generator(18, 13, 20, 40, 15),
generator(100, 200, 64000, 3121, 1237),
)
fmt.Println(time.Now().Sub(t))
}
func generator(nums ...int) <-chan int {
out := make(chan int, 10)
go func() {
defer close(out)
for _, v := range nums {
out <- v
}
}()
return out
}
func fanIn(in ...<-chan int) <-chan int {
var wg sync.WaitGroup
out := make(chan int, 10)
wg.Add(len(in))
go func() {
for _, v := range in {
go func(ch <-chan int) {
defer wg.Done()
for val := range ch {
out <- val
}
}(v)
}
}()
go func() {
wg.Wait()
close(out)
}()
return out
}
There is a little difference between go run and go build (compile time):
for me 17ms (on 2 Cores) and 3ms (on 8 Cores) with go1.7 amd64:
difference between go run and go build:
951.0543ms-934.0535ms = 17.0008ms (on 2 Cores)
575.3447ms-572.3914ms = 2.9533ms (on 8 Cores)
difference between 8 Cores and 2 Cores with go build:
934.0535ms-572.3914ms = 361.6621ms
For good benchmark statistics, use large number of samples.
try update to latest Go version ( 1.7).
Try this working sample code, and compare your result with these outputs:
package main
import (
"fmt"
"math/rand"
"sync"
"time"
)
func main() {
t := time.Now()
cs := make([]<-chan int, 1000)
for i := 0; i < len(cs); i++ {
cs[i] = generator(rand.Perm(10000)...)
}
ch := fanIn(cs...)
fmt.Println(time.Now().Sub(t))
is := make([]int, 0, len(ch))
for v := range ch {
is = append(is, v)
}
fmt.Println("len=", len(is))
}
func generator(nums ...int) <-chan int {
out := make(chan int, len(nums))
go func() {
defer close(out)
for _, v := range nums {
out <- v
}
}()
return out
}
func fanIn(in ...<-chan int) <-chan int {
var wg sync.WaitGroup
out := make(chan int, 10)
wg.Add(len(in))
go func() {
for _, v := range in {
go func(ch <-chan int) {
defer wg.Done()
for val := range ch {
out <- val
}
}(v)
}
}()
go func() {
wg.Wait()
close(out)
}()
return out
}
output with 2 Cores ( with go run):
951.0543ms
len= 10000000
output with 2 Cores ( with go build):
934.0535ms
len= 10000000
output with 8 Cores ( with go run):
575.3447ms
len= 10000000
output with 8 Cores ( with go build):
572.3914ms
len= 10000000
I want delete some elements from a slice, and https://github.com/golang/go/wiki/SliceTricks advise this slice-manipulation:
a = append(a[:i], a[i+1:]...)
Then I coded below:
package main
import (
"fmt"
)
func main() {
slice := []int{1, 2, 3, 4, 5, 6, 7, 8, 9}
for i, value := range slice {
if value%3 == 0 { // remove 3, 6, 9
slice = append(slice[:i], slice[i+1:]...)
}
}
fmt.Printf("%v\n", slice)
}
with go run hello.go, it panics:
panic: runtime error: slice bounds out of range
goroutine 1 [running]:
panic(0x4ef680, 0xc082002040)
D:/Go/src/runtime/panic.go:464 +0x3f4
main.main()
E:/Code/go/test/slice.go:11 +0x395
exit status 2
How can I change this code to get right?
I tried below:
1st, with a goto statement:
func main() {
slice := []int{1, 2, 3, 4, 5, 6, 7, 8, 9}
Label:
for i, n := range slice {
if n%3 == 0 {
slice = append(slice[:i], slice[i+1:]...)
goto Label
}
}
fmt.Printf("%v\n", slice)
}
it works, but too much iteration
2nd, use another slice sharing same backing array:
func main() {
slice := []int{1, 2, 3, 4, 5, 6, 7, 8, 9}
dest := slice[:0]
for _, n := range slice {
if n%3 != 0 { // filter
dest = append(dest, n)
}
}
slice = dest
fmt.Printf("%v\n", slice)
}
but not sure if this one is better or not.
3rd, from Remove elements in slice, with len operator:
func main() {
slice := []int{1, 2, 3, 4, 5, 6, 7, 8, 9}
for i := 0; i < len(slice); i++ {
if slice[i]%3 == 0 {
slice = append(slice[:i], slice[i+1:]...)
i-- // should I decrease index here?
}
}
fmt.Printf("%v\n", slice)
}
which one should I take now?
with benchmark:
func BenchmarkRemoveSliceElementsBySlice(b *testing.B) {
for i := 0; i < b.N; i++ {
slice := []int{1, 2, 3, 4, 5, 6, 7, 8, 9}
dest := slice[:0]
for _, n := range slice {
if n%3 != 0 {
dest = append(dest, n)
}
}
}
}
func BenchmarkRemoveSliceElementByLen(b *testing.B) {
for i := 0; i < b.N; i++ {
slice := []int{1, 2, 3, 4, 5, 6, 7, 8, 9}
for i := 0; i < len(slice); i++ {
if slice[i]%3 == 0 {
slice = append(slice[:i], slice[i+1:]...)
}
}
}
}
$ go test -v -bench=".*"
testing: warning: no tests to run
PASS
BenchmarkRemoveSliceElementsBySlice-4 50000000 26.6 ns/op
BenchmarkRemoveSliceElementByLen-4 50000000 32.0 ns/op
it seems delete all elements in one loop is better
Iterate over the slice copying elements that you want to keep.
k := 0
for _, n := range slice {
if n%3 != 0 { // filter
slice[k] = n
k++
}
}
slice = slice[:k] // set slice len to remaining elements
The slice trick is useful in the case where a single element is deleted. If it's possible that more than one element will be deleted, then use the for loop above.
working playground example
while this is good answer for small slice:
package main
import "fmt"
func main() {
slice := []int{1, 2, 3, 4, 5, 6, 7, 8, 9}
k := 0
for _, n := range slice {
if n%3 != 0 { // filter
slice[k] = n
k++
}
}
slice = slice[:k]
fmt.Println(slice) //[1 2 4 5 7 8]
}
for minimizing memory write for first elements (for big slice), you may use this:
package main
import "fmt"
func main() {
slice := []int{1, 2, 3, 4, 5, 6, 7, 8, 9}
k := 0
for i, n := range slice {
if n%3 != 0 { // filter
if i != k {
slice[k] = n
}
k++
}
}
slice = slice[:k]
fmt.Println(slice) //[1 2 4 5 7 8]
}
and if you need new slice or preserving old slice:
package main
import "fmt"
func main() {
slice := []int{1, 2, 3, 4, 5, 6, 7, 8, 9}
s2 := make([]int, len(slice))
k := 0
for _, n := range slice {
if n%3 != 0 { // filter
s2[k] = n
k++
}
}
s2 = s2[:k]
fmt.Println(s2) //[1 2 4 5 7 8]
}