I am optimizing matrix multiplication via goroutines in Go.
My benchmark shows, introducing concurrency per row or per element largely drops performance:
goos: darwin
goarch: amd64
BenchmarkMatrixDotNaive/A.MultNaive-8 2000000 869 ns/op 0 B/op 0 allocs/op
BenchmarkMatrixDotNaive/A.ParalMultNaivePerRow-8 100000 14467 ns/op 80 B/op 9 allocs/op
BenchmarkMatrixDotNaive/A.ParalMultNaivePerElem-8 20000 77299 ns/op 528 B/op 65 allocs/op
I know some basic prior knowledge of cache locality, it make sense that per element concurrency drops performance. However, why per row still drops the performance even in naive version?
In fact, I also wrote a block/tiling optimization, its vanilla version (without goroutine concurrency) even worse than naive version (not present here, let's focus on naive first).
What did I do wrong here? Why? How to optimize here?
Multiplication:
package naive
import (
"errors"
"sync"
)
// Errors
var (
ErrNumElements = errors.New("Error number of elements")
ErrMatrixSize = errors.New("Error size of matrix")
)
// Matrix is a 2d array
type Matrix struct {
N int
data [][]float64
}
// New a size by size matrix
func New(size int) func(...float64) (*Matrix, error) {
wg := sync.WaitGroup{}
d := make([][]float64, size)
for i := range d {
wg.Add(1)
go func(i int) {
defer wg.Done()
d[i] = make([]float64, size)
}(i)
}
wg.Wait()
m := &Matrix{N: size, data: d}
return func(es ...float64) (*Matrix, error) {
if len(es) != size*size {
return nil, ErrNumElements
}
for i := range es {
wg.Add(1)
go func(i int) {
defer wg.Done()
m.data[i/size][i%size] = es[i]
}(i)
}
wg.Wait()
return m, nil
}
}
// At access element (i, j)
func (A *Matrix) At(i, j int) float64 {
return A.data[i][j]
}
// Set set element (i, j) with val
func (A *Matrix) Set(i, j int, val float64) {
A.data[i][j] = val
}
// MultNaive matrix multiplication O(n^3)
func (A *Matrix) MultNaive(B, C *Matrix) (err error) {
var (
i, j, k int
sum float64
N = A.N
)
if N != B.N || N != C.N {
return ErrMatrixSize
}
for i = 0; i < N; i++ {
for j = 0; j < N; j++ {
sum = 0.0
for k = 0; k < N; k++ {
sum += A.At(i, k) * B.At(k, j)
}
C.Set(i, j, sum)
}
}
return
}
// ParalMultNaivePerRow matrix multiplication O(n^3) in concurrency per row
func (A *Matrix) ParalMultNaivePerRow(B, C *Matrix) (err error) {
var N = A.N
if N != B.N || N != C.N {
return ErrMatrixSize
}
wg := sync.WaitGroup{}
for i := 0; i < N; i++ {
wg.Add(1)
go func(i int) {
defer wg.Done()
for j := 0; j < N; j++ {
sum := 0.0
for k := 0; k < N; k++ {
sum += A.At(i, k) * B.At(k, j)
}
C.Set(i, j, sum)
}
}(i)
}
wg.Wait()
return
}
// ParalMultNaivePerElem matrix multiplication O(n^3) in concurrency per element
func (A *Matrix) ParalMultNaivePerElem(B, C *Matrix) (err error) {
var N = A.N
if N != B.N || N != C.N {
return ErrMatrixSize
}
wg := sync.WaitGroup{}
for i := 0; i < N; i++ {
for j := 0; j < N; j++ {
wg.Add(1)
go func(i, j int) {
defer wg.Done()
sum := 0.0
for k := 0; k < N; k++ {
sum += A.At(i, k) * B.At(k, j)
}
C.Set(i, j, sum)
}(i, j)
}
}
wg.Wait()
return
}
Benchmark:
package naive
import (
"os"
"runtime/trace"
"testing"
)
type Dot func(B, C *Matrix) error
var (
A = &Matrix{
N: 8,
data: [][]float64{
[]float64{1, 2, 3, 4, 5, 6, 7, 8},
[]float64{9, 1, 2, 3, 4, 5, 6, 7},
[]float64{8, 9, 1, 2, 3, 4, 5, 6},
[]float64{7, 8, 9, 1, 2, 3, 4, 5},
[]float64{6, 7, 8, 9, 1, 2, 3, 4},
[]float64{5, 6, 7, 8, 9, 1, 2, 3},
[]float64{4, 5, 6, 7, 8, 9, 1, 2},
[]float64{3, 4, 5, 6, 7, 8, 9, 0},
},
}
B = &Matrix{
N: 8,
data: [][]float64{
[]float64{9, 8, 7, 6, 5, 4, 3, 2},
[]float64{1, 9, 8, 7, 6, 5, 4, 3},
[]float64{2, 1, 9, 8, 7, 6, 5, 4},
[]float64{3, 2, 1, 9, 8, 7, 6, 5},
[]float64{4, 3, 2, 1, 9, 8, 7, 6},
[]float64{5, 4, 3, 2, 1, 9, 8, 7},
[]float64{6, 5, 4, 3, 2, 1, 9, 8},
[]float64{7, 6, 5, 4, 3, 2, 1, 0},
},
}
C = &Matrix{
N: 8,
data: [][]float64{
[]float64{0, 0, 0, 0, 0, 0, 0, 0},
[]float64{0, 0, 0, 0, 0, 0, 0, 0},
[]float64{0, 0, 0, 0, 0, 0, 0, 0},
[]float64{0, 0, 0, 0, 0, 0, 0, 0},
[]float64{0, 0, 0, 0, 0, 0, 0, 0},
[]float64{0, 0, 0, 0, 0, 0, 0, 0},
[]float64{0, 0, 0, 0, 0, 0, 0, 0},
[]float64{0, 0, 0, 0, 0, 0, 0, 0},
},
}
)
func BenchmarkMatrixDotNaive(b *testing.B) {
f, _ := os.Create("bench.trace")
defer f.Close()
trace.Start(f)
defer trace.Stop()
tests := []struct {
name string
f Dot
}{
{
name: "A.MultNaive",
f: A.MultNaive,
},
{
name: "A.ParalMultNaivePerRow",
f: A.ParalMultNaivePerRow,
},
{
name: "A.ParalMultNaivePerElem",
f: A.ParalMultNaivePerElem,
},
}
for _, tt := range tests {
b.Run(tt.name, func(b *testing.B) {
for i := 0; i < b.N; i++ {
tt.f(B, C)
}
})
}
}
Performing 8x8 matrix multipliciation is relatively small work.
Goroutines (although may be lightweight) do have overhead. If the work they do is "small", the overhead of launching, synchronizing and throwing them away may outweight the performance gain of utilizing multiple cores / threads, and overall you might not gain performance by executing such small tasks concurrently (hell, you may even do worse than without using goroutines). Measure.
If we increase the matrix size to 80x80, running the benchmark we already see some performance gain in case of ParalMultNaivePerRow:
BenchmarkMatrixDotNaive/A.MultNaive-4 2000 1054775 ns/op
BenchmarkMatrixDotNaive/A.ParalMultNaivePerRow-4 2000 709367 ns/op
BenchmarkMatrixDotNaive/A.ParalMultNaivePerElem-4 100 10224927 ns/op
(As you see in the results, I have 4 CPU cores, running it on your 8-core machine might show more performance gain.)
When rows are small, you are using goroutines to do minimal work, you may improve performance by not "throwing" away goroutines once they're done with their "tiny" work, but you may "reuse" them. See related question: Is this an idiomatic worker thread pool in Go?
Also see related / possible duplicate: Vectorise a function taking advantage of concurrency
Related
Apologies if this is a "please do my homework for me" style of question. But I was looking for advise from the community around performance of array operations in go.
My problem space: I am using Go to perform astrophotographic image calibration. To perform astrophotographic calibration, there is a requirement to calculate the average pixel element value across many multiples data arrays.
For argument sake let's give each sub-exposure a data array type []float32.
What I want to have, is a function that can perform a stacking and mean for any number of these data arrays, i.e., for a good master bias frame a good number of bias frame sub-exposures are needed, and this could be in the region of 10 to 100, depending on how statistically thorough the end user needs to be.
Therefore, I have proposed that I would need a function setup like this:
func MeanFloat32Arrays(a [][]float32) ([]float32, error) {
// calculate the mean across the array of arrays here
}
That is, I am happy for the user to call the function as follows:
m, err := MeanFloat32Arrays([][]float32{i.Dark.data, j.Bias.data, k.Bias.data, ... , z.Bias.data })
The construction of the arg var I feel is just another detail we can gloss over for the moment.
My problem: How do I optimise the "mean stacking" process for this function? That is to say, how should I go about making MeanFloat32Arrays as performant as possible?
My initial code is as follows (which passes the test suite outlined below):
func MeanFloat32Arrays(a [][]float32) ([]float32, error) {
if len(a) == 0 {
return nil, errors.New("to divide arrays they must be of same length")
}
m := make([]float32, len(a[0]))
for i := range m {
for j := range a {
// Ensure that each sub-array has the same length as the first one:
if len(a[j]) != len(a[0]) {
return nil, fmt.Errorf("issue at array input %v: to compute the mean of multiple arrays the length of each array must be the same", i)
}
if a[j][i] == 0 {
continue
}
m[i] += a[j][i]
}
m[i] /= float32(len(a))
}
return m, nil
}
My current unit test suite is as follows:
func TestMeanABC(t *testing.T) {
a := []float32{10, 9, 8, 7, 6, 5, 4, 3, 2, 1}
b := []float32{1, 2, 3, 4, 5, 6, 7, 8, 9, 10}
c := []float32{1, 2, 3, 4, 5, 6, 7, 8, 9, 10}
m, err := MeanFloat32Arrays([][]float32{a, b, c})
if err != nil {
t.Errorf("error should be nil, but got %v", err)
}
if len(m) != len(a) {
t.Errorf("result should be of same length as a")
}
if m[0] != 4 {
t.Errorf("result should be 1, but got %v", m[0])
}
if m[1] != 4.333333333333333 {
t.Errorf("result should be 1, but got %v", m[1])
}
if m[2] != 4.666666666666667 {
t.Errorf("result should be 1, but got %v", m[2])
}
if m[3] != 5 {
t.Errorf("result should be 1, but got %v", m[3])
}
if m[4] != 5.333333333333333 {
t.Errorf("result should be 6 but got %v", m[4])
}
if m[5] != 5.666666666666667 {
t.Errorf("result should be 6 but got %v", m[5])
}
//... Assume here that the mean calculation is correct for all other elements
}
func TestMeanABCD(t *testing.T) {
a := []float32{10, 9, 8, 7, 6, 5, 4, 3, 2, 1}
b := []float32{1, 2, 3, 4, 5, 6, 7, 8, 9, 10}
c := []float32{1, 2, 3, 4, 5, 6, 7, 8, 9, 10}
d := []float32{1, 2, 3, 4, 5, 6, 7, 8, 9, 10}
m, err := MeanFloat32Arrays([][]float32{a, b, c, d})
if err != nil {
t.Errorf("error should be nil, but got %v", err)
}
if len(m) != len(a) {
t.Errorf("result should be of same length as a")
}
if m[0] != 3.25 {
t.Errorf("result should be 1, but got %v", m[0])
}
if m[1] != 3.75 {
t.Errorf("result should be 1, but got %v", m[1])
}
if m[2] != 4.25 {
t.Errorf("result should be 1, but got %v", m[2])
}
if m[3] != 4.75 {
t.Errorf("result should be 1, but got %v", m[3])
}
if m[4] != 5.25 {
t.Errorf("result should be 6 but got %v", m[4])
}
if m[5] != 5.75 {
t.Errorf("result should be 6 but got %v", m[5])
}
//... Assume here that the mean calculation is correct for all other elements
}
func TestMeanABNotEqualLengthPanic(t *testing.T) {
a := []float32{2, 3, 4, 5, 6, 7, 8, 9, 10, 11}
b := []float32{1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11}
_, err := MeanFloat32Arrays([][]float32{a, b})
if err == nil {
t.Errorf("error should not be nil for two arrays of unequal length")
}
}
So my question, what tricks could I apply to reduce the linear behaviour of the function as much as possible, could I use multiple threads? Would this be thread safe? I'd welcome any comments as I am still in the phase of writing go as a hobbyist ...
I am trying to solve a sudoku puzzle in Go using a recursive backtracking algorithm. I created helper functions that check if a certain row, column, or block are valid (i.e no repeated values), as well as a function to print out the current state. I have tested all of these many times so I don't think they are causing the issue. I created the following function to test if a potential game board would be possible.
func cellValid(gameBoard *[9][9]int, value int, y int, x int) bool {
oldVal := gameBoard[y-1][x-1]
gameBoard[y-1][x-1] = value
row := getRow(gameBoard, y)
col := getCol(gameBoard, x)
block := getBlock(gameBoard, x, y)
possible := unitValid(row) && unitValid(col) && unitValid(block)
gameBoard[y-1][x-1] = oldVal
return possible
}
It makes a change to the gameboard, checks if it is possible and stores that bool in the variable possible. It changes the board back to what it was then returns the bool. This function is being called from the following solveBoard function.
func solveBoard(gameBoard *[9][9]int) {
for row := 1; row <= 9; row++ {
for col := 1; col <= 9; col++ {
if gameBoard[row-1][col-1] == 0 {
for value := 1; value <= 9; value++ {
if cellValid(gameBoard, value, row, col) {
gameBoard[row-1][col-1] = value
solveBoard(gameBoard)
gameBoard[row-1][col-1] = 0
}
}
return
}
}
}
printBoard(gameBoard)
return
}
Upon running the file I get no output.
func main() {
var gameBoard = [9][9]int{
{5, 3, 0, 0, 7, 0, 0, 0, 0},
{6, 0, 0, 1, 9, 5, 0, 0, 0},
{0, 9, 8, 0, 0, 0, 0, 6, 0},
{8, 0, 0, 0, 6, 0, 0, 0, 3},
{4, 0, 0, 8, 0, 3, 0, 0, 1},
{7, 0, 0, 0, 2, 0, 0, 0, 6},
{0, 6, 0, 0, 0, 0, 2, 8, 0},
{0, 0, 0, 4, 1, 9, 0, 0, 5},
{0, 0, 0, 0, 8, 0, 0, 7, 9}}
solveBoard(&gameBoard)
}
Here is a link to a go playground containing all my code.
Go Playground
The following video demonstrates what I am trying to accomplish in python.
Computerphile Video
Solution to puzzle:
Puzzle solution
Your program works perfectly fine. Double check the second last line of your matrix:
You have:
{0, 0, 0, 4, 1, 7, 0, 0, 5},
But it should be
{0, 0, 0, 4, 1, 9, 0, 0, 5},
The final working code is.
package main
import (
"fmt"
)
func printBoard(gameBoard *[9][9]int) {
for y := 0; y < 9; y++ {
if y == 3 || y == 6 {
fmt.Println("\n---------")
} else {
fmt.Println("")
}
for x := 0; x < 9; x++ {
if x == 3 || x == 6 {
fmt.Print("|", gameBoard[y][x])
} else {
fmt.Print("", gameBoard[y][x])
}
}
}
fmt.Println("")
}
func unitValid(unit [9]int) bool {
for value := 1; value <= 9; value++ {
count := 0
for index := 0; index < 9; index++ {
if unit[index] == value {
count++
}
}
if count > 1 {
return false
}
}
return true
}
func getRow(gameBoard *[9][9]int, row int) [9]int {
return gameBoard[row-1]
}
func getCol(gameBoard *[9][9]int, col int) [9]int {
var column [9]int
for row := 0; row < 9; row++ {
column[row] = gameBoard[row][col-1]
}
return column
}
func getBlock(gameBoard *[9][9]int, row, col int) [9]int {
i := whatBlock(col)*3 - 2
j := whatBlock(row)*3 - 2
var block [9]int
block[0] = gameBoard[j-1][i-1]
block[1] = gameBoard[j-1][i]
block[2] = gameBoard[j-1][i+1]
block[3] = gameBoard[j][i-1]
block[4] = gameBoard[j][i]
block[5] = gameBoard[j][i+1]
block[6] = gameBoard[j+1][i-1]
block[7] = gameBoard[j+1][i]
block[8] = gameBoard[j+1][i+1]
return block
}
func whatBlock(val int) int {
if val >= 1 && val <= 3 {
return 1
} else if val >= 4 && val <= 6 {
return 2
} else if val >= 7 && val <= 9 {
return 3
}
return 0
}
func cellValid(gameBoard *[9][9]int, value int, y int, x int) bool {
oldVal := gameBoard[y-1][x-1]
gameBoard[y-1][x-1] = value
row := getRow(gameBoard, y)
col := getCol(gameBoard, x)
block := getBlock(gameBoard, y, x)
possible := unitValid(row) && unitValid(col) && unitValid(block)
gameBoard[y-1][x-1] = oldVal
return possible
}
func solveBoard(gameBoard *[9][9]int) {
for row := 1; row <= 9; row++ {
for col := 1; col <= 9; col++ {
if gameBoard[row-1][col-1] == 0 {
for value := 1; value <= 9; value++ {
if cellValid(gameBoard, value, row, col) {
gameBoard[row-1][col-1] = value
solveBoard(gameBoard)
gameBoard[row-1][col-1] = 0
}
}
return
}
}
}
printBoard(gameBoard)
return
}
func main() {
var gameBoard = [9][9]int{
{5, 3, 0, 0, 7, 0, 0, 0, 0},
{6, 0, 0, 1, 9, 5, 0, 0, 0},
{0, 9, 8, 0, 0, 0, 0, 6, 0},
{8, 0, 0, 0, 6, 0, 0, 0, 3},
{4, 0, 0, 8, 0, 3, 0, 0, 1},
{7, 0, 0, 0, 2, 0, 0, 0, 6},
{0, 6, 0, 0, 0, 0, 2, 8, 0},
{0, 0, 0, 4, 1, 9, 0, 0, 5},
{0, 0, 0, 0, 8, 0, 0, 7, 9}}
solveBoard(&gameBoard)
}
I'm trying to figure out how to change a multidimensional slice by reference.
func main() {
matrix := [][]int{
{1, 0, 0},
{1, 0, 0},
{0, 1, 1},
}
fmt.Println("Before")
printMatrix(matrix)
changeMatrixByReference(&matrix)
fmt.Println("After")
printMatrix(matrix)
}
func changeMatrixByReference(matrix *[][]int) {
//&matrix[0][0] = 3
}
func printMatrix(matrix [][]int) {
for i := 0; i < len(matrix); i++ {
for j := 0; j < len(matrix[0]); j++ {
fmt.Printf("%d", matrix[i][j])
}
fmt.Println("")
}
}
How can I change the matrix 2d slice inside the function changeMatrixByReference? I expect when printMatrix runs the second time matrix[0][0] becomes 3.
To set matrix[0][0] to 3, using pointer dereferencing:
(*matrix)[0][0] = 3
Try this:
package main
import "fmt"
func main() {
matrix := [][]int{
{1, 0, 0},
{1, 0, 0},
{0, 1, 1},
}
fmt.Println("Before")
printMatrix(matrix)
changeMatrixByReference(&matrix)
fmt.Println("After")
printMatrix(matrix)
}
func changeMatrixByReference(matrix *[][]int) {
(*matrix)[0][0] = 3
}
func printMatrix(matrix [][]int) {
for i := 0; i < len(matrix); i++ {
for j := 0; j < len(matrix[0]); j++ {
fmt.Printf("%d", matrix[i][j])
}
fmt.Println("")
}
}
For as long as you don't modify the slice header (like when adding element), you don't need a pointer, elements accessed by their index are stored in a backing array for which the slice header holds a pointer for you:
Try this:
package main
import "fmt"
func main() {
matrix := [][]int{
{1, 0, 0},
{1, 0, 0},
{0, 1, 1},
}
fmt.Println("Before")
printMatrix(matrix)
changeMatrixByReference(matrix)
fmt.Println("After")
printMatrix(matrix)
}
func changeMatrixByReference(matrix [][]int) {
matrix[0][0] = 3
}
func printMatrix(matrix [][]int) {
for i := 0; i < len(matrix); i++ {
for j := 0; j < len(matrix[0]); j++ {
fmt.Printf("%d", matrix[i][j])
}
fmt.Println("")
}
}
Output:
Before
100
100
011
After
300
100
011
I want shuffle db ids so that none of the id refer to themselves, but with this piece of code:
package main
import (
"log"
"math/rand"
"time"
)
func main() {
seed := time.Now().UnixNano() & 999999999
log.Print("seed: ", seed)
rand.Seed(seed)
ordered := []int{0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19}
randomized := shufflePreventCollision(ordered)
log.Print("Final Result")
log.Print("ordered: ", ordered)
log.Print("random: ", randomized)
}
func shufflePreventCollision(ordered []int) []int {
randomized := rand.Perm(len(ordered))
for i, o := range ordered {
if o == randomized[i] {
log.Printf("Doing it again because ordered[%d] (%d) is == randomized[%d] (%d)", i, o, i, randomized[i])
log.Print(ordered)
log.Print(randomized)
shufflePreventCollision(ordered)
}
}
return randomized
}
I find a strange behaviour, when it runs often it at some point hangs and cannot find non-colliding sequences anymore. I tried
go build -o rand_example3 rand_example3.go && time (for i in $(seq 10000) ; do ./rand_example3 ; done)
And it seems to never end. Am I missing some understanding here or is there really something fishy with math/rand?
"With this piece of code, I want shuffle db ids so that none of the
ids refer to themselves. [Sometimes] it doesn't end even when I let it
run for an hour or so."
tl;dr There is a faster solution that is more than a thousand times faster.
Your code:
slaxor.go:
package main
import (
"log"
"math/rand"
"time"
)
func main() {
seed := time.Now().UnixNano() & 999999999
log.Print("seed: ", seed)
rand.Seed(seed)
ordered := []int{0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19}
randomized := shufflePreventCollisionSlaxor(ordered)
log.Print("Final Result")
log.Print("ordered: ", ordered)
log.Print("random: ", randomized)
}
func shufflePreventCollisionSlaxor(ordered []int) []int {
randomized := rand.Perm(len(ordered))
for i, o := range ordered {
if o == randomized[i] {
log.Printf("Doing it again because ordered[%d] (%d) is == randomized[%d] (%d)", i, o, i, randomized[i])
log.Print(ordered)
log.Print(randomized)
shufflePreventCollisionSlaxor(ordered)
}
}
return randomized
}
Playground: https://play.golang.org/p/JI5rJGcAAz
It is not obvious how, or even if, the code fulfills its purpose.
The termination condition is probabalistic, not deterministic.
Let's leave aside the issue of whether the code fulfills its purpose.
This benchmark has been modified so that stderr is limited by the speed of the sink /dev/null, not a terminal.
slaxor.bash:
go build -o slaxor slaxor.go && time (for i in $(seq 10000) ; do ./slaxor 2> /dev/null ; done)
The benchmark measures the execution of a program and a single execution of the algorithm. The benchmark times are inconsistent because the pseudorandom seed value changes for each program execution. The benchmark sometimes "doesn't end even when I let it run for an hour or so."
There is a faster solution that runs and terminates in a few seconds, despite the program execution overhead.
peterso.bash:
go build -o peterso peterso.go && time (for i in $(seq 10000) ; do ./peterso 2> /dev/null ; done)
Output:
$ ./peterso.bash
real 0m5.290s
user 0m5.224s
sys 0m1.128s
$ ./peterso.bash
real 0m7.462s
user 0m7.109s
sys 0m1.922s
peterso.go:
package main
import (
"fmt"
"log"
"math/rand"
"time"
)
func main() {
seed := time.Now().UnixNano() & 999999999
log.Print("seed: ", seed)
r = rand.New(rand.NewSource(seed))
ordered := []int{0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19}
randomized := shufflePreventCollisionPeterSO(ordered)
log.Print("Final Result")
log.Print("ordered: ", ordered)
log.Print("random: ", randomized)
if randomized == nil {
err := "Shuffle Error!"
fmt.Print(err)
log.Fatal(err)
}
}
var r *rand.Rand
func isNoCollision(a, b []int) bool {
if len(a) == len(b) {
for i, ai := range a {
if ai == b[i] {
return false
}
}
return true
}
return false
}
func shufflePreventCollisionPeterSO(ordered []int) []int {
const guard = 4 * 1024 // deterministic, finite time
for n := 1; n <= guard; n++ {
randomized := r.Perm(len(ordered))
if isNoCollision(ordered, randomized) {
return randomized
}
}
return nil
}
The guard provides a deterministic, finite time, termination condition.
Playground: https://play.golang.org/p/ZT-sfDW5Mi
Let's put aside the distorted program execution benchmarks, let's look at function execution times. Because this is important, the Go standard library has the testing package for testing and benchmarking functions.
Tests:
$ go test shuffle_test.go -v -count=1 -run=. -bench=!
=== RUN TestTimeSlaxor
=== RUN TestTimeSlaxor/1K
=== RUN TestTimeSlaxor/2K
=== RUN TestTimeSlaxor/3K
--- PASS: TestTimeSlaxor (13.78s)
--- PASS: TestTimeSlaxor/1K (1.18s)
--- PASS: TestTimeSlaxor/2K (1.27s)
--- PASS: TestTimeSlaxor/3K (11.33s)
=== RUN TestTimePeterSO
=== RUN TestTimePeterSO/1K
=== RUN TestTimePeterSO/2K
=== RUN TestTimePeterSO/3K
=== RUN TestTimePeterSO/1M
=== RUN TestTimePeterSO/2M
=== RUN TestTimePeterSO/3M
--- PASS: TestTimePeterSO (6.57s)
--- PASS: TestTimePeterSO/1K (0.00s)
--- PASS: TestTimePeterSO/2K (0.00s)
--- PASS: TestTimePeterSO/3K (0.00s)
--- PASS: TestTimePeterSO/1M (1.13s)
--- PASS: TestTimePeterSO/2M (2.25s)
--- PASS: TestTimePeterSO/3M (3.19s)
PASS
ok command-line-arguments 20.347s
$
In a fraction of the rapidly increasing time it takes to run 3K (3,000) iterations of shufflePreventCollisionSlaxor, 3M (3,000,000) iterations of shufflePreventCollisionPeterSO run, a more than thousand-fold improvement.
Benchmarks:
$ go test shuffle_test.go -v -count=1 -run=! -bench=.
goos: linux
goarch: amd64
BenchmarkTimePeterSO-8 1000000 1048 ns/op 434 B/op 2 allocs/op
BenchmarkTimeSlaxor-8 10000 2256271 ns/op 636894 B/op 3980 allocs/op
PASS
ok command-line-arguments 23.643s
$
It's easy to see that the shufflePreventCollisionPeterSO average per iteration cost of 1,000,000 iterations is small, 1,048 nanoseconds, especially when compared to only 10,000 iterations of shufflePreventCollisionSlaxor at 2,256,271 nanoseconds average per iteration.
Also, note the sparing use of memory by shufflePreventCollisionPeterSO, on average 2 allocations for a total allocation of 434 bytes per iteration, versus the profligate use of memory by shufflePreventCollisionSlaxor, on average 3,980 allocations for a total allocation of 636,894 bytes per iteration.
shuffle_test.go:
package main
import (
"fmt"
"math/rand"
"strconv"
"testing"
)
func shufflePreventCollisionSlaxor(ordered []int) []int {
randomized := rand.Perm(len(ordered))
for i, o := range ordered {
if o == randomized[i] {
shufflePreventCollisionSlaxor(ordered)
}
}
return randomized
}
var r *rand.Rand
func isNoCollision(a, b []int) bool {
if len(a) == len(b) {
for i, ai := range a {
if ai == b[i] {
return false
}
}
return true
}
return false
}
func shufflePreventCollisionPeterSO(ordered []int) []int {
const guard = 4 * 1024 // deterministic, finite time
for n := 1; n <= guard; n++ {
randomized := r.Perm(len(ordered))
if isNoCollision(ordered, randomized) {
return randomized
}
}
return nil
}
const testSeed = int64(60309766)
func testTime(t *testing.T, ordered, randomized []int, shuffle func([]int) []int) {
shuffled := shuffle(ordered)
want := fmt.Sprintf("%v", randomized)
got := fmt.Sprintf("%v", shuffled)
if want != got {
t.Errorf("Error:\n from: %v\n want: %s\n got: %s\n", ordered, want, got)
}
}
func testTimeSlaxor(t *testing.T, n int) {
rand.Seed(testSeed)
ordered := []int{0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19}
randomized := []int{3, 1, 17, 15, 10, 16, 14, 19, 7, 6, 11, 2, 0, 12, 8, 18, 13, 4, 9, 5}
testTime(t, ordered, randomized, shufflePreventCollisionSlaxor)
for i := 1; i < n; i++ {
shufflePreventCollisionSlaxor(ordered)
}
}
func TestTimeSlaxor(t *testing.T) {
for k := 1; k <= 3; k++ {
n := 1000 * k
t.Run(strconv.Itoa(k)+"K", func(t *testing.T) { testTimeSlaxor(t, n) })
}
}
func testTimePeterSo(t *testing.T, n int) {
r = rand.New(rand.NewSource(testSeed))
ordered := []int{0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19}
randomized := []int{10, 7, 15, 14, 8, 6, 18, 17, 19, 11, 5, 16, 2, 12, 1, 13, 3, 0, 9, 4}
testTime(t, ordered, randomized, shufflePreventCollisionPeterSO)
for i := 1; i < n; i++ {
shufflePreventCollisionPeterSO(ordered)
}
}
func TestTimePeterSO(t *testing.T) {
for k := 1; k <= 3; k++ {
n := 1000 * k
t.Run(strconv.Itoa(k)+"K", func(t *testing.T) { testTimePeterSo(t, n) })
}
for m := 1; m <= 3; m++ {
n := 1000 * 1000 * m
t.Run(strconv.Itoa(m)+"M", func(t *testing.T) { testTimePeterSo(t, n) })
}
}
func benchTime(b *testing.B, ordered, randomized []int, shuffle func([]int) []int) {
shuffled := shuffle(ordered)
want := fmt.Sprintf("%v", randomized)
got := fmt.Sprintf("%v", shuffled)
if want != got {
b.Errorf("Error:\n from: %v\n want: %s\n got: %s\n", ordered, want, got)
}
}
func BenchmarkTimePeterSO(b *testing.B) {
b.ReportAllocs()
r = rand.New(rand.NewSource(testSeed))
ordered := []int{0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19}
randomized := []int{10, 7, 15, 14, 8, 6, 18, 17, 19, 11, 5, 16, 2, 12, 1, 13, 3, 0, 9, 4}
benchTime(b, ordered, randomized, shufflePreventCollisionPeterSO)
r = rand.New(rand.NewSource(testSeed))
b.ResetTimer()
for i := 0; i < b.N; i++ {
shufflePreventCollisionPeterSO(ordered)
}
}
func BenchmarkTimeSlaxor(b *testing.B) {
b.ReportAllocs()
r = rand.New(rand.NewSource(testSeed))
ordered := []int{0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19}
randomized := []int{10, 7, 15, 14, 8, 6, 18, 17, 19, 11, 5, 16, 2, 12, 1, 13, 3, 0, 9, 4}
benchTime(b, ordered, randomized, shufflePreventCollisionPeterSO)
r = rand.New(rand.NewSource(testSeed))
b.ResetTimer()
for i := 0; i < b.N; i++ {
shufflePreventCollisionSlaxor(ordered)
}
}
Playground: https://play.golang.org/p/ozazWGNZsu
I want delete some elements from a slice, and https://github.com/golang/go/wiki/SliceTricks advise this slice-manipulation:
a = append(a[:i], a[i+1:]...)
Then I coded below:
package main
import (
"fmt"
)
func main() {
slice := []int{1, 2, 3, 4, 5, 6, 7, 8, 9}
for i, value := range slice {
if value%3 == 0 { // remove 3, 6, 9
slice = append(slice[:i], slice[i+1:]...)
}
}
fmt.Printf("%v\n", slice)
}
with go run hello.go, it panics:
panic: runtime error: slice bounds out of range
goroutine 1 [running]:
panic(0x4ef680, 0xc082002040)
D:/Go/src/runtime/panic.go:464 +0x3f4
main.main()
E:/Code/go/test/slice.go:11 +0x395
exit status 2
How can I change this code to get right?
I tried below:
1st, with a goto statement:
func main() {
slice := []int{1, 2, 3, 4, 5, 6, 7, 8, 9}
Label:
for i, n := range slice {
if n%3 == 0 {
slice = append(slice[:i], slice[i+1:]...)
goto Label
}
}
fmt.Printf("%v\n", slice)
}
it works, but too much iteration
2nd, use another slice sharing same backing array:
func main() {
slice := []int{1, 2, 3, 4, 5, 6, 7, 8, 9}
dest := slice[:0]
for _, n := range slice {
if n%3 != 0 { // filter
dest = append(dest, n)
}
}
slice = dest
fmt.Printf("%v\n", slice)
}
but not sure if this one is better or not.
3rd, from Remove elements in slice, with len operator:
func main() {
slice := []int{1, 2, 3, 4, 5, 6, 7, 8, 9}
for i := 0; i < len(slice); i++ {
if slice[i]%3 == 0 {
slice = append(slice[:i], slice[i+1:]...)
i-- // should I decrease index here?
}
}
fmt.Printf("%v\n", slice)
}
which one should I take now?
with benchmark:
func BenchmarkRemoveSliceElementsBySlice(b *testing.B) {
for i := 0; i < b.N; i++ {
slice := []int{1, 2, 3, 4, 5, 6, 7, 8, 9}
dest := slice[:0]
for _, n := range slice {
if n%3 != 0 {
dest = append(dest, n)
}
}
}
}
func BenchmarkRemoveSliceElementByLen(b *testing.B) {
for i := 0; i < b.N; i++ {
slice := []int{1, 2, 3, 4, 5, 6, 7, 8, 9}
for i := 0; i < len(slice); i++ {
if slice[i]%3 == 0 {
slice = append(slice[:i], slice[i+1:]...)
}
}
}
}
$ go test -v -bench=".*"
testing: warning: no tests to run
PASS
BenchmarkRemoveSliceElementsBySlice-4 50000000 26.6 ns/op
BenchmarkRemoveSliceElementByLen-4 50000000 32.0 ns/op
it seems delete all elements in one loop is better
Iterate over the slice copying elements that you want to keep.
k := 0
for _, n := range slice {
if n%3 != 0 { // filter
slice[k] = n
k++
}
}
slice = slice[:k] // set slice len to remaining elements
The slice trick is useful in the case where a single element is deleted. If it's possible that more than one element will be deleted, then use the for loop above.
working playground example
while this is good answer for small slice:
package main
import "fmt"
func main() {
slice := []int{1, 2, 3, 4, 5, 6, 7, 8, 9}
k := 0
for _, n := range slice {
if n%3 != 0 { // filter
slice[k] = n
k++
}
}
slice = slice[:k]
fmt.Println(slice) //[1 2 4 5 7 8]
}
for minimizing memory write for first elements (for big slice), you may use this:
package main
import "fmt"
func main() {
slice := []int{1, 2, 3, 4, 5, 6, 7, 8, 9}
k := 0
for i, n := range slice {
if n%3 != 0 { // filter
if i != k {
slice[k] = n
}
k++
}
}
slice = slice[:k]
fmt.Println(slice) //[1 2 4 5 7 8]
}
and if you need new slice or preserving old slice:
package main
import "fmt"
func main() {
slice := []int{1, 2, 3, 4, 5, 6, 7, 8, 9}
s2 := make([]int, len(slice))
k := 0
for _, n := range slice {
if n%3 != 0 { // filter
s2[k] = n
k++
}
}
s2 = s2[:k]
fmt.Println(s2) //[1 2 4 5 7 8]
}