I have a university project for testing time difference for matrix multiplication when I use 1 goroutine, 2 goroutines, 3 and so on. I must use channels. My problem is that doesn't matter how many go routines I add time of compilation is almost always the same. Maybe some one can tell where is the problem. Maybe that sending is very long and it gives all the time. Code is given below
package main
import (
"fmt"
"math/rand"
"time"
)
const length = 1000
var start time.Time
var rez [length][length]int
func main() {
const threadlength = 1
toCalcRow := make(chan []int)
toCalcColumn := make(chan []int)
dummy1 := make(chan int)
dummy2 := make(chan int)
var row [length + 1]int
var column [length + 1]int
var a [length][length]int
var b [length][length]int
for i := 0; i < length; i++ {
for j := 0; j < length; j++ {
a[i][j] = rand.Intn(10)
b[i][j] = rand.Intn(10)
}
}
for i := 0; i < threadlength; i++ {
go Calc(toCalcRow, toCalcColumn, dummy1, dummy2)
}
start = time.Now()
for i := 0; i < length; i++ {
for j := 0; j < length; j++ {
row[0] = i
column[0] = j
for k := 0; k < length; k++ {
row[k+1] = a[i][j]
column[k+1] = b[i][k]
}
rowSlices := make([]int, len(row))
columnSlices := make([]int, len(column))
copy(rowSlices, row[:])
copy(columnSlices, column[:])
toCalcRow <- rowSlices
toCalcColumn <- columnSlices
}
}
dummy1 <- -1
for i := 0; i < length; i++ {
for j := 0; j < length; j++ {
fmt.Print(rez[i][j])
fmt.Print(" ")
}
fmt.Println(" ")
}
<-dummy2
close(toCalcRow)
close(toCalcColumn)
close(dummy1)
}
func Calc(chin1 <-chan []int, chin2 <-chan []int, dummy <-chan int, dummy1 chan<- int) {
loop:
for {
select {
case row := <-chin1:
column := <-chin2
var sum [3]int
sum[0] = row[0]
sum[1] = column[0]
for i := 1; i < len(row); i++ {
sum[2] += row[i] * column[i]
}
rez[sum[0]][sum[1]] = sum[2]
case <-dummy:
elapsed := time.Since(start)
fmt.Println("Binomial took ", elapsed)
dummy1 <- 0
break loop
}
}
close(dummy1)
}
You don't see a difference because preparing the data to pass to the go routines is your bottleneck. It's slower or as fast as performing the calc.
Passing a copy of the rows and columns is not a good strategy. This is killing the performance.
The go routines may read data directly from the input matrix that are read only. There is no possible race condition here.
Same for output. If a go routine computes the multiplication of a row and a column, it will write the result in a distinct cell. There is also no possible race conditions here.
What to do is the following. Define a struct with two fields, one for the row and one for the column to multiply.
Fill a buffered channel with all possible combinations of row and columns to multiply from (0,0) to (n-1,m-1).
The go routines, consume the structs from the channel, perform the computation and write the result directly into the output matrix.
You then also have a done channel to signal to the main go routine that the computation is done. When a go routine has finished processing the struct (n-1,m-1) it closes the done channel.
The main go routine waits on the done channel after it has written all structs. Once the done channel is closed, it prints the elapsed time.
We can use a waiting group to wait that all go routine terminated their computation.
You can then start with one go routine and increase the number of go routines to see the impact of the processing time.
See the code:
package main
import (
"fmt"
"math/rand"
"sync"
"time"
)
type pair struct {
row, col int
}
const length = 1000
var start time.Time
var rez [length][length]int
func main() {
const threadlength = 1
pairs := make(chan pair, 1000)
var wg sync.WaitGroup
var a [length][length]int
var b [length][length]int
for i := 0; i < length; i++ {
for j := 0; j < length; j++ {
a[i][j] = rand.Intn(10)
b[i][j] = rand.Intn(10)
}
}
wg.Add(threadlength)
for i := 0; i < threadlength; i++ {
go Calc(pairs, &a, &b, &rez, &wg)
}
start = time.Now()
for i := 0; i < length; i++ {
for j := 0; j < length; j++ {
pairs <- pair{row: i, col: j}
}
}
close(pairs)
wg.Wait()
elapsed := time.Since(start)
fmt.Println("Binomial took ", elapsed)
for i := 0; i < length; i++ {
for j := 0; j < length; j++ {
fmt.Print(rez[i][j])
fmt.Print(" ")
}
fmt.Println(" ")
}
}
func Calc(pairs chan pair, a, b, rez *[length][length]int, wg *sync.WaitGroup) {
for {
pair, ok := <-pairs
if !ok {
break
}
rez[pair.row][pair.col] = 0
for i := 0; i < length; i++ {
rez[pair.row][pair.col] += a[pair.row][i] * b[i][pair.col]
}
}
wg.Done()
}
Your code is quite difficult to follow (calling variables dummy1/dummy2 is confusing particularly when they get different names in Calc) and adding some comments would make it more easily understood.
Firstly a bug. After sending data to be calculated you dummy1 <- -1 and I believe you expect this to wait for all calculations to be complete. However that is not necessarily the case when you have multiple goroutines. The channel will be drained by ONE of the goroutines and the timing info printed out; other goroutines will still be running (and may not have finnished their calculations).
In terms of timing I suspect that the way you are sending data to the go routines will slow things down; you send the row and then the column; because the channels are not buffered the goroutine will block while waiting for the column (switching back to the main goroutine to send the column). This back and forth will slow the rate at which your goroutines get data and may well explain why adding extra goroutines has a limited impact (it also becomes dangerous if you use buffered channels).
I have refactored your code (note there may be bugs and its far from perfect!) into something that does show a difference (on my computer 1 goroutine = 10s; 5 = 7s):
package main
import (
"fmt"
"math/rand"
"sync"
"time"
)
const length = 1000
var start time.Time
var rez [length][length]int
// toMultiply will hold details of what the goroutine will be multiplying (one row and one column)
type toMultiply struct {
rowNo int
columnNo int
row []int
column []int
}
func main() {
const noOfGoRoutines = 5
// Build up a matrix of dimensions (length) x (length)
var a [length][length]int
var b [length][length]int
for i := 0; i < length; i++ {
for j := 0; j < length; j++ {
a[i][j] = rand.Intn(10)
b[i][j] = rand.Intn(10)
}
}
// Setup completed so start the clock...
start = time.Now()
// Start off threadlength go routines to multiply each row/column
toCalc := make(chan toMultiply)
var wg sync.WaitGroup
wg.Add(noOfGoRoutines)
for i := 0; i < noOfGoRoutines; i++ {
go func() {
Calc(toCalc)
wg.Done()
}()
}
// Begin the multiplication.
start = time.Now()
for i := 0; i < length; i++ {
for j := 0; j < length; j++ {
tm := toMultiply{
rowNo: i,
columnNo: j,
row: make([]int, length),
column: make([]int, length),
}
for k := 0; k < length; k++ {
tm.row[k] = a[i][j]
tm.column[k] = b[i][k]
}
toCalc <- tm
}
}
// All of the data has been sent to the chanel; now we need to wait for all of the
// goroutines to complete
close(toCalc)
wg.Wait()
fmt.Println("Binomial took ", time.Since(start))
// The full result should be in tz
for i := 0; i < length; i++ {
for j := 0; j < length; j++ {
//fmt.Print(rez[i][j])
//fmt.Print(" ")
}
//fmt.Println(" ")
}
}
// Calc - Multiply a row from one matrix with a column from another
func Calc(toCalc <-chan toMultiply) {
for tc := range toCalc {
var result int
for i := 0; i < len(tc.row); i++ {
result += tc.row[i] * tc.column[i]
}
// warning - the below should work in this case but be careful writing to global variables from goroutines
rez[tc.rowNo][tc.columnNo] = result
}
}
Related
im new to go. i have this project it's about matrices and their operations. i am supposed to use goroutine in my multiply function.(also disclaimer:this is not my code, its a group project im supposed to add the goroutine to the function)
i've searched around and seen this way(the way that i used) in different sites.however i get a fatal error: all the goroutines are asleep.deadlock!
or it just takes way too long for the code to run.
func (M Matrix) Multiply(N Matrix) Matrix {
var ans = BuildZeroMatrix(len(M), len(N[0]))
var wg sync.WaitGroup
wg.Add(len(M)*len(N[0])*len(N))
for i := 0; i < len(M); i++ {
for j := 0; j < len(N[0]); j++ {
for k := 0; k < len(N); k++ {
wg.Add(1)
go func(j int, i int, k int) {
ans[i][j] += M[i][k] * N[k][j]
defer wg.Done()
}(j, i, k)
}
}
}
go func(){
wg.Wait()
}()
return ans
}
this is the multiply function.
type Matrix [][]float64
this is Matrix.
func BuildZeroMatrix(Row int, Col int) Matrix {
var temp [][]float64
for i := 0; i < Row; i++ {
var tmp []float64
for j := 0; j < Col; j++ {
tmp = append(tmp, 0)
}
temp = append(temp, tmp)
}
M, _ := BuildMatrix(Row, Col, temp)
return M
}
and this is the buildzeromatrix function.
i give the function 2 500*500 matrices and it takes about 10 seconds to run or it gives the fatal error.
every post i found on this problem had channels in them and their problem was with the channel.however i dont have any.
Just pointing out the mistakes in this snippet. Please check the inline comments.
func (M Matrix) Multiply(N Matrix) Matrix {
var ans = BuildZeroMatrix(len(M), len(N[0]))
var wg sync.WaitGroup
wg.Add(len(M)*len(N[0])*len(N)) // This is unnecessary. You are already adding the goroutine inside the k loop.
for i := 0; i < len(M); i++ {
for j := 0; j < len(N[0]); j++ {
for k := 0; k < len(N); k++ {
wg.Add(1)
go func(j int, i int, k int) {
ans[i][j] += M[i][k] * N[k][j] // This is data race area. Multiple goroutines will try to write to the same address. You will need a lock here.
defer wg.Done()
}(j, i, k)
}
}
}
go func(){
wg.Wait() // Wait()ing for goroutines to be completed should not be done inside a goroutine. Because here you don't wait for this goroutine to be done.
}()
return ans
}
UPDATED: Pls check the sample code
package main
import (
"fmt"
"sync"
)
type Matrix [][]float64
func (M Matrix) Multiply(N Matrix) Matrix {
var ans = BuildZeroMatrix(len(M), len(N[0]))
var wg sync.WaitGroup
var mx sync.Mutex
for i := 0; i < len(M); i++ {
for j := 0; j < len(N[0]); j++ {
for k := 0; k < len(N); k++ {
wg.Add(1)
go func(j int, i int, k int) {
defer wg.Done()
mx.Lock()
ans[i][j] += M[i][k] * N[k][j]
mx.Unlock()
}(j, i, k)
}
}
}
wg.Wait()
return ans
}
func BuildZeroMatrix(Row int, Col int) Matrix {
var temp [][]float64
for i := 0; i < Row; i++ {
var tmp []float64
for j := 0; j < Col; j++ {
tmp = append(tmp, 0)
}
temp = append(temp, tmp)
}
return temp
}
func main() {
var m, n Matrix
// T1:
m = Matrix{[]float64{1, 1, 1}, []float64{1, 1, 1}, []float64{1, 1, 1}}
n = Matrix{[]float64{1, 1, 1}, []float64{1, 1, 1}, []float64{1, 1, 1}}
fmt.Printf("%v", m.Multiply(n))
// T2:
m = Matrix{[]float64{1, 1, 1}, []float64{1, 1, 1}, []float64{1, 1, 1}}
n = Matrix{[]float64{1, 1}, []float64{1, 1}, []float64{1, 1}}
fmt.Printf("\n%v", m.Multiply(n))
}
When not using goroutineļ¼500 * 100000000 times plus one
// 1m12.2857724s
start := time.Now()
for i := 0; i < 500; i++ {
res := 0
for j := 0; j < 100000000; j++ {
res++
}
}
duration := time.Since(start)
fmt.Println(duration)
When using goroutine, 10 goroutines execute 50 * 100000000 times plus one
// 1m12.0174541s
start := time.Now()
ch := make(chan bool)
for i := 0; i < 10; i++ {
go func(ch chan bool) {
for i := 0; i < 50; i++ {
res := 0
for j := 0; j < 100000000; j++ {
res++
}
}
ch <- true
}(ch)
<- ch
}
duration := time.Since(start)
fmt.Println(duration)
Why use goroutine does not save time
The ch channel is unbuffered. You launch a goroutine and send a value on the channel at the end, and right after that, before launching another goroutine you receive from it. This is a blocking operation. You won't start a new goroutine until one is finished. You gain nothing compared to the first solution.
One "solution" is to make the channel buffered, and only start receiving from it once all goroutines have been launched:
ch := make(chan bool, 10)
for i := 0; i < 10; i++ {
go func(ch chan bool) {
for i := 0; i < 50; i++ {
res := 0
for j := 0; j < 100000000; j++ {
res++
}
}
ch <- true
}(ch)
}
for i := 0; i < 10; i++ {
<-ch
}
This will result in almost 4x speedup on my computer (4 CPU cores).
A better, more idiomatic way to wait for all goroutines is to use sync.WaitGroup:
var wg sync.WaitGroup
for i := 0; i < 10; i++ {
wg.Add(1)
go func() {
defer wg.Done()
for i := 0; i < 50; i++ {
res := 0
for j := 0; j < 100000000; j++ {
res++
}
}
}()
}
wg.Wait()
Also note that using multiple goroutines is only worth it if the task they do is "significant", see:
Matrix multiplication with goroutine drops performance
Vectorise a function taking advantage of concurrency
You could effectively launch the 10 goroutines, and receive from each after they complete their work, in a simple way.
Check the CPU usage of your code, and this one
func add(ch chan int) {
r := 0
for i := 0; i < 50; i++ {
for j := 0; j < 100000000; j++ {
r++
}
}
ch <- r
}
func main() {
start := time.Now()
ch := make(chan int)
res := 0
for i := 0; i < 10; i++ {
go add(ch)
}
for i := 0; i < 10; i++ {
res += <-ch
}
duration := time.Since(start)
fmt.Println(duration, res)
}
Output
3.747405005s 50000000000
That will successfully launch the 10 goroutines, each of them perform their work and return the results down the channel once they're done.
Referring to the following benchmarking test codes:
func BenchmarkRuneCountNoDefault(b *testing.B) {
b.StopTimer()
var strings []string
numStrings := 10
for n := 0; n < numStrings; n++{
s := RandStringBytesMaskImprSrc(10)
strings = append(strings, s)
}
jobs := make(chan string)
results := make (chan int)
for i := 0; i < runtime.NumCPU(); i++{
go RuneCountNoDefault(jobs, results)
}
b.StartTimer()
for n := 0; n < b.N; n++ {
go func(){
for n := 0; n < numStrings; n++{
<-results
}
return
}()
for n := 0; n < numStrings; n++{
jobs <- strings[n]
}
}
close(jobs)
}
func RuneCountNoDefault(jobs chan string, results chan int){
for{
select{
case j, ok := <-jobs:
if ok{
results <- utf8.RuneCountInString(j)
} else {
return
}
}
}
}
func BenchmarkRuneCountWithDefault(b *testing.B) {
b.StopTimer()
var strings []string
numStrings := 10
for n := 0; n < numStrings; n++{
s := RandStringBytesMaskImprSrc(10)
strings = append(strings, s)
}
jobs := make(chan string)
results := make (chan int)
for i := 0; i < runtime.NumCPU(); i++{
go RuneCountWithDefault(jobs, results)
}
b.StartTimer()
for n := 0; n < b.N; n++ {
go func(){
for n := 0; n < numStrings; n++{
<-results
}
return
}()
for n := 0; n < numStrings; n++{
jobs <- strings[n]
}
}
close(jobs)
}
func RuneCountWithDefault(jobs chan string, results chan int){
for{
select{
case j, ok := <-jobs:
if ok{
results <- utf8.RuneCountInString(j)
} else {
return
}
default: //DIFFERENCE
}
}
}
//https://stackoverflow.com/questions/22892120/how-to-generate-a-random-string-of-a-fixed-length-in-golang
const letterBytes = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ"
const (
letterIdxBits = 6 // 6 bits to represent a letter index
letterIdxMask = 1<<letterIdxBits - 1 // All 1-bits, as many as letterIdxBits
letterIdxMax = 63 / letterIdxBits // # of letter indices fitting in 63 bits
)
var src = rand.NewSource(time.Now().UnixNano())
func RandStringBytesMaskImprSrc(n int) string {
b := make([]byte, n)
// A src.Int63() generates 63 random bits, enough for letterIdxMax characters!
for i, cache, remain := n-1, src.Int63(), letterIdxMax; i >= 0; {
if remain == 0 {
cache, remain = src.Int63(), letterIdxMax
}
if idx := int(cache & letterIdxMask); idx < len(letterBytes) {
b[i] = letterBytes[idx]
i--
}
cache >>= letterIdxBits
remain--
}
return string(b)
}
When I benchmarked both the functions where one function, RuneCountNoDefault has no default clause in the select and the other, RuneCountWithDefault has a default clause, I'm getting the following benchmark:
BenchmarkRuneCountNoDefault-4 200000 8910 ns/op
BenchmarkRuneCountWithDefault-4 5 277798660 ns/op
Checking the cpuprofile generated by the tests, I noticed that the function with the default clause spends a lot of time in the following channel operations:
Why having a default clause in the goroutine's select makes it slower?
I'm using Go version 1.10 for windows/amd64
The Go Programming Language
Specification
Select statements
If one or more of the communications can proceed, a single one that
can proceed is chosen via a uniform pseudo-random selection.
Otherwise, if there is a default case, that case is chosen. If there
is no default case, the "select" statement blocks until at least one
of the communications can proceed.
Modifying your benchmark to count the number of proceed and default cases taken:
$ go test default_test.go -bench=.
goos: linux
goarch: amd64
BenchmarkRuneCountNoDefault-4 300000 4108 ns/op
BenchmarkRuneCountWithDefault-4 10 209890782 ns/op
--- BENCH: BenchmarkRuneCountWithDefault-4
default_test.go:90: proceeds 114
default_test.go:91: defaults 128343308
$
While other cases were unable to proceed, the default case was taken 128343308 times in 209422470, (209890782 - 114*4108), nanoseconds or 1.63 nanoseconds per default case. If you do something small a large number of times, it adds up.
default_test.go:
package main
import (
"math/rand"
"runtime"
"sync/atomic"
"testing"
"time"
"unicode/utf8"
)
func BenchmarkRuneCountNoDefault(b *testing.B) {
b.StopTimer()
var strings []string
numStrings := 10
for n := 0; n < numStrings; n++ {
s := RandStringBytesMaskImprSrc(10)
strings = append(strings, s)
}
jobs := make(chan string)
results := make(chan int)
for i := 0; i < runtime.NumCPU(); i++ {
go RuneCountNoDefault(jobs, results)
}
b.StartTimer()
for n := 0; n < b.N; n++ {
go func() {
for n := 0; n < numStrings; n++ {
<-results
}
return
}()
for n := 0; n < numStrings; n++ {
jobs <- strings[n]
}
}
close(jobs)
}
func RuneCountNoDefault(jobs chan string, results chan int) {
for {
select {
case j, ok := <-jobs:
if ok {
results <- utf8.RuneCountInString(j)
} else {
return
}
}
}
}
var proceeds ,defaults uint64
func BenchmarkRuneCountWithDefault(b *testing.B) {
b.StopTimer()
var strings []string
numStrings := 10
for n := 0; n < numStrings; n++ {
s := RandStringBytesMaskImprSrc(10)
strings = append(strings, s)
}
jobs := make(chan string)
results := make(chan int)
for i := 0; i < runtime.NumCPU(); i++ {
go RuneCountWithDefault(jobs, results)
}
b.StartTimer()
for n := 0; n < b.N; n++ {
go func() {
for n := 0; n < numStrings; n++ {
<-results
}
return
}()
for n := 0; n < numStrings; n++ {
jobs <- strings[n]
}
}
close(jobs)
b.Log("proceeds", atomic.LoadUint64(&proceeds))
b.Log("defaults", atomic.LoadUint64(&defaults))
}
func RuneCountWithDefault(jobs chan string, results chan int) {
for {
select {
case j, ok := <-jobs:
atomic.AddUint64(&proceeds, 1)
if ok {
results <- utf8.RuneCountInString(j)
} else {
return
}
default: //DIFFERENCE
atomic.AddUint64(&defaults, 1)
}
}
}
//https://stackoverflow.com/questions/22892120/how-to-generate-a-random-string-of-a-fixed-length-in-golang
const letterBytes = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ"
const (
letterIdxBits = 6 // 6 bits to represent a letter index
letterIdxMask = 1<<letterIdxBits - 1 // All 1-bits, as many as letterIdxBits
letterIdxMax = 63 / letterIdxBits // # of letter indices fitting in 63 bits
)
var src = rand.NewSource(time.Now().UnixNano())
func RandStringBytesMaskImprSrc(n int) string {
b := make([]byte, n)
// A src.Int63() generates 63 random bits, enough for letterIdxMax characters!
for i, cache, remain := n-1, src.Int63(), letterIdxMax; i >= 0; {
if remain == 0 {
cache, remain = src.Int63(), letterIdxMax
}
if idx := int(cache & letterIdxMask); idx < len(letterBytes) {
b[i] = letterBytes[idx]
i--
}
cache >>= letterIdxBits
remain--
}
return string(b)
}
Playground: https://play.golang.org/p/DLnAY0hovQG
now I start learning Go language by watching this great course. To be clear for years I write only on PHP and concurrency/parallelism is new for me, so I little confused by this.
In this course, there is a task to create a program which calculates factorial with 100 computations. I went a bit further and to comparing performance I changed it to 10000 and for some reason, the sequential program works same or even faster than concurrency.
Here I'm going to provide 3 solutions: mine, teachers and sequential
My solution:
package main
import (
"fmt"
)
func gen(steps int) <-chan int{
out := make(chan int)
go func() {
for j:= 0; j <steps; j++ {
out <- j
}
close(out)
}()
return out
}
func factorial(in <-chan int) <-chan int {
out := make(chan int)
go func() {
for n := range in {
out <- fact(n)
}
close(out)
}()
return out
}
func fact(n int) int {
total := 1
for i := n;i>0;i-- {
total *=i
}
return total
}
func main() {
steps := 10000
for i := 0; i < steps; i++ {
for n:= range factorial(gen(10)) {
fmt.Println(n)
}
}
}
execution time:
real 0m6,356s
user 0m3,885s
sys 0m0,870s
Teacher solution:
package main
import (
"fmt"
)
func gen(steps int) <-chan int{
out := make(chan int)
go func() {
for i := 0; i < steps; i++ {
for j:= 0; j <10; j++ {
out <- j
}
}
close(out)
}()
return out
}
func factorial(in <-chan int) <-chan int {
out := make(chan int)
go func() {
for n := range in {
out <- fact(n)
}
close(out)
}()
return out
}
func fact(n int) int {
total := 1
for i := n;i>0;i-- {
total *=i
}
return total
}
func main() {
steps := 10000
for n:= range factorial(gen(steps)) {
fmt.Println(n)
}
}
execution time:
real 0m2,836s
user 0m1,388s
sys 0m0,492s
Sequential:
package main
import (
"fmt"
)
func fact(n int) int {
total := 1
for i := n;i>0;i-- {
total *=i
}
return total
}
func main() {
steps := 10000
for i := 0; i < steps; i++ {
for j:= 0; j <10; j++ {
fmt.Println(fact(j))
}
}
}
execution time:
real 0m2,513s
user 0m1,113s
sys 0m0,387s
So, as you can see the sequential solution is fastest, teachers solution is in the second place and my solution is third.
First question: why the sequential solution is fastest?
And second why my solution is so slow? if I understanding correctly in my solution I'm creating 10000 goroutines inside gen and 10000 inside factorial and in teacher solution, he is creating only 1 goroutine in gen and 1 in factorial. My so slow because I'm creating too many unneeded goroutines?
It's the difference between concurrency and parellelism - your's, you teachers and the sequential are progressively less concurrent in design but how parallel they are depends on number of CPU cores and there is a set up and communication cost associated with concurrency. There are no asynchronous calls in the code so only parallelism will improve speed.
This is worth a look: https://blog.golang.org/concurrency-is-not-parallelism
Also, even with parallel cores speedup will be dependent on nature of the workload - google Amdahl's law for explanation.
Let's start with some fundamental benchmarks for factorial computation.
$ go test -run=! -bench=. factorial_test.go
goos: linux
goarch: amd64
BenchmarkFact0-4 1000000000 2.07 ns/op
BenchmarkFact9-4 300000000 4.37 ns/op
BenchmarkFact0To9-4 50000000 36.0 ns/op
BenchmarkFact10K0To9-4 3000 384069 ns/op
$
The CPU time is very small, even for 10,000 iterations of factorials zero through nine.
factorial_test.go:
package main
import "testing"
func fact(n int) int {
total := 1
for i := n; i > 0; i-- {
total *= i
}
return total
}
var sinkFact int
func BenchmarkFact0(b *testing.B) {
for N := 0; N < b.N; N++ {
j := 0
sinkFact = fact(j)
}
}
func BenchmarkFact9(b *testing.B) {
for N := 0; N < b.N; N++ {
j := 9
sinkFact = fact(j)
}
}
func BenchmarkFact0To9(b *testing.B) {
for N := 0; N < b.N; N++ {
for j := 0; j < 10; j++ {
sinkFact = fact(j)
}
}
}
func BenchmarkFact10K0To9(b *testing.B) {
for N := 0; N < b.N; N++ {
steps := 10000
for i := 0; i < steps; i++ {
for j := 0; j < 10; j++ {
sinkFact = fact(j)
}
}
}
}
Let's look at the time for the sequential program.
$ go build -a sequential.go && time ./sequential
real 0m0.247s
user 0m0.054s
sys 0m0.149s
Writing to the terminal is obviously a major bottleneck. Let's write to a sink.
$ go build -a sequential.go && time ./sequential > /dev/null
real 0m0.070s
user 0m0.049s
sys 0m0.020s
It's still a lot more than the 0m0.000000384069s for the factorial computation.
sequential.go:
package main
import (
"fmt"
)
func fact(n int) int {
total := 1
for i := n; i > 0; i-- {
total *= i
}
return total
}
func main() {
steps := 10000
for i := 0; i < steps; i++ {
for j := 0; j < 10; j++ {
fmt.Println(fact(j))
}
}
}
Attempts to use concurrency for such a trivial amount of parallel work are likely to fail. Go goroutines and channels are cheap, but they are not free. Also, a single channel and a single terminal are the bottleneck, the limiting factor, even when writing to a sink. See Amdahl's Law for parallel computing. See Concurrency is not parallelism.
$ go build -a teacher.go && time ./teacher > /dev/null
real 0m0.123s
user 0m0.123s
sys 0m0.022s
$ go build -a student.go && time ./student > /dev/null
real 0m0.135s
user 0m0.113s
sys 0m0.038s
teacher.go:
package main
import (
"fmt"
)
func gen(steps int) <-chan int {
out := make(chan int)
go func() {
for i := 0; i < steps; i++ {
for j := 0; j < 10; j++ {
out <- j
}
}
close(out)
}()
return out
}
func factorial(in <-chan int) <-chan int {
out := make(chan int)
go func() {
for n := range in {
out <- fact(n)
}
close(out)
}()
return out
}
func fact(n int) int {
total := 1
for i := n; i > 0; i-- {
total *= i
}
return total
}
func main() {
steps := 10000
for n := range factorial(gen(steps)) {
fmt.Println(n)
}
}
student.go:
package main
import (
"fmt"
)
func gen(steps int) <-chan int {
out := make(chan int)
go func() {
for j := 0; j < steps; j++ {
out <- j
}
close(out)
}()
return out
}
func factorial(in <-chan int) <-chan int {
out := make(chan int)
go func() {
for n := range in {
out <- fact(n)
}
close(out)
}()
return out
}
func fact(n int) int {
total := 1
for i := n; i > 0; i-- {
total *= i
}
return total
}
func main() {
steps := 10000
for i := 0; i < steps; i++ {
for n := range factorial(gen(10)) {
fmt.Println(n)
}
}
}
Why is there a deadlock even tho I just pass one and get one output from the channel?
package main
import "fmt"
import "math/cmplx"
func max(a []complex128, base int, ans chan float64, index chan int) {
fmt.Printf("called for %d,%d\n",len(a),base)
maxi_i := 0
maxi := cmplx.Abs(a[maxi_i]);
for i:=1 ; i< len(a) ; i++ {
if cmplx.Abs(a[i]) > maxi {
maxi_i = i
maxi = cmplx.Abs(a[i])
}
}
fmt.Printf("called for %d,%d and found %f %d\n",len(a),base,maxi,base+maxi_i)
ans <- maxi
index <- base+maxi_i
}
func main() {
ans := make([]complex128,128)
numberOfSlices := 4
incr := len(ans)/numberOfSlices
tmp_val := make([]chan float64,numberOfSlices)
tmp_index := make([]chan int,numberOfSlices)
for i,j := 0 , 0; i < len(ans); j++{
fmt.Printf("From %d to %d - %d\n",i,i+incr,len(ans))
go max(ans[i:i+incr],i,tmp_val[j],tmp_index[j])
i = i+ incr
}
//After Here is it stops deadlock
maximumFreq := <- tmp_index[0]
maximumMax := <- tmp_val[0]
for i := 1; i < numberOfSlices; i++ {
tmpI := <- tmp_index[i]
tmpV := <- tmp_val[i]
if(tmpV > maximumMax ) {
maximumMax = tmpV
maximumFreq = tmpI
}
}
fmt.Printf("Max freq = %d",maximumFreq)
}
For those reading this question and perhaps wondering why his code failed here's an explanation.
When he constructed his slice of channels like so:
tmp_val := make([]chan float64,numberOfSlices)
He made slice of channels where every index was to the channels zero value. A channels zero value is nil since channels are reference types and a nil channel blocks on send forever and since there is never anything in a nil channel it will also block on recieve forever. Thus you get a deadlock.
When footy changes his code to construct each channel individually using
tmp_val[i] = make(chan float64)
in a loop he constructs non-nil channels and everything is good.
I was wrong in making of the chan. Should have done
numberOfSlices := 4
incr := len(ans)/numberOfSlices
var tmp_val [4]chan float64
var tmp_index [4]chan int
for i := range tmp_val {
tmp_val[i] = make(chan float64)
tmp_index[i] = make(chan int)
}
for i,j := 0 , 0; i < len(ans); j++{
fmt.Printf("From %d to %d [j:%d] - %d\n",i,i+incr,j,len(ans))
go maximumFunc(ans[i:i+incr],i,tmp_val[j],tmp_index[j])
i = i+ incr
}