Say, we have three methods to implement "fan in" behavior
func MakeChannel(tries int) chan int {
ch := make(chan int)
go func() {
for i := 0; i < tries; i++ {
ch <- i
}
close(ch)
}()
return ch
}
func MergeByReflection(channels ...chan int) chan int {
length := len(channels)
out := make(chan int)
cases := make([]reflect.SelectCase, length)
for i, ch := range channels {
cases[i] = reflect.SelectCase{Dir: reflect.SelectRecv, Chan: reflect.ValueOf(ch)}
}
go func() {
for length > 0 {
i, line, opened := reflect.Select(cases)
if !opened {
cases[i].Chan = reflect.ValueOf(nil)
length -= 1
} else {
out <- int(line.Int())
}
}
close(out)
}()
return out
}
func MergeByCode(channels ...chan int) chan int {
length := len(channels)
out := make(chan int)
go func() {
var i int
var ok bool
for length > 0 {
select {
case i, ok = <-channels[0]:
out <- i
if !ok {
channels[0] = nil
length -= 1
}
case i, ok = <-channels[1]:
out <- i
if !ok {
channels[1] = nil
length -= 1
}
case i, ok = <-channels[2]:
out <- i
if !ok {
channels[2] = nil
length -= 1
}
case i, ok = <-channels[3]:
out <- i
if !ok {
channels[3] = nil
length -= 1
}
case i, ok = <-channels[4]:
out <- i
if !ok {
channels[4] = nil
length -= 1
}
}
}
close(out)
}()
return out
}
func MergeByGoRoutines(channels ...chan int) chan int {
var group sync.WaitGroup
out := make(chan int)
for _, ch := range channels {
go func(ch chan int) {
for i := range ch {
out <- i
}
group.Done()
}(ch)
}
group.Add(len(channels))
go func() {
group.Wait()
close(out)
}()
return out
}
type MergeFn func(...chan int) chan int
func main() {
length := 5
tries := 1000000
channels := make([]chan int, length)
fns := []MergeFn{MergeByReflection, MergeByCode, MergeByGoRoutines}
for _, fn := range fns {
sum := 0
t := time.Now()
for i := 0; i < length; i++ {
channels[i] = MakeChannel(tries)
}
for i := range fn(channels...) {
sum += i
}
fmt.Println(time.Since(t))
fmt.Println(sum)
}
}
Results are (at 1 CPU, I have used runtime.GOMAXPROCS(1)):
19.869s (MergeByReflection)
2499997500000
8.483s (MergeByCode)
2499997500000
4.977s (MergeByGoRoutines)
2499997500000
Results are (at 2 CPU, I have used runtime.GOMAXPROCS(2)):
44.94s
2499997500000
10.853s
2499997500000
3.728s
2499997500000
I understand the reason why MergeByReflection is slowest, but what is about the difference between MergeByCode and MergeByGoRoutines?
And when we increase the CPU number why "select" clause (used MergeByReflection directly and in MergeByCode indirectly) becomes slower?
Here is a preliminary remark. The channels in your examples are all unbuffered, meaning they will likely block at put or get time.
In this example, there is almost no processing except channel management. The performance is therefore dominated by synchronization primitives. Actually, there is very little of this code that can be parallelized.
In the MergeByReflection and MergeByCode functions, select is used to listen to multiple input channels, but nothing is done to take in account the output channel (which may therefore block, while some event could be available on one of the input channels).
In the MergeByGoRoutines function, this situation cannot happen: when the output channel blocks, it does not prevent an other input channel to be read by another goroutine. There are therefore better opportunities for the runtime to parallelize the goroutines, and less contention on the input channels.
The MergeByReflection code is the slowest because it has the overhead of reflection, and almost nothing can be parallelized.
The MergeByGoRoutines function is the fastest because it reduces the contention (less synchronization is needed), and because output contention has a lesser impact on the input performance. It can therefore benefit of a small improvement when running with multiple cores (contrary to the two other methods).
There is so much synchronization activity with MergeByReflection and MergeByCode, that running on multiple cores negatively impacts the performance. You could have different performance by using buffered channels though.
Related
I want to make X number of goroutines to update CountValue using parallelism (numRoutines are as much as how many CPU you have).
Solution 1:
func count(numRoutines int) (countValue int) {
var mu sync.Mutex
k := func(i int) {
mu.Lock()
defer mu.Unlock()
countValue += 5
}
for i := 0; i < numRoutines; i++ {
go k(i)
}
It becomes a data race and the returned countValue = 0.
Solution 2:
func count(numRoutines int) (countValue int) {
k := func(i int, c chan int) {
c <- 5
}
c := make(chan int)
for i := 0; i < numRoutines; i++ {
go k(i, c)
}
for i := 0; i < numRoutines; i++ {
countValue += <- c
}
return
}
I did a benchmark test on it and doing a sequential addition would work faster than using goroutines. I think it's because I have two for loops here as when I put countValue += <- c inside the first for loop, the code runs faster.
Solution 3:
func count(numRoutines int) (countValue int) {
var wg sync.WaitGroup
c := make(chan int)
k := func(i int) {
defer wg.Done()
c <- 5
}
for i := 0; i < numShards; i++ {
wg.Add(1)
go k(i)
}
go func() {
for i := range c {
countValue += i
}
}()
wg.Wait()
return
}
Still a race count :/
Is there any way better to do this?
There definitely is a better way to safely increment a variable: using sync/atomic:
import "sync/atomic"
var words int64
k := func() {
_ = atomic.AddInt64(&words, 5) // increment atomically
}
Using a channel basically eliminates the need for a mutex, or takes away the the risk of concurrent access to the variable itself, and a waitgroup here is just a bit overkill
Channels:
words := 0
done := make(chan struct{}) // or use context
ch := make(chan int, numRoutines) // buffer so each routine can write
go func () {
read := 0
for i := range ch {
words += 5 // or use i or something
read++
if read == numRoutines {
break // we've received data from all routines
}
}
close(done) // indicate this routine has terminated
}()
for i := 0; i < numRoutines; i++ {
ch <- i // write whatever value needs to be used in the counting routine on the channel
}
<- done // wait for our routine that increments words to return
close(ch) // this channel is no longer needed
fmt.Printf("Counted %d\n", words)
As you can tell, the numRoutines no longer is the number of routines, but rather the number of writes on the channel. You can move that to individual routines, still:
for i := 0; i < numRoutines; i++ {
go func(ch chan<- int, i int) {
// do stuff here
ch <- 5 * i // for example
}(ch, i)
}
WaitGroup:
Instead of using a context that you can cancel, or a channel, you can use a waitgroup + atomic to get the same result. The easiest way IMO to do so is to create a type:
type counter struct {
words int64
}
func (c *counter) doStuff(wg *sync.WaitGroup, i int) {
defer wg.Done()
_ = atomic.AddInt64(&c.words, i * 5) // whatever value you need to add
}
func main () {
cnt := counter{}
wg := sync.WaitGroup{}
wg.Add(numRoutines) // create the waitgroup
for i := 0; i < numRoutines; i++ {
go cnt.doStuff(&wg, i)
}
wg.Wait() // wait for all routines to finish
fmt.Println("Counted %d\n", cnt.words)
}
Fix for your third solution
As I mentioned in the comment: your third solution is still causing a race condition because the channel c is never closed, meaning the routine:
go func () {
for i := range c {
countValue += i
}
}()
Never returns. The waitgroup also only ensures that you've sent all values on the channel, but not that the countValue has been incremented to its final value. The fix would be to either close the channel after wg.Wait() returns so the routine can return, and add a done channel that you can close when this last routine returns, and add a <-done statement before returning.
func count(numRoutines int) (countValue int) {
var wg sync.WaitGroup
c := make(chan int)
k := func(i int) {
defer wg.Done()
c <- 5
}
for i := 0; i < numShards; i++ {
wg.Add(1)
go k(i)
}
done := make(chan struct{})
go func() {
for i := range c {
countValue += i
}
close(done)
}()
wg.Wait()
close(c)
<-done
return
}
This adds some clutter, though, and IMO is a bit messy. It might just be easier to just move the wg.Wait() call to a routine:
func count(numRoutines int) (countValue int) {
var wg sync.WaitGroup
c := make(chan int)
// add wg as argument, makes it easier to move this function outside of this scope
k := func(wg *sync.WaitGroup, i int) {
defer wg.Done()
c <- 5
}
wg.Add(numShards) // increment the waitgroup once
for i := 0; i < numShards; i++ {
go k(&wg, i)
}
go func() {
wg.Wait()
close(c) // this ends the loop over the channel
}()
// just iterate over the channel until it is closed
for i := range c {
countValue += i
}
// we've added all values to countValue
return
}
Im studying Golang now on my freetime and I am trying sample exams online to test what i learned,
I came about this coding exam task but I cant seem to make it work/run without a crash,
im getting fatal error: all goroutines are asleep - deadlock! error, can anybody help what I am doing wrong here?
func executeParallel(ch chan<- int, done chan<- bool, functions ...func() int) {
ch <- functions[1]()
done <- true
}
func exampleFunction(counter int) int {
sum := 0
for i := 0; i < counter; i++ {
sum += 1
}
return sum
}
func main() {
expensiveFunction := func() int {
return exampleFunction(200000000)
}
cheapFunction := func() int {return exampleFunction(10000000)}
ch := make(chan int)
done := make(chan bool)
go executeParallel(ch, done, expensiveFunction, cheapFunction)
var isDone = <-done
for result := range ch {
fmt.Printf("Result: %d\n", result)
if isDone {
break;
}
}
}
Your executeParallel function will panic if less than 2 functions are provided - and will only run the 2nd function:
ch <- functions[1]() // runtime panic if less then 2 functions
I think it should look more like this: running all input functions in parallel and grabbing the first result.
for _, fn := range functions {
fn := fn // so each iteration/goroutine gets the proper value
go func() {
select {
case ch <- fn():
// first (fastest worker) wins
default:
// other workers results are discarded (if reader has not read results yet)
// this ensure we don't leak goroutines - since reader only reads one result from channel
}
}()
}
As such there's no need for a done channel - as we just need to read the one and only (quickest) result:
ch := make(chan int, 1) // big enough to capture one result - even if reader is not reading yet
executeParallel(ch, expensiveFunction, cheapFunction)
fmt.Printf("Result: %d\n", <-ch)
https://play.golang.org/p/skXc3gZZmRn
package main
import "fmt"
func executeParallel(ch chan<- int, done chan<- struct{}, functions ...func() int) {
// Only execute the second function [1], if available.
if len(functions) > 1 {
ch <- functions[1]()
}
// Close the done channel to signal the for-select to break and the main returns.
close(done)
}
// example returns the number of iterations for [0..counter-1]
func example(counter int) int {
sum := 0
for i := 0; i < counter; i++ {
sum += 1
}
return sum
// NOTE(SS): This function could just return "counter-1"
// to avoid the unnecessary calculation done above.
}
func main() {
var (
cheap = func() int { return example(10000000) }
expensive = func() int { return example(200000000) }
ch = make(chan int)
done = make(chan struct{})
)
// executeParallel takes ch, done channel followed by variable
// number of functions where on the second i.e., indexed 1
// function is executed on a separated goroutine which is then
// sent to ch channel which is then received by the for-select
// reciever below i.e., <-ch is the receiver.
go executeParallel(ch, done, expensive, cheap)
for {
select {
// Wait for something to be sent to done or the done channel
// to be closed.
case <-done:
return
// Keep receiving from ch (if something is sent to it)
case result := <-ch:
fmt.Println("Result:", result)
}
}
}
I have commented on the code so that it's understandable. As you didn't the actual question the logic could be still wrong.
I'm using this: (symbols is []string as well as filteredSymbols)
concurrency := 5
sem := make(chan bool, concurrency)
for i := range symbols {
sem <- true
go func(int) {
defer func() { <-sem }()
rows, err := stmt.Query(symbols[i])
if <some condition is true> {
filteredSymbols = append(filteredSymbols, symbols[i])
}
}(i)
}
for i := 0; i < cap(sem); i++ {
sem <- true
}
to limit number of goroutines running concurrently. I need to limit it because every goroutine interacts with Postgres database and sometimes I do have more than 3000 symbols to evaluate. The code is for analysing big financial data, stocks and other securities. I'm also using same code to get OHLC and pre-calculated data from db. Is this a modern approach for this? I'm asking this because WaitGroups already exist and I'm looking for a way to use those instead.
Also, I observed that my method above sometimes yield different results. I had a code where sometimes the resulting number of filteredSymbols is 1409. Without changing the parameters, it would then yield 1407 results, then 1408 at times.
I even had a code where there were big deficit in results.
The code below was very inconsistent so I removed the concurrency. (NOTE that in code below, I don't even have to limit concurrent goroutines since they only use in-memory resources). Removing concurrency fixed it
func getCommonSymbols(symbols1 []string, symbols2 []string) (symbols []string) {
defer timeTrack(time.Now(), "Get common symbols")
// concurrency := len(symbols1)
// sem := make(chan bool, concurrency)
// for _, s := range symbols1 {
for _, sym := range symbols1 {
// sym := s
// sem <- true
// go func(string) {
// defer func() { <-sem }()
for k := range symbols2 {
if sym == symbols2[k] {
symbols = append(symbols, sym)
break
}
}
// }(sym)
}
// for i := 0; i < cap(sem); i++ {
// sem <- true
// }
return
}
You have a data race, multiple goroutines are updating filteredSymbols at the same time. The smallest change you can make to fix it is to add a mutex lock around the append call, e.g.
concurrency := 5
sem := make(chan bool, concurrency)
l := sync.Mutex{}
for i := range symbols {
sem <- true
go func(int) {
defer func() { <-sem }()
rows, err := stmt.Query(symbols[i])
if <some condition is true> {
l.Lock()
filteredSymbols = append(filteredSymbols, symbols[i])
l.Unlock()
}
}(i)
}
for i := 0; i < cap(sem); i++ {
sem <- true
}
The Race Detector could of helped you spot this as well.
One common alternative would be to use a channel to get work into a goroutine, and a channel to get the results out, something like.
concurrency := 5
workCh := make(chan string, concurrency)
resCh := make(chan string, concurrency)
workersWg := sync.WaitGroup{}
// start the required number of workers, use the WaitGroup to see when they're done
for i := 0; i < concurrency; i++ {
workersWg.Add(1)
go func() {
defer workersWg.Done()
for symbol := range workCh {
// do some work
if cond {
resCh <- symbol
}
}
}()
}
go func() {
// when all the workers are done, close the resultsCh
workersWg.Wait()
close(resCh)
}()
// submit all the work
for _, s := range symbols {
workCh <- s
}
close(workCh)
// collect up the results
for r := range resCh {
filteredSymbols = append(filteredSymbols, r)
}
I am trying the fan in - fan out pattern with a factorial problem. But I am getting:
fatal error: all goroutines are asleep - deadlock!
and unable to identify the reason for deadlock.
I am trying to concurrently calculate factorial for 100 numbers using the fan-in fan-out pattern.
package main
import (
"fmt"
)
func main() {
_inChannel := _inListener(generator())
for val := range _inChannel {
fmt.Print(val, " -- ")
}
}
func generator() chan int { // NEED TO CALCULATE FACTORIAL FOR 100 NUMBERS
ch := make(chan int) // CREATE CHANNEL TO INPUT NUMBERS
go func() {
for i := 1; i <= 100; i++ {
ch <- i
}
close(ch) // CLOSE CHANNEL WHEN ALL NUMBERS HAVE BEEN WRITTEM
}()
return ch
}
func _inListener(ch chan int) chan int {
rec := make(chan int) // CHANNEL RECEIVED FROM GENERATOR
go func() {
for num := range ch { // RECEIVE THE INPUT NUMBERS FROM GENERATOR
result := factorial(num) // RESULT IS A NEW CHANNEL CREATED
rec <- <-result // MERGE INTO A SINGLE CHANNEL; rec
close(result)
}
close(rec)
}()
return rec // RETURN THE DEDICATED CHANNEL TO RECEIVE ALL OUTPUTS
}
func factorial(n int) chan int {
ch := make(chan int) // MAKE A NEW CHANNEL TO OUTPUT THE RESULT
// OF FACTORIAL
total := 1
for i := n; i > 0; i-- {
total *= i
}
ch <- total
return ch // RETURN THE CHANNEL HAVING THE FACTORIAL CALCULATED
}
I have put in comments, so that it becomes easier to follow the code.
I'm no expert in channels. I've taking on this to try and get more familiar with go.
Another issue is the int isn't large enough to take all factorials over 20 or so.
As you can see, I added a defer close as well as a logical channel called done in the generator func. The rest of the changes probably aren't needed. With channels you need to make sure something is ready to take off a value on the channel when you put something on a channel. Otherwise deadlock. Also, using
go run -race main.go
helps at least see which line(s) are causing problems.
I hope this helps and isn't removed for being off topic.
I was able to remove the deadlock by doing this:
package main
import (
"fmt"
)
func main() {
_gen := generator()
_inChannel := _inListener(_gen)
for val := range _inChannel {
fmt.Print(val, " -- \n")
}
}
func generator() chan int { // NEED TO CALCULATE FACTORIAL FOR 100 NUMBERS
ch := make(chan int) // CREATE CHANNEL TO INPUT NUMBERS
done := make(chan bool)
go func() {
defer close(ch)
for i := 1; i <= 100; i++ {
ch <- i
}
//close(ch) // CLOSE CHANNEL WHEN ALL NUMBERS HAVE BEEN WRITTEM
done <- true
}()
// this function will pull off the done for each function call above.
go func() {
for i := 1; i < 100; i++ {
<-done
}
}()
return ch
}
func _inListener(ch chan int) chan int {
rec := make(chan int) // CHANNEL RECEIVED FROM GENERATOR
go func() {
for num := range ch { // RECEIVE THE INPUT NUMBERS FROM GENERATOR
result := factorial(num) // RESULT IS A NEW CHANNEL CREATED
rec <- result // MERGE INTO A SINGLE CHANNEL; rec
}
close(rec)
}()
return rec // RETURN THE DEDICATED CHANNEL TO RECEIVE ALL OUTPUTS
}
func factorial(n int) int {
// OF FACTORIAL
total := 1
for i := n; i > 0; i-- {
total *= i
}
return total // RETURN THE CHANNEL HAVING THE FACTORIAL CALCULATED
}
I have two versions of factorial. Concurrent vs Sequencial.
Both the program will calculate factorial of 10 "1000000" times.
Factorial Concurrent Processing
package main
import (
"fmt"
//"math/rand"
"sync"
"time"
//"runtime"
)
func main() {
start := time.Now()
printFact(fact(gen(1000000)))
fmt.Println("Current Time:", time.Now(), "Start Time:", start, "Elapsed Time:", time.Since(start))
panic("Error Stack!")
}
func gen(n int) <-chan int {
c := make(chan int)
go func() {
for i := 0; i < n; i++ {
//c <- rand.Intn(10) + 1
c <- 10
}
close(c)
}()
return c
}
func fact(in <-chan int) <-chan int {
out := make(chan int)
var wg sync.WaitGroup
for n := range in {
wg.Add(1)
go func(n int) {
//temp := 1
//for i := n; i > 0; i-- {
// temp *= i
//}
temp := calcFact(n)
out <- temp
wg.Done()
}(n)
}
go func() {
wg.Wait()
close(out)
}()
return out
}
func printFact(in <-chan int) {
//for n := range in {
// fmt.Println("The random Factorial is:", n)
//}
var i int
for range in {
i ++
}
fmt.Println("Count:" , i)
}
func calcFact(c int) int {
if c == 0 {
return 1
} else {
return calcFact(c-1) * c
}
}
//###End of Factorial Concurrent
Factorial Sequencial Processing
package main
import (
"fmt"
//"math/rand"
"time"
"runtime"
)
func main() {
start := time.Now()
//for _, n := range factorial(gen(10000)...) {
// fmt.Println("The random Factorial is:", n)
//}
var i int
for range factorial(gen(1000000)...) {
i++
}
fmt.Println("Count:" , i)
fmt.Println("Current Time:", time.Now(), "Start Time:", start, "Elapsed Time:", time.Since(start))
}
func gen(n int) []int {
var out []int
for i := 0; i < n; i++ {
//out = append(out, rand.Intn(10)+1)
out = append(out, 10)
}
println(len(out))
return out
}
func factorial(val ...int) []int {
var out []int
for _, n := range val {
fa := calcFact(n)
out = append(out, fa)
}
return out
}
func calcFact(c int) int {
if c == 0 {
return 1
} else {
return calcFact(c-1) * c
}
}
//###End of Factorial sequential processing
My assumption was concurrent processing will be faster than sequential but sequential is executing faster than concurrent in my windows machine.
I am using 8 core/ i7 / 32 GB RAM.
I am not sure if there is something wrong in the programs or my basic understanding is correct.
p.s. - I am new to GoLang.
Concurrent version of your program will always be slow compared to the sequential version. The reason however, is related to the nature and behavior of problem you are trying to solve.
Your program is concurrent but it is not parallel. Each callFact is running in it's own goroutine but there is no division of the amount of work required to be done. Each goroutine must perform the same computation and output the same value.
It is like having a task that requires some text to be copied a hundred times. You have just one CPU (ignore the cores for now).
When you start a sequential process, you point the CPU to the original text once, and ask it to write it down a 100 times. The CPU has to manage a single task.
With goroutines, the CPU is told that there are a hundred tasks that must be done concurrently. It just so happens that they are all the same tasks. But CPU is not smart enough to know that.
So it does the same thing as above. Even though each task now is a 100 times smaller, there is still just one CPU. So the amount of work CPU has to do is still the same, except with all the added overhead of managing 100 different things at once. Hence, it looses a part of its efficiency.
To see an improvement in performance you'll need proper parallelism. A simple example would be to split the factorial input number roughly in the middle and compute 2 smaller factorials. Then combine them together:
// not an ideal solution
func main() {
ch := make(chan int)
r := 10
result := 1
go fact(r, ch)
for i := range ch {
result *= i
}
fmt.Println(result)
}
func fact(n int, ch chan int) {
p := n/2
q := p + 1
var wg sync.WaitGroup
wg.Add(2)
go func() {
ch <- factPQ(1, p)
wg.Done()
}()
go func() {
ch <- factPQ(q, n)
wg.Done()
}()
go func() {
wg.Wait()
close(ch)
}()
}
func factPQ(p, q int) int {
r := 1
for i := p; i <= q; i++ {
r *= i
}
return r
}
Working code: https://play.golang.org/p/xLHAaoly8H
Now you have two goroutines working towards the same goal and not just repeating the same calculations.
Note about CPU cores:
In your original code, the sequential version's operations are most definitely being distributed amongst various CPU cores by the runtime environment and the OS. So it still has parallelism to a degree, you just don't controll it.
The same is happening in the concurrent version but again as mentioned above, the overhead of goroutine context switching makes the performance come down.
abhink has given a good answer. I would also like to draw attention to Amdahl's Law, which should always be borne in mind when trying to use parallel processing to increase the overall speed of computation. That's not to say "don't make things parallel", but rather: be realistic about expectations and understand the parallel architecture fully.
Go allows us to write concurrent programs. This is related to trying to write faster parallel programs, but the two issues are separate. See Rob Pike's Concurrency is not Parallelism for more info.