Concurrency vs parallelism when executing a goroutine - go

A fairly naive go question. I was going through go-concurrency tutorial and I came across this https://tour.golang.org/concurrency/4.
I modified the code to add a print statement in the fibonacci function. So the code looks something like
package main
import (
"fmt"
)
func fibonacci(n int, c chan int) {
x, y := 0, 1
for i := 0; i < n; i++ {
c <- x
x, y = y, x+y
fmt.Println("here")
}
close(c)
}
func main() {
c := make(chan int, 10)
go fibonacci(cap(c), c)
for i := range c {
fmt.Println(i)
}
}
And I got this as an output
here
here
here
here
here
here
here
here
here
here
0
1
1
2
3
5
8
13
21
34
I was expecting here and the numbers to be interleaved. (Since the routine gets executed concurrently)
I think I am missing something basic about go-routines. Not quite sure what though.

A few things here.
You have 2 goroutines, one running main(), and one running fibonacci(). Because this is a small program, there isn't a good reason for the go scheduler not to run them one after another on the same thread, so that's what happens consistently, though it isn't guaranteed. Because the goroutine in main() is waiting for the chan, the fibonacci() routine is scheduled first. It's important to remember that goroutines aren't threads, they're routines that the go scheduler runs on threads according to its liking.
Because you're passing the length of the buffered channel to fibonacci() there will almost certainly (never rely on this behavior) be cap(c) heres printed after which the channel is filled, the for loop finishes, a close is sent to the chan, and the goroutine finishes. Then the main() goroutine is scheduled and cap(c) fibonacci's will be printed. If the buffered chan had filled up, then main() would have been rescheduled:
https://play.golang.org/p/_IgFIO1K-Dc
By sleeping you can tell the go scheduler to give up control. But in practice never do this. Restructure in some way or, if you must, use a Waitgroup. See: https://play.golang.org/p/Ln06-NYhQDj
I think you're trying to do this: https://play.golang.org/p/8Xo7iCJ8Gj6

I think what you are observing is that Go has its own scheduler, and at the same time there is a distinction between "concurrency" and "parallelism". In the words of Rob Pike: Concurrency is not Parallelism
Goroutines are much more lightweight than OS threads and they are managed in "userland" (within the Go process) as opposed to the operating system. Some programs have many thousands (even tens of thousands) of goroutines running, whilst there would certainly be far fewer operating system threads allocated. (This is one of Go's major strengths in asynchronous programs with many routines)
Because your program is so simple, and the channel buffered, it does not block on writing to the channel:
c <- x
The fibonacci goroutine isn't getting preempted before it completes the short loop.
Even the fmt.Println("here") doesn't deterministically introduce preemption - I learned something myself there in writing this answer. It is buffered, like the analagous printf and scanf from C.
(see the source code https://github.com/golang/go/blob/master/src/fmt/print.go)
For interest, if you wanted to artificially control the number of OS threads, you can set the GOMAXPROCS environment variable on the command line:
~$ GOMAXPROCS=1 go run main.go
However, with your simple program there probably would be no discernable difference, because the Go runtime is still perfectly capable of scheduling many goroutines against 1 OS thread.
For example, here is a minor variation to your program. By making the channel buffer smaller (5), but still iterating 10 times, we introduce a point at which the fibonacci go routine can (but won't necessarily) be preempted, where it could block at least once on writing to the channel:
package main
import (
"fmt"
)
func fibonacci(n int, c chan int) {
x, y := 0, 1
for i := 0; i < n; i++ {
c <- x
x, y = y, x+y
fmt.Println("here")
}
close(c)
}
func main() {
c := make(chan int, 5)
go fibonacci(cap(c)*2, c)
for i := range c {
fmt.Println(i)
}
}
~$ GOMAXPROCS=1 go run main.go
here
here
here
here
here
here
0
1
1
2
3
5
8
here
here
here
here
13
21
34
Long explanation here, short explanation is that there are a multitude of reasons that a go routine can temporarily block and those are ideal opportunities for the go scheduler to schedule execution of another go routine.

If you add this after the fmt.Println in the fibonacci loop, you will see the results interleaved the way you would expect:
time.Sleep(1 * time.Second)
This gives the Go scheduler a reason to block the execution of the fibonacci() goroutine long enough to allow the main() goroutine to read from the channel.

Related

How does this Fibonacci code run with channels and go routines?

There is this website called rosetta code that has algorithms in all languages so you can learn and compare when learning new languages.
Here I saw that one of the solutions for go lang is pretty interesting but I don't fully understand it.
func fib(c chan int) {
a, b := 0, 1
for {
c <- a
a, b = b, a+b
}
}
func main() {
c := make(chan int)
go fib(c)
for i := 0; i < 10; i++ {
fmt.Println(<-c)
}
}
Here are some of my doubts
How does the infinite for loop know when to stop?
How does the c channel communicate this?
What is the logical sequence between the func calls?
Thanks for the help kind strangers.
How does the infinite for loop know when to stop?
As you said: This is an infinite loop and doesn't stop at all (as long as the program is running).
How does the c channel communicate this?
The channel c doesn't communicate stopping the for loop at all, the loop is not stopped. The sole purpose of c is to deliver the next number in the sequence from the calculation site (the infinite for loop) to the usage site (the print loop).
What is the logical sequence between the func calls?
go fib(c) starts fib as a goroutine. This is the one and only function call (*) ever happening in your code. Once go fib(c) has happened you have to concurrent things running: 1. The main function which will print 10 times and 2. fib(c) which does the computation.
The interesting stuff -- the synchronization between main() and fib(c) -- happens when main executes <-c and ("in the same moment") fib executes c <- a. Both functions, main and fib work until they both reach those lines. Once both are "at that line", both will happen "at the same time": fib will write/send to c and main consumes/receives from c "simultaneous". Afterwards both functions main and fib continue independently.
Once main is done the program finishes (this also "stops" fib's infinite loop).
(*) for the nitpickers: besides the fmt.Printf and make call which are irrelevant for understanding of this code.

Why do console outputs of two goroutines looks like synchronous

I am absolute newbie to concurrency in Go. I was trying to produce race condition with two gouroutines and wrote the following code:
var x int = 2
func main() {
go f1(&x)
go f2(&x)
time.Sleep(time.Second)
fmt.Println("Final value of x:", x)
}
func f1(px *int) {
for i := 0; i < 10; i++ {
*px = *px * 2
fmt.Println("f1:", *px)
}
}
func f2(px *int) {
for i := 0; i < 10; i++ {
*px = *px + 1
fmt.Println("f2:", *px)
}
}
And in each variant of output there are all the f2's output lines in console and only after that there are f1's outputs. Here's an example:
f2: 3
f2: 4
f2: 5
f2: 6
f2: 7
f2: 8
f2: 9
f2: 10
f2: 21
f2: 22
f1: 20
f1: 44
f1: 88
f1: 176
f1: 352
f1: 704
f1: 1408
f1: 2816
f1: 5632
f1: 11264
Final value of x: 11264
But you can see that indeed some of f1's executions were made in between of f2's executions:
f2: 10
f2: 21
So that i have two questions:
Why all the Printl() of f1 executes strictly after execution of f2's Println() (I thought they must be mixed somehow)
Why when I change the order of goroutines in the code
go f2(&x)
go f1(&x)
instead of
go f1(&x)
go f2(&x)
the order of output lines changes vice versa, f1's first, f2'2 second. I mean how the order of gouroutines in code affects their execution?
Firstly, the behavior you are seeing is due to a tight-loop. The Go scheduler cannot reasonable know how to share the workload, since your loops are short and don't take significant amounts of time (below the 10ms threshold for instance)
How the Go scheduler works is a very board topic and has changed across Go versions, but to quote this artcile:
If loops don’t contain any preemption points (like function calls, or
allocate memory), they will prevent other goroutines from running
with preemption typically not occurring until 10ms later.
In the real world a processing loop will typically invoke some blocking call (DB operation, REST/gRPC call etc) - this will give a cue to the Go scheduler to set other goroutines as "Runnable". You can simulate this in your code by inserting a time.Sleep into your loops: https://play.golang.org/p/_C3QOUMNOaU
There are other methods to relinquish (runtime.Gosched) but these techniques should generally be avoided. Avoid tight-loops and let the schedule do its thing.
Execution Ordering
When multiple goroutines are involved - and as #Marc commented - without coordination between the goroutines, the order of execution is non-deterministic.
Go has many tools at it's disposal to coordinate go-routine activities:
channels
sync package's WaitGroup, Mutex etc.
that block the current goroutine and allow other goroutines to be scheduled. Using these techniques guarantee the precise ordering of larger tasks.
Predicting the execution order, however, of individual instructions that run between these coordination checkpoints is impossible.

Explanation on why go routines are needed in gotour concurency tutorial

I'm having trouble understanding the use of goroutines and channels in the tour of go. Referencing the code below from:
"https://tour.golang.org/concurrency/2"
package main
import "fmt"
func sum(s []int, c chan int) {
sum := 0
for _, v := range s {
sum += v
}
c <- sum // send sum to c
}
func main() {
s := []int{7, 2, 8, -9, 4, 0}
c := make(chan int)
go sum(s[:len(s)/2], c)
go sum(s[len(s)/2:], c)
x, y := <-c, <-c // receive from c
fmt.Println(x, y, x+y)
}
It runs the sum functions using goroutines with the 'go' keyword in front of them, but all they do is send in values to a channel. They shouldn't have to be run with go routines. However, when removing the go keyword to run the functions as normal I get this error:
fatal error: all goroutines are asleep - deadlock!
goroutine 1 [chan send]:
main.sum(0xc420059f10, 0x3, 0x6, 0xc420088060)
/tmp/compile33.go:10 +0x5a
main.main()
/tmp/compile33.go:17 +0x99
I can't understand why goroutines are needed here. I might be misunderstanding the concept and would appreciate if anyone more familiar with go could shed some light.
Thanks,
Others have already pointed out in the comments that in terms of being an example, you obviously don't need to write this program with channels.
From your question, though, it sounds like you're curious about why separate goroutines are needed in order for the program to run.
To answer that, it might be helpful to think about how this might work in a world where you were only thinking about threads. You've got your main thread, and that thread invokes sum(s[:len(s)/2], c). So now the main thread gets to the c <- sum line in sum, and it blocks, because the channel is unbuffered - meaning there must be another listening thread to "take" from that channel in order for our main thread to put something into it. In other words, the threads are passing messages directly to each other, but there's no second thread to pass to. Deadlock!
In this context, goroutines and threads are functionally equivalent. So without a second goroutine, you've got your main goroutine calling...but nobody's picking up the telephone on the other end.

Conceptually, is the a correct approach to using goroutines (and what's wrong with my code)

I have two question:
a) Does it make sense to spin up multiple goroutines in a loop for something like calculating a math result?
b) Why doesn't my code work (this is my first attempt at goroutines)? I'm guessing it has something to do with closing the channel.
package main
import (
"fmt"
"math"
"sync"
)
func main() {
input := [][]int{
[]int{10, 9},
[]int{5, 2},
[]int{4, 9},
}
var wg sync.WaitGroup
c := make(chan int)
for _, val := range input {
wg.Add(1)
go func(coordinates []int, c chan int) {
defer wg.Done()
c <- calculateDistance(coordinates[0], coordinates[1])
}(val, c)
}
distances := []int{}
for val := range c {
distances = append(distances, val)
}
wg.Wait()
fmt.Println(distances)
}
func calculateDistance(x int, y int) int {
v := math.Exp2(float64(x)) + math.Exp2(float64(y))
distance := math.Sqrt(v)
return int(distance)
}
Playground link: https://play.golang.org/p/0iJ9hFnb8R
a) Yes it can make sense to spin up multiple go routines to do CPU bound tasks, if you have multiple CPU's. Also it's super important to profile your code to see if there is actually any benefit. You could use go's built in benchmark framework to help do this.
Because you're limited by CPU it could be a good start to do synchronously, then to bound your goroutines to the number of CPU cores instead of the # of items in your input list, but really it should be metrics driven to see. Go provides amazing toolchain using benchmarks and pprof to empirically determine what is the most efficient approach :)
b) https://play.golang.org/p/zGEQGC9EIy Your channel never closes and your main thread never end. The example waits until all go routines finish their work, then closes the channel.
range loops over channels terminate when the channel is closed. Since you never close the channel in your programm, the main goroutine will eventually block forever, trying to receive from c.
Does it make sense to spin up multiple goroutines in a loop for something like calculating a math result?
Depends. If you haven't seen it yet, I can recommend Rob Pike's talk Concurrency is not parallelism. This may give you some intuition about where it is beneficial, and where it isn't.

Why can you reuse channels in Go when doing parallel processing?

Here is a code snippet from the official tutorial
package main
import "fmt"
func sum(s []int, c chan int) {
sum := 0
for _, v := range s {
sum += v
}
c <- sum // send sum to c
}
func main() {
s := []int{7, 2, 8, -9, 4, 0}
c := make(chan int)
go sum(s[:len(s)/2], c)
go sum(s[len(s)/2:], c)
x, y := <-c, <-c // receive from c
fmt.Println(x, y, x+y)
}
Since we are doing the calculation in parallel, and each thread saves its result into the same channel, doesn't this screw up the data?
It's true that when you send two values over a channel from two different goroutines that the ordering is not necessarily guaranteed (unless you've done something else to coordinate their sends).
However, in this example, the ordering doesn't matter at all. Two values are being sent on the channel: the sum of the first half and the sum of the second.
go sum(s[:len(s)/2], c)
go sum(s[len(s)/2:], c)
Since the only thing those two values are used for is to calculate the total sum, the order doesn't matter at all. In fact, if you ran the example enough times you should see that x and y are often swapped, but the sum x+y is always the same.
Operations with channels are goroutine safe. You can read/write/close in any goroutine without corrupting anything that goes in or out of the channel. Basically, channels are synchronization points. Unbuffered channels (like in your case) will block on every write and read. When you write your code will block and wait until someone starts reading on the other end. When you read your code will block and wait until someone starts writing on the other end.
In your case calculations in goroutines will be done concurrently (not necessary in parallel) but will block on channel write. Your main goroutine will block on the first read, read the value. Block on the second read, read the value.
Even if you use a buffered channel - c := make(chan int, 2). Your goroutines will finish calculations, write resuls to the channel without blocking and terminate. Nothing will be corrupted. In the meantime main goroutine will block on channel read and wait until someone writes to it.
I suggest you read effective go, Go Concurrency Patterns and try A Tour of Go

Resources