I am trying to understand concurrency and goroutines, and had a couple questions about the following experimental code:
Why does it create a memory leak? I thought that a return at the end of the goroutine would allow memory associated with it to get cleaned up.
Why do my loops almost never reach 999? In fact, when I output to a file and study the output, I notice that it rarely prints integers in double digits; the first time it prints "99" is line 2461, and for "999" line 6120. This behavior is unexpected to me, which clearly means I don't really understand what is going on with goroutine scheduling.
Disclaimer:
Be careful running the code below, it can crash your system if you don't stop it after a few seconds!
CODE
package main
import (
"fmt"
"sync"
)
func main() {
var wg sync.WaitGroup
for {
// spawn four worker goroutines
spawnWorkers(4, wg)
// wait for the workers to finish
wg.Wait()
}
}
func spawnWorkers(max int, wg sync.WaitGroup) {
for n := 0; n < max; n++ {
wg.Add(1)
go func() {
defer wg.Done()
f(n)
return
}()
}
}
func f(n int) {
for i := 0; i < 1000; i++ {
fmt.Println(n, ":", i)
}
}
Thanks to Tim Cooper, JimB, and Greg for their helpful comments. The corrected version of the code is posted below for reference.
The two fixes were to pass in the WaitGroup by reference, which fixed the memory leak, and to pass n correctly into the anonymous goroutine, and
package main
import (
"fmt"
"sync"
)
func main() {
var wg sync.WaitGroup
for {
// spawn four worker goroutines
spawnWorkers(4,&wg)
// wait for the workers to finish
wg.Wait()
}
}
func spawnWorkers(max int, wg *sync.WaitGroup) {
for n := 0; n < max; n++ {
wg.Add(1)
go func(n int) {
defer wg.Done()
f(n)
return
}(n)
}
}
func f(n int) {
for i := 0; i < 1000; i++ {
fmt.Println(n, ":", i)
}
}
Related
Can someone explain to me how the goroutine works in the following code, I wrote it btw.
When I do BubbleSortVanilla, it takes roughly 15s for a list of size 100000
When I do BubbleSortOdd followed by BubbleSortEven using the odd even phase, it takes roughly 7s. But when I just do ConcurrentBubbleSort it only takes roughly 1.4s.
Can't really understand why the single ConcurrentBubbleSort is better?
Is it cause of the overhead in creating the two threads and its also processing the
same or well half the length of the list?
I tried profiling the code but am not really sure how to see how many threads are being created or the memory usage of each thread etc
package main
import (
"fmt"
"math/rand"
"sync"
"time"
)
func BubbleSortVanilla(intList []int) {
for i := 0; i < len(intList)-1; i += 1 {
if intList[i] > intList[i+1] {
intList[i], intList[i+1] = intList[i+1], intList[i]
}
}
}
func BubbleSortOdd(intList []int, wg *sync.WaitGroup, c chan []int) {
for i := 1; i < len(intList)-2; i += 2 {
if intList[i] > intList[i+1] {
intList[i], intList[i+1] = intList[i+1], intList[i]
}
}
wg.Done()
}
func BubbleSortEven(intList []int, wg *sync.WaitGroup, c chan []int) {
for i := 0; i < len(intList)-1; i += 2 {
if intList[i] > intList[i+1] {
intList[i], intList[i+1] = intList[i+1], intList[i]
}
}
wg.Done()
}
func ConcurrentBubbleSort(intList []int, wg *sync.WaitGroup, c chan []int) {
for i := 0; i < len(intList)-1; i += 1 {
if intList[i] > intList[i+1] {
intList[i], intList[i+1] = intList[i+1], intList[i]
}
}
wg.Done()
}
func main() {
// defer profile.Start(profile.MemProfile).Stop()
rand.Seed(time.Now().Unix())
intList := rand.Perm(100000)
fmt.Println("Read a sequence of", len(intList), "elements")
c := make(chan []int, len(intList))
var wg sync.WaitGroup
start := time.Now()
for j := 0; j < len(intList)-1; j++ {
// BubbleSortVanilla(intList) // takes roughly 15s
// wg.Add(2)
// go BubbleSortOdd(intList, &wg, c) // takes roughly 7s
// go BubbleSortEven(intList, &wg, c)
wg.Add(1)
go ConcurrentBubbleSort(intList, &wg, c) // takes roughly 1.4s
}
wg.Wait()
elapsed := time.Since(start)
// Print the sorted integers
fmt.Println("Sorted List: ", len(intList), "in", elapsed)
}
Your code is not working at all. ConcurrentBubbleSort and BubbleSortOdd + BubbleSortEven will cause the data race. Try to run your code with go run -race main.go. Because of data race, data of array will be incorrect after sort, and they are not sorted neither.
Why it is slow? I guess it is because of data race, and there are too many go routines which are causing the data race.
The Thread Analyzer detects data-races that occur during the execution
of a multi-threaded process. A data race occurs when:
two or more threads in a single process access the same memory
location concurrently, and
at least one of the accesses is for writing, and
the threads are not using any exclusive locks to control their
accesses to that memory.
I'm thinking start 1000 goroutines at the same time using for loop in Golang.
The problem is: I have to make sure that every goroutine has been executed.
Is it possible using channels to help me make sure of that?
The structure is kinda like this:
func main {
for i ... {
go ...
ch?
ch?
}
As #Andy mentioned You can use sync.WaitGroup to achieve this. Below is an example. Hope the code is self-explanatory.
package main
import (
"fmt"
"sync"
"time"
)
func dosomething(millisecs int64, wg *sync.WaitGroup) {
defer wg.Done()
duration := time.Duration(millisecs) * time.Millisecond
time.Sleep(duration)
fmt.Println("Function in background, duration:", duration)
}
func main() {
arr := []int64{200, 400, 150, 600}
var wg sync.WaitGroup
for _, n := range arr {
wg.Add(1)
go dosomething(n, &wg)
}
wg.Wait()
fmt.Println("Done")
}
To make sure the goroutines are done and collect the results, try this example:
package main
import (
"fmt"
)
const max = 1000
func main() {
for i := 1; i <= max; i++ {
go f(i)
}
sum := 0
for i := 1; i <= max; i++ {
sum += <-ch
}
fmt.Println(sum) // 500500
}
func f(n int) {
// do some job here and return the result:
ch <- n
}
var ch = make(chan int, max)
In order to wait for 1000 goroutines to finish, try this example:
package main
import (
"fmt"
"sync"
)
func main() {
wg := &sync.WaitGroup{}
for i := 0; i < 1000; i++ {
wg.Add(1)
go f(wg, i)
}
wg.Wait()
fmt.Println("Done.")
}
func f(wg *sync.WaitGroup, n int) {
defer wg.Done()
fmt.Print(n, " ")
}
I would suggest that you follow a pattern. Concurrency and Channel is Good but if you use it in a bad way, your program might became even slower than expected. The simple way to handle multiple go-routine and channel is by a worker pool pattern.
Take a close look at the code below
// In this example we'll look at how to implement
// a _worker pool_ using goroutines and channels.
package main
import "fmt"
import "time"
// Here's the worker, of which we'll run several
// concurrent instances. These workers will receive
// work on the `jobs` channel and send the corresponding
// results on `results`. We'll sleep a second per job to
// simulate an expensive task.
func worker(id int, jobs <-chan int, results chan<- int) {
for j := range jobs {
fmt.Println("worker", id, "started job", j)
time.Sleep(time.Second)
fmt.Println("worker", id, "finished job", j)
results <- j * 2
}
}
func main() {
// In order to use our pool of workers we need to send
// them work and collect their results. We make 2
// channels for this.
jobs := make(chan int, 100)
results := make(chan int, 100)
// This starts up 3 workers, initially blocked
// because there are no jobs yet.
for w := 1; w <= 3; w++ {
go worker(w, jobs, results)
}
// Here we send 5 `jobs` and then `close` that
// channel to indicate that's all the work we have.
for j := 1; j <= 5; j++ {
jobs <- j
}
close(jobs)
// Finally we collect all the results of the work.
for a := 1; a <= 5; a++ {
<-results
}
}
This simple example is taken from here . Also the results channel can help you keep track of all the go routines executing the jobs including failure notice.
I'll use a hacky inefficient prime number finder to make this question a little more concrete.
Let's say our main function fires off a bunch of "worker" goroutines. They will report their results to a single channnel which prints them. But not every worker will report something so we can't use a counter to know when the last job is finished. Or is there a way?
For the concrete example, here, main fires off goroutines to check whether the values 2...1000 are prime (yeah I know it is inefficient).
package main
import (
"fmt"
"time"
)
func main() {
c := make(chan int)
go func () {
for {
fmt.Print(" ", <- c)
}
}()
for n := 2; n < 1000; n++ {
go printIfPrime(n, c)
}
time.Sleep(2 * time.Second) // <---- THIS FEELS WRONG
}
func printIfPrime(n int, channel chan int) {
for d := 2; d * d <= n; d++ {
if n % d == 0 {
return
}
}
channel <- n
}
My problem is that I don't know how to reliably stop it at the right time. I tried adding a sleep at the end of main and it works (but it might take too long, and this is no way to write concurrent code!). I would like to know if there was a way to send a stop signal through a channel or something so main can stop at the right time.
The trick here is that I don't know how many worker responses there will be.
Is this impossible or is there a cool trick?
(If there's an answer for this prime example, great. I can probably generalize. Or maybe not. Maybe this is app specific?)
Use a WaitGroup.
The following code uses two WaitGroups. The main function uses wgTest to wait for print_if_prime functions to complete. Once they are done, it closes the channel to break the for loop in the printing goroutine. The main function uses wgPrint to wait for printing to complete.
package main
import (
"fmt"
"sync"
)
func main() {
c := make(chan int)
var wgPrint, wgTest sync.WaitGroup
wgPrint.Add(1)
go func(wg *sync.WaitGroup) {
defer wg.Done()
for n := range c {
fmt.Print(" ", n)
}
}(&wgPrint)
for n := 2; n < 1000; n++ {
wgTest.Add(1)
go print_if_prime(&wgTest, n, c)
}
wgTest.Wait()
close(c)
wgPrint.Wait()
}
func print_if_prime(wg *sync.WaitGroup, n int, channel chan int) {
defer wg.Done()
for d := 2; d*d <= n; d++ {
if n%d == 0 {
return
}
}
channel <- n
}
playground example
I would like to speed up a certain task in Go by starting several worker goroutines.
In the example below, I'm looking for a "treasure". Worker goroutines that are started will dig forever for items until they are told to stop. If the item is a wanted treasure, then it is placed onto a shared buffered channel with a capacity equals to the number of workers.
The program is only interested in getting one treasure. It also has to make sure that all workers goroutines will return and not be blocked forever.
The example works, but is there a cleaner or more idiomatic way to ensuring the above?
package main
import (
"fmt"
"math/rand"
"sync"
"time"
)
var wg sync.WaitGroup
func digTreasure() int {
time.Sleep(5 * time.Millisecond)
return rand.Intn(1000)
}
func worker(i int, ch chan int, quit chan struct{}) {
defer func() {
fmt.Println("worker", i, "left")
wg.Done()
}()
for {
treasure := digTreasure()
if treasure%100 == 0 {
fmt.Println("worker", i, "found treasure", treasure)
ch <- treasure
return
}
select {
case <-quit:
fmt.Println("worker", i, "quitting")
return
default:
}
}
}
func main() {
rand.Seed(time.Now().UnixNano())
n := 10
ch, quit := make(chan int, n), make(chan struct{})
fmt.Println("Searching...")
wg.Add(n)
for i := 0; i < n; i++ {
go worker(i, ch, quit)
}
fmt.Println("I found treasure:", <-ch)
close(quit)
wg.Wait()
fmt.Println("All workers left")
}
why in that script http://play.golang.org/p/Q5VMfVB67-
goroutine shower doesn't work ?
package main
import "fmt"
func main() {
ch := make(chan int)
go producer(ch)
go shower(ch)
for i := 0; i < 10; i++ {
fmt.Printf("main: %d\n", i)
}
}
func shower(c chan int) {
for {
j := <-c
fmt.Printf("worker: %d\n", j)
}
}
func producer(c chan int) {
for i := 0; i < 10; i++ {
c <- i
}
}
Your main function exit way before the goroutines have a chance to complete their own work.
You need to wait for them to finish before ending main() (which stops the all program), with for instance sync.WaitGroup, as seen in "Wait for the termination of n goroutines".
In your case, you need to wait for goroutine shower() to end: pass a wg *sync.WaitGroup instance, for shower() to signal wg.Done() when it finishes processing.