Can someone explain to me how the goroutine works in the following code, I wrote it btw.
When I do BubbleSortVanilla, it takes roughly 15s for a list of size 100000
When I do BubbleSortOdd followed by BubbleSortEven using the odd even phase, it takes roughly 7s. But when I just do ConcurrentBubbleSort it only takes roughly 1.4s.
Can't really understand why the single ConcurrentBubbleSort is better?
Is it cause of the overhead in creating the two threads and its also processing the
same or well half the length of the list?
I tried profiling the code but am not really sure how to see how many threads are being created or the memory usage of each thread etc
package main
import (
"fmt"
"math/rand"
"sync"
"time"
)
func BubbleSortVanilla(intList []int) {
for i := 0; i < len(intList)-1; i += 1 {
if intList[i] > intList[i+1] {
intList[i], intList[i+1] = intList[i+1], intList[i]
}
}
}
func BubbleSortOdd(intList []int, wg *sync.WaitGroup, c chan []int) {
for i := 1; i < len(intList)-2; i += 2 {
if intList[i] > intList[i+1] {
intList[i], intList[i+1] = intList[i+1], intList[i]
}
}
wg.Done()
}
func BubbleSortEven(intList []int, wg *sync.WaitGroup, c chan []int) {
for i := 0; i < len(intList)-1; i += 2 {
if intList[i] > intList[i+1] {
intList[i], intList[i+1] = intList[i+1], intList[i]
}
}
wg.Done()
}
func ConcurrentBubbleSort(intList []int, wg *sync.WaitGroup, c chan []int) {
for i := 0; i < len(intList)-1; i += 1 {
if intList[i] > intList[i+1] {
intList[i], intList[i+1] = intList[i+1], intList[i]
}
}
wg.Done()
}
func main() {
// defer profile.Start(profile.MemProfile).Stop()
rand.Seed(time.Now().Unix())
intList := rand.Perm(100000)
fmt.Println("Read a sequence of", len(intList), "elements")
c := make(chan []int, len(intList))
var wg sync.WaitGroup
start := time.Now()
for j := 0; j < len(intList)-1; j++ {
// BubbleSortVanilla(intList) // takes roughly 15s
// wg.Add(2)
// go BubbleSortOdd(intList, &wg, c) // takes roughly 7s
// go BubbleSortEven(intList, &wg, c)
wg.Add(1)
go ConcurrentBubbleSort(intList, &wg, c) // takes roughly 1.4s
}
wg.Wait()
elapsed := time.Since(start)
// Print the sorted integers
fmt.Println("Sorted List: ", len(intList), "in", elapsed)
}
Your code is not working at all. ConcurrentBubbleSort and BubbleSortOdd + BubbleSortEven will cause the data race. Try to run your code with go run -race main.go. Because of data race, data of array will be incorrect after sort, and they are not sorted neither.
Why it is slow? I guess it is because of data race, and there are too many go routines which are causing the data race.
The Thread Analyzer detects data-races that occur during the execution
of a multi-threaded process. A data race occurs when:
two or more threads in a single process access the same memory
location concurrently, and
at least one of the accesses is for writing, and
the threads are not using any exclusive locks to control their
accesses to that memory.
Related
The code below starts a few workers. Each worker receives a value via a channel which is added to a map where the key is the worker ID and value is the number received. Finally, when I add all the values received, I should get an expected result (in this case 55 because that is what you get when you add from 1..10). In most cases, I am not seeing the expected output. What am I doing wrong here? I do not want to solve it by adding a sleep. I would like to identify the issue programmatically and fix it.
type counter struct {
value int
count int
}
var data map[string]counter
var lock sync.Mutex
func adder(wid string, n int) {
defer lock.Unlock()
lock.Lock()
d := data[wid]
d.count++
d.value += n
data[wid] = d
return
}
func main() {
fmt.Println(os.Getpid())
data = make(map[string]counter)
c := make(chan int)
for w := 1; w <= 3; w++ { //starting 3 workers here
go func(wid string) {
data[wid] = counter{}
for {
v, k := <-c
if !k {
continue
}
adder(wid, v)
}
}(strconv.Itoa(w)) // worker is given an ID
}
time.Sleep(1 * time.Second) // If this is not added, only one goroutine is recorded.
for i := 1; i <= 10; i++ {
c <- i
}
close(c)
total := 0
for i, v := range data {
fmt.Println(i, v)
total += v.value
}
fmt.Println(total)
}
Your code has two significant races:
The initialization of data[wid] = counter{} is not synchronized with other goroutines that may be reading and rewriting data.
The worker goroutines do not signal when they are done modifying data, which means your main goroutine may read data before they finish writing.
You also have a strange construct:
for {
v, k := <-c
if !k {
continue
}
adder(wid, v)
}
k will only be false when the channel c is closed, after which the goroutine spins as much as it can. This would be better written as for v := range c.
To fix the reading code in the main goroutine, we'll use the more normal for ... range c idiom and add a sync.WaitGroup, and have each worker invoke Done() on the wait-group. The main goroutine will then wait for them to finish. To fix the initialization, we'll lock the map (there are other ways to do this, e.g., to set up the map before starting any of the goroutines, or to rely on the fact that empty map slots read as zero, but this one is straightforward). I also took out the extra debug. The result is this code, also available on the Go Playground.
package main
import (
"fmt"
// "os"
"strconv"
"sync"
// "time"
)
type counter struct {
value int
count int
}
var data map[string]counter
var lock sync.Mutex
var wg sync.WaitGroup
func adder(wid string, n int) {
defer lock.Unlock()
lock.Lock()
d := data[wid]
d.count++
d.value += n
data[wid] = d
}
func main() {
// fmt.Println(os.Getpid())
data = make(map[string]counter)
c := make(chan int)
for w := 1; w <= 3; w++ { //starting 3 workers here
wg.Add(1)
go func(wid string) {
lock.Lock()
data[wid] = counter{}
lock.Unlock()
for v := range c {
adder(wid, v)
}
wg.Done()
}(strconv.Itoa(w)) // worker is given an ID
}
for i := 1; i <= 10; i++ {
c <- i
}
close(c)
wg.Wait()
total := 0
for i, v := range data {
fmt.Println(i, v)
total += v.value
}
fmt.Println(total)
}
(This can be improved easily, e.g., there's no reason for wg to be global.)
Well, I like #torek's answer but I wanted to post this answer as it contains a bunch of improvements:
Reduce the usage of locks (For such simple tasks, avoid locks. If you benchmark it, you'll notice a good difference because my code uses the lock only numworkers times).
Improve the naming of variables.
Remove usage of global vars (Use of global vars should always be as minimum as possible).
The following code adds a number from minWork to maxWork using numWorker spawned goroutines.
package main
import (
"fmt"
"sync"
)
const (
bufferSize = 1 // Buffer for numChan
numworkers = 3 // Number of workers doing addition
minWork = 1 // Sum from [minWork] (inclusive)
maxWork = 10000000 // Sum upto [maxWork] (inclusive)
)
// worker stats
type worker struct {
workCount int // Number of times, worker worked
workDone int // Amount of work done; numbers added
}
// workerMap holds a map for worker(s)
type workerMap struct {
mu sync.Mutex // Guards m for safe, concurrent r/w
m map[int]worker // Map to hold worker id to worker mapping
}
func main() {
var (
totalWorkDone int // Total Work Done
wm workerMap // WorkerMap
wg sync.WaitGroup // WaitGroup
numChan = make(chan int, bufferSize) // Channel for nums
)
wm.m = make(map[int]worker, numworkers)
for wid := 0; wid < numworkers; wid++ {
wg.Add(1)
go func(id int) {
var wk worker
// Wait for numbers
for n := range numChan {
wk.workCount++
wk.workDone += n
}
// Fill worker stats
wm.mu.Lock()
wm.m[id] = wk
wm.mu.Unlock()
wg.Done()
}(wid)
}
// Send numbers for addition by multiple workers
for i := minWork; i <= maxWork; i++ {
numChan <- i
}
// Close the channel
close(numChan)
// Wait for goroutines to finish
wg.Wait()
// Print stats
for k, v := range wm.m {
fmt.Printf("WorkerID: %d; Work: %+v\n", k, v)
totalWorkDone += v.workDone
}
// Print total work done by all workers
fmt.Printf("Work Done: %d\n", totalWorkDone)
}
I'm thinking start 1000 goroutines at the same time using for loop in Golang.
The problem is: I have to make sure that every goroutine has been executed.
Is it possible using channels to help me make sure of that?
The structure is kinda like this:
func main {
for i ... {
go ...
ch?
ch?
}
As #Andy mentioned You can use sync.WaitGroup to achieve this. Below is an example. Hope the code is self-explanatory.
package main
import (
"fmt"
"sync"
"time"
)
func dosomething(millisecs int64, wg *sync.WaitGroup) {
defer wg.Done()
duration := time.Duration(millisecs) * time.Millisecond
time.Sleep(duration)
fmt.Println("Function in background, duration:", duration)
}
func main() {
arr := []int64{200, 400, 150, 600}
var wg sync.WaitGroup
for _, n := range arr {
wg.Add(1)
go dosomething(n, &wg)
}
wg.Wait()
fmt.Println("Done")
}
To make sure the goroutines are done and collect the results, try this example:
package main
import (
"fmt"
)
const max = 1000
func main() {
for i := 1; i <= max; i++ {
go f(i)
}
sum := 0
for i := 1; i <= max; i++ {
sum += <-ch
}
fmt.Println(sum) // 500500
}
func f(n int) {
// do some job here and return the result:
ch <- n
}
var ch = make(chan int, max)
In order to wait for 1000 goroutines to finish, try this example:
package main
import (
"fmt"
"sync"
)
func main() {
wg := &sync.WaitGroup{}
for i := 0; i < 1000; i++ {
wg.Add(1)
go f(wg, i)
}
wg.Wait()
fmt.Println("Done.")
}
func f(wg *sync.WaitGroup, n int) {
defer wg.Done()
fmt.Print(n, " ")
}
I would suggest that you follow a pattern. Concurrency and Channel is Good but if you use it in a bad way, your program might became even slower than expected. The simple way to handle multiple go-routine and channel is by a worker pool pattern.
Take a close look at the code below
// In this example we'll look at how to implement
// a _worker pool_ using goroutines and channels.
package main
import "fmt"
import "time"
// Here's the worker, of which we'll run several
// concurrent instances. These workers will receive
// work on the `jobs` channel and send the corresponding
// results on `results`. We'll sleep a second per job to
// simulate an expensive task.
func worker(id int, jobs <-chan int, results chan<- int) {
for j := range jobs {
fmt.Println("worker", id, "started job", j)
time.Sleep(time.Second)
fmt.Println("worker", id, "finished job", j)
results <- j * 2
}
}
func main() {
// In order to use our pool of workers we need to send
// them work and collect their results. We make 2
// channels for this.
jobs := make(chan int, 100)
results := make(chan int, 100)
// This starts up 3 workers, initially blocked
// because there are no jobs yet.
for w := 1; w <= 3; w++ {
go worker(w, jobs, results)
}
// Here we send 5 `jobs` and then `close` that
// channel to indicate that's all the work we have.
for j := 1; j <= 5; j++ {
jobs <- j
}
close(jobs)
// Finally we collect all the results of the work.
for a := 1; a <= 5; a++ {
<-results
}
}
This simple example is taken from here . Also the results channel can help you keep track of all the go routines executing the jobs including failure notice.
I am trying to understand concurrency and goroutines, and had a couple questions about the following experimental code:
Why does it create a memory leak? I thought that a return at the end of the goroutine would allow memory associated with it to get cleaned up.
Why do my loops almost never reach 999? In fact, when I output to a file and study the output, I notice that it rarely prints integers in double digits; the first time it prints "99" is line 2461, and for "999" line 6120. This behavior is unexpected to me, which clearly means I don't really understand what is going on with goroutine scheduling.
Disclaimer:
Be careful running the code below, it can crash your system if you don't stop it after a few seconds!
CODE
package main
import (
"fmt"
"sync"
)
func main() {
var wg sync.WaitGroup
for {
// spawn four worker goroutines
spawnWorkers(4, wg)
// wait for the workers to finish
wg.Wait()
}
}
func spawnWorkers(max int, wg sync.WaitGroup) {
for n := 0; n < max; n++ {
wg.Add(1)
go func() {
defer wg.Done()
f(n)
return
}()
}
}
func f(n int) {
for i := 0; i < 1000; i++ {
fmt.Println(n, ":", i)
}
}
Thanks to Tim Cooper, JimB, and Greg for their helpful comments. The corrected version of the code is posted below for reference.
The two fixes were to pass in the WaitGroup by reference, which fixed the memory leak, and to pass n correctly into the anonymous goroutine, and
package main
import (
"fmt"
"sync"
)
func main() {
var wg sync.WaitGroup
for {
// spawn four worker goroutines
spawnWorkers(4,&wg)
// wait for the workers to finish
wg.Wait()
}
}
func spawnWorkers(max int, wg *sync.WaitGroup) {
for n := 0; n < max; n++ {
wg.Add(1)
go func(n int) {
defer wg.Done()
f(n)
return
}(n)
}
}
func f(n int) {
for i := 0; i < 1000; i++ {
fmt.Println(n, ":", i)
}
}
I have two versions of factorial. Concurrent vs Sequencial.
Both the program will calculate factorial of 10 "1000000" times.
Factorial Concurrent Processing
package main
import (
"fmt"
//"math/rand"
"sync"
"time"
//"runtime"
)
func main() {
start := time.Now()
printFact(fact(gen(1000000)))
fmt.Println("Current Time:", time.Now(), "Start Time:", start, "Elapsed Time:", time.Since(start))
panic("Error Stack!")
}
func gen(n int) <-chan int {
c := make(chan int)
go func() {
for i := 0; i < n; i++ {
//c <- rand.Intn(10) + 1
c <- 10
}
close(c)
}()
return c
}
func fact(in <-chan int) <-chan int {
out := make(chan int)
var wg sync.WaitGroup
for n := range in {
wg.Add(1)
go func(n int) {
//temp := 1
//for i := n; i > 0; i-- {
// temp *= i
//}
temp := calcFact(n)
out <- temp
wg.Done()
}(n)
}
go func() {
wg.Wait()
close(out)
}()
return out
}
func printFact(in <-chan int) {
//for n := range in {
// fmt.Println("The random Factorial is:", n)
//}
var i int
for range in {
i ++
}
fmt.Println("Count:" , i)
}
func calcFact(c int) int {
if c == 0 {
return 1
} else {
return calcFact(c-1) * c
}
}
//###End of Factorial Concurrent
Factorial Sequencial Processing
package main
import (
"fmt"
//"math/rand"
"time"
"runtime"
)
func main() {
start := time.Now()
//for _, n := range factorial(gen(10000)...) {
// fmt.Println("The random Factorial is:", n)
//}
var i int
for range factorial(gen(1000000)...) {
i++
}
fmt.Println("Count:" , i)
fmt.Println("Current Time:", time.Now(), "Start Time:", start, "Elapsed Time:", time.Since(start))
}
func gen(n int) []int {
var out []int
for i := 0; i < n; i++ {
//out = append(out, rand.Intn(10)+1)
out = append(out, 10)
}
println(len(out))
return out
}
func factorial(val ...int) []int {
var out []int
for _, n := range val {
fa := calcFact(n)
out = append(out, fa)
}
return out
}
func calcFact(c int) int {
if c == 0 {
return 1
} else {
return calcFact(c-1) * c
}
}
//###End of Factorial sequential processing
My assumption was concurrent processing will be faster than sequential but sequential is executing faster than concurrent in my windows machine.
I am using 8 core/ i7 / 32 GB RAM.
I am not sure if there is something wrong in the programs or my basic understanding is correct.
p.s. - I am new to GoLang.
Concurrent version of your program will always be slow compared to the sequential version. The reason however, is related to the nature and behavior of problem you are trying to solve.
Your program is concurrent but it is not parallel. Each callFact is running in it's own goroutine but there is no division of the amount of work required to be done. Each goroutine must perform the same computation and output the same value.
It is like having a task that requires some text to be copied a hundred times. You have just one CPU (ignore the cores for now).
When you start a sequential process, you point the CPU to the original text once, and ask it to write it down a 100 times. The CPU has to manage a single task.
With goroutines, the CPU is told that there are a hundred tasks that must be done concurrently. It just so happens that they are all the same tasks. But CPU is not smart enough to know that.
So it does the same thing as above. Even though each task now is a 100 times smaller, there is still just one CPU. So the amount of work CPU has to do is still the same, except with all the added overhead of managing 100 different things at once. Hence, it looses a part of its efficiency.
To see an improvement in performance you'll need proper parallelism. A simple example would be to split the factorial input number roughly in the middle and compute 2 smaller factorials. Then combine them together:
// not an ideal solution
func main() {
ch := make(chan int)
r := 10
result := 1
go fact(r, ch)
for i := range ch {
result *= i
}
fmt.Println(result)
}
func fact(n int, ch chan int) {
p := n/2
q := p + 1
var wg sync.WaitGroup
wg.Add(2)
go func() {
ch <- factPQ(1, p)
wg.Done()
}()
go func() {
ch <- factPQ(q, n)
wg.Done()
}()
go func() {
wg.Wait()
close(ch)
}()
}
func factPQ(p, q int) int {
r := 1
for i := p; i <= q; i++ {
r *= i
}
return r
}
Working code: https://play.golang.org/p/xLHAaoly8H
Now you have two goroutines working towards the same goal and not just repeating the same calculations.
Note about CPU cores:
In your original code, the sequential version's operations are most definitely being distributed amongst various CPU cores by the runtime environment and the OS. So it still has parallelism to a degree, you just don't controll it.
The same is happening in the concurrent version but again as mentioned above, the overhead of goroutine context switching makes the performance come down.
abhink has given a good answer. I would also like to draw attention to Amdahl's Law, which should always be borne in mind when trying to use parallel processing to increase the overall speed of computation. That's not to say "don't make things parallel", but rather: be realistic about expectations and understand the parallel architecture fully.
Go allows us to write concurrent programs. This is related to trying to write faster parallel programs, but the two issues are separate. See Rob Pike's Concurrency is not Parallelism for more info.
Given the following code:
package main
import (
"fmt"
"math/rand"
"time"
)
func main() {
for i := 0; i < 3; i++ {
go f(i)
}
// prevent main from exiting immediately
var input string
fmt.Scanln(&input)
}
func f(n int) {
for i := 0; i < 10; i++ {
dowork(n, i)
amt := time.Duration(rand.Intn(250))
time.Sleep(time.Millisecond * amt)
}
}
func dowork(goroutine, loopindex int) {
// simulate work
time.Sleep(time.Second * time.Duration(5))
fmt.Printf("gr[%d]: i=%d\n", goroutine, loopindex)
}
Can i assume that the 'dowork' function will be executed in parallel?
Is this a correct way of achieving parallelism or is it better to use channels and separate 'dowork' workers for each goroutine?
Regarding GOMAXPROCS, you can find this in Go 1.5's release docs:
By default, Go programs run with GOMAXPROCS set to the number of cores available; in prior releases it defaulted to 1.
Regarding preventing the main function from exiting immediately, you could leverage WaitGroup's Wait function.
I wrote this utility function to help parallelize a group of functions:
import "sync"
// Parallelize parallelizes the function calls
func Parallelize(functions ...func()) {
var waitGroup sync.WaitGroup
waitGroup.Add(len(functions))
defer waitGroup.Wait()
for _, function := range functions {
go func(copy func()) {
defer waitGroup.Done()
copy()
}(function)
}
}
So in your case, we could do this
func1 := func() {
f(0)
}
func2 = func() {
f(1)
}
func3 = func() {
f(2)
}
Parallelize(func1, func2, func3)
If you wanted to use the Parallelize function, you can find it here https://github.com/shomali11/util
This answer is outdated. Please see this answer instead.
Your code will run concurrently, but not in parallel. You can make it run in parallel by setting GOMAXPROCS.
It's not clear exactly what you're trying to accomplish here, but it looks like a perfectly valid way of achieving concurrency to me.
f() will be executed concurrently but many dowork() will be executed sequentially within each f(). Waiting on stdin is also not the right way to ensure that your routines finished execution. You must spin up a channel that each f() pushes a true on when the f() finishes.
At the end of the main() you must wait for n number of true's on the channel. n being the number of f() that you have spun up.
This helped me when I was starting out.
package main
import "fmt"
func put(number chan<- int, count int) {
i := 0
for ; i <= (5 * count); i++ {
number <- i
}
number <- -1
}
func subs(number chan<- int) {
i := 10
for ; i <= 19; i++ {
number <- i
}
}
func main() {
channel1 := make(chan int)
channel2 := make(chan int)
done := 0
sum := 0
go subs(channel2)
go put(channel1, 1)
go put(channel1, 2)
go put(channel1, 3)
go put(channel1, 4)
go put(channel1, 5)
for done != 5 {
select {
case elem := <-channel1:
if elem < 0 {
done++
} else {
sum += elem
fmt.Println(sum)
}
case sub := <-channel2:
sum -= sub
fmt.Printf("atimta : %d\n", sub)
fmt.Println(sum)
}
}
close(channel1)
close(channel2)
}
"Conventional cluster-based systems (such as supercomputers) employ parallel execution between processors using MPI. MPI is a communication interface between processes that execute in operating system instances on different processors; it doesn't support other process operations such as scheduling. (At the risk of complicating things further, because MPI processes are executed by operating systems, a single processor can run multiple MPI processes and/or a single MPI process can also execute multiple threads!)"
You can add a loop at the end, to block until the jobs are done:
package main
import "time"
func f(n int, b chan bool) {
println(n)
time.Sleep(time.Second)
b <- true
}
func main() {
b := make(chan bool, 9)
for n := cap(b); n > 0; n-- {
go f(n, b)
}
for <-b {
if len(b) == 0 { break }
}
}