I got this code from someone on github and I am trying to play around with it to understand concurrency.
package main
import (
"bufio"
"fmt"
"os"
"sync"
"time"
)
var wg sync.WaitGroup
func sad(url string) string {
fmt.Printf("gonna sleep a bit\n")
time.Sleep(2 * time.Second)
return url + " added stuff"
}
func main() {
sc := bufio.NewScanner(os.Stdin)
urls := make(chan string)
results := make(chan string)
for i := 0; i < 20; i++ {
wg.Add(1)
go func() {
defer wg.Done()
for url := range urls {
n := sad(url)
results <- n
}
}()
}
for sc.Scan() {
url := sc.Text()
urls <- url
}
for result := range results {
fmt.Printf("%s arrived\n", result)
}
wg.Wait()
close(urls)
close(results)
}
I have a few questions:
Why does this code give me a deadlock?
How does that for loop exist before the operation of taking in input from user does the go routines wait until anything is passes in the urls channel then start doing work? I don't get this because it's not sequential, like why is taking in input from user then putting every input in the urls channel then running the go routines is considered wrong?
Inside the for loop I have another loop which is iterating over the urls channel, does each go routine deal with exactly one line of input? or does one go routine handle multiple lines at once? how does any of this work?
Am i gathering the output correctly here?
Mostly you're doing things correctly, but have things a little out of order. The for sc.Scan() loop will continue until Scanner is done, and the for result := range results loop will never run, thus no go routine ('main' in this case) will be able to receive from results. When running your example, I started the for result := range results loop before for sc.Scan() and also in its own go routine--otherwise for sc.Scan() will never be reached.
go func() {
for result := range results {
fmt.Printf("%s arrived\n", result)
}
}()
for sc.Scan() {
url := sc.Text()
urls <- url
}
Also, because you run wg.Wait() before close(urls), the main goroutine is left blocked waiting for the 20 sad() go routines to finish. But they can't finish until close(urls) is called. So just close that channel before waiting for the waitgroup.
close(urls)
wg.Wait()
close(results)
The for-loop creates 20 goroutines, all waiting input from the urls channel. When someone writes into this channel, one of the goroutines will pick it up and work on in. This is a typical worker-pool implementation.
Then, then scanner reads input line by line, and sends it to the urls channel, where one of the goroutines will pick it up and write the response to the results channel. At this point, there are no other goroutines reading from the results channel, so this will block.
As the scanner reads URLs, all other goroutines will pick them up and block. So if the scanner reads more than 20 URLs, it will deadlock because all goroutines will be waiting for a reader.
If there are fewer than 20 URLs, the scanner for-loop will end, and the results will be read. However that will eventually deadlock as well, because the for-loop will terminate when the channel is closed, and there is no one there to close the channel.
To fix this, first, close the urls channel right after you finish reading. That will release all the for-loops in the goroutines. Then you should put the for-loop reading from the results channel into a goroutine, so you can call wg.Wait while results are being processed. After wg.Wait, you can close the results channel.
This does not guarantee that all items in the results channel will be read. The program may terminate before all messages are processed, so use a third channel which you close at the end of the goroutine that reads from the results channel. That is:
done:=make(chan struct{})
go func() {
defer close(done)
for result := range results {
fmt.Printf("%s arrived\n", result)
}
}()
wg.Wait()
close(results)
<-done
I am not super happy with previous answers, so here is a solution based on the documented behavior in the go tour, the go doc, the specifications.
package main
import (
"bufio"
"fmt"
"strings"
"sync"
"time"
)
var wg sync.WaitGroup
func sad(url string) string {
fmt.Printf("gonna sleep a bit\n")
time.Sleep(2 * time.Millisecond)
return url + " added stuff"
}
func main() {
// sc := bufio.NewScanner(os.Stdin)
sc := bufio.NewScanner(strings.NewReader(strings.Repeat("blah blah\n", 15)))
urls := make(chan string)
results := make(chan string)
for i := 0; i < 20; i++ {
wg.Add(1)
go func() {
defer wg.Done()
for url := range urls {
n := sad(url)
results <- n
}
}()
}
// results is consumed by so many goroutines
// we must wait for them to finish before closing results
// but we dont want to block here, so put that into a routine.
go func() {
wg.Wait()
close(results)
}()
go func() {
for sc.Scan() {
url := sc.Text()
urls <- url
}
close(urls) // done consuming a channel, close it, right away.
}()
for result := range results {
fmt.Printf("%s arrived\n", result)
} // the program will finish when it gets out of this loop.
// It will get out of this loop because you have made sure the results channel is closed.
}
Related
Assuming I have a bunch of files to deal with(say 1000 or more), first they should be processed by function A(), function A() will generate a file, then this file will be processed by B().
If we do it one by one, that's too slow, so I'm thinking process 5 files at a time using goroutine(we can not process too much at a time cause the CPU cannot bear).
I'm a newbie in golang, I'm not sure if my thought is correct, I think the function A() is a producer and the function B() is a consumer, function B() will deal with the file that produced by function A(), and I wrote some code below, forgive me, I really don't know how to write the code, can anyone give me a help? Thank you in advance!
package main
import "fmt"
var Box = make(chan string, 1024)
func A(file string) {
fmt.Println(file, "is processing in func A()...")
fileGenByA := "/path/to/fileGenByA1"
Box <- fileGenByA
}
func B(file string) {
fmt.Println(file, "is processing in func B()...")
}
func main() {
// assuming that this is the file list read from a directory
fileList := []string{
"/path/to/file1",
"/path/to/file2",
"/path/to/file3",
}
// it seems I can't do this, because fileList may have 1000 or more file
for _, v := range fileList {
go A(v)
}
// can I do this?
for file := range Box {
go B(file)
}
}
Update:
sorry, maybe I haven’t made myself clear, actually the file generated by function A() is stored in the hard disk(generated by a command line tool, I just simple execute it using exec.Command()), not in a variable(the memory), so it doesn't have to be passed to function B() immediately.
I think there are 2 approach:
approach1
approach2
Actually I prefer approach2, as you can see, the first B() doesn't have to process the file1GenByA, it's the same for B() to process any file in the box, because file1GenByA may generated after file2GenByA(maybe the file is larger so it takes more time).
You could spawn 5 goroutines that read from a work channel. That way you have at all times 5 goroutines running and don't need to batch them so that you have to wait until 5 are finished to start the next 5.
func main() {
stack := []string{"a", "b", "c", "d", "e", "f", "g", "h"}
work := make(chan string)
results := make(chan string)
// create worker 5 goroutines
wg := sync.WaitGroup{}
for i := 0; i < 5; i++ {
wg.Add(1)
go func() {
defer wg.Done()
for s := range work {
results <- B(A(s))
}
}()
}
// send the work to the workers
// this happens in a goroutine in order
// to not block the main function, once
// all 5 workers are busy
go func() {
for _, s := range stack {
// could read the file from disk
// here and pass a pointer to the file
work <- s
}
// close the work channel after
// all the work has been send
close(work)
// wait for the workers to finish
// then close the results channel
wg.Wait()
close(results)
}()
// collect the results
// the iteration stops if the results
// channel is closed and the last value
// has been received
for result := range results {
// could write the file to disk
fmt.Println(result)
}
}
https://play.golang.com/p/K-KVX4LEEoK
you're halfway there. There's a few things you need to fix:
your program deadlocks because nothing closes Box, so the main function can never get done rangeing over it.
You aren't waiting for your goroutines to finish, and there than 5 goroutines. (The solutions to these are too intertwined to describe them separately)
1. Deadlock
fatal error: all goroutines are asleep - deadlock!
goroutine 1 [chan receive]:
main.main()
When you range over a channel, you read each value from the channel until it is both closed and empty. Since you never close the channel, the range over that channel can never complete, and the program can never finish.
This is a fairly easy problem to solve in your case: we just need to close the channel when we know there will be no more writes to the channel.
for _, v := range fileList {
go A(v)
}
close(Box)
Keep in mind that closeing a channel doesn't stop it from being read, only written. Now consumers can distinguish between an empty channel that may receive more data in the future, and an empty channel that will never receive more data.
Once you add the close(Box), the program doesn't deadlock anymore, but it still doesn't work.
2. Too Many Goroutines and not waiting for them to complete
To run a certain maximum number of concurrent executions, instead of creating a goroutine for each input, create the goroutines in a "worker pool":
Create a channel to pass the workers their work
Create a channel for the goroutines to return their results, if any
Start the number of goroutines you want
Start at least one additional goroutine to either dispatch work or collect the result, so you don't have to try doing both from the main goroutine
use a sync.WaitGroup to wait for all data to be processed
close the channels to signal to the workers and the results collector that their channels are done being filled.
Before we get into the implementation, let's talk aobut how A and B interact.
first they should be processed by function A(), function A() will generate a file, then this file will be processed by B().
A() and B() must, then, execute serially. They can still pass their data through a channel, but since their execution must be serial, it does nothing for you. Simpler is to run them sequentially in the workers. For that, we'll need to change A() to either call B, or to return the path for B and the worker can call. I choose the latter.
func A(file string) string {
fmt.Println(file, "is processing in func A()...")
fileGenByA := "/path/to/fileGenByA1"
return fileGenByA
}
Before we write our worker function, we also must consider the result of B. Currently, B returns nothing. In the real world, unless B() cannot fail, you would at least want to either return the error, or at least panic. I'll skip over collecting results for now.
Now we can write our worker function.
func worker(wg *sync.WaitGroup, incoming <-chan string) {
defer wg.Done()
for file := range incoming {
B(A(file))
}
}
Now all we have to do is start 5 such workers, write the incoming files to the channel, close it, and wg.Wait() for the workers to complete.
incoming_work := make(chan string)
var wg sync.WaitGroup
for i := 0; i < 5; i++ {
wg.Add(1)
go worker(&wg, incoming_work)
}
for _, v := range fileList {
incoming_work <- v
}
close(incoming_work)
wg.Wait()
Full example at https://go.dev/play/p/A1H4ArD2LD8
Returning Results.
It's all well and good to be able to kick off goroutines and wait for them to complete. But what if you need results back from your goroutines? In all but the simplest of cases, you would at least want to know if files failed to process so you could investigate the errors.
We have only 5 workers, but we have many files, so we have many results. Each worker will have to return several results. So, another channel. It's usually worth defining a struct for your return:
type result struct {
file string
err error
}
This tells us not just whether there was an error but also clearly defines which file from which the error resulted.
How will we test an error case in our current code? In your example, B always gets the same value from A. If we add A's incoming file name to the path it passes to B, we can mock an error based on a substring. My mocked error will be that file3 fails.
func A(file string) string {
fmt.Println(file, "is processing in func A()...")
fileGenByA := "/path/to/fileGenByA1/" + file
return fileGenByA
}
func B(file string) (r result) {
r.file = file
fmt.Println(file, "is processing in func B()...")
if strings.Contains(file, "file3") {
r.err = fmt.Errorf("Test error")
}
return
}
Our workers will be sending results, but we need to collect them somewhere. main() is busy dispatching work to the workers, blocking on its write to incoming_work when the workers are all busy. So the simplest place to collect the results is another goroutine. Our results collector goroutine has to read from a results channel, print out errors for debugging, and the return the total number of failures so our program can return a final exit status indicating overall success or failure.
failures_chan := make(chan int)
go func() {
var failures int
for result := range results {
if result.err != nil {
failures++
fmt.Printf("File %s failed: %s", result.file, result.err.Error())
}
}
failures_chan <- failures
}()
Now we have another channel to close, and it's important we close it after all workers are done. So we close(results) after we wg.Wait() for the workers.
close(incoming_work)
wg.Wait()
close(results)
if failures := <-failures_chan; failures > 0 {
os.Exit(1)
}
Putting all that together, we end up with this code:
package main
import (
"fmt"
"os"
"strings"
"sync"
)
func A(file string) string {
fmt.Println(file, "is processing in func A()...")
fileGenByA := "/path/to/fileGenByA1/" + file
return fileGenByA
}
func B(file string) (r result) {
r.file = file
fmt.Println(file, "is processing in func B()...")
if strings.Contains(file, "file3") {
r.err = fmt.Errorf("Test error")
}
return
}
func worker(wg *sync.WaitGroup, incoming <-chan string, results chan<- result) {
defer wg.Done()
for file := range incoming {
results <- B(A(file))
}
}
type result struct {
file string
err error
}
func main() {
// assuming that this is the file list read from a directory
fileList := []string{
"/path/to/file1",
"/path/to/file2",
"/path/to/file3",
}
incoming_work := make(chan string)
results := make(chan result)
var wg sync.WaitGroup
for i := 0; i < 5; i++ {
wg.Add(1)
go worker(&wg, incoming_work, results)
}
failures_chan := make(chan int)
go func() {
var failures int
for result := range results {
if result.err != nil {
failures++
fmt.Printf("File %s failed: %s", result.file, result.err.Error())
}
}
failures_chan <- failures
}()
for _, v := range fileList {
incoming_work <- v
}
close(incoming_work)
wg.Wait()
close(results)
if failures := <-failures_chan; failures > 0 {
os.Exit(1)
}
}
And when we run it, we get:
/path/to/file1 is processing in func A()...
/path/to/fileGenByA1//path/to/file1 is processing in func B()...
/path/to/file2 is processing in func A()...
/path/to/fileGenByA1//path/to/file2 is processing in func B()...
/path/to/file3 is processing in func A()...
/path/to/fileGenByA1//path/to/file3 is processing in func B()...
File /path/to/fileGenByA1//path/to/file3 failed: Test error
Program exited.
A final thought: buffered channels.
There is nothing wrong with buffered channels. Especially if you know the overall size of incoming work and results, buffered channels can obviate the results collector goroutine because you can allocate a buffered channel big enough to hold all results. However, I think it's more straightforward to understand this pattern if the channels are unbuffered. The key takeaway is that you don't need to know the number of incoming or outgoing results, which could indeed be different numbers or based on something that can't be predetermined.
func check(name string) string {
resp, err := http.Get(endpoint + name)
if err != nil {
panic(err)
}
defer resp.Body.Close()
body, err := ioutil.ReadAll(resp.Body)
if err != nil {
panic(err)
}
return string(body)
}
func worker(name string, wg *sync.WaitGroup, names chan string) {
defer wg.Done()
var a = check(name)
names <- a
}
func main() {
names := make(chan string)
var wg sync.WaitGroup
for i := 1; i <= 5; i++ {
wg.Add(1)
go worker("www"+strconv.Itoa(i), &wg, names)
}
fmt.Println(<-names)
}
The expected result would be 5 results but, only one executes and the process ends.
Is there something I am missing? New to go.
The endpoint is a generic API that returns json
You started 5 goroutines, but read input for one only. Also, you are not waiting for your goroutines to end.
// If there are only 5 goroutines unconditionally, you don't need the wg
for i := 1; i <= 5; i++ {
go worker("www"+strconv.Itoa(i), names)
}
for i:=1;i<=5;i++ {
fmt.Println(<-names)
}
If, however, you don't know how many goroutines you're waiting, then the waitgroup is necessary.
for i := 1; i <= 5; i++ {
wg.Add(1)
go worker("www"+strconv.Itoa(i), &wg, names)
}
// Read from the channel until it is closed
done:=make(chan struct{})
go func() {
for x:=range names {
fmt.Println(x)
}
// Signal that println is completed
close(done)
}()
// Wait for goroutines to end
wg.Wait()
// Close the channel to terminate the reader goroutine
close(names)
// Wait until println completes
<-done
You are launching 5 goroutines, but are reading from the names channel only once.
fmt.Println(<-names)
As soon as that first channel read is done, main() exits.
That means everything stops before having the time to be executed.
To know more about channels, see "Concurrency made easy" from Dave Cheney:
If you have to wait for the result of an operation, it’s easier to do it yourself.
Release locks and semaphores in the reverse order you acquired them.
Channels aren’t resources like files or sockets, you don’t need to close them to free them.
Acquire semaphores when you’re ready to use them.
Avoid mixing anonymous functions and goroutines
Before you start a goroutine, always know when, and how, it will stop
func GoCountColumns(in chan []string, r chan Result, quit chan int) {
for {
select {
case data := <-in:
r <- countColumns(data) // some calculation function
case <-quit:
return // stop goroutine
}
}
}
func main() {
fmt.Println("Welcome to the csv Calculator")
file_path := os.Args[1]
fd, _ := os.Open(file_path)
reader := csv.NewReader(bufio.NewReader(fd))
var totalColumnsCount int64 = 0
var totallettersCount int64 = 0
linesCount := 0
numWorkers := 10000
rc := make(chan Result, numWorkers)
in := make(chan []string, numWorkers)
quit := make(chan int)
t1 := time.Now()
for i := 0; i < numWorkers; i++ {
go GoCountColumns(in, rc, quit)
}
//start worksers
go func() {
for {
record, err := reader.Read()
if err == io.EOF {
break
}
if err != nil {
log.Fatal(err)
}
if linesCount%1000000 == 0 {
fmt.Println("Adding to the channel")
}
in <- record
//data := countColumns(record)
linesCount++
//totalColumnsCount = totalColumnsCount + data.ColumnCount
//totallettersCount = totallettersCount + data.LettersCount
}
close(in)
}()
for i := 0; i < numWorkers; i++ {
quit <- 1 // quit goroutines from main
}
close(rc)
for i := 0; i < linesCount; i++ {
data := <-rc
totalColumnsCount = totalColumnsCount + data.ColumnCount
totallettersCount = totallettersCount + data.LettersCount
}
fmt.Printf("I counted %d lines\n", linesCount)
fmt.Printf("I counted %d columns\n", totalColumnsCount)
fmt.Printf("I counted %d letters\n", totallettersCount)
elapsed := time.Now().Sub(t1)
fmt.Printf("It took %f seconds\n", elapsed.Seconds())
}
My Hello World is a program that reads a csv file and passes it to a channel. Then the goroutines should consume from this channel.
My Problem is I have no idea how to detect from the main thread that all data was processed and I can exit my program.
on top of other answers.
Take (great) care that closing a channel should happen on the write call site, not the read call site. In GoCountColumns the r channel being written, the responsibility to close the channel are onto GoCountColumns function. Technical reasons are, it is the only actor knowing for sure that the channel will not being written anymore and thus is safe for close.
func GoCountColumns(in chan []string, r chan Result, quit chan int) {
defer close(r) // this line.
for {
select {
case data := <-in:
r <- countColumns(data) // some calculation function
case <-quit:
return // stop goroutine
}
}
}
The function parameters naming convention, if i might say, is to have the destination as first parameter, the source as second, and others parameters along. The GoCountColumns is preferably written:
func GoCountColumns(dst chan Result, src chan []string, quit chan int) {
defer close(dst)
for {
select {
case data := <-src:
dst <- countColumns(data) // some calculation function
case <-quit:
return // stop goroutine
}
}
}
You are calling quit right after the process started. Its illogical. This quit command is a force exit sequence, it should be called once an exit signal is detected, to force exit the current processing in best state possible, possibly all broken. In other words, you should be relying on the signal.Notify package to capture exit events, and notify your workers to quit. see https://golang.org/pkg/os/signal/#example_Notify
To write better parallel code, list at first the routines you need to manage the program lifetime, identify those you need to block onto to ensure the program has finished before exiting.
In your code, exists read, map. To ensure complete processing, the program main function must ensure that it captures a signal when map exits before exiting itself. Notice that the read function does not matter.
Then, you will also need the code required to capture an exit event from user input.
Overall, it appears we need to block onto two events to manage lifetime. Schematically,
func main(){
go read()
go map(mapDone)
go signal()
select {
case <-mapDone:
case <-sig:
}
}
This simple code is good to process or die. Indeed, when the user event is caught, the program exits immediately, without giving a chance to others routines to do something required upon stop.
To improve those behaviors, you need first a way to signal the program wants to leave to other routines, second, a way to wait for those routines to finish their stop sequence before leaving.
To signal exit event, or cancellation, you can make use of a context.Context, pass it around to the workers, make them listen to it.
Again, schematically,
func main(){
ctx,cancel := context.WithCancel(context.WithBackground())
go read(ctx)
go map(ctx,mapDone)
go signal()
select {
case <-mapDone:
case <-sig:
cancel()
}
}
(more onto read and map later)
To wait for completion, many things are possible, for as long as they are thread safe. Usually, a sync.WaitGroup is being used. Or, in cases like yours where there is only one routine to wait for, we can re use the current mapDone channel.
func main(){
ctx,cancel := context.WithCancel(context.WithBackground())
go read(ctx)
go map(ctx,mapDone)
go signal()
select {
case <-mapDone:
case <-sig:
cancel()
<-mapDone
}
}
That is simple and straight forward. But it is not totally correct. The last mapDone chan might block forever and make the program unstoppable. So you might implement a second signal handler, or a timeout.
Schematically, the timeout solution is
func main(){
ctx,cancel := context.WithCancel(context.WithBackground())
go read(ctx)
go map(ctx,mapDone)
go signal()
select {
case <-mapDone:
case <-sig:
cancel()
select {
case <-mapDone:
case <-time.After(time.Second):
}
}
}
You might also accumulate a signal handling and a timeout in the last select.
Finally, there are few things to tell about read and map context listening.
Starting with map, the implementation requires to read for context.Done channel regularly to detect cancellation.
It is the easy part, it requires to only update the select statement.
func GoCountColumns(ctx context.Context, dst chan Result, src chan []string) {
defer close(dst)
for {
select {
case <-ctx.Done():
<-time.After(time.Minute) // do something more useful.
return // quit. Notice the defer will be called.
case data := <-src:
dst <- countColumns(data) // some calculation function
}
}
}
Now the read part is bit more tricky as it is an IO it does not provide a selectable programming interface and listening to the context channel cancellation might seem contradictory. It is. As IOs are blocking, impossible to listen the context. And while reading from the context channel, impossible to read the IO. In your case, the solution requires to understand that your read loop is not relevant to your program lifetime (recall we only listen onto mapDone?), and that we can just ignore the context.
In other cases, if for example you wanted to restart at last byte read (so at every read, we increment an n, counting bytes, and we want to save that value upon stop). Then, a new routine is required to be started, and thus, multiple routines are to wait for completion. In such cases a sync.WaitGroup will be more appropriate.
Schematically,
func main(){
var wg sync.WaitGroup
processDone:=make(chan struct{})
ctx,cancel := context.WithCancel(context.WithBackground())
go read(ctx)
wg.Add(1)
go saveN(ctx,&wg)
wg.Add(1)
go map(ctx,&wg)
go signal()
go func(){
wg.Wait()
close(processDone)
}()
select {
case <-processDone:
case <-sig:
cancel()
select {
case <-processDone:
case <-time.After(time.Second):
}
}
}
In this last code, the waitgroup is being passed around. Routines are responsible to call for wg.Done(), when all routines are done, the processDone channel is closed, to signal the select.
func GoCountColumns(ctx context.Context, dst chan Result, src chan []string, wg *sync.WaitGroup) {
defer wg.Done()
defer close(dst)
for {
select {
case <-ctx.Done():
<-time.After(time.Minute) // do something more useful.
return // quit. Notice the defer will be called.
case data := <-src:
dst <- countColumns(data) // some calculation function
}
}
}
It is undecided which patterns is preferred, but you might also see waitgroup being managed at call sites only.
func main(){
var wg sync.WaitGroup
processDone:=make(chan struct{})
ctx,cancel := context.WithCancel(context.WithBackground())
go read(ctx)
wg.Add(1)
go func(){
defer wg.Done()
saveN(ctx)
}()
wg.Add(1)
go func(){
defer wg.Done()
map(ctx)
}()
go signal()
go func(){
wg.Wait()
close(processDone)
}()
select {
case <-processDone:
case <-sig:
cancel()
select {
case <-processDone:
case <-time.After(time.Second):
}
}
}
Beyond all of that and OP questions, you must always evaluate upfront the pertinence of parallel processing for a given task. There is no unique recipe, practice and measure your code performances. see pprof.
There is way too much going on in this code. You should restructure your code into short functions that serve specific purposes to make it possible for someone to help you out easily (and help yourself as well).
You should read the following Go article, which goes into concurrency patterns:
https://blog.golang.org/pipelines
There are multiple ways to make one go-routine wait on some other work to finish. The most common ways are with wait groups (example I have provided) or channels.
func processSomething(...) {
...
}
func main() {
workers := &sync.WaitGroup{}
for i := 0; i < numWorkers; i++ {
workers.Add(1) // you want to call this from the calling go-routine and before spawning the worker go-routine
go func() {
defer workers.Done() // you want to call this from the worker go-routine when the work is done (NOTE the defer, which ensures it is called no matter what)
processSomething(....) // your async processing
}()
}
// this will block until all workers have finished their work
workers.Wait()
}
You can use a channel to block main until completion of a goroutine.
package main
import (
"log"
"time"
)
func main() {
c := make(chan struct{})
go func() {
time.Sleep(3 * time.Second)
log.Println("bye")
close(c)
}()
// This blocks until the channel is closed by the routine
<-c
}
No need to write anything into the channel. Reading is blocked until data is read or, which we use here, the channel is closed.
I've just installed Go on Mac, and here's the code
package main
import (
"fmt"
"time"
)
func Product(ch chan<- int) {
for i := 0; i < 100; i++ {
fmt.Println("Product:", i)
ch <- i
}
}
func Consumer(ch <-chan int) {
for i := 0; i < 100; i++ {
a := <-ch
fmt.Println("Consmuer:", a)
}
}
func main() {
ch := make(chan int, 1)
go Product(ch)
go Consumer(ch)
time.Sleep(500)
}
I "go run producer_consumer.go", there's no output on screen, and then it quits.
Any problem with my program ? How to fix it ?
This is a rather verbose answer, but to put it simply:
Using time.Sleep to wait until hopefully other routines have completed their jobs is bad.
The consumer and producer shouldn't know anything about each other, apart from the type they exchange over the channel. Your code relies on both consumer and producer knowing how many ints will be passed around. Not a realistic scenario
Channels can be iterated over (think of them as a thread-safe, shared slice)
channels should be closed
At the bottom of this rather verbose answer where I attempt to explain some basic concepts and best practices (well, better practices), you'll find your code rewritten to work and display all the values without relying on time.Sleep. I've not tested that code, but should be fine
Right, there's a couple of problems here. Just as a bullet-list:
Your channel is buffered to 1, which is fine, but it's not necessary
Your channel is never closed
You're waiting 500ns, then exit regardless of the routines having completed, or even started processing for that matter.
There's no centralised control on over the routines, once you've started them, you have 0 control. If you hit ctrl+c, you might want to cancel routines when writing code that'll handle important data. Check signal handling, and context for this
Channel buffer
Seeing as you already know how many values you're going to push onto your channel, why not simply create ch := make(chan int, 100)? That way your publisher can continue to push messages onto the channel, regardless of what the consumer does.
You don't need to do this, but adding a sensible buffer to your channel, depending on what you're trying to do, is definitely worth checking out. At the moment, though, both routines are using fmt.Println & co, which is going to be a bottleneck either way. Printing to STDOUT is thread-safe, and buffered. This means that each call to fmt.Print* is going to acquire a lock, to avoid text from both routines to be combined.
Closing the channel
You could simply push all the values onto your channel, and then close it. This is, however, bad form. The rule of thumb WRT channels is that channels are created and closed in the same routine. Meaning: you're creating the channel in the main routine, that's where it should be closed.
You need a mechanism to sync up, or at least keep tabs on whether or not your routines have completed their job. That's done using the sync package, or through a second channel.
// using a done channel
func produce(ch chan<- int) <-chan struct{} {
done := make(chan struct{})
go func() {
for i := 0; i < 100; i++ {
ch <- i
}
// all values have been published
// close done channel
close(done)
}()
return done
}
func main() {
ch := make(chan int, 1)
done := produce(ch)
go consume(ch)
<-done // if producer has done its thing
close(ch) // we can close the channel
}
func consume(ch <-chan int) {
// we can now simply loop over the channel until it's closed
for i := range ch {
fmt.Printf("Consumed %d\n", i)
}
}
OK, but here you'll still need to wait for the consume routine to complete.
You may have already noticed that the done channel technically isn't closed in the same routine that creates it either. Because the routine is defined as a closure, however, this is an acceptable compromise. Now let's see how we could use a waitgroup:
import (
"fmt"
"sync"
)
func product(wg *sync.WaitGroup, ch chan<- int) {
defer wg.Done() // signal we've done our job
for i := 0; i < 100; i++ {
ch <- i
}
}
func main() {
ch := make(chan int, 1)
wg := sync.WaitGroup{}
wg.Add(1) // I'm adding a routine to the channel
go produce(&wg, ch)
wg.Wait() // will return once `produce` has finished
close(ch)
}
OK, so this looks promising, I can have the routines tell me when they've finished their tasks. But if I add both consumer and producer to the waitgroup, I can't simply iterate over the channel. The channel will only ever get closed if both routines invoke wg.Done(), but if the consumer is stuck looping over a channel that'll never get closed, then I've created a deadlock.
Solution:
A hybrid would be the easiest solution at this point: Add the consumer to a waitgroup, and use the done channel in the producer to get:
func produce(ch chan<- int) <-chan struct{} {
done := make(chan struct{})
go func() {
for i := 0; i < 100; i++ {
ch <- i
}
close(done)
}()
return done
}
func consume(wg *sync.WaitGroup, ch <-chan int) {
defer wg.Done()
for i := range ch {
fmt.Printf("Consumer: %d\n", i)
}
}
func main() {
ch := make(chan int, 1)
wg := sync.WaitGroup{}
done := produce(ch)
wg.Add(1)
go consume(&wg, ch)
<- done // produce done
close(ch)
wg.Wait()
// consumer done
fmt.Println("All done, exit")
}
I have changed slightly(expanded time.Sleep) your code. Works fine on my Linux x86_64
func Product(ch chan<- int) {
for i := 0; i < 10; i++ {
fmt.Println("Product:", i)
ch <- i
}
}
func Consumer(ch <-chan int) {
for i := 0; i < 10; i++ {
a := <-ch
fmt.Println("Consmuer:", a)
}
}
func main() {
ch := make(chan int, 1)
go Product(ch)
go Consumer(ch)
time.Sleep(10000)
}
Output
go run s1.go
Product: 0
Product: 1
Product: 2
As JimB hinted at, time.Sleep takes a time.Duration, not an integer. The godoc shows an example of how to call this correctly. In your case, you probably want:
time.Sleep(500 * time.Millisecond)
The reason that your program is exiting quickly (but not giving you an error) is due to the (somewhat surprising) way that time.Duration is implemented.
time.Duration is simply a type alias for int64. Internally, it uses the value to represent the duration in nanoseconds. When you call time.Sleep(500), the compiler will gladly interpret the numeric literal 500 as a time.Duration. Unfortunately, that means 500 ns.
time.Millisecond is a constant equal to the number of nanoseconds in a millisecond (1,000,000). The nice thing is that requiring you to do that multiplication explicitly makes it obvious to that caller what the units are on that argument. Unfortunately, time.Sleep(500) is perfectly valid go code but doesn't do what most beginners would expect.
there I am having some fun with GO and am just very curious about something I am trying to achieve. I have a package here that just gets a feed from Reddit noting special. When I receive the parent JSON file I would then like to retrieve child data. If you see the code below I launch a series of goroutines which I then block, waiting for them to finish using the sync package. What I would like is once the first series of goroutines finish the second series of goroutines using the previous results. There are a few was I was thinking such as for loop and switch statement. But what is the best and most efficient way to do this
func (m redditMatcher) retrieve(dataPoint *collect.DataPoint) (*redditCommentsDocument, error) {
if dataPoint.URI == "" {
return nil, errors.New("No datapoint uri provided")
}
// Get options data -> returns empty struct
// if no options are present
options := m.options(dataPoint.Options)
if len(options.subreddit) <= 0 {
return nil, fmt.Errorf("Matcher fail: Reddit - Subreddit option manditory\n")
}
// Create an buffered channel to receive match results to display.
results := make(chan *redditCommentsDocument, len(options.subreddit))
// Generte requests for each subreddit produced using
// goroutines concurency model
for _, s := range options.subreddit {
// Set the number of goroutines we need to wait for while
// they process the individual subreddit.
waitGroup.Add(1)
go retrieveComment(s.(string), dataPoint.URI, results)
}
// Launch a goroutine to monitor when all the work is done.
waitGroup.Wait()
// HERE I WOULD TO CALL ANOTHER SERIES OFF GOROUTINES
for commentFeed := range results {
// HERE I WOULD LIKE TO CALL GO ROUTINES USING THE RESULTS
// PROVIDED FROM THE PREVIOUS FUNCTIONS
waitGroup.Add(1)
log.Printf("%s\n\n", commentFeed.Kind)
}
waitGroup.Wait()
close(results)
return nil, nil
}
If you want to wait for all of the first series to complete, then you can just pass in a pointer to your waitgroup, wait after calling all the first series functions (which will call Done() on the waitgroup), and then start the second series. Here's a runnable annotated code example that does that:
package main
import(
"fmt"
"sync"
"time"
)
func first(wg *sync.WaitGroup) {
defer wg.Done()
fmt.Println("Starting a first")
// do some stuff... here's a sleep to make some time pass
time.Sleep(250 * time.Millisecond)
fmt.Println("Done with a first")
}
func second(wg *sync.WaitGroup) {
defer wg.Done()
fmt.Println("Starting a second")
// do some followup stuff
time.Sleep(50 * time.Millisecond)
fmt.Println("Done with a second")
}
func main() {
wg := new(sync.WaitGroup) // you'll need a pointer to avoid a copy when passing as parameter to goroutine function
// let's start 5 firsts and then wait for them to finish
wg.Add(5)
go first(wg)
go first(wg)
go first(wg)
go first(wg)
go first(wg)
wg.Wait()
// now that we're done with all the firsts, let's do the seconds
// how about two of these
wg.Add(2)
go second(wg)
go second(wg)
wg.Wait()
fmt.Println("All done")
}
It outputs:
Starting a first
Starting a first
Starting a first
Starting a first
Starting a first
Done with a first
Done with a first
Done with a first
Done with a first
Done with a first
Starting a second
Starting a second
Done with a second
Done with a second
All done
But if you want a "second" to start as soon as a "first" has finished, just have the seconds executing blocking receive operators on the channel while the firsts are running:
package main
import(
"fmt"
"math/rand"
"sync"
"time"
)
func first(res chan int, wg *sync.WaitGroup) {
defer wg.Done()
fmt.Println("Starting a first")
// do some stuff... here's a sleep to make some time pass
time.Sleep(250 * time.Millisecond)
fmt.Println("Done with a first")
res <- rand.Int() // this will block until a second is ready
}
func second(res chan int, wg *sync.WaitGroup) {
defer wg.Done()
fmt.Println("Wait for a value from first")
val := <-res // this will block until a first is ready
fmt.Printf("Starting a second with val %d\n", val)
// do some followup stuff
time.Sleep(50 * time.Millisecond)
fmt.Println("Done with a second")
}
func main() {
wg := new(sync.WaitGroup) // you'll need a pointer to avoid a copy when passing as parameter to goroutine function
ch := make(chan int)
// lets run first twice, and second once for each first result, for a total of four workers:
wg.Add(4)
go first(ch, wg)
go first(ch, wg)
// don't wait before starting the seconds
go second(ch, wg)
go second(ch, wg)
wg.Wait()
fmt.Println("All done")
}
Which outputs:
Wait for a value from first
Starting a first
Starting a first
Wait for a value from first
Done with a first
Starting a second with val 5577006791947779410
Done with a first
Starting a second with val 8674665223082153551
Done with a second
Done with a second
All done