I am using golang to implement a simple event driven worker. It's like this:
go func() {
for {
select {
case data := <-ch:
time.Sleep(1)
someGlobalMap[data.key] = data.value
}
}
}()
And the main function will create several goroutines, and each of them will do thing like this:
ch <- data
fmt.Println(someGlobalMap[data.key])
As you can see that, because my worker need some time to do the work, I will got nil result in my main function.How can I control this workflow properly?
EDIT: I may have misread your question, I see that you mention that main will start many producer goroutines. I thought it was many consumer goroutines, and a single producer. Leaving the answer here in case it can be useful for others looking for that pattern, though the bullet points still apply to your case.
So if I understand correctly your use-case, you can't expect to send on a channel and read the results immediately after. You don't know when the worker will process that send, you need to communicate between the goroutines, and that is done with channels. Assuming just calling a function with a return value doesn't work in your scenario, if you really need to send to a worker, then block until you have the result, you could send a channel as part of the data structure, and block-receive on it after the send, i.e.:
resCh := make(chan Result)
ch <- Data{key, value, resCh}
res := <- resCh
But you should probably try to break down the work as a pipeline of independent steps instead, see the blog post that I linked to in the original answer.
Original answer where I thought it was a single producer - multiple consumers/workers pattern:
This is a common pattern for which Go's goroutines and channels semantics are very well suited. There are a few things you need to keep in mind:
The main function will not automatically wait for goroutines to finish. If there's nothing else to do in the main, then the program exits and you don't have your results.
The global map that you use is not thread-safe. You need to synchronize access via a mutex, but there's a better way - use an output channel for results, which is already synchronized.
You can use a for..range over a channel, and you can safely share a channel between multiple goroutines. As we'll see, that makes this pattern quite elegant to write.
Playground: https://play.golang.org/p/WqyZfwldqp
For more on Go pipelines and concurrency patterns, to introduce error handling, early cancellation, etc.: https://blog.golang.org/pipelines
Commented code for the use-case you mention:
// could be a command-line flag, a config, etc.
const numGoros = 10
// Data is a similar data structure to the one mentioned in the question.
type Data struct {
key string
value int
}
func main() {
var wg sync.WaitGroup
// create the input channel that sends work to the goroutines
inch := make(chan Data)
// create the output channel that sends results back to the main function
outch := make(chan Data)
// the WaitGroup keeps track of pending goroutines, you can add numGoros
// right away if you know how many will be started, otherwise do .Add(1)
// each time before starting a worker goroutine.
wg.Add(numGoros)
for i := 0; i < numGoros; i++ {
// because it uses a closure, it could've used inch and outch automaticaly,
// but if the func gets bigger you may want to extract it to a named function,
// and I wanted to show the directed channel types: within that function, you
// can only receive from inch, and only send (and close) to outch.
//
// It also receives the index i, just for fun so it can set the goroutines'
// index as key in the results, to show that it was processed by different
// goroutines. Also, big gotcha: do not capture a for-loop iteration variable
// in a closure, pass it as argument, otherwise it very likely won't do what
// you expect.
go func(i int, inch <-chan Data, outch chan<- Data) {
// make sure WaitGroup.Done is called on exit, so Wait unblocks
// eventually.
defer wg.Done()
// range over a channel gets the next value to process, safe to share
// concurrently between all goroutines. It exits the for loop once
// the channel is closed and drained, so wg.Done will be called once
// ch is closed.
for data := range inch {
// process the data...
time.Sleep(10 * time.Millisecond)
outch <- Data{strconv.Itoa(i), data.value}
}
}(i, inch, outch)
}
// start the goroutine that prints the results, use a separate WaitGroup to track
// it (could also have used a "done" channel but the for-loop would be more complex, with a select).
var wgResults sync.WaitGroup
wgResults.Add(1)
go func(ch <-chan Data) {
defer wgResults.Done()
// to prove it processed everything, keep a counter and print it on exit
var n int
for data := range ch {
fmt.Println(data.key, data.value)
n++
}
// for fun, try commenting out the wgResults.Wait() call at the end, the output
// will likely miss this line.
fmt.Println(">>> Processed: ", n)
}(outch)
// send work, wherever that comes from...
for i := 0; i < 1000; i++ {
inch <- Data{"main", i}
}
// when there's no more work to send, close the inch, so the goroutines will begin
// draining it and exit once all values have been processed.
close(inch)
// wait for all goroutines to exit
wg.Wait()
// at this point, no more results will be written to outch, close it to signal
// to the results goroutine that it can terminate.
close(outch)
// and wait for the results goroutine to actually exit, otherwise the program would
// possibly terminate without printing the last few values.
wgResults.Wait()
}
In real-life scenarios, where the amount of work is not known ahead of time, the closing of the in-channel could come from e.g. a SIGINT signal. Just make sure no code path can send work after the channel was closed as that would panic.
Related
I am implementing a (sort of a) combinatorial backtracking algorithm in go utilising goroutines. My problem can be represented as a tree with a certain degree/spread where I want to visit each leaf and calculate a result depending on the path taken. On a given level, I want to spawn goroutines to process the subproblems concurrently, i.e. if I have a tree with degree 3 and I want to start the concurrency after level 2, I'd spawn 3*3=9 goroutines that proceed with processing the subproblems concurrently.
func main() {
cRes := make(chan string, 100)
res := []string{}
numLevels := 5
spread := 3
startConcurrencyAtLevel := 2
nTree("", numLevels, spread, startConcurrencyAtLevel, cRes)
for {
select {
case r := <-cRes:
res = append(res, r)
case <-time.After(10 * time.Second):
fmt.Println("Caculation timed out")
fmt.Println(len(res), math.Pow(float64(spread), float64(numLevels)))
return
}
}
}
func nTree(path string, maxLevels int, spread int, startConcurrencyAtLevel int, cRes chan string) {
if len(path) == maxLevels {
// some longer running task here associated with the found path, also using a lookup table
// real problem actually returns not the path but the result if it satisfies some condition
cRes <- path
return
}
for i := 1; i <= spread; i++ {
nextPath := path + fmt.Sprint(i)
if len(path) == startConcurrencyAtLevel {
go nTree(nextPath, maxLevels, spread, startConcurrencyAtLevel, cRes)
} else {
nTree(nextPath, maxLevels, spread, startConcurrencyAtLevel, cRes)
}
}
}
The above code works, however I rely on the for select statement timing out. I am looking for a way to continue with main() as soon as all goroutines have finished, i.e. all subproblems have been processed.
I already came up with two possible (unpreferred/unelegant) solutions:
Using a mutex protected result map + a waitgroup instead of a channel-based approach should do the trick, but I'm curious if there is a neat solution with channels.
Using a quit channel (of type int). Every time a goroutine is spawned, the quit channel gets a +1 int, everytime a comptutation finished in a leaf, it gets a -1 int and the caller sums up the values. See the following snippet, this however is not a good solution as it (rather blatantly) runs into timing issues I don't want to deal with. It quits prematurely if for instance the first goroutine finishes before another one has been spawned.
for {
select {
case q := <-cRunningRoutines:
runningRoutines += q
if runningRoutines == 0 {
fmt.Println("Calculation complete")
return res
}
// ...same cases as above
}
Playground: https://go.dev/play/p/9jzeCvl8Clj
Following questions:
Is doing recursive calls from a function started as a goroutine to itself a valid approach?
What would be an idiomatic way of reading the results from cRes until all spawned goroutines finish? I read somewhere that channels should be closed when computation is done, but I just cant wrap my head around how to integrate it in this case.
Happy about any ideas, thanks!
reading the description and the snippet I am not able to understand exactly what you are trying to achieve, but I have some hints and patterns for channels that I use daily and think are helpful.
the context package is very helpful to manage goroutines' state in a safe way. In your example, time.After is used to end the main program, but in non-main functions it could be leaking goroutines: if instead you use context.Context and pass it into the gorotuines (it's usually passed first arg of a function) you will be able to control cancellation of downstream calls. This explains it briefly.
it is common practice to create channels (and return them) in functions that produce messages and send them in the channel. The same function should be responsible of closing the channel, e,g, with defer close(channel) when it's done writing.
This is handy because when the channel is buffered, closing it is possible even when it still has data in it: the Go runtime will actually wait until all messages are polledbefore closing. For unbuffered channels, the function won't be able to send a message over the channel until a reader of the channel is ready to poll it, thus won;t be able to exit.
This is an example (without recursion).
We can close the channel both when it is buffered or unbuffered in this example, because the send will block until the for := range on the channel in the main goroutine reads from it.
This is a variant for the same principle, with the channel passed as argument.
we can use sync.WaitGroup in tandem with channels, to signal completion for individual goroutines, and let know to an "orchestrating" goroutine that the channel can be closed, because all message producers are done sending data into the channel. The same considerations as point 1 apply on the close operation.
This is an example showing the use of waitGroup and external closer of channel.
channels can have a direction! notice that in the example, I added/removed arrows next to the channel (e.g. <-chan string, or chan<- string) when passing them in/outside functions. This tells the compiler that a channel is read or write only respectively in the scope of that function.
This is helping in 2 ways:
the compiler will produce more efficient code, as the channels with direction will have a single lock instead of 2.
the signature of the function describes if it will only use the channel for writing to it (and possibly close()) or reading: remember that reading from a channel with a range automatically stops the iteration when the channel is closed.
you can build channels of channels: make(chan chan string) is a valid (and helpful) construct to build processing pipelines.
A common usage of it is a fan-in goroutine that is collecting multiple outputs of a series of channel-producing goroutines.
This is an example of how to use them.
In essence, to answer your initial questions:
Is doing recursive calls from a function started as a goroutine to itself a valid approach?
If you really need recursion, it's probably better to handle it separately from the concurrent code: create a dedicated function that recursively sends data into a channel, and orchestrate the closing of the channel in the caller.
What would be an idiomatic way of reading the results from cRes until all spawned goroutines finish? I read somewhere that channels should be closed when computation is done, but I just cant wrap my head around how to integrate it in this case.
A good reference is Go Concurrency Patterns: Pipelines and cancellation: this is a rather old post (before the context package existedin the std lib) and I think Parallel digestion is what you're looking for to address the original question.
As mentioned by torek, I spun off an anonymous function closing the channel after the waitgroup finished waiting. Also needed some logic around calling the wg.Done() of the spawned goroutines only after the the recursion of the goroutine spawning level returns.
Generally I think this is a useful idiom (correct me if I'm wrong :))
Playground: https://go.dev/play/p/bQjHENsZL25
func main() {
cRes := make(chan string, 100)
numLevels := 3
spread := 3
startConcurrencyAtLevel := 2
var wg sync.WaitGroup
nTree("", numLevels, spread, startConcurrencyAtLevel, cRes, &wg)
go func() {
// time.Sleep(1 * time.Second) // edit: code should work without this initial sleep
wg.Wait()
close(cRes)
}()
for r := range cRes {
fmt.Println(r)
}
fmt.Println("Done!")
}
func nTree(path string, maxLevels int, spread int, startConcurrencyAtLevel int, cRes chan string, wg *sync.WaitGroup) {
if len(path) == maxLevels {
// some longer running task here associated with the found path
cRes <- path
return
}
for i := 1; i <= spread; i++ {
nextPath := path + fmt.Sprint(i)
if len(path) == startConcurrencyAtLevel {
go nTree(nextPath, maxLevels, spread, startConcurrencyAtLevel, cRes, wg)
} else {
nTree(nextPath, maxLevels, spread, startConcurrencyAtLevel, cRes, wg)
}
}
}
I currently have two functions pushNodes(node) and updateNodes(node). In the pushNodes function, I am pushing values through a channel that are to be used in updateNodes. In order to have accurate channel values saved, I need all pushNodes go routines to finish before starting updateNodes(). How can I still access the channel values after the GoRoutines have finished executing?
I continuously get "fatal error: all goroutines are asleep - deadlock!". Please let me know how I can get these values from the channel. Is there a better/alternate way to do this?
//pushNodes is a function that will push the nodes values
func pushNodes(node Node) {
defer wg.Done()
fmt.Printf("Pushing: %d \n", node.number)
//Choose a random peer node
var randomnode int = rand.Intn(totalnodes)
for randomnode == node.number {
rand.Seed(time.Now().UnixNano())
randomnode = rand.Intn(totalnodes)
}
//If the current node is infected, send values through the channel
if node.infected {
sentchanneldata := ChannelData{infected: true, message: node.message}
allnodes[randomnode].channel <- sentchanneldata
fmt.Printf("Node %d sent a value of %t and %s to node %d!\n", node.number, sentchanneldata.infected, sentchanneldata.message, allnodes[randomnode].number)
}
//updateNodes is a function that will update the nodes values
func updateNodes(node Node) {
defer wg.Done()
fmt.Printf("Updating: %d\n", node.number)
//get value through node channel
receivedchanneldata := <-node.channel
fmt.Printf("Node %d received a value of %t and %s!\n", node.number, receivedchanneldata.infected, receivedchanneldata.message)
// update value
if receivedchanneldata.infected == true {
node.infected = true
}
if receivedchanneldata.message != "" {
node.message = receivedchanneldata.message
}
fmt.Printf("Update successful!\n")
}
//Part of main functions
wg.Add(totalnodes)
for node := range allnodes {
go pushNodes(allnodes[node])
}
wg.Wait()
fmt.Println("Infect function done!")
wg.Add(totalnodes)
for node := range allnodes {
go updateNodes(allnodes[node])
}
wg.Wait()
How can I still access the channel values after the GoRoutines have finished executing?
A channel's existence, including any data that have been shoved into it, is independent of the goroutines that might read from or write to it, provided that at least one goroutine still exists that can read from and/or write to it. (Once all such goroutines are gone, the channel will—eventually—be GC'ed.)
Your code sample is unusable (as already noted) so we can't say precisely where you have gone wrong, but you'll get the kind of fatal message you report here:
fatal error: all goroutines are asleep - deadlock!
if you attempt to read from a channel in the last runnable goroutine, such that this goroutine goes to sleep to await a message on that channel, in such a way that the rest of the Go runtime can determine for certain that no currently-asleep goroutine will ever wake up and deliver a message on that channel. For instance, suppose you have 7 total goroutines running right as one of them reaches the following line of code:
msg = <-ch
where ch is an open channel with no data available right now. One of those 7 goroutines reaches this line and blocks ("goes to sleep"), waiting for one of the remaining six goroutines to do:
ch <- whatever
which would wake up that 7th goroutine. So now there are only 6 goroutines that can write on ch or close ch. If those six remaining goroutines also pass through the same line, one at a time or several or all at once, with none of them ever sending on the channel or closing it, those remaining goroutines will also block. When the last one of them blocks, the runtime will realize that the program is stuck, and panic.
If, however, only five of the remaining six goroutines block like this, and then the sixth one runs though a line reading:
close(ch)
that close operation will close the channel, causing all six of the "stuck asleep" goroutines to receive "end of data" represented by a zero-valued "fake" message msg. You can also use the two-valued form of receive:
msg, ok = <-ch
Here ok gets true if the channel isn't closed and msg contains a real message, but gets false if the channel is closed and msg now contains a zero-valued "fake" message.
Thus, you can either:
close the channel to indicate that you plan not to send anything else, or
carefully match up the number of "receive from channel" operations to the number of "send message on channel" operations.
The former is the norm with channels where there's no way to know in advance how many messages should be sent on the channel. It can still be used even if you do know. A typical construct for doing the close is:
ch := make(chan T) // for some type T
// do any other setup that is appropriate
var wg sync.WaitGroup
wg.add(N) // for some number N
// spin off some number of goroutines N, each of which may send
// any number of messages on the channel
for i := 0; i < N; i++ {
go doSomething(&wg, ch)
// in doSomething, call wg.Done() when done sending on ch
}
go func() {
wg.Wait() // wait for all N goroutines to finish
close(ch) // then, close the channel
}()
// Start function(s) that receive from the channel, either
// inline or in more goroutines here; have them finish when
// they see that the channel is closed.
This pattern relies on the ability to create an extra N+1'th goroutine—that's the anonymous function go func() { ... }() sequence—whose entire job in life is to wait for all the senders to say I am done sending. Each sender does that by calling wg.Done() once. That way, no sender has any special responsibility for closing the channel: they all just write and then announce "I'm done writing" when they are done writing. One goroutine has one special responsibility: it waits for all senders to have announced "I'm done writing", and then it closes the channel and exits, having finished its one job in life.
All receivers—whether that's one or many—now have an easy time knowing when nobody will ever send anything any more, because they see a closed channel at that point. So if most of the work is on the sending side, you can even use the main goroutine here with a simple for ... range ch loop.
Suppose I have a simple loop which does sequential tests like this.
for f := 1; f <= 1000; f++ {
if doTest(f) {
break
}
}
I loop through range of numbers and do a test for each number. If test fails for one number, I break and exit the main thread. Simple enough.
Now, how do correctly feed the test numbers in say four or several go routines. Basically, I want to test the numbers from 1 to 1000 in batches of 4 (or whatever number of go routines is).
Do I create 4 routines reading from one channel and feed the numbers sequentially into this channel? Or do I make 4 routines with an individual channel?
And another question. How do I stop all 4 routines if one of them fails the test? I've been reading some texts on channels but I cannot put the pieces together.
You can create a producer/consumer system: https://play.golang.org/p/rks0gB3aDb
func main() {
ch := make(chan int)
clients := 4
// make it buffered, so all clients can fail without hanging
notifyCh := make(chan struct{}, clients)
go produce(100, ch, notifyCh)
var wg sync.WaitGroup
wg.Add(clients)
for i := 0; i < clients; i++ {
go func() {
consumer(ch, notifyCh)
wg.Done()
}()
}
wg.Wait()
}
func consumer(in chan int, notifyCh chan struct{}) {
fmt.Printf("Start consumer\n")
for i := range in {
<-time.After(100 * time.Millisecond)
if i == 42 {
fmt.Printf("%d fails\n", i)
notifyCh <- struct{}{}
return
} else {
fmt.Printf("%d\n", i)
}
}
fmt.Printf("Consumer stopped working\n")
}
func produce(N int, out chan int, notifyCh chan struct{}) {
for i := 0; i < N; i++ {
select {
case out <- i:
case <-notifyCh:
close(out)
return
}
}
close(out)
}
The producer pushes numbers from 0 to 99 to the channel, the consumer consumes until the channel is closed. In main we create 4 clients and add them to a waitgroup to reliably check if every goroutine returned.
Every consumer can signal on the notifyCh, the producer stops working and no further numbers are generated, therefor all consumers return after their current number.
There's also an option to create 4 go routines, wait for all of them to return, start the next 4 go routines. But this adds quite an overhead on waiting.
Since you mentioned prime numbers, here's a really cool prime seive: https://golang.org/doc/play/sieve.go
Whether you will create one channel common or a channel per routines depend on what you want.
If you want only put some numbers (or more general - requests) inside and you don't care which goroutine serve that, than of course is better to share a channel. In case when you want for example first 250 request to be served by goroutine1, than of course you cannot share a channel.
For channel is a good practice use it as input or output. And the simples thing how sender can sent, that he is finished is close the channel. Good article about that is https://blog.golang.org/pipelines
What is not mentiond in the question - is you need also another channel (or channels) or or any other communication primitive to get results. And here is the channel most interesting than to feeding.
What information should be sent - it should be sent, a bool after every doTest, or just know when everthing was done (it this case neither bool is not necessary just close a channel)?
If you prefer program at first fail. Than I would prefer use buffered shared channel to feed the numbers. Don't forget to close it, when all numbers will be feed.
And another unbuffered chan to let main thread know, that tests are done. It can be channel, there you only put the number, where test failed, or if you also want a positive result - channel of struct containing number and result, or any other informantion returned from doTest.
Very good article about channel is also http://dave.cheney.net/2014/03/19/channel-axioms
Each of your four goroutines can report a failure (by sending error and closing channel). But gotcha is what goroutines should do, when all numbers passed and feeding channel is closed. And about that is also nice article http://nathanleclaire.com/blog/2014/02/15/how-to-wait-for-all-goroutines-to-finish-executing-before-continuing/
We have a process whereby users request files that we need to get from our source. This source isn't the most reliable so we implemented a queue using Amazon SQS. We put the download URL into the queue and then we poll it with a small app that we wrote in Go. This app simply retrieves the messages, downloads the file and then pushes it to S3 where we store it. Once all of this is complete it calls back a service which will email the user to let them know that the file is ready.
Originally I wrote this to create n channels and then attached 1 go-routine to each and had the go-routine in an infinite loop. This way I could ensure that I was only ever processing a fixed number of downloads at a time.
I realised that this isn't the way that channels are supposed to be used and, if I'm understanding correctly now, there should actually be one channel with n go-routines receiving on that channel. Each go-routine is in an infinite loop, waiting on a message and when it receives it will process the data, do everything that it's supposed to and when it's done it will wait on the next message. This allows me to ensure that I'm only ever processing n files at a time. I think this is the right way to do it. I believe this is fan-out, right?
What I don't need to do, is to merge these processes back together. Once the download is done it is calling back a remote service so that handles the remainder of the process. There is nothing else that the app needs to do.
OK, so some code:
func main() {
queue, err := ConnectToQueue() // This works fine...
if err != nil {
log.Fatalf("Could not connect to queue: %s\n", err)
}
msgChannel := make(chan sqs.Message, 10)
for i := 0; i < MAX_CONCURRENT_ROUTINES; i++ {
go processMessage(msgChannel, queue)
}
for {
response, _ := queue.ReceiveMessage(MAX_SQS_MESSAGES)
for _, m := range response.Messages {
msgChannel <- m
}
}
}
func processMessage(ch <-chan sqs.Message, queue *sqs.Queue) {
for {
m := <-ch
// Do something with message m
// Delete message from queue when we're done
queue.DeleteMessage(&m)
}
}
Am I anywhere close here? I have n running go-routines (where MAX_CONCURRENT_ROUTINES = n) and in the loop we will keep passing messages in to the single channel. Is this the right way to do it? Do I need to close anything or can I just leave this running indefinitely?
One thing that I'm noticing is that SQS is returning messages but once I've had 10 messages passed into processMessage() (10 being the size of the channel buffer) that no further messages are actually processed.
Thanks all
That looks fine. A few notes:
You can limit the work parallelism by means other than limiting the number of worker routines you spawn. For example you can create a goroutine for every message received, and then have the spawned goroutine wait for a semaphore that limits the parallelism. Of course there are tradeoffs, but you aren't limited to just the way you've described.
sem := make(chan struct{}, n)
work := func(m sqs.Message) {
sem <- struct{}{} // When there's room we can proceed
// do the work
<-sem // Free room in the channel
}()
for _, m := range queue.ReceiveMessage(MAX_SQS_MESSAGES) {
for _, m0 := range m {
go work(m0)
}
}
The limit of only 10 messages being processed is being caused elsewhere in your stack. Possibly you're seeing a race where the first 10 fill the channel, and then the work isn't completing, or perhaps you're accidentally returning from the worker routines. If your workers are persistent per the model you've described, you'll want to be certain that they don't return.
It's not clear if you want the process to return after you've processed some number of messages. If you do want this process to exit, you'll need to wait for all the workers to finish their current tasks, and probably signal them to return afterwards. Take a look at sync.WaitGroup for synchronizing their completion, and having another channel to signal that there's no more work, or close msgChannel, and handle that in your workers. (Take a look at the 2-tuple return channel receive expression.)
The reading part isn't concurrent but the processing is. I phrased the title this way because I'm most likely to search for this problem again using that phrase. :)
I'm getting a deadlock after trying to go beyond the examples so this is a learning experience for me. My goals are these:
Read a file line by line (eventually use a buffer to do groups of lines).
Pass off the text to a func() that does some regex work.
Send the results somewhere but avoid mutexes or shared variables. I'm sending ints (always the number 1) to a channel. It's sort of silly but if it's not causing problems I'd like to leave it like this unless you folks have a neater option.
Use a worker pool to do this. I'm not sure how I tell the workers to requeue themselves?
Here is the playground link. I tried to write helpful comments, hopefully this makes sense. My design could be completely wrong so don't hesitate to refactor.
package main
import (
"bufio"
"fmt"
"regexp"
"strings"
"sync"
)
func telephoneNumbersInFile(path string) int {
file := strings.NewReader(path)
var telephone = regexp.MustCompile(`\(\d+\)\s\d+-\d+`)
// do I need buffered channels here?
jobs := make(chan string)
results := make(chan int)
// I think we need a wait group, not sure.
wg := new(sync.WaitGroup)
// start up some workers that will block and wait?
for w := 1; w <= 3; w++ {
wg.Add(1)
go matchTelephoneNumbers(jobs, results, wg, telephone)
}
// go over a file line by line and queue up a ton of work
scanner := bufio.NewScanner(file)
for scanner.Scan() {
// Later I want to create a buffer of lines, not just line-by-line here ...
jobs <- scanner.Text()
}
close(jobs)
wg.Wait()
// Add up the results from the results channel.
// The rest of this isn't even working so ignore for now.
counts := 0
// for v := range results {
// counts += v
// }
return counts
}
func matchTelephoneNumbers(jobs <-chan string, results chan<- int, wg *sync.WaitGroup, telephone *regexp.Regexp) {
// Decreasing internal counter for wait-group as soon as goroutine finishes
defer wg.Done()
// eventually I want to have a []string channel to work on a chunk of lines not just one line of text
for j := range jobs {
if telephone.MatchString(j) {
results <- 1
}
}
}
func main() {
// An artificial input source. Normally this is a file passed on the command line.
const input = "Foo\n(555) 123-3456\nBar\nBaz"
numberOfTelephoneNumbers := telephoneNumbersInFile(input)
fmt.Println(numberOfTelephoneNumbers)
}
You're almost there, just need a little bit of work on goroutines' synchronisation. Your problem is that you're trying to feed the parser and collect the results in the same routine, but that can't be done.
I propose the following:
Run scanner in a separate routine, close input channel once everything is read.
Run separate routine waiting for the parsers to finish their job, than close the output channel.
Collect all the results in you main routine.
The relevant changes could look like this:
// Go over a file line by line and queue up a ton of work
go func() {
scanner := bufio.NewScanner(file)
for scanner.Scan() {
jobs <- scanner.Text()
}
close(jobs)
}()
// Collect all the results...
// First, make sure we close the result channel when everything was processed
go func() {
wg.Wait()
close(results)
}()
// Now, add up the results from the results channel until closed
counts := 0
for v := range results {
counts += v
}
Fully working example on the playground: http://play.golang.org/p/coja1_w-fY
Worth adding you don't necessarily need the WaitGroup to achieve the same, all you need to know is when to stop receiving results. This could be achieved for example by scanner advertising (on a channel) how many lines were read and then the collector reading only specified number of results (you would need to send zeros as well though).
Edit: The answer by #tomasz above is the correct one. Please disregard this answer.
You need to do two things:
use buffered chan's so that sending doesn't block
close the results chan so that receiving doesn't block.
The use of buffered channels is essential because unbuffered channels need a receive for each send, which is causing the deadlock you're hitting.
If you fix that, you'll run into a deadlock when you try to receive the results, because results hasn't been closed.
Here's the fixed playground: http://play.golang.org/p/DtS8Matgi5