Why is my Golang Channel Write Blocking Forever? - go

I've been attempting to take a swing at concurrency in Golang by refactoring one of my command-line utilities over the past few days, but I'm stuck.
Here's the original code (master branch).
Here's the branch with concurrency (x_concurrent branch).
When I execute the concurrent code with go run jira_open_comment_emailer.go, the defer wg.Done() never executes if the JIRA issue is added to the channel here, which causes my wg.Wait() to hang forever.
The idea is that I have a large amount of JIRA issues, and I want to spin off a goroutine for each one to see if it has a comment I need to respond to. If it does, I want to add it to some structure (I chose a channel after some research) that I can read from like a queue later to build up an email reminder.
Here's the relevant section of the code:
// Given an issue, determine if it has an open comment
// Returns true if there is an open comment on the issue, otherwise false
func getAndProcessComments(issue Issue, channel chan<- Issue, wg *sync.WaitGroup) {
// Decrement the wait counter when the function returns
defer wg.Done()
needsReply := false
// Loop over the comments in the issue
for _, comment := range issue.Fields.Comment.Comments {
commentMatched, err := regexp.MatchString("~"+config.JIRAUsername, comment.Body)
checkError("Failed to regex match against comment body", err)
if commentMatched {
needsReply = true
}
if comment.Author.Name == config.JIRAUsername {
needsReply = false
}
}
// Only add the issue to the channel if it needs a reply
if needsReply == true {
// This never allows the defered wg.Done() to execute?
channel <- issue
}
}
func main() {
start := time.Now()
// This retrieves all issues in a search from JIRA
allIssues := getFullIssueList()
// Initialize a wait group
var wg sync.WaitGroup
// Set the number of waits to the number of issues to process
wg.Add(len(allIssues))
// Create a channel to store issues that need a reply
channel := make(chan Issue)
for _, issue := range allIssues {
go getAndProcessComments(issue, channel, &wg)
}
// Block until all of my goroutines have processed their issues.
wg.Wait()
// Only send an email if the channel has one or more issues
if len(channel) > 0 {
sendEmail(channel)
}
fmt.Printf("Script ran in %s", time.Since(start))
}

The goroutines block on sending to the unbuffered channel.
A minimal change unblocks the goroutines is to create a buffered channel with capacity for all issues:
channel := make(chan Issue, len(allIssues))
and close the channel after the call to wg.Wait().

Related

Code executes successfully even though there are issues with WaitGroup implementation

I have this snippet of code which concurrently runs a function using an input and output channel and associated WaitGroups, but I was clued in to the fact that I've done some things wrong. Here's the code:
func main() {
concurrency := 50
var tasksWG sync.WaitGroup
tasks := make(chan string)
output := make(chan string)
for i := 0; i < concurrency; i++ {
tasksWG.Add(1)
// evidentally because I'm processing tasks in a groutine then I'm not blocking and I end up closing the tasks channel almost immediately and stopping tasks from executing
go func() {
for t := range tasks {
output <- process(t)
continue
}
tasksWG.Done()
}()
}
var outputWG sync.WaitGroup
outputWG.Add(1)
go func() {
for o := range output {
fmt.Println(o)
}
outputWG.Done()
}()
go func() {
// because of what was mentioned in the previous comment, the tasks wait group finishes almost immediately which then closes the output channel almost immediately as well which ends ranging over output early
tasksWG.Wait()
close(output)
}()
f, err := os.Open(os.Args[1])
if err != nil {
log.Panic(err)
}
s := bufio.NewScanner(f)
for s.Scan() {
tasks <- s.Text()
}
close(tasks)
// and finally the output wait group finishes almost immediately as well because tasks gets closed right away due to my improper use of goroutines
outputWG.Wait()
}
func process(t string) string {
time.Sleep(3 * time.Second)
return t
}
I've indicated in the comments where I've implementing things wrong. Now these comments make sense to me. The funny thing is that this code does indeed seem to run asynchronously and dramatically speeds up execution. I want to understand what I've done wrong but it's hard to wrap my head around it when the code seems to execute in an asynchronous way. I'd love to understand this better.
Your main goroutine is doing a couple of things sequentially and others concurrently, so I think your order of execution is off
f, err := os.Open(os.Args[1])
if err != nil {
log.Panic(err)
}
s := bufio.NewScanner(f)
for s.Scan() {
tasks <- s.Text()
}
Shouldn't you move this up top? So then you have values sent to tasks
THEN have your loop which ranges over tasks 50 times in the concurrency named for loop (you want to have something in tasks before calling code that ranges over it)
go func() {
// because of what was mentioned in the previous comment, the tasks wait group finishes almost immediately which then closes the output channel almost immediately as well which ends ranging over output early
tasksWG.Wait()
close(output)
}()
The logic here is confusing me, you're spawning a goroutine to wait on the waitgroup, so here the wait is nonblocking on the main goroutine - is that what you want to do? It won't wait for tasksWG to be decremented to zero inside main, it'll do that inside the goroutine that you've created. I don't believe you want to do that?
It might be easier to debug if you could give more details on the expected output?

How to ensure goroutines launched within goroutines are synchronized with each other?

This is the first time I'm using the concurrency features of Go and I'm jumping right into the deep end.
I want to make concurrent calls to an API. The request is based off of the tags of the posts I want to receive back (there can be 1 <= N tags). The response body looks like this:
{
"posts": [
{
"id": 1,
"author": "Name",
"authorId": 1,
"likes": num_likes,
"popularity": popularity_decimal,
"reads": num_reads,
"tags": [ "tag1", "tag2" ]
},
...
]
}
My plan is to daisy-chain a bunch of channels together and spawn a number of goroutines that read and or write from those channels:
- for each tag, add it to a tagsChannel inside a goroutine
- use that tagsChannel inside another goroutine to make concurrent GET requests to the endpoint
- for each response of that request, pass the underlying slice of posts into another goroutine
- for each individual post inside the slice of posts, add the post to a postChannel
- inside another goroutine, iterate over postChannel and insert each post into a data structure
Here's what I have so far:
func (srv *server) Get() {
// Using red-black tree prevents any duplicates, fast insertion
// and retrieval times, and is sorted already on ID.
rbt := tree.NewWithIntComparator()
// concurrent approach
tagChan := make(chan string) // tags -> tagChan
postChan := make(chan models.Post) // tagChan -> GET -> post -> postChan
errChan := make(chan error) // for synchronizing errors across goroutines
wg := &sync.WaitGroup{} // for synchronizing goroutines
wg.Add(4)
// create a go func to synchronize our wait groups
// once all goroutines are finished, we can close our errChan
go func() {
wg.Wait()
close(errChan)
}()
go insertTags(tags, tagChan, wg)
go fetch(postChan, tagChan, errChan, wg)
go addPostToTree(rbt, postChan, wg)
for err := range errChan {
if err != nil {
srv.HandleError(err, http.StatusInternalServerError).ServeHTTP(w, r)
}
}
}
// insertTags inserts user's passed-in tags to tagChan
// so that tagChan may pass those along in fetch.
func insertTags(tags []string, tagChan chan<- string, group *sync.WaitGroup) {
defer group.Done()
for _, tag := range tags {
tagChan <- tag
}
close(tagChan)
}
// fetch completes a GET request to the endpoint
func fetch(posts chan<- models.Post, tags <-chan string, errs chan<- error, group *sync.WaitGroup) {
defer group.Done()
for tag := range tags {
ep, err := formURL(tag)
if err != nil {
errs <- err
}
group.Add(1) // QUESTION should I use a separate wait group here?
go func() {
resp, err := http.Get(ep.String())
if err != nil {
errs <- err
}
container := models.PostContainer{}
err = json.NewDecoder(resp.Body).Decode(&container)
defer resp.Body.Close()
group.Add(1) // QUESTION should I add a separate wait group here and pass it to insertPosts?
go insertPosts(posts, container.Posts, group)
defer group.Done()
}()
// group.Done() -- removed this call due to Burak, but now my program hands
}
}
// insertPosts inserts each individual post into our posts channel so that they may be
// concurrently added to our RBT.
func insertPosts(posts chan<- models.Post, container []models.Post, group *sync.WaitGroup) {
defer group.Done()
for _, post := range container {
posts <- post
}
}
// addPostToTree iterates over the channel and
// inserts each individual post into our RBT,
// setting the post ID as the node's key.
func addPostToTree(tree *tree.RBT, collection <-chan models.Post, group *sync.WaitGroup) {
defer group.Done()
for post := range collection {
// ignore return value & error here:
// we don't care about the returned key and
// error is only ever if a duplicate is attempted to be added -- we don't care
tree.Insert(post.ID, post)
}
}
I'm able to make one request to the endpoint, but as soon as try to submit a second request, my program fails with panic: sync: negative WaitGroup counter.
My question is why exactly is my WaitGroup counter going negative? I make sure to add to the waitgroup and mark when my goroutines are done.
If the waitgroup is negative on the second request, then that must mean that the first time I allocate a waitgroup and add 4 to it is being skipped... why? Does this have something to do with closing channels, maybe? And if so, where do I close a channel?
Also -- does anyone have tips for debugging goroutines?
Thanks for your help.
Firstly, the whole design is quite complicated. Mentioned my thoughts towards the end.
There are 2 problems in your code:
posts channel is never closed, due to which addPostToTree might never be existing the loop, resulting in one waitGroup never decreasing (In your case the program hangs). There is a chance the program waits indefinitely with a deadlock (Thinking other goroutine will release it, but all goroutines are stuck).
Solution: You can close the postChan channel. But how? Its always recommended for the producer to always close the channel, but you've multiple producers. So the best option is, wait for all producers to finish and then close the channel. In order to wait for all producers to finish, you'll need to create another waitGroup and use that to track the child routines.
Code:
// fetch completes a GET request to the endpoint
func fetch(posts chan<- models.Post, tags <-chan string, errs chan<- error, group *sync.WaitGroup) {
postsWG := &sync.WaitGroup{}
for tag := range tags {
ep, err := formURL(tag)
if err != nil {
errs <- err
}
postsWG.Add(1) // QUESTION should I use a separate wait group here?
go func() {
resp, err := http.Get(ep.String())
if err != nil {
errs <- err
}
container := models.PostContainer{}
err = json.NewDecoder(resp.Body).Decode(&container)
defer resp.Body.Close()
go insertPosts(posts, container.Posts, postsWG)
}()
}
defer func() {
postsWG.Wait()
close(posts)
group.Done()
}()
}
Now, we've another issue, the main waitGroup should be initialized with 3 instead of 4. This is because the main routine is only spinning up 3 more routines wg.Add(3), so it has to keep track of only those. For child routines, we're using a different waitGroup, so that is not the headache of the parent anymore.
Code:
errChan := make(chan error) // for synchronizing errors across goroutines
wg := &sync.WaitGroup{} // for synchronizing goroutines
wg.Add(3)
// create a go func to synchronize our wait groups
// once all goroutines are finished, we can close our errChan
TLDR --
Complex Design - As the main wait group is started at one place, but each goroutine is modifying this waitGroup as it feels necessary. So there is no single owner for this, which makes debugging and maintaining super complex (+ can't ensure it'll be bug free).
I would recommend breaking this down and have separate trackers for each child routines. That way, the caller who is spinning up more routines can only concentrate on tracking its child goroutines. This routine will then inform its parent waitGroup only after its done (& its child is done, rather than letting the child routinue informing the grandparent directly).
Also, in fetch method after making the HTTP call and getting the response, why create another goroutine to process this data? Either way this goroutine cannot exit until the data insertion happens, nor is it doing someother action which data processing happens. From what I understand, The second goroutine is redundant.
group.Add(1) // QUESTION should I add a separate wait group here and pass it to insertPosts?
go insertPosts(posts, container.Posts, group)
defer group.Done()

Understanding the logic of WaitGroups

I'd like to understand if my logic around WaitGroups is correct and to see if there is a more efficient way of structuring my code. The aim is to perform the tasks as fast as possible.
My code populates a _urls channel which is populated via stdin. Then I'm spinning up two WaitGroups, one which reads from this _urls channel, and the other which reads from a _downloads channel, which is fed from a goroutine in the first WaitGroup.
Essentially the code looks like this:
// declare channels
_urls := make(chan string)
_downloads := make(chan string)
// first waitgroup with 2 goroutines
var wg sync.WaitGroup
for i := 0; i < concurrency; i++ {
wg.Add(2)
go func() {
defer wg.Done()
for url := range _urls {
// perform GET request and inspect the responseBody
}
}()
go func() {
defer wg.Done()
for url := range _urls {
// perform a HEAD request to look for a certain file
// if the file exists, send to the _downloads channel
_downloads <- url
}
}()
}
// second waitgroup with 1 goroutine
var dwg sync.WaitGroup
for i := 0; i < concurrency; i++ {
dwg.Add(1)
go func() {
defer wg.Done()
for url := range _downloads {
// perform the download
}
}()
}
My concern is around whether this is an efficient way to feed the _downloads channel, or would it make more sense to just perform the download within the first WaitGroup.
I did something similar to this with the worker pool pattern, https://gobyexample.com/worker-pools. Probably the right direction if you are looking to maximizing concurrency.
It abstracts jobs using go interfaces, so it could be a HEAD, GET, Download, or whatever else happens to make sense in the future. A scheduler sends jobs to a dispatcher that manages the worker pool and sends results back.
Here is a link to the README and code.
It uses wait groups to track the number of active workers, not jobs. Workers execute a for {} loop and only exit when they read true from a done channel. In this case, the use of the wait group is for a graceful shutdown. In your example, many of the workers could be doing long downloads. So your shutdown logic could wait for N jobs to be left before shutting down.
It may be overkill for your use cases.

A case of `all goroutines are asleep - deadlock!` I can't figure out why

TL;DR: A typical case of all goroutines are asleep, deadlock! but can't figure it out
I'm parsing the Wiktionary XML dump to build a DB of words. I defer the parsing of each article's text to a goroutine hoping that it will speed up the process.
It's 7GB and is processed in under 2 minutes in my machine when doing it serially, but if I can take advantage of all cores, why not.
I'm new to threading in general, I'm getting a all goroutines are asleep, deadlock! error.
What's wrong here?
This may not be performant at all, as it uses an unbuffered channel, so all goroutines effectively end up executing serially, but my idea is to learn and understand threading and to benchmark how long it takes with different alternatives:
unbuffered channel
different sized buffered channel
only calling as many goroutines at a time as there are runtime.NumCPU()
The summary of my code in pseudocode:
while tag := xml.getNextTag() {
wg.Add(1)
go parseTagText(chan, wg, tag.text)
// consume a channel message if available
select {
case msg := <-chan:
// do something with msg
default:
}
}
// reading tags finished, wait for running goroutines, consume what's left on the channel
for msg := range chan {
// do something with msg
}
// Sometimes this point is never reached, I get a deadlock
wg.Wait()
----
func parseTagText(chan, wg, tag.text) {
defer wg.Done()
// parse tag.text
chan <- whatever // just inform that the text has been parsed
}
Complete code:
https://play.golang.org/p/0t2EqptJBXE
In your complete example on the Go Playground, you:
Create a channel (line 39, results := make(chan langs)) and a wait-group (line 40, var wait sync.WaitGroup). So far so good.
Loop: in the loop, sometimes spin off a task:
if ...various conditions... {
wait.Add(1)
go parseTerm(results, &wait, text)
}
In the loop, sometimes do a non-blocking read from the channel (as shown in your question). No problem here either. But...
At the end of the loop, use:
for res := range results {
...
}
without ever calling close(results) in exactly one place, after all writers finish. This loop uses a blocking read from the channel. As long as some writer goroutine is still running, the blocking read can block without having the whole system stop, but when the last writer finishes writing and exits, there are no remaining writer goroutines. Any other remaining goroutines might rescue you, but there are none.
Since you use the var wait correctly (adding 1 in the right place, and calling Done() in the right place in the writer), the solution is to add one more goroutine, which will be the one to rescue you:
go func() {
wait.Wait()
close(results)
}()
You should spin off this rescuer goroutine just before entering the for res := range results loop. (If you spin it off any earlier, it might see the wait variable count down to zero too soon, just before it gets counted up again by spinning off another parseTerm.)
This anonymous function will block in the wait variable's Wait() function until the last writer goroutine has called the final wait.Done(), which will unblock this goroutine. Then this goroutine will call close(results), which will arrange for the for loop in your main goroutine to finish, unblocking that goroutine. When this goroutine (the rescuer) returns and thus terminates, there are no more rescuers, but we no longer need any.
(This main code then calls wait.Wait() unnecessarily: Since the for didn't terminate until the wait.Wait() in the new goroutine already unblocked, we know that this next wait.Wait() will return immediately. So we can drop this second call, although leaving it in is harmless.)
The problem is that nothing is closing the results channel, yet the range loop only exits when it closes. I've simplified your code to illustrate this and propsed a solution - basically consume the data in a goroutine:
// This is our producer
func foo(i int, ch chan int, wg *sync.WaitGroup) {
defer wg.Done()
ch <- i
fmt.Println(i, "done")
}
// This is our consumer - it uses a different WG to signal it's done
func consumeData(ch chan int, wg *sync.WaitGroup) {
defer wg.Done()
for x := range ch {
fmt.Println(x)
}
fmt.Println("ALL DONE")
}
func main() {
ch := make(chan int)
wg := sync.WaitGroup{}
// create the producers
for i := 0; i < 10; i++ {
wg.Add(1)
go foo(i, ch, &wg)
}
// create the consumer on a different goroutine, and sync using another WG
consumeWg := sync.WaitGroup{}
consumeWg.Add(1)
go consumeData(ch,&consumeWg)
wg.Wait() // <<<< means that the producers are done
close(ch) // << Signal the consumer to exit
consumeWg.Wait() // << Wait for the consumer to exit
}

Why is my code causing a stall or race condition?

For some reason, once I started adding strings through a channel in my goroutine, the code stalls when I run it. I thought that it was a scope/closure issue so I moved all code directly into the function to no avail. I have looked through Golang's documentation and all examples look similar to mine so I am kind of clueless as to what is going wrong.
func getPage(url string, c chan<- string, swg sizedwaitgroup.SizedWaitGroup) {
defer swg.Done()
doc, err := goquery.NewDocument(url)
if err != nil{
fmt.Println(err)
}
nodes := doc.Find(".v-card .info")
for i := range nodes.Nodes {
el := nodes.Eq(i)
var name string
if el.Find("h3.n span").Size() != 0{
name = el.Find("h3.n span").Text()
}else if el.Find("h3.n").Size() != 0{
name = el.Find("h3.n").Text()
}
address := el.Find(".adr").Text()
phoneNumber := el.Find(".phone.primary").Text()
website, _ := el.Find(".track-visit-website").Attr("href")
//c <- map[string] string{"name":name,"address":address,"Phone Number": phoneNumber,"website": website,};
c <- fmt.Sprint("%s%s%s%s",name,address,phoneNumber,website)
fmt.Println([]string{name,address,phoneNumber,website,})
}
}
func getNumPages(url string) int{
doc, err := goquery.NewDocument(url)
if err != nil{
fmt.Println(err);
}
pagination := strings.Split(doc.Find(".pagination p").Contents().Eq(1).Text()," ")
numItems, _ := strconv.Atoi(pagination[len(pagination)-1])
return int(math.Ceil(float64(numItems)/30))
}
func main() {
arrChan := make(chan string)
swg := sizedwaitgroup.New(8)
zips := []string{"78705","78710","78715"}
for _, item := range zips{
swg.Add()
go getPage(fmt.Sprintf(base_url,item,1),arrChan,swg)
}
swg.Wait()
}
Edit:
so I fixed it by passing sizedwaitgroup as a reference but when I remove the buffer it doesn't work does that mean that I need to know how many elements will be sent to the channel in advance?
Issue
Building off of Colin Stewart's answer, from the code you have posted, as far as I can tell, your issue is actually with reading your arrChan. You write into it, but there's no place where you read from it in your code.
From the documentation :
If the channel is unbuffered, the sender blocks until the receiver has received the value. If the channel has a buffer, the sender blocks only until the value
has been copied to the buffer; if the buffer is full, this means
waiting until some receiver has retrieved a value.
By making the channel buffered, what's happening is your code is no longer blocking on the channel write operations, the line that looks like:
c <- fmt.Sprint("%s%s%s%s",name,address,phoneNumber,website)
My guess is that if you're still hanging at when the channel has a size of 5000, it's because you have more than 5000 values returned across all of your loops over node.Nodes. Once your buffered channel is full, the operations block until the channel has space, just like if you were writing to an unbuffered channel.
Fix
Here's a minimal example showing you how you would fix something like this (basically just add a reader)
package main
import "sync"
func getPage(item string, c chan<- string) {
c <- item
}
func readChannel(c <-chan string) {
for {
<-c
}
}
func main() {
arrChan := make(chan string)
wg := sync.WaitGroup{}
zips := []string{"78705", "78710", "78715"}
for _, item := range zips {
wg.Add(1)
go func() {
defer wg.Done()
getPage(item, arrChan)
}()
}
go readChannel(arrChan) // comment this out and you'll deadlock
wg.Wait()
}
Your channel has no buffer, so writes will block until the value can be read, and at least in the code you have posted, there are no readers.
You don't need to know size to make it work. But you might in order to exit cleanly. Which can be a bit tricky to observe at time because your program will exit once your main function exits and all goroutines still running are killed immediately finished or not.
As a warm up example, change readChannel in photoionized's response to this:
func readChannel(c <-chan string) {
for {
url := <-c
fmt.Println (url)
}
}
It only adds printing to the original code. But now you'll see better what is actually happening. Notice how it usually only prints two strings when code actually writes 3. This is because code exits once all writing goroutines finish, but reading goroutine is aborted mid way as result. You can "fix" it by removing "go" before readChannel (which would be same as reading the channel in main function). And then you'll see 3 strings printed, but program crashes with a dead lock as readChannel is still reading from the channel, but nobody writes into it anymore. You can fix that too by reading exactly 3 strings in readChannel(), but that requires knowing how many strings you expect to receive.
Here is my minimal working example (I'll use it to illustrate the rest):
package main
import (
"fmt"
"sync"
)
func getPage(url string, c chan<- string, wg *sync.WaitGroup) {
defer wg.Done()
c <- fmt.Sprintf("Got page for %s\n",url)
}
func readChannel(c chan string, wg *sync.WaitGroup) {
defer wg.Done()
var url string
ok := true
for ok {
url, ok = <- c
if ok {
fmt.Printf("Received: %s\n", url)
} else {
fmt.Println("Exiting readChannel")
}
}
}
func main() {
arrChan := make(chan string)
var swg sync.WaitGroup
base_url := "http://test/%s/%d"
zips := []string{"78705","78710","78715"}
for _, item := range zips{
swg.Add(1)
go getPage(fmt.Sprintf(base_url,item,1),arrChan,&swg)
}
var wg2 sync.WaitGroup
wg2.Add(1)
go readChannel(arrChan, &wg2)
swg.Wait()
// All written, signal end to readChannel by closing the channel
close(arrChan)
wg2.Wait()
}
Here I close the channel to signal to readChannel that there is nothing left to read, so it can exit cleanly at proper time. But sometimes you might want instead to tell readChannel to read exactly 3 strings and finish. Or may be you would want to start one reader for each writer and each reader will read exactly one string... Well, there are many ways to skin a cat and choice is all yours.
Note, if you remove wg2.Wait() line your code becomes equivalent to photoionized's response and will only print two strings whilst writing 3. This is because code exits once all writers finish (ensured by swg.Wait()), but it does not wait for readChannel to finish.
If you remove close(arrChan) line instead, your code will crash with a deadlock after printing 3 lines as code waits for readChannel to finish, but readChannel waits to read from a channel which nobody is writing to anymore.
If you just remove "go" before the readChannel call, it becomes equivalent of reading from channel inside main function. It will again crash with a dead lock after printing 3 strings because readChannel is still reading when all writers have already finished (and readChannel has already read all they written). A tricky point here is that swg.Wait() line will never be reached by this code as readChannel never exits.
If you move readChannel call after the swg.Wait() then code will crash before even printing a single string. But this is a different dead lock. This time code reaches swg.Wait() and stops there waiting for writers. First writer succeeds, but channel is not buffered, so next writer blocks until someone reads from the channel the data already written. Trouble is - nobody reads from the channel yet as readChannel has not been called yet. So, it stalls and crashes with a dead lock. This particular issue can be "fixed", but making channel buffered as in make(chan string, 3) as that will allow writers to keep writing even though nobody is reading from that channel yet. And sometimes this is what you want. But here again you have to know the maximum of messages to ever be in the channel buffer. And most of the time it's only deferring a problem - just add one more writer and you are where you started - code stalls and crashes as channel buffer is full and that one extra writer is waiting for someone to read from the buffer.
Well, this should covers all bases. So, check your code and see which case is yours.

Resources