Why doesn't Go subroutine get executed [duplicate] - go

This question already has answers here:
No output from goroutine
(3 answers)
Closed 5 years ago.
I'm learning go, following the online tutorial "A tour of Go".
In this excercise: https://tour.golang.org/concurrency/10
Before going on to solve the problem, I wanted to try something simple:
func Crawl(url string, depth int, fetcher Fetcher) {
fmt.Println("Hello from Crawl")
if depth <= 0 {
return
}
body, urls, err := fetcher.Fetch(url)
if err != nil {
fmt.Println(err)
return
}
fmt.Printf("found: %s %q\n", url, body)
for _, u := range urls {
fmt.Println("in loop with u = %s", u)
go Crawl(u, depth-1, fetcher) //I added "go" here
}
}
The only thing I added is the go command right before the recursive call to Crawl. I expected it should not change much to the behavior.
However the printout is:
Hello from Crawl
found: http://golang.org/ "The Go Programming Language"
in loop with u = http://golang.org/pkg/
in loop with u = http://golang.org/cmd/
I expected to see Hello from Crawl for each iteration of the loop.
Why doesn't my Crawl subroutine get executed?

Your goroutines started, but ended before doing what you want to do because your main() finished. Execution of goroutines is independent of the main program like threads, but will be terminated when program stops. So you need a WaitGroup to wait for the goroutines to finish their jobs or simply call time.Sleep() to make the main program sleep for a while.

There is nothing to make sure that all your go routines ends before your program finishes, I'd refactor your code to something similar to this:
func Crawl(url string, depth int, fetcher Fetcher) {
fmt.Println("Hello from Crawl")
if depth <= 0 {
return
}
body, urls, err := fetcher.Fetch(url)
if err != nil {
fmt.Println(err)
return
}
fmt.Printf("found: %s %q\n", url, body)
// Adding waiting group to make sure go routines finishes
wg := sync.WaitGroup{}
wg.Add(len(urls))
for _, u := range urls {
fmt.Println("in loop with u = %s", u)
go func() {
defer wg.Done()
Crawl(u, depth-1, fetcher)
}()
}
wg.Wait()
}

Related

How to efficiently close two goroutines?

I'm using two concurrent goroutines to copy stdin/stdout from my terminal to a net.Conn target. For some reason, I can't manage to completely stop the two go routines without getting a panic error (for trying to close a closed connection). This is my code:
func interact(c net.Conn, sessionMap map[int]net.Conn) {
quit := make(chan bool) //the channel to quit
copy := func(r io.ReadCloser, w io.WriteCloser) {
defer func() {
r.Close()
w.Close()
close(quit) //this is how i'm trying to close it
}()
_, err := io.Copy(w, r)
if err != nil {
//
}
}
go func() {
for {
select {
case <-quit:
return
default:
copy(c, os.Stdout)
}
}
}()
go func() {
for {
select {
case <-quit:
return
default:
copy(os.Stdin, c)
}
}
}()
}
This errors as panic: close of closed channel
I want to terminate the two go routines, and then normally proceed to another function. What am I doing wrong?
You can't call close on a channel more than once, there's no reason to call copy in a for loop, since it can only operate one time, and you're copying in the wrong direction, writing to stdin and reading from stdout.
Simply asking how to quit 2 goroutines is simple, but that's not the only thing you need to do here. Since io.Copy is blocking, you don't need the extra synchronization to determine when the call is complete. This lets you simplify the code significantly, which will make it a lot easier to reason about.
func interact(c net.Conn) {
go func() {
// You want to close this outside the goroutine if you
// expect to send data back over a half-closed connection
defer c.Close()
// Optionally close stdout here if you need to signal the
// end of the stream in a pipeline.
defer os.Stdout.Close()
_, err := io.Copy(os.Stdout, c)
if err != nil {
//
}
}()
_, err := io.Copy(c, os.Stdin)
if err != nil {
//
}
}
Also note that you may not be able to break out of the io.Copy from stdin, so you can't expect the interact function to return. Manually doing the io.Copy in the function body and checking for a half-closed connection on every loop may be a good idea, then you can break out sooner and ensure that you fully close the net.Conn.
Also could be like this
func scanReader(quit chan int, r io.Reader) chan string {
line := make(chan string)
go func(quit chan int) {
defer close(line)
scan := bufio.NewScanner(r)
for scan.Scan() {
select {
case <- quit:
return
default:
s := scan.Text()
line <- s
}
}
}(quit)
return line
}
stdIn := scanReader(quit, os.Stdin)
conIn := scanReader(quit, c)
for {
select {
case <-quit:
return
case l <- stdIn:
_, e := fmt.Fprintf(c, l)
if e != nil {
quit <- 1
return
}
case l <- conIn:
fmt.Println(l)
}
}

Go routines not executing

The following is the code that is giving me problem. What i want to achieve is to create those many tables in parallel. After all the tables are created I want to exit the functions.
func someFunction(){
....
gos := 5
proc := make(chan bool, gos)
allDone := make(chan bool)
for i:=0; i<gos; i++ {
go func() {
for j:=i; j<len(tables); j+=gos {
r, err := db.Exec(tables[j])
fmt.Println(r)
if err != nil {
methods.CheckErr(err, err.Error())
}
}
proc <- true
}()
}
go func() {
for i:=0; i<gos; i++{
<-proc
}
allDone <- true
}()
for {
select {
case <-allDone:
return
}
}
}
I'm creating two channels 1 to keep track of number of tables created (proc) and other (allDone) to see if all are done.
When i run this code then the go routine to create table starts execution but before it completes someFunction gets terminated.
However there is no problem if run the code sequentially
What is the mistake in my design pattern and also how do i correct it.
The usual pattern for what you're trying to achieve uses WaitGroup.
I think the problem you're facing is that i is captured by each goroutine and it keeps getting incremented by the outer loop. Your inner loop starts at i and since the outer loop has continued, each goroutine starts at 5.
Try passing the iterator as parameter to the goroutine so that you get a new copy each time.
func someFunction(){
....
gos := 5
var wg sync.WaitGroup
wg.Add(gos)
for i:=0; i< gos; i++ {
go func(n int) {
defer wg.Done()
for j:=n; j<len(tables); j+=gos {
r, err := db.Exec(tables[j])
fmt.Println(r)
if err != nil {
methods.CheckErr(err, err.Error())
}
}
}(i)
}
wg.Wait();
}
I'm not sure what you're trying to achieve here, each goroutine does db.Exec on all the tables above the one it started with so the first one treats all the tables, the second one treats all but the first one and so on. Is this what you intended?

What's the best practice to synchronise channels and wait groups?

What's the best practice to synchronise wait groups and channels? I want to handle messages and block on a loop, and it appears that delegating the closing of the channel to another go routine seems to be a weird solution?
func Crawl(url string, depth int, fetcher Fetcher) {
ch := make(chan string)
var waitGroup sync.WaitGroup
waitGroup.Add(1)
go crawlTask(&waitGroup, ch, url, depth, fetcher)
go func() {
waitGroup.Wait()
close(ch)
}()
for message := range ch {
// I want to handle the messages here
fmt.Println(message)
}
}
func crawlTask(waitGroup *sync.WaitGroup, ch chan string, url string, depth int, fetcher Fetcher) {
defer waitGroup.Done()
if depth <= 0 {
return
}
body, urls, err := fetcher.Fetch(url)
if err != nil {
ch <- err.Error()
return
}
ch <- fmt.Sprintf("found: %s %q\n", url, body)
for _, u := range urls {
waitGroup.Add(1)
go crawlTask(waitGroup, ch, u, depth-1, fetcher)
}
}
func main() {
Crawl("http://golang.org/", 4, fetcher)
}
// truncated from https://tour.golang.org/concurrency/10 webCrawler
As an alternative to using a waitgroup and extra goroutine, you can use a separate channel for ending goroutines.
This is (also) idiomatic in Go. It involves blocking using a select control group.
So you'd have to make a new channel, typically with an empty struct as it's value (eg closeChan := make(chan struct{}) which, when closed (close(closeChan)) would end the goroutine itself.
Instead of ranging over a chan, you can use a select to block until either fed data or closed.
The code in Crawl could look something like this:
for { // instead of ranging over a to-be closed chan
select {
case message := <-ch:
// handle message
case <-closeChan:
break // exit goroutine, can use return instead
}
}
And then in crawlTask, you could close the closeChan (passed in as another parameter, like ch when you return (I figure that's when you want the other goroutine to end, and stop handling messages?)
if depth <= 0 {
close(closeChan)
return
}
Using a separate 'closer' go-routine prevents a deadlock.
If the wait/close operation were in the main go-routine before the for-range loop, it would never end, because all of the 'worker' go-routines would block in the absence of a receiver on the channel. And if it were placed in the main go-routine after the for-range loop, it would be unreachable, because the loop would block with no-one to close the channel.
This explanation was borrowed from 'The Go Programming Language' book (8.5 Looping in parallel).

A channel is closed but all goroutines are asleep - deadlock

Yes it looks like one of the most duplicated questions on StackOverflow but please take a few minutes for the question.
func _Crawl(url string, fetcher Fetcher, ch chan []string) {
if store.Read(url) == true {
return
} else {
store.Write(url)
}
body, urls, err := fetcher.Fetch(url)
if err != nil {
fmt.Printf("not found: %s\n", url)
}
fmt.Printf("found: %s %q\n", url, body)
ch <- urls
}
func Crawl(url string, fetcher Fetcher) {
UrlChannel := make(chan []string, 4)
go _Crawl(url, fetcher, UrlChannel)
for urls, ok := <- UrlChannel; ok; urls, ok = <- UrlChannel{
for _, i := range urls {
go _Crawl(i, fetcher, UrlChannel)
}
}
close(UrlChannel) //The channel is closed.
}
func main() {
Crawl("http://golang.org/", fetcher)
}
I'm closing the channel after the loop ended. The program returns correct results but raises the error at the end:
fatal error: all goroutines are asleep - deadlock!
goroutine 1 [chan receive]:
main.Crawl(0x113a2f, 0x12, 0x1800d0, 0x10432220)
/tmp/sandbox854979773/main.go:55 +0x220
main.main()
/tmp/sandbox854979773/main.go:61 +0x60
What is wrong with my goroutines?
Well after a first look, you can do a shorter for just using range like:
for urls := range UrlChannel { ... }
it will iterate until the channel is closed and it looks much better.
Also you have an early return in the first if of your function _Crawl(), so if that first condition is true the function will end and nothing will passed to the channel, so the code what is receiving from that channel will wait forever.
Other thing, inside your second for your're creating goroutines for each url, but you're not waiting for them and actually those goroutines will try to send something to a closed channel. It seems that this is not happening because in this case the code will panic, you can use a WaitGroup for this.
In resume you've a code with several possible dead lock conditions.
||| Super Edit |||:
I should write you that your code is a kinda messy and the solution may be a simple WaitGroup, but I was afraid of make you feel bad 'cause I found too many issues, but if you really want to learn how to write concurrent code you should think first in a code or pseudo-code without concurrency at all and then try to add the magic.
In your case what I see is a recursive solution since the url are fetched from a HTML document in form of tree, would be something like a DFS:
func crawl(url string, fetcher Fetcher) {
// if we've visited this url just stop the recursion
if store.Read(url) == true {
return
}
store.Write(url)
body, urls, err := fetcher.Fetch(url)
if err != nil {
fmt.Printf("not found: %s\n", url)
return // early return if there's no urls there's no reason to continue
}
fmt.Printf("found: %s %q\n", url, body)
// this part will change !!
// ...
for _, url := range urls {
crawl(url, fetcher)
}
//
}
func main() {
crawl("http://golang.org", fetcher)
}
now the second step identify concurrent code, easy in this case since each url can be fetched concurrently (sometimes in parallel), all we have to add is a WaitGroup and create a goroutine for each url, now just have to update only the for to fetch urls (it's only the for block):
// this code will be in the comment: "this part will change !!"
//
// this var is just a thread-safe counter
var wg sync.WaitGroup
// set the WaitGroup counter with the len of urls slice
wg.Add(len(urls))
for _, url := range urls {
// it's very important pass the url as a parameter
// because the var url changes for each loop (url := range)
go func(u string) {
// Decrement the counter (-1) when the goroutine completes
defer wg.Done()
crawl(u, fetcher)
}(url)
}
wg.Wait() // wait for all your goroutines
// ...
Future considerations, maybe you want to control the number of goroutines (or workers) for that you have to use something like Fan In or Fan Out, you can find more here:
https://blog.golang.org/advanced-go-concurrency-patterns
and
https://blog.golang.org/pipelines
But don't be afraid of create thousands of goroutines in Go they're very cheap
Note: I haven't compiled the code, maybe has a little bug somewhere :)
Both solutions described above and range loop through a channel have the same issue. The issue is that a loop will be ended after a channel be closed but a channel be closed after the loop be ended. So we need to know when to close the opened channel. I believe we need to count started jobs (goroutines). But in this case I just lost a counter variable. Since it is a Tour exercise it shouldn't be complicated.
func _Crawl(url string, fetcher Fetcher, ch chan []string) {
if store.Read(url) == false {
store.Write(url)
body, urls, err := fetcher.Fetch(url)
if err != nil {
fmt.Printf("not found: %s\n", url)
} else {
fmt.Printf("found: %s %q\n", url, body)
}
ch <- urls
}
}
func Crawl(url string, depth int, fetcher Fetcher) {
UrlChannel := make(chan []string, 4)
go _Crawl(url, fetcher, UrlChannel)
for urls := range UrlChannel {
for _, url := range urls {
go _Crawl(url, fetcher, UrlChannel)
}
depth--
if depth < 0 {
close(UrlChannel)
}
}
}

What am I missing on concurrency?

I have a very simple script that makes a get request and then does some thing with the response. I have 2 version one using a go routine and one without I bencharmaked both and there was no difference in speed. Here is a dumb down version of what I'm doing:
Regular Version:
func main() {
url := "http://finance.yahoo.com/q?s=aapl"
for i := 0; i < 250; i++ {
resp, err := http.Get(url)
if err != nil {
fmt.Println(err)
}
fmt.Println(resp.Status)
}
}
Go Routine:
func main() {
url := "http://finance.yahoo.com/q?s=aapl"
for i := 0; i < 250; i++ {
wg.Add(1)
go run(url, &wg)
wg.Wait()
}
}
func run(url string, wg *sync.WaitGroup) {
defer wg.Done()
resp, err := http.Get(url)
if err != nil {
fmt.Println(err)
}
fmt.Println(resp.Status)
}
In most cases when I used a go routine the program took longer to execute. What concept am I missing to understand using concurrency efficiently?
The main problem with your example is that you're calling wg.Wait() within the for loop. This causes execution to block until you the deferred wg.Done() call inside of run. As a result, the execution isn't concurrent, it happens in a goroutine but you block after starting goroutine i and before starting i+1. If you place that statement after the loop instead like below then your code won't block until after the loop (all goroutines have been started, some may have already completed).
func main() {
url := "http://finance.yahoo.com/q?s=aapl"
for i := 0; i < 250; i++ {
wg.Add(1)
go run(url, &wg)
// wg.Wait() don't wait here cause it serializes execution
}
wg.Wait() // wait here, now that all goroutines have been started
}

Resources