I am learning Golang concurrency and have written a program to display URL's in order. I expect the code to return
http://bing.com*
http://google.com*
But it always returns http:/google.com*** . As if the variable is being overwritten.Since i am using goroutines i would expect it to return both values at the sametime.
func check(u string) string {
tmpres := u+"*****"
return tmpres
}
func IsReachable(url string) string {
ch := make(chan string, 1)
go func() {
ch <- check(url)
}()
select {
case reachable := <-ch:
// use err and reply
return reachable
case <-time.After(3* time.Second):
// call timed out
return "none"
}
}
func main() {
var urls = []string{
"http://bing.com/",
"http://google.com/",
}
for _, url := range urls {
go func() {
fmt.Println(IsReachable(url))
}()
}
time.Sleep(1 * time.Second)
}
Two problems. First, you've created a race condition. By closing over the loop variable, you're sharing it between the thread running the loop and the thread running the goroutine, which is causing your described problem: by the time the goroutine that was started for the first URL tries to run, the value of the variable has changed. You need to either copy it to a local variable, or pass it as an argument, e.g.:
for _, url := range urls {
go func(url string) {
fmt.Println(IsReachable(url))
}(url)
}
Second, you said you wanted to display them "in order", which is not a goal generally compatible with concurrency/parallism, because you cannot control the order of parallel operations. If you want them in order, you should do them in order in a single thread. Otherwise, you'll have to collect the results, wait for all them to come back, then sort the results back into the desired order before printing them.
Related
Consider a group of check works, each of which has independent logic, so they seem to be good to run concurrently, like:
type Work struct {
// ...
}
// This Check could be quite time-consuming
func (w *Work) Check() bool {
// return succeed or not
//...
}
func CheckAll(works []*Work) {
num := len(works)
results := make(chan bool, num)
for _, w := range works {
go func(w *Work) {
results <- w.Check()
}(w)
}
for i := 0; i < num; i++ {
if r := <-results; !r {
ReportFailed()
break;
}
}
}
func ReportFailed() {
// ...
}
When concerned about the results, if the logic is no matter which one work fails, we assert all works totally fail, the remaining values in the channel are useless. Let the remaining unfinished goroutines continue to run and send results to the channel is meaningless and waste, especially when w.Check() is quite time-consuming. The ideal effect is similar to:
for _, w := range works {
if !w.Check() {
ReportFailed()
break;
}
}
This only runs necessary check works then break, but is in sequential non-concurrent scenario.
So, is it possible to cancel these unfinished goroutines, or sending to channel?
Cancelling a (blocking) send
Your original question asked how to cancel a send operation. A send on a channel is basically "instant". A send on a channel blocks if the channel's buffer is full and there is no ready receiver.
You can "cancel" this send by using a select statement and a cancel channel which you close, e.g.:
cancel := make(chan struct{})
select {
case ch <- value:
case <- cancel:
}
Closing the cancel channel with close(cancel) on another goroutine will make the above select abandon the send on ch (if it's blocking).
But as said, the send is "instant" on a "ready" channel, and the send first evaluates the value to be sent:
results <- w.Check()
This first has to run w.Check(), and once it's done, its return value will be sent on results.
Cancelling a function call
So what you really need is to cancel the w.Check() method call. For that, the idiomatic way is to pass a context.Context value which you can cancel, and w.Check() itself must monitor and "obey" this cancellation request.
See Terminating function execution if a context is cancelled
Note that your function must support this explicitly. There is no implicit termination of function calls or goroutines, see cancel a blocking operation in Go.
So your Check() should look something like this:
// This Check could be quite time-consuming
func (w *Work) Check(ctx context.Context, workDuration time.Duration) bool {
// Do your thing and monitor the context!
select {
case <-ctx.Done():
return false
case <-time.After(workDuration): // Simulate work
return true
case <-time.After(2500 * time.Millisecond): // Simulate failure after 2.5 sec
return false
}
}
And CheckAll() may look like this:
func CheckAll(works []*Work) {
ctx, cancel := context.WithCancel(context.Background())
defer cancel()
num := len(works)
results := make(chan bool, num)
wg := &sync.WaitGroup{}
for i, w := range works {
workDuration := time.Second * time.Duration(i)
wg.Add(1)
go func(w *Work) {
defer wg.Done()
result := w.Check(ctx, workDuration)
// You may check and return if context is cancelled
// so result is surely not sent, I omitted it here.
select {
case results <- result:
case <-ctx.Done():
return
}
}(w)
}
go func() {
wg.Wait()
close(results) // This allows the for range over results to terminate
}()
for result := range results {
fmt.Println("Result:", result)
if !result {
cancel()
break
}
}
}
Testing it:
CheckAll(make([]*Work, 10))
Output (try it on the Go Playground):
Result: true
Result: true
Result: true
Result: false
We get true printed 3 times (works that complete under 2.5 seconds), then the failure simulation kicks in, returns false, and terminates all other jobs.
Note that the sync.WaitGroup in the above example is not strictly needed as results has a buffer capable of holding all results, but in general it's still good practice (should you use a smaller buffer in the future).
See related: Close multiple goroutine if an error occurs in one in go
The short answer is: No.
You can not cancel or close any goroutine unless the goroutine itself reaches the return or end of its stack.
If you want to cancel something, the best approach is to pass a context.Context to them and listen to this context.Done() inside of the routine. Whenever context is canceled, you should return and the goroutine will automatically die after executing defers(if any).
package main
import "fmt"
type Work struct {
// ...
Name string
IsSuccess chan bool
}
// This Check could be quite time-consuming
func (w *Work) Check() {
// return succeed or not
//...
if len(w.Name) > 0 {
w.IsSuccess <- true
}else{
w.IsSuccess <- false
}
}
//堆排序
func main() {
works := make([]*Work,3)
works[0] = &Work{
Name: "",
IsSuccess: make(chan bool),
}
works[1] = &Work{
Name: "111",
IsSuccess: make(chan bool),
}
works[2] =&Work{
Name: "",
IsSuccess: make(chan bool),
}
for _,w := range works {
go w.Check()
}
for i,w := range works{
select {
case checkResult := <-w.IsSuccess :
fmt.Printf("index %d checkresult %t \n",i,checkResult)
}
}
}
enter image description here
I'm new to Golang, trying to build a system that fetches content from a set of urls and extract specific lines with regex. The problems start when i wrap the code with goroutines. I'm getting a different number of regex results and many of fetched lines are duplicates.
max_routines := 3
sem := make(chan int, max_routines) // to control the number of working routines
var wg sync.WaitGroup
ch_content := make(chan string)
client := http.Client{}
for i:=2; ; i++ {
// for testing
if i>5 {
break
}
// loop should be broken if feebbacks_checstr is found in content
if loop_break {
break
}
wg.Add(1)
go func(i int) {
defer wg.Done()
sem <- 1 // will block if > max_routines
final_url = url+a.tm_id+"/page="+strconv.Itoa(i)
resp, _ := client.Get(final_url)
var bodyString string
if resp.StatusCode == http.StatusOK {
bodyBytes, _ := ioutil.ReadAll(resp.Body)
bodyString = string(bodyBytes)
}
// checking for stop word in content
if false == strings.Contains(bodyString, feebbacks_checstr) {
res2 = regex.FindAllStringSubmatch(bodyString,-1)
for _,v := range res2 {
ch_content <- v[1]
}
} else {
loop_break = true
}
resp.Body.Close()
<-sem
}(i)
}
for {
select {
case r := <-ch_content:
a.feedbacks = append(a.feedbacks, r) // collecting the data
case <-time.After(500 * time.Millisecond):
show(len(a.feedbacks)) // < always different result, many entries in a.feedbacks are duplicates
fmt.Printf(".")
}
}
As a result len(a.feedbacks) gives sometimes 130, sometimes 139 and a.feedbacks contains duplicates. If i clean the duplicates the number of results is about half of what i'm expecting (109 without duplicates)
You're creating a closure by using an anonymous go routine function. I notice your final_url isn't := but = which means it's defined outside the closure. All go routines will have access to the same value of final_url and there's a race condition going on. Some go routines are overwriting final_url before other go routines are making their requests and this will result in duplicates.
If you define final_url inside the go routine then they won't be stepping on each other's toes and it should work as you expect.
That's the simple fix for what you have. A more idiomatically Go way to do this would be to create an input channel (containing the URLs to request) and an output channel (eventually containing whatever you're pulling out of the response) and instead of trying to manage the life and death of dozens of go routines you would keep alive a constant amount of go routines that try to empty out the input channel.
I created a tool in Go that could bruteforce subdomains using Go concurrency. The problem is that it just shows first few results. I mean if the threads i specify are 10, it shows 10, if 100 then it shows 100. Any solutions for this. I am following this example.
func CheckWildcardSubdomain(state *State, domain string, words <-chan string, wg *sync.WaitGroup) {
defer wg.Done()
for {
preparedSubdomain := <-words + "." + domain
ipAddress, err := net.LookupHost(preparedSubdomain)
if err == nil {
if !state.WildcardIPs.ContainsAny(ipAddress) {
if state.Verbose == true {
fmt.Printf("\n%s", preparedSubdomain)
}
state.FinalResults = append(state.FinalResults, preparedSubdomain)
}
}
}
}
func RemoveWildcardSubdomains(state *State, subdomains []string) []string {
var wg sync.WaitGroup
var channel = make(chan string)
wg.Add(state.Threads)
for i := 0; i < state.Threads; i++ {
go CheckWildcardSubdomain(state, state.Domain, channel, &wg)
}
for _, entry := range subdomains {
sub := strings.Join(strings.Split(entry, ".")[:2][:], ".")
channel <- sub
}
close(channel)
wg.Wait()
return state.FinalResults
}
Thanks in advance for your help.
2 mistakes that immediately stand out.
First, in CheckWildcardSubdomain() you should range over the words channel like this:
for word := range words {
preparedSubdomain := word + "." + domain
// ...
}
The for ... range over a channel will terminate once all values sent on the channel (sent before the channel was closed) are received. Note that the simple receive operator will not terminate nor panic if the channel is closed, instead it will yield the zero value of the channel's element type. So your original loop would never terminate. Spec: Receive operator:
A receive operation on a closed channel can always proceed immediately, yielding the element type's zero value after any previously sent values have been received.
Second, inside CheckWildcardSubdomain() the state.FinalResults field is read / modified concurrently, without synchronization. This is undefined behavior.
You must synchronize access to this field, e.g. using a mutex, or you should find other ways to communicate and collect results, e.g. using a channel.
See this related question for an elegant, efficient and scalabe way to do it:
Is this an idiomatic worker thread pool in Go?
I'm trying to understand the difference in Go between creating an anonymous function which takes a parameter, versus having that function act as a closure. Here is an example of the difference.
With parameter:
func main() {
done := make(chan bool, 1)
go func(c chan bool) {
time.Sleep(50 * time.Millisecond)
c <- true
}(done)
<-done
}
As closure:
func main() {
done := make(chan bool, 1)
go func() {
time.Sleep(50 * time.Millisecond)
done <- true
}()
<-done
}
My question is, when is the first form better than the second? Would you ever use a parameter for this kind of thing? The only time I can see the first form being useful is when returning a func(x, y) from another function.
The difference between using a closure vs using a function parameter has to do with sharing the same variable vs getting a copy of the value. Consider these two examples below.
In the Closure all function calls will use the value stored in i. This value will most likely already reach 3 before any of the goroutines has had time to print it's value.
In the Parameter example each function call will get passed a copy of the value of i when the call was made, thus giving us the result we more likely wanted:
Closure:
for i := 0; i < 3; i++ {
go func() {
fmt.Println(i)
}()
}
Result:
3
3
3
Parameter:
for i := 0; i < 3; i++ {
go func(v int) {
fmt.Println(v)
}(i)
}
Result:
0
1
2
Playground: http://play.golang.org/p/T5rHrIKrQv
When to use parameters
Definitely the first form is preferred if you plan to change the value of the variable which you don't want to observe in the function.
This is the typical case when the anonymous function is inside a for loop and you intend to use the loop's variables, for example:
for i := 0; i < 10; i++ {
go func(i int) {
fmt.Println(i)
}(i)
}
Without passing the variable i you might observe printing 10 ten times. With passing i, you will observe numbers printed from 0 to 9.
When not to use parameters
If you don't want to change the value of the variable, it is cheaper not to pass it and thus not create another copy of it. This is especially true for large structs. Although if you later alter the code and modify the variable, you may easily forget to check its effect on the closure and get unexpected results.
Also there might be cases when you do want to observe changes made to "outer" variables, such as:
func GetRes(name string) (Res, error) {
res, err := somepack.OpenRes(name)
if err != nil {
return nil, err
}
closeres := true
defer func() {
if closeres {
res.Close()
}
}()
// Do other stuff
if err = otherStuff(); err != nil {
return nil, err // res will be closed
}
// Everything went well, return res, but
// res must not be closed, it will be the responsibility of the caller
closeres = false
return res, nil // res will not be closed
}
In this case the GetRes() is to open some resource. But before returning it other things have to be done which might also fail. If those fail, res must be closed and not returned. If everything goes well, res must not be closed and returned.
This is a example of parameter from net/Listen
package main
import (
"io"
"log"
"net"
)
func main() {
// Listen on TCP port 2000 on all available unicast and
// anycast IP addresses of the local system.
l, err := net.Listen("tcp", ":2000")
if err != nil {
log.Fatal(err)
}
defer l.Close()
for {
// Wait for a connection.
conn, err := l.Accept()
if err != nil {
log.Fatal(err)
}
// Handle the connection in a new goroutine.
// The loop then returns to accepting, so that
// multiple connections may be served concurrently.
go func(c net.Conn) {
// Echo all incoming data.
io.Copy(c, c)
// Shut down the connection.
c.Close()
}(conn)
}
}
I am using goroutines/channels to check if list of urls are reachable. Here is my code. This seems to always return true. Why is the timeout case not getting executed? The goal is to return false even if one of the urls is not reachable
import "fmt"
import "time"
func check(u string) bool {
time.Sleep(4 * time.Second)
return true
}
func IsReachable(urls []string) bool {
ch := make(chan bool, 1)
for _, url := range urls {
go func(u string) {
select {
case ch <- check(u):
case <-time.After(time.Second):
ch<-false
}
}(url)
}
return <-ch
}
func main() {
fmt.Println(IsReachable([]string{"url1"}))
}
check(u) will sleep in the current goroutine, i.e. the one that's running func. The select statement is only run properly once it returns, and by that time, both branches are runnable and the runtime can pick whichever one it pleases.
You can solve it by running check inside yet another goroutine:
package main
import "fmt"
import "time"
func check(u string, checked chan<- bool) {
time.Sleep(4 * time.Second)
checked <- true
}
func IsReachable(urls []string) bool {
ch := make(chan bool, 1)
for _, url := range urls {
go func(u string) {
checked := make(chan bool)
go check(u, checked)
select {
case ret := <-checked:
ch <- ret
case <-time.After(1 * time.Second):
ch <- false
}
}(url)
}
return <-ch
}
func main() {
fmt.Println(IsReachable([]string{"url1"}))
}
It seems you want to check reachability of a set of URLs, and return true if one of them is available. If the timeout is long compared to the time it takes to spin up a goroutine, you could simplify this by having just one timeout for all URLs together. But we need to make sure that the channel is large enough to hold the answers from all checks, or the ones that don't "win" will block forever:
package main
import "fmt"
import "time"
func check(u string, ch chan<- bool) {
time.Sleep(4 * time.Second)
ch <- true
}
func IsReachable(urls []string) bool {
ch := make(chan bool, len(urls))
for _, url := range urls {
go check(url, ch)
}
time.AfterFunc(time.Second, func() { ch <- false })
return <-ch
}
func main() {
fmt.Println(IsReachable([]string{"url1", "url2"}))
}
The reason this always returns true is you are calling check(u) within your select statement. You need to call it within a go routine and then use a select to either wait for the result or timeout.
In case you want to check the reachability of multiple URLs in parallel you need to restructure your code.
First create a function which checks the reachability of one URL:
func IsReachable(url string) bool {
ch := make(chan bool, 1)
go func() { ch <- check(url) }()
select {
case reachable := <-ch:
return reachable
case <-time.After(time.Second):
// call timed out
return false
}
}
Then call this function from a loop:
urls := []string{"url1", "url2", "url3"}
for _, url := range urls {
go func() { fmt.Println(IsReachable(url)) }()
}
Play
change the line
ch := make(chan bool, 1)
to
ch := make(chan bool)
You did open a asynchronous (= non blocking) channel, but you need a blocking channel to get it work.
The result of true being returned here is deterministic in this scenario, it's not a random one picked up by the runtime, because there's only true value available (however long it may take for it to become available!) being sent into the channel, the false result would never be available for the channel since the time.After() call statement would never get the chance to be executed in the first place!
In this select, the first executable line it sees is check(u) call, not the channel sending call in the first case branch, or any other call at all! And it's only after this first check(u) execution has returned here, would select branch cases get checked and called upon, by which point, the value of true is already to be pushed into the first branch case channel, so no channel blocking here to the select statement, the select can fulfil its purpose promptly here without needing to check its remaining branch cases!
so looks like it's the use of select here that wouldn't seem quite correct in this scenario.
the select branch cases are supposed to listen to channel sending and receiving values directly, or optionally with a default to escape the blocking when necessary.
so the fix is as some people pointed out here already, putting the long running task or process into a separate goroutine, and have it send result into channel,
and then in the main goroutine (or whichever other routine that needs that value off the channel), use the select branch cases to either listen on that specific channel for a value, or on the channel provided by the time.After(time.Second) call.
Basically, this line: case ch <- check(u) is correct in the sense of sending a value into a channel, but it's just not for its intended use (i.e. blocking this branch case), because the case channel<- is not being blocked there at all (the time check(u) spends on is all happening before the channel gets involved), since in a separate goroutine, aka, the main one: return <-ch, it's already ready to read that value whenever it gets pushed through. That is why time.After() call statement in the second case branch would never even get a chance to be evaluated, in the first instance!
see this example for a simple solution, ie. the correct use of a select in conjunction of separate goroutines: https://gobyexample.com/timeouts
In case it's useful, here's a generalised version of #Thomas 's answer, much simplified by #mh-cbon
func WithTimeout(delegate func() interface{}, timeout time.Duration) (ret interface{}, ok bool) {
ch := make(chan interface{}, 1) // buffered
go func() { ch <- delegate() }()
select {
case ret = <-ch:
return ret, true
case <-time.After(timeout):
}
return nil, false
}
Then you can call to 'timeout' any function
if value,ok := WithTimeout(myFunc, time.Second); ok {
// returned
} else {
// didn't return
}
Call like this to wait for a channel
if value,ok := WithTimeout(func()interface{}{return <- inbox}, time.Second); ok {
// returned
} else {
// didn't return
}
Like this to try sending
_,ok = WithTimeout(func()interface{}{outbox <- myValue; return nil}, time.Second)
if !ok{...