I have a goroutine with a switch which want to append an interface to a struct but when a run it I don't receive an error but it not append any response
How do you write that in Go to make it concurrency-safe?
This is my code:
var wg sync.WaitGroup
for _, v := range inputParameters.Entities {
go func(v domain.Entity) {
wg.Add(1)
defer wg.Done()
var f func(
v domain.Entity,
result *domain.Resoponse,
)(interface{}, Error) // Signature of all Get methods
switch v.Name {
case "process1":
f = 1Processor{}.Get
case "process2":
f = 2Processor{}.Get
case "process3":
f = 3Processor{}.Get
default:
return
}
res, err := f(v, result)
if err != nil {
mapError.Error = append(mapError.Error, err)
} else {
result.Mu.Lock()
defer result.Mu.Unlock()
result.Entities = append(result.Entities, res)
}
}(v)
}
wg.Wait()
return result, mapError
For reference, here is the Response type:
type Resoponse struct {
Mu sync.Mutex
Entities []interface{}
}
Do wg.Add(1) just before the goroutine. There's no guarantee that any of the logic within the goroutines gets done before you reach wg.Wait() so don't put the wg.Add(1)s in the goroutines.
Related
I have a function works before using goroutine:
res, err := example(a , b)
if err != nil {
return Response{
ErrCode: 1,
ErrMsg:"error",
}
}
Response is a struct defined error info. When I use goroutine:
var wg sync.WaitGroup()
wg.Add(1)
go func(){
defer wg.Done()
res, err := example(a , b)
if err != nil {
return Response{
ErrCode: 1,
ErrMsg:"error",
}
}()
wg.Wait()
Then I got
too many arguments to return
have (Response)
want ()
You need to use channel to achieve what you want:
func main() {
c := make(chan Response)
go func() {
res, err := example(a , b)
if err != nil {
c <- Response{
ErrCode: 1,
ErrMsg:"error",
}
}
}()
value := <-c
}
The function you provide to span a go routine has no return in its signature. Go routines cannot return data. Running goroutine (asynchronously) and fetch return value from function are essentially contradictory actions. Simply put, goroutine cannot know where to return the data. Hence it doesnot allow it.
You can do something like this:
var response Response
var wg sync.WaitGroup
wg.Add(1)
go func() {
defer wg.Done()
res, err := example(a, b)
if err != nil {
response = res
}
}()
wg.Wait()
return response
I try to build concurrent crawler based on Tour and some others SO answers regarding that. What I have currently is below but I think I have here two subtle issues.
Sometimes I get 16 urls in response and sometimes 17 (debug print in main). I know it because when I even change WriteToSlice to Read then in Read sometimes 'Read: end, counter = ' is never reached and it's always when I get 16 urls.
I have troubles with err channel, I get no messages in this channel, even when I run my main Crawl method with address like www.golang.org so without valid schema error should be send via err channel
Concurrency is really difficult topic, help and advice will be appreciated
package main
import (
"fmt"
"net/http"
"sync"
"golang.org/x/net/html"
)
type urlCache struct {
urls map[string]struct{}
sync.Mutex
}
func (v *urlCache) Set(url string) bool {
v.Lock()
defer v.Unlock()
_, exist := v.urls[url]
v.urls[url] = struct{}{}
return !exist
}
func newURLCache() *urlCache {
return &urlCache{
urls: make(map[string]struct{}),
}
}
type results struct {
data chan string
err chan error
}
func newResults() *results {
return &results{
data: make(chan string, 1),
err: make(chan error, 1),
}
}
func (r *results) close() {
close(r.data)
close(r.err)
}
func (r *results) WriteToSlice(s *[]string) {
for {
select {
case data := <-r.data:
*s = append(*s, data)
case err := <-r.err:
fmt.Println("e ", err)
}
}
}
func (r *results) Read() {
fmt.Println("Read: start")
counter := 0
for c := range r.data {
fmt.Println(c)
counter++
}
fmt.Println("Read: end, counter = ", counter)
}
func crawl(url string, depth int, wg *sync.WaitGroup, cache *urlCache, res *results) {
defer wg.Done()
if depth == 0 || !cache.Set(url) {
return
}
response, err := http.Get(url)
if err != nil {
res.err <- err
return
}
defer response.Body.Close()
node, err := html.Parse(response.Body)
if err != nil {
res.err <- err
return
}
urls := grablUrls(response, node)
res.data <- url
for _, url := range urls {
wg.Add(1)
go crawl(url, depth-1, wg, cache, res)
}
}
func grablUrls(resp *http.Response, node *html.Node) []string {
var f func(*html.Node) []string
var results []string
f = func(n *html.Node) []string {
if n.Type == html.ElementNode && n.Data == "a" {
for _, a := range n.Attr {
if a.Key != "href" {
continue
}
link, err := resp.Request.URL.Parse(a.Val)
if err != nil {
continue
}
results = append(results, link.String())
}
}
for c := n.FirstChild; c != nil; c = c.NextSibling {
f(c)
}
return results
}
res := f(node)
return res
}
// Crawl ...
func Crawl(url string, depth int) []string {
wg := &sync.WaitGroup{}
output := &[]string{}
visited := newURLCache()
results := newResults()
defer results.close()
wg.Add(1)
go crawl(url, depth, wg, visited, results)
go results.WriteToSlice(output)
// go results.Read()
wg.Wait()
return *output
}
func main() {
r := Crawl("https://www.golang.org", 2)
// r := Crawl("www.golang.org", 2) // no schema, error should be generated and send via err
fmt.Println(len(r))
}
Both your questions 1 and 2 are a result of the same bug.
In Crawl() you are not waiting for this go routine to finish: go results.WriteToSlice(output). On the last crawl() function, the wait group is released, the output is returned and printed before the WriteToSlice function finishes with the data and err channel. So what has happened is this:
crawl() finishes, placing data in results.data and results.err.
Waitgroup wait() unblocks, causing main() to print the length of the result []string
WriteToSlice adds the last data (or err) item to the channel
You need to return from Crawl() not only when the data is done being written to the channel, but also when the channel is done being read in it's entirety (including the buffer). A good way to do this is close channels when you are sure that you are done with them. By organizing your code this way, you can block on the go routine that is draining the channels, and instead of using the wait group to release to main, you wait until the channels are 100% done.
You can see this gobyexample https://gobyexample.com/closing-channels. Remember that when you close a channel, the channel can still be used until the last item is taken. So you can close a buffered channel, and the reader will still get all the items that were queued in the channel.
There is some code structure that can change to make this cleaner, but here is a quick way to fix your program. Change Crawl to block on WriteToSlice. Close the data channel when the crawl function finishes, and wait for WriteToSlice to finish.
// Crawl ...
func Crawl(url string, depth int) []string {
wg := &sync.WaitGroup{}
output := &[]string{}
visited := newURLCache()
results := newResults()
go func() {
wg.Add(1)
go crawl(url, depth, wg, visited, results)
wg.Wait()
// All data is written, this makes `WriteToSlice()` unblock
close(results.data)
}()
// This will block until results.data is closed
results.WriteToSlice(output)
close(results.err)
return *output
}
Then on write to slice, you have to check for the closed channel to exit the for loop:
func (r *results) WriteToSlice(s *[]string) {
for {
select {
case data, open := <-r.data:
if !open {
return // All data done
}
*s = append(*s, data)
case err := <-r.err:
fmt.Println("e ", err)
}
}
}
Here is the full code: https://play.golang.org/p/GBpGk-lzrhd (it won't work in the playground)
I want to write the contents of the process function with a function different from the main function.
I want to count the number of urls that did not result in an error with "resp, err: = client.Do (req)".
I want to write the number of successes before fmt.Println ("Finish!") In the main function.
What should I do?
func main() {
site_list := [][]string{
{"site1","https://www.aaaa"},
{"site2","https://www.bbbb"},
{"site3","https://www.cccc"},
{"site4","https://www.dddd"},
}
var wg sync.WaitGroup
maxChan := make(chan bool, 2)
for _, v := range site_list {
maxChan <- true
wg.Add(1)
go process(v[1], maxChan, &wg)
}
wg.Wait()
fmt.Println("Finish!")
}
func process(site_url string, maxChan chan bool, wg *sync.WaitGroup) {
defer wg.Done()
defer func(maxChan chan bool) {
<-maxChan
}(maxChan)
req, err := http.NewRequest("GET", site_url, nil)
client := new(http.Client)
resp, err := client.Do(req)
if err != nil {
fmt.Println("client.Doエラー" + site_url)
return
}
defer resp.Body.Close()
}
As per my comment above, create a custom work item struct:
type urlItem struct {
Name string
Url string
Err error
}
Use pointers, so that the process() function can update the underlying struct:
site_list := []*urlItem{
&urlItem{"site1", "https://www.aaaa", nil},
&urlItem{"site2", "https://www.bbbb", nil},
&urlItem{"site3", "https://www.cccc", nil},
&urlItem{"site4", "https://www.dddd", nil},
}
process() can then safely update it's particular work item (without the need for any mutexs or synchronization) as each go-routine gets a unique work item.
func process(site *urlItem, maxChan chan bool, wg *sync.WaitGroup) { /* ... */ }
Then tally the go-routine errors at the end:
var failCount int
for _, v := range site_list {
if v.Err != nil {
failCount++
}
}
Playground Example
I have the following function which spins off a given amount of go routines
func (r *Runner) Execute() {
var wg sync.WaitGroup
wg.Add(len(r.pipelines))
for _, p := range r.pipelines {
go executePipeline(p, &wg)
}
wg.Wait()
errs := ....//contains list of errors reported by any/all go routines
}
I was thinking there might be some way with channels, but I can't seem to figure it out.
One way to do this is using mutexes if you can make executePipeline retuen errors:
// ...
for _, p := range r.pipelines {
go func(p pipelineType) {
if err := executePipeline(p, &wg); err != nil {
mu.Lock()
errs = append(errs, err)
mu.UnLock()
}
}(p)
}
To use a channel, you can have a separate goroutine listning for errors:
errCh := make(chan error)
go func() {
for e := range errCh {
errs = append(errs, e)
}
}
and in the Execute function, make the following changes:
// ...
wg.Add(len(r.pipelines))
for _, p := range r.pipelines {
go func(p pipelineType) {
if err := executePipeline(p, &wg); err != nil {
errCh <- err
}
}(p)
}
wg.Wait()
close(errCh)
You can always use #zerkms method listed above if the number of goroutines is not high.
instead of returning error from executePipleline and using a anonymous function wrapper, you can always make above changes within the function itself.
You can use channels as #Kaveh Shahbazian suggested:
func (r *Runner) Execute() {
pipelineChan := makePipeline(r.pipelines)
for cnt := 0; cnt < len(r.pipelines); cnt++{
//recieve from channel
p := <- pipelineChan
//do something with the result
}
}
func makePipeline(pipelines []pipelineType) <-chan pipelineType{
pipelineChan := make(chan pipelineType)
go func(){
for _, p := range pipelines {
go func(p pipelineType){
pipelineChan <- executePipeline(p)
}(p)
}
}()
return pipelineChan
}
Please see this example: https://gist.github.com/steven-ferrer/9b2eeac3eed3f7667e8976f399d0b8ad
This has been the bane of my existence.
type ec2Params struct {
sess *session.Session
region string
}
type cloudwatchParams struct {
cl cloudwatch.CloudWatch
id string
metric string
region string
}
type request struct {
ec2Params
cloudwatchParams
}
// Control concurrency and sync
var maxRoutines = 128
var sem chan bool
var req chan request
func main() {
sem := make(chan bool, maxRoutines)
for i := 0; i < maxRoutines; i++ {
sem <- true
}
req := make(chan request)
go func() { // This is my the producer
for _, arn := range arns {
arnCreds := startSession(arn)
for _, region := range regions {
sess, err := session.NewSession(
&aws.Config{****})
if err != nil {
failOnError(err, "Can't assume role")
}
req <- request{ec2Params: ec2Params{ **** }}
}
}
}()
for f := range(req) {
<- sem
if (ec2Params{}) != f.ec2Params {
go getEC2Metrics(****)
} else {
// I should be excercising this line of code too,
// but I'm not :(
go getMetricFromCloudwatch(****)
}
sem <- true
}
}
getEC2Metrics and getCloudwatchMetrics are the goroutines to execute
func getMetricFromCloudwatch(cl cloudwatch.CloudWatch, id, metric, region string) {
// Magic
}
func getEC2Metrics(sess *session.Session, region string) {
ec := ec2.New(sess)
var ids []string
l, err := ec.DescribeInstances(&ec2.DescribeInstancesInput{})
if err != nil {
fmt.Println(err.Error())
} else {
for _, rsv := range l.Reservations {
for _, inst := range rsv.Instances {
ids = append(ids, *inst.InstanceId)
}
}
metrics := cfg.AWSMetric.Metric
if len(ids) >= 0 {
cl := cloudwatch.New(sess)
for _, id := range ids{
for _, metric := range metrics {
// For what I can tell, execution get stuck here
req <- request{ cloudwatchParams: ***** }}
}
}
}
}
}
Both the anonymous producer in main and getEC2Metrics should publish data to req asynchronically, but so far it seems like whatever getEC2Metrics is publishing to the channel is never processed.
It looks like there is something stopping me from publishing from within a goroutine, but I haven't found anything. I would love to know how to go about this and to produce the indended behavior (This is, an actually working semaphore).
The base of the implementation can be found here: https://burke.libbey.me/conserving-file-descriptors-in-go/
Im frantic, JimB's comment made the wheel spin and now I've solved this!
// Control concurrency and sync
var maxRoutines = 128
var sem chan bool
var req chan request // Not reachable inside getEC2Metrics
func getEC2Metrics(sess *session.Session, region string, req chan <- request ) {
....
....
for _, id := range ids{
for _, metric := range metrics {
req <- request{ **** }} // When using the global req,
// this would block
}
}
....
....
}
func main() {
sem := make(chan bool, maxRoutines)
for i := 0; i < maxRoutines; i++ {
sem <- true
}
req := make(chan request)
go func() {
// Producing tasks
}()
for f := range(req) {
<- sem // checking out tickets outside the goroutine does block
//outside of the goroutine
go func() {
defer func() { sem <- true }()
if (ec2Params{}) != f.ec2Params {
getEC2Metrics(****, req) // somehow sending the channel makes
// possible to publish to it
} else {
getMetricFromCloudwatch(****)
}
}()
}
}
There were two issues:
The semaphore was not locking (I think it is because I was checking out and in tokens inside a goroutine, so there was a race condition probably).
For some reason, The global channel req was not being addressed properly by getEC2Metrics, so it would leave alll the goroutines stuck while trying to publish to a channel that was apparently on scope, but it wasn't (I really don't know why yet).
I've honestly just had luck with the second item, so far I haven't found any docs regarding this quirk, but at the end I'm glad it's working.