I am fairly new to golang and its concurrency principles. My use-case involves performing multiple http requests(for a single entity), on batch of entities. If any of the http request fails for an entity, I need to stop all parallel http requests for it. Also, I have to manage counts of entities failed with errors. I am trying to implement errorgroup inside entities goroutines, such that if any http request fails for a single entity the errorgroup terminates and return error to its parent goroutine. But I am not sure how to maintain count of errors.
func main(entity[] string) {
errorC := make(chan string) // channel to insert failed entity
var wg sync.WaitGroup
for _, link := range entity {
wg.Add(1)
// Spawn errorgroup here. errorgroup_spawn
}
go func() {
wg.Wait()
close(errorC)
}()
for msg := range errorC {
// here storing error entityIds somewhere.
}
}
and errorgroup like this
func errorgroup_spawn(ctx context.Context, errorC chan string, wg *sync.WaitGroup) { // and other params
defer (*wg).Done()
goRoutineCollection, ctxx := errgroup.WithContext(ctx)
results := make(chan *result)
goRoutineCollection.Go(func() error {
// http calls for single entity
// if error occurs, push it in errorC, and return Error.
return nil
})
go func() {
goRoutineCollection.Wait()
close(result)
}()
return goRoutineCollection.Wait()
}
PS: I was also thinking to apply nested errorgroups, but can't think to maintain error counts, while running other errorgroups
Can anyone guide me, is this a correct approach to handle such real world scenarios?
One way to keep track of errors is to use a status struct to keep track of which error came from where:
type Status struct {
Entity string
Err error
}
...
errorC := make(chan Status)
// Spawn error groups with name of the entity, and when error happens, push Status{Entity:entityName,Err:err} to the chanel
You can then read all errors from the error channel and figure out what failed why.
Another option is not to use errorgroups at all. This makes things more explicit, but whether it is better or not is debatable:
// Keep entity statuses
statuses:=make([]Status,len(entity))
for i, link := range entity {
statuses[i].Entity=link
wg.Add(1)
go func(i index) {
defer wg.Done()
ctx, cancel:=context.WithCancel(context.Background())
defer cancel()
// Error collector
status:=make(chan error)
defer close(status)
go func() {
for st:=range status {
if st!=nil {
cancel() // Stop all calls
// store first error
if statuses[i].Err==nil {
statuses[i].Err=st
}
}
}
}()
innerWg:=sync.WaitGroup{}
innerWg.Add(1)
go func() {
defer innerWg.Done()
status<- makeHttpCall(ctx)
}()
innerWg.Add(1)
go func() {
defer innerWg.Done()
status<- makeHttpCall(ctx)
}()
...
innerWg.Wait()
}(i)
}
When everything is done, statuses will contain all entities and corresponding statuses.
Related
I've been working with Go for some time but never done SSE before. I'm having an issue, can someone PLEASE provide with a working example of server sent events that will only send to a specific user(connection).
I'm using a gorilla - sessions to authenticate and I would like to use UserID to separate connections.
Or should I use 5 second polling via Ajax?
Many thanks
Here is what i found and tried:
https://gist.github.com/ismasan/3fb75381cd2deb6bfa9c it doenst send to an individual user and the go func wont stop if the connection is closed
https://github.com/striversity/gotr/blob/master/010-server-sent-event-part-2/main.go this is kind of what i need but it doesnt track once the connection is removed. So now, once you close and open the browser in private window it's not working at all. Also, as above, the go routine keeps going.
Create a "broker" to distribute messages to connected users:
type Broker struct {
// users is a map where the key is the user id
// and the value is a slice of channels to connections
// for that user id
users map[string][]chan []byte
// actions is a channel of functions to call
// in the broker's goroutine. The broker executes
// everything in that single goroutine to avoid
// data races.
actions chan func()
}
// run executes in a goroutine. It simply gets and
// calls functions.
func (b *Broker) run() {
for a := range b.actions {
a()
}
}
func newBroker() *Broker {
b := &Broker{
users: make(map[string][]chan []byte),
actions: make(chan func()),
}
go b.run()
return b
}
// addUserChan adds a channel for user with given id.
func (b *Broker) addUserChan(id string, ch chan []byte) {
b.actions <- func() {
b.users[id] = append(b.users[id], ch)
}
}
// removeUserchan removes a channel for a user with the given id.
func (b *Broker) removeUserChan(id string, ch chan []byte) {
// The broker may be trying to send to
// ch, but nothing is receiving. Pump ch
// to prevent broker from getting stuck.
go func() { for range ch {} }()
b.actions <- func() {
chs := b.users[id]
i := 0
for _, c := range chs {
if c != ch {
chs[i] = c
i = i + 1
}
}
if i == 0 {
delete(b.users, id)
} else {
b.users[id] = chs[:i]
}
// Close channel to break loop at beginning
// of removeUserChan.
// This must be done in broker goroutine
// to ensure that broker does not send to
// closed goroutine.
close(ch)
}
}
// sendToUser sends a message to all channels for the given user id.
func (b *Broker) sendToUser(id string, data []byte) {
b.actions <- func() {
for _, ch := range b.users[id] {
ch <- data
}
}
}
Declare a variable with the broker at package-level:
var broker = newBroker()
Write the SSE endpoint using the broker:
func sseEndpoint(w http.ResponseWriter, r *http.Request) {
// I assume that user id is in query string for this example,
// You should use your authentication code to get the id.
id := r.FormValue("id")
// Do the usual SSE setup.
flusher := w.(http.Flusher)
w.Header().Set("Content-Type", "text/event-stream")
w.Header().Set("Cache-Control", "no-cache")
w.Header().Set("Connection", "keep-alive")
// Create channel to receive messages for this connection.
// Register that channel with the broker.
// On return from the function, remove the channel
// from the broker.
ch := make(chan []byte)
broker.addUserChan(id, ch)
defer broker.removeUserChan(id, ch)
for {
select {
case <-r.Context().Done():
// User closed the connection. We are out of here.
return
case m := <-ch:
// We got a message. Do the usual SSE stuff.
fmt.Fprintf(w, "data: %s\n\n", m)
flusher.Flush()
}
}
}
Add code to your application to call Broker.sendToUser.
constantly receive Json data from websocket and process them in goroutine, no idea is this writing pattern is encourage or not
ws.onmessage { //infinite receive message from websocket
go func() { //work find using this goroutine
defer processJson(message)
}()
go processJson(message) //error and program will terminated
}
func processJson(msg string) {
//code for process json
insertDatabase(processedMsg)
}
func insertDatabase(processedMsg string) {
//code insert to database
}
Below(the first goroutine) work just fine, but sometime(a week) indicates there is a data race in the code and terminate the program.
go func() {
defer processJson(message)
}()
the second goroutine, often encounter error after few minutes running, the error often is "fatal error: unexpected signal during runtime execution".
go processJson(message)
from my understanding both goroutine do the samething, why is that the first can run well and second cannot. i have try using channel, but not much difference compare to the first goroutine.
msgChan := make(chan string, 1000)
go processJson(msgChan)
for { //receive json from websocket, send to channel
msgChan <- message
}
func JsonProcessor(msg chan string) {
for { //get data from channel, process in goroutine function
msgModified := <-msg
insertDatabase(msgModified)
}
}
is there any encourage way to acheive the goal without data race, suggestions are welcome.
Appreciate and Thanks.
try to use sync.Mutex avoid data racing
mutux := sync.Mutex{}
ws.onmessage {
processJson(message)
}
func processJson(msg string) {
mutux.Lock()
// .........
mutux.Unlock()
}
if the processing function can be divided without data racing, multithread version as follows :
msgChan1 := make(chan string, 1000)
msgChan2 := make(chan string, 1000)
go func() {
for m := range msgChan1 {
// ...
}
}()
go func() {
for m := range msgChan2 {
// ...
}
}()
ws.onmessage {
msgChan1 <- message
msgChan2 <- message
}
ws.onclose {
close(msgChan1)
close(msgChan2)
}
I've been searching a lot but could not find an answer for my problem yet.
I need to make multiple calls to an external API, but with different parameters concurrently.
And then for each call I need to init a struct for each dataset and process the data I receive from the API call. Bear in mind that I read each line of the incoming request and start immediately send it to the channel.
First problem I encounter was not obvious at the beginning due to the large quantity of data I'm receiving, is that each goroutine does not receive all the data that goes through the channel. (Which I learned by the research I've made). So what I need is a way of requeuing/redirect that data to the correct goroutine.
The function that sends the streamed response from a single dataset.
(I've cut useless parts of code that are out of context)
func (api *API) RequestData(ctx context.Context, c chan DWeatherResponse, dataset string, wg *sync.WaitGroup) error {
for {
line, err := reader.ReadBytes('\n')
s := string(line)
if err != nil {
log.Println("End of %s", dataset)
return err
}
data, err := extractDataFromStreamLine(s, dataset)
if err != nil {
continue
}
c <- *data
}
}
The function that will process the incoming data
func (s *StrikeStruct) Process(ch, requeue chan dweather.DWeatherResponse) {
for {
data, more := <-ch
if !more {
break
}
// data contains {dataset string, value float64, date time.Time}
// The s.Parameter needs to match the dataset
// IMPORTANT PART, checks if the received data is part of this struct dataset
// If not I want to send it to another go routine until it gets to the correct
one. There will be a max of 4 datasets but still this could not be the best approach to have
if !api.GetDataset(s.Parameter, data.Dataset) {
requeue <- data
continue
}
// Do stuff with the data from this point
}
}
Now on my own API endpoint I have the following:
ch := make(chan dweather.DWeatherResponse, 2)
requeue := make(chan dweather.DWeatherResponse)
final := make(chan strike.StrikePerYearResponse)
var wg sync.WaitGroup
for _, s := range args.Parameters.Strikes {
strike := strike.StrikePerYear{
Parameter: strike.Parameter(s.Dataset),
StrikeValue: s.Value,
}
// I receive and process the data in here
go strike.ProcessStrikePerYear(ch, requeue, final, string(s.Dataset))
}
go func() {
for {
data, _ := <-requeue
ch <- data
}
}()
// Creates a goroutine for each dataset
for _, dataset := range api.Params.Dataset {
wg.Add(1)
go api.RequestData(ctx, ch, dataset, &wg)
}
wg.Wait()
close(ch)
//Once the data is all processed it is all appended
var strikes []strike.StrikePerYearResponse
for range args.Fetch.Datasets {
strikes = append(strikes, <-final)
}
return strikes
The issue with this code is that as soon as I start receiving data from more than one endpoint the requeue will block and nothing more happens. If I remove that requeue logic data will be lost if it does not land on the correct goroutine.
My two questions are:
Why is the requeue blocking if it has a goroutine always ready to receive?
Should I take a different approach on how I'm processing the incoming data?
this is not a good way to solving your problem. you should change your solution. I suggest an implementation like the below:
import (
"fmt"
"sync"
)
// answer for https://stackoverflow.com/questions/68454226/how-to-handle-multiple-goroutines-that-share-the-same-channel
var (
finalResult = make(chan string)
)
// IData use for message dispatcher that all struct must implement its method
type IData interface {
IsThisForMe() bool
Process(*sync.WaitGroup)
}
//MainData can be your main struct like StrikePerYear
type MainData struct {
// add any props
Id int
Name string
}
type DataTyp1 struct {
MainData *MainData
}
func (d DataTyp1) IsThisForMe() bool {
// you can check your condition here to checking incoming data
if d.MainData.Id == 2 {
return true
}
return false
}
func (d DataTyp1) Process(wg *sync.WaitGroup) {
d.MainData.Name = "processed by DataTyp1"
// send result to final channel, you can change it as you want
finalResult <- d.MainData.Name
wg.Done()
}
type DataTyp2 struct {
MainData *MainData
}
func (d DataTyp2) IsThisForMe() bool {
// you can check your condition here to checking incoming data
if d.MainData.Id == 3 {
return true
}
return false
}
func (d DataTyp2) Process(wg *sync.WaitGroup) {
d.MainData.Name = "processed by DataTyp2"
// send result to final channel, you can change it as you want
finalResult <- d.MainData.Name
wg.Done()
}
//dispatcher will run new go routine for each request.
//you can implement a worker pool to preventing running too many go routines.
func dispatcher(incomingData *MainData, wg *sync.WaitGroup) {
// based on your requirements you can remove this go routing or not
go func() {
var p IData
p = DataTyp1{incomingData}
if p.IsThisForMe() {
go p.Process(wg)
return
}
p = DataTyp2{incomingData}
if p.IsThisForMe() {
go p.Process(wg)
return
}
}()
}
func main() {
dummyDataArray := []MainData{
MainData{Id: 2, Name: "this data #2"},
MainData{Id: 3, Name: "this data #3"},
}
wg := sync.WaitGroup{}
for i := range dummyDataArray {
wg.Add(1)
dispatcher(&dummyDataArray[i], &wg)
}
result := make([]string, 0)
done := make(chan struct{})
// data collector
go func() {
loop:for {
select {
case <-done:
break loop
case r := <-finalResult:
result = append(result, r)
}
}
}()
wg.Wait()
done<- struct{}{}
for _, s := range result {
fmt.Println(s)
}
}
Note: this is just for opening your mind for finding a better solution, and for sure this is not a production-ready code.
I have written an API that makes DB calls and does some business logic. I am invoking a goroutine that must perform some operation in the background.
Since the API call should not wait for this background task to finish, I am returning 200 OK immediately after calling the goroutine (let us assume the background task will never give any error.)
I read that goroutine will be terminated once the goroutine has completed its task.
Is this fire and forget way safe from a goroutine leak?
Are goroutines terminated and cleaned up once they perform the job?
func DefaultHandler(w http.ResponseWriter, r *http.Request) {
// Some DB calls
// Some business logics
go func() {
// some Task taking 5 sec
}()
w.WriteHeader(http.StatusOK)
}
I would recommend always having your goroutines under control to avoid memory and system exhaustion.
If you are receiving a spike of requests and you start spawning goroutines without control, probably the system will go down soon or later.
In those cases where you need to return an immediate 200Ok the best approach is to create a message queue, so the server only needs to create a job in the queue and return the ok and forget. The rest will be handled by a consumer asynchronously.
Producer (HTTP server) >>> Queue >>> Consumer
Normally, the queue is an external resource (RabbitMQ, AWS SQS...) but for teaching purposes, you can achieve the same effect using a channel as a message queue.
In the example you'll see how we create a channel to communicate 2 processes.
Then we start the worker process that will read from the channel and later the server with a handler that will write to the channel.
Try to play with the buffer size and job time while sending curl requests.
package main
import (
"fmt"
"log"
"net/http"
"time"
)
/*
$ go run .
curl "http://localhost:8080?user_id=1"
curl "http://localhost:8080?user_id=2"
curl "http://localhost:8080?user_id=3"
curl "http://localhost:8080?user_id=....."
*/
func main() {
queueSize := 10
// This is our queue, a channel to communicate processes. Queue size is the number of items that can be stored in the channel
myJobQueue := make(chan string, queueSize) // Search for 'buffered channels'
// Starts a worker that will read continuously from our queue
go myBackgroundWorker(myJobQueue)
// We start our server with a handler that is receiving the queue to write to it
if err := http.ListenAndServe("localhost:8080", myAsyncHandler(myJobQueue)); err != nil {
panic(err)
}
}
func myAsyncHandler(myJobQueue chan<- string) http.HandlerFunc {
return func(rw http.ResponseWriter, r *http.Request) {
// We check that in the query string we have a 'user_id' query param
if userID := r.URL.Query().Get("user_id"); userID != "" {
select {
case myJobQueue <- userID: // We try to put the item into the queue ...
rw.WriteHeader(http.StatusOK)
rw.Write([]byte(fmt.Sprintf("queuing user process: %s", userID)))
default: // If we cannot write to the queue it's because is full!
rw.WriteHeader(http.StatusInternalServerError)
rw.Write([]byte(`our internal queue is full, try it later`))
}
return
}
rw.WriteHeader(http.StatusBadRequest)
rw.Write([]byte(`missing 'user_id' in query params`))
}
}
func myBackgroundWorker(myJobQueue <-chan string) {
const (
jobDuration = 10 * time.Second // simulation of a heavy background process
)
// We continuosly read from our queue and process the queue 1 by 1.
// In this loop we could spawn more goroutines in a controlled way to paralelize work and increase the read throughput, but i don't want to overcomplicate the example.
for userID := range myJobQueue {
// rate limiter here ...
// go func(u string){
log.Printf("processing user: %s, started", userID)
time.Sleep(jobDuration)
log.Printf("processing user: %s, finisehd", userID)
// }(userID)
}
}
There is no "goroutine cleaning" you have to handle, you just launch goroutines and they'll be cleaned when the function launched as a goroutine returns. Quoting from Spec: Go statements:
When the function terminates, its goroutine also terminates. If the function has any return values, they are discarded when the function completes.
So what you do is fine. Note however that your launched goroutine cannot use or assume anything about the request (r) and response writer (w), you may only use them before you return from the handler.
Also note that you don't have to write http.StatusOK, if you return from the handler without writing anything, that's assumed to be a success and HTTP 200 OK will be sent back automatically.
See related / possible duplicate: Webhook process run on another goroutine
#icza is absolutely right there is no "goroutine cleaning" you can use a webhook or a background job like gocraft. The only way I can think of using your solution is to use the sync package for learning purposes.
func DefaultHandler(w http.ResponseWriter, r *http.Request) {
// Some DB calls
// Some business logics
var wg sync.WaitGroup
wg.Add(1)
go func() {
defer wg.Done()
// some Task taking 5 sec
}()
w.WriteHeader(http.StatusOK)
wg.wait()
}
you can wait for a goroutine to finish using &sync.WaitGroup:
// BusyTask
func BusyTask(t interface{}) error {
var wg = &sync.WaitGroup{}
wg.Add(1)
go func() {
// busy doing stuff
time.Sleep(5 * time.Second)
wg.Done()
}()
wg.Wait() // wait for goroutine
return nil
}
// this will wait 5 second till goroutune finish
func main() {
fmt.Println("hello")
BusyTask("some task...")
fmt.Println("done")
}
Other way is to attach a context.Context to goroutine and time it out.
//
func BusyTaskContext(ctx context.Context, t string) error {
done := make(chan struct{}, 1)
//
go func() {
// time sleep 5 second
time.Sleep(5 * time.Second)
// do tasks and signle done
done <- struct{}{}
close(done)
}()
//
select {
case <-ctx.Done():
return errors.New("timeout")
case <-done:
return nil
}
}
//
func main() {
fmt.Println("hello")
ctx, cancel := context.WithTimeout(context.TODO(), 2*time.Second)
defer cancel()
if err := BusyTaskContext(ctx, "some task..."); err != nil {
fmt.Println(err)
return
}
fmt.Println("done")
}
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 3 years ago.
Improve this question
Many languages have their own high-level non-blocking HTTP client, for example, python's aiohttp. Namely, they send out HTTP requests; do not wait for response; When response arrives they make some kind of callbacks.
My questions are
is there a Go package for that?
or we just create a goroutine in which we use normal HTTP clients?
which way is better?
Other languages have such features because when they block waiting for request they block the thread they are using. This is the case for Java, Python or NodeJS. Therefore to make them useful, the developers needed to implement such long-standing blocking operations with callbacks. The root cause of that is the usage of the C library beneath that blocks threads on input-output operations.
Go does not use C library (only in some cases, but it can be turned off) and makes system calls by itself. While doing this the thread that executes current goroutine parks it and executes another goroutine. Therefore you can have enormous number of blocked goroutines without running out of threads. Goroutines are cheap with regard to memory, threads are operating system entities.
In Go using goroutines is better. There is no need for creating asynchronous client because of the above.
For comparison in Java you would quickly end up with multiple threads. The next step would be pooling them as they are costly. Pooling means limiting the concurrency.
As others have stated, goroutines are the way to go (pun intended).
Minimal Example:
type nonBlocking struct {
Response *http.Response
Error error
}
const numRequests = 2
func main() {
nb := make(chan nonBlocking, numRequests)
wg := &sync.WaitGroup{}
for i := 0; i < numRequests; i++ {
wg.Add(1)
go Request(nb)
}
go HandleResponse(nb, wg)
wg.Wait()
}
func Request(nb chan nonBlocking) {
resp, err := http.Get("http://example.com")
nb <- nonBlocking{
Response: resp,
Error: err,
}
}
func HandleResponse(nb chan nonBlocking, wg *sync.WaitGroup) {
for get := range nb {
if get.Error != nil {
log.Println(get.Error)
} else {
log.Println(get.Response.Status)
}
wg.Done()
}
}
Yip, built into the standard library, just not usable by a simple function call out of the box.
Take this example
package main
import (
"flag"
"log"
"net/http"
"sync"
"time"
)
var url string
var timeout time.Duration
func init() {
flag.StringVar(&url, "url", "http://www.stackoverflow.com", "url to GET")
flag.DurationVar(&timeout, "timeout", 5*time.Second, "timeout for the GET operation")
}
func main() {
flag.Parse()
// We use the channel as our means to
// hand the response over
rc := make(chan *http.Response)
// We need a waitgroup because all goroutines exit when main exits
var wg sync.WaitGroup
// We are spinning up an async request
// Increment the counter for our WaitGroup.
// What we are basically doing here is to tell the WaitGroup
// "Hey, there is one more task you have to wait for!"
wg.Add(1)
go func() {
// Notify the WaitGroup that one task is done as soon
// as we exit the goroutine.
defer wg.Done()
log.Printf("Doing GET request on \"%s\"", url)
resp, err := http.Get(url)
if err != nil {
log.Printf("GET for %s: %s", url, err)
}
// We send the reponse downstream
rc <- resp
// Now, the goroutine exits, the defered call to wg.Done()
// is executed.
}()
// And here we do our async processing.
// Note that you could have done the processing in the first goroutine
// as well, since http.Get would be a blocking operation and any subsequent
// code in the goroutine would have been excuted only after the Get returned.
// However, I put te processing into its own goroutine for demonstration purposes.
wg.Add(1)
go func() {
// As above
defer wg.Done()
log.Println("Doing something else")
// Setting up a timer for a timeout.
// Note that this could be done using a request with a context, as well.
to := time.NewTimer(timeout).C
select {
case <-to:
log.Println("Timeout reached")
// Exiting the goroutine, the deferred call to wg.Done is executed
return
case r := <-rc:
if r == nil {
log.Printf("Got no useful response from GETting \"%s\"", url)
// Exiting the goroutine, the deferred call to wg.Done is executed
return
}
log.Printf("Got response with status code %d (%s)", r.StatusCode, r.Status)
log.Printf("Now I can do something useful with the response")
}
}()
// Now we have set up all of our tasks,
// we are waiting until all of them are done...
wg.Wait()
log.Println("All tasks done, exiting")
}
If you look at this closely, we have all building blocks to make GETting an URL and processing the response async. We can start to abstract this a bit:
package main
import (
"flag"
"log"
"net/http"
"time"
)
var url string
var timeout time.Duration
func init() {
flag.StringVar(&url, "url", "http://www.stackoverflow.com", "url to GET")
flag.DurationVar(&timeout, "timeout", 5*time.Second, "timeout for the GET operation")
}
type callbackFunc func(*http.Response, error) error
func getWithCallBack(u string, callback callbackFunc) chan error {
// We create a channel which we can use to notify the caller of the
// result of the callback.
c := make(chan error)
go func() {
c <- callback(http.Get(u))
}()
return c
}
func main() {
flag.Parse()
c := getWithCallBack(url, func(resp *http.Response, err error) error {
if err != nil {
// Doing something useful with the err.
// Add additional cases as needed.
switch err {
case http.ErrNotSupported:
log.Printf("GET not supported for \"%s\"", url)
}
return err
}
log.Printf("GETting \"%s\": Got response with status code %d (%s)", url, resp.StatusCode, resp.Status)
return nil
})
if err := <-c; err != nil {
log.Printf("Error GETting \"%s\": %s", url, err)
}
log.Println("All tasks done, exiting")
}
And there you Go (pun intended): Async processing of GET requests.