I am trying to construct a receiver and sender pattern using two channels in Golang. I am doing a task (API call), and receiving back a Response struct. My goal is that when a response is received I'd like to send it to another channel (writeChan) for additional processing.
I'd like to continuously read/listen on that receiver channel (respChan) and process anything that comes through (such as a Response). Then I'd like to spin up a thread to go and do a further operation with that Response in another goroutine.
I'd like to understand how I can chain together this pattern to allow data to flow from the API calls and concurrently write it (each Response will be written to a separate file destination which the Write() func handles.
Essentially my current pattern is the following:
package main
import (
"fmt"
"sync"
)
func main() {
var wg sync.WaitGroup
respChan := make(chan Response) // Response is a struct that contains API response metadata
defer close(respChan)
// requests is just a slice of requests to be made to an API
// This part is working well
for _, req := range requests {
wg.Add(1)
go func(r Request) {
defer wg.Done()
resp, _ := r.Get() // Make the API call and receive back a Response struct
respChan <- resp // Put the response into our channel
}(req)
}
// Now, I want to extract the responses as they become available and send them to another function to do some processing. This I am unsure of how to handle properly
writeChan := make(chan string)
defer close(writeChan)
select {
case resp := <-respChan: // receive from response channel
go func(response Response) {
signal, _ := Write(response) // Separate func to write the response to a file. Not important here in this context.
writeChan <- signal // Put the signal data into the channel which is a string file path of where the file was written (will be used for a later process)
}(resp)
case <-time.After(15 *time.Second):
fmt.Println("15 seconds have passed without receiving anything...")
}
wg.Wait()
}
Let me share with you a working example that you can benefit from. First, I'm gonna present the code, then, I'm gonna walk you through all the relevant sections.
package main
import (
"fmt"
"net/http"
"os"
"strings"
"time"
)
type Request struct {
Url string
DelayInSeconds time.Duration
}
type Response struct {
Url string
StatusCode int
}
func main() {
requests := []Request{
{"https://www.google.com", 0},
{"https://stackoverflow.com", 1},
{"https://www.wikipedia.com", 4},
}
respChan := make(chan Response)
defer close(respChan)
for _, req := range requests {
go func(r Request) {
fmt.Printf("%q - %v\n", r.Url, strings.Repeat("#", 30))
// simulate heavy work
time.Sleep(time.Second * r.DelayInSeconds)
resp, _ := http.Get(r.Url)
res := Response{r.Url, resp.StatusCode}
fmt.Println(time.Now())
respChan <- res
}(req)
}
writeChan := make(chan struct{})
defer close(writeChan)
for i := 0; i < len(requests); i++ {
select {
case res := <-respChan:
go func(r Response) {
f, err := os.Create(fmt.Sprintf("%v.txt", strings.Replace(r.Url, "https://", "", 1)))
if err != nil {
panic(err)
}
defer f.Close()
f.Write([]byte(fmt.Sprintf("%q OK with %d\n", r.Url, r.StatusCode)))
writeChan <- struct{}{}
}(res)
case <-time.After(time.Second * 2):
fmt.Println("Timeout")
}
}
}
Set up
First, I've defined the two structs that will be used in the example: Request and Response. In the former, I put a DelayInSeconds to mock some heavy loads and time-consuming operations. Then, I defined the requests variable that contains all the requests that have to be done.
The writing part
Here, I range over the requests variable. For each request, I'm gonna issue an HTTP request to the target URL. The time.Sleep emulate the heavy load. Then, I write the response to the respChan channel which is unbuffered.
The reading part
Here, the major change is to wrap the select construct into a for loop. Thanks to this, we'll make sure to iterate the right times (based on the length of the requests variable).
Final notes
First of all, bear in mind that the code is oversimplified just to show off the relevant parts. Due to this, a lot of error handling is missing and some inline functions could be extrapolated into named functions. You don't need to use sync.WaitGroup to achieve what you need, the usage of channels will be enough.
Feel free to play with delays and check which files are written!
Let me know if this helps you!
Edit
As requested, I'm gonna provide you with a more accurate solution based on your needs. The new reading part will be something like the following:
count := 0
for {
// this check is need to exit the for loop and not wait indefinitely
// it can be removed based on your needs
if count == 3 {
fmt.Println("all responses arrived...")
return
}
res := <-respChan
count++
go func(r Response) {
f, err := os.Create(fmt.Sprintf("%v.txt", strings.Replace(r.Url, "https://", "", 1)))
if err != nil {
panic(err)
}
defer f.Close()
f.Write([]byte(fmt.Sprintf("%q OK with %d\n", r.Url, r.StatusCode)))
writeChan <- struct{}{}
}(res)
}
Here, the execution is waiting indefinitely within the for loop. No matter how long each request takes to complete, it will be fetched as soon as it arrives. I put, at the top of the for loop, an if to exit after it processed the requests that we need. However, you can avoid it and let the code run till a cancellation signal comes in (it's up to you).
Let me know if this better meets your requirements, thanks!
Related
I have following for...range block which call the urls using goroutine.
func callUrls(urls []string, reqBody interface{}) []*Response {
ch := make(chan *Response, len(urls))
for _, url := range urls {
somePostData := reqBody //this just seems to copy reference, not a deep copy
go func(url string, somePostData interface{}) {
//serviceMutex.Lock()
//defer serviceMutex.Unlock()
somePostData.(map[string]interface{})["someKey"] = "someval" //can be more deeper nested
//Data race - while executing above line it seems the original data is only getting modified, not the new local variable
//http post on url,
postJsonBody, _ := json.Marshal(somePostData)
req, err := http.NewRequest("POST", url, bytes.NewBuffer(postJsonBody))
req.Header.Set("Content-Type", "application/json")
req.Header.Set("Connection", "Keep-Alive")
client := &http.Client{
Timeout: time.Duration(time.Duration(300) * time.MilliSecond),
}
response, err := client.Do(req)
response.Body.Close()
// return to channel accordingly
ch <- &Response{200, "url", "response body"}
}(url, somePostData)
}
//for block to return result.
}
Each goroutine func needs to post modified post data to url.
However running with -race shows data race at the line where post data interface is modified.
I also tried the sync.Mutex Lock() and Unlock() but it seems it blocking the whole app. I don't want to use []bytes so as modifying the slices seems to more cpu consuming(As it seems to me).
What would be best way to avoid data race here. Also the connection doesn't seems to be reused causing http error as well. Any suggestions?
A couple of options:
Use the Mutex
This is the safest and most straightforward. It looks like the scope of your mutex could be reduced to:
serviceMutex.Lock()
somePostData.(map[string]interface{})["someKey"] = "someval"
postJsonBody, _ := json.Marshal(somePostData)
serviceMutex.Unlock()
Which should significantly help with your throughput.
Have each goroutine build its own somePostData
Depending on your data structure this should allow each goroutine to not share data with any other and give you the safety and speed boost. Imagine instead of passing an interface{} with potentially lots of references itself you pass in a thread-safe method that is able to build a request body:
func callUrls(urls []string, buildReqBody func(...params...) interface{})
...
somePostData = buildReqBody(...params...)
I have written go code to call multiple http request independently and combine the results.
Sometime values are missing on combined method.
func profile(req *http.Request) (UserMe, error, UserRating, error) {
wgcall := &sync.WaitGroup{}
uChan := make(chan ResUser)
rChan := make(chan ResRating)
// variable inits
var meResp UserMe
var ratingResp UserRating
go func() {
res := <-uChan
meResp = res.Value
}()
go func() {
res := <-rChan
ratingResp = res.Value
}()
wgcall.Add(2)
go me(req, wgcall, uChan)
go rate(req, wgcall, rChan)
wgcall.Wait()
logrus.Info(meResp) // sometimes missing
logrus.Info(ratingResp) // sometimes missing
return meResp, meErr, ratingResp, ratingErr
}
But the me and rating calls returns the values from api requests as expected.
func me(req *http.Request, wg *sync.WaitGroup, ch chan ResUser) {
defer wg.Done()
// http call return value correclty
me := ...
ch <- ResUser{
Value := // value from rest
}
logrus.Info(fmt.Sprintf("User calls %v" , me)) // always return the values
close(ch)
}
func rate(req *http.Request, wg *sync.WaitGroup, ch chan ResRating) {
defer wg.Done()
// make http call
rating := ...
ch <- ResRating{
Value := // value from rest
}
logrus.Info(fmt.Sprintf("Ratings calls %v" , rating)) // always return the values
close(ch)
}
Issue is: meResp and ratingResp on profile function not getting the values always. sometime only meResp or ratingResp, sometimes both as expected.
But me and rate function calls getting values always.
Can help me fix this plz ?
There's a race condition in your code.
There is no barrier to ensure the goroutines in the profile method which read from uChan and rChan have populated the variables meResp and ratingResp before you return from profile.
You can simplify your code immensely by dropping the use of channels and the inline declared goroutines in profile. Instead, simply populate the response values directly. There is no benefit provided by using channels or goroutines to read from them in this circumstance as you only intend to send one value and you have a requirement that the values produced by both HTTP calls are present before returning.
You might do this by modifying the signature of me and rate to receive a pointer to the location to write their output, or by wrapping their calls with a small function which receives their output value and populates the value in profile. Importantly, the WaitGroup should only be signalled after the value has been populated:
wgcall := &sync.WaitGroup{}
var meResp UserMe
var ratingResp RatingMe
wgcall.Add(2)
// The "me" and "rate" functions should be refactored to
// drop the wait group and channel arguments.
go func() {
meResp = me(req)
wgcall.Done()
}()
go func() {
ratingResp = rate(req)
wgcall.Done()
}()
wgcall.Wait()
// You are guaranteed that if "me" and "rate" returned valid values,
// they are populated in "meResp" and "ratingResp" at this point.
// Do whatever you need here, such as logging or returning.
I want to write requests to one file from some ajax script. The problem arises when there will be many of those in a second and writing to file will take more time than the break between requests, and when there will be two requests at the same time.
How could I solve this?
I've came up using mutex, like:
var mu sync.Mutex
func writeToFile() {
mu.Lock()
defer mu.Unlock()
// write to file
}
But it makes the whole thing synchronous and I don't really know what happens when there are two requests at the same time. And it still does not lock the file itself.
Uh, what's the proper way to do this?
You only need to make writing to the file "sequential", meaning don't allow 2 concurrent goroutines to write to the file. Yes, if you use locking in the writeToFile() function, serving your ajax requests may become (partially) sequential too.
What I suggest is open the file once, when your application starts. And designate a single goroutine which will be responsible writing to the file, no other goroutines should do it.
And use a buffered channel to send data that should be written to the file. This will make serving ajax requests non-blocking, and still the file will not be written concurrently / parallel.
Note that this way ajax requests won't even have to wait while the data is actually written to the file (faster response time). This may or may not be a problem. For example if later writing fails, your ajax response might already be committed => no chance to signal failure to the client.
Example how to do it:
var (
f *os.File
datach = make(chan []byte, 100) // Buffered channel
)
func init() {
// Open file for appending (create if doesn't exist)
var err error
f, err = os.OpenFile("data.txt", os.O_CREATE|os.O_APPEND|os.O_WRONLY, 0666)
if err != nil {
panic(err)
}
// Start goroutine which writes to the file
go writeToFile()
}
func writeToFile() {
// Loop through any data that needs to be written:
for data := range datach {
if _, err := f.Write(data); err != nil {
// handle error!
}
}
// We get here if datach is closed: shutdown
f.Close()
}
func ajaxHandler(w http.ResponseWriter, r *http.Request) {
// Assmeble data that needs to be written (appended) to the file
data := []byte{1, 2, 3}
// And send it:
datach <- data
}
To gracefully exit from the app, you should close the datach channel: when it's closed, the loop in the writeToFile() will terminate, and the file will be closed (flushing any cached data, releasing OS resources).
If you want to write text to the file, you may declare the data channel like this:
var datach = make(chan string, 100) // Buffered channel
And you may use File.WriteString() to write it to the file:
if _, err := f.WriteString(data); err != nil {
// handle error!
}
I want to ask several servers for data (e.g. multiple read replicas).
In this task most important is speed, so first result should be served
and all other can be ignored.
I have problem with idiomatic way of bypassing this data. Everything
with this problem is ok when it quits (all slower goroutines are not
finishing their work, because main process exists). But when we uncomment
last line (with Sleep) We can see that other goroutines are doing their work too.
Now I'm pushing data through channel is there any way to not push them?
What is good and safe way of dealing with this kind of problems?
package main
import (
"fmt"
"log"
"math/rand"
"time"
)
type Result int
type Conn struct {
Id int
}
func (c *Conn) DoQuery(params string) Result {
log.Println("Querying start", params, c.Id)
time.Sleep(time.Duration(rand.Int31n(1000)) * time.Millisecond)
log.Println("Querying end", params, c.Id)
return Result(1000 + c.Id*c.Id)
}
func Query(conns []Conn, query string) Result {
ch := make(chan Result)
for _, conn := range conns {
go func(c Conn) {
ch <- c.DoQuery(query)
}(conn)
}
return <-ch
}
func main() {
conns := []Conn{Conn{1}, Conn{2}, Conn{3}, Conn{4}, Conn{5}}
result := Query(conns, "query!")
fmt.Println(result)
// time.Sleep(time.Minute)
}
My recommendation would be to make ch a buffered channel with one space per query: ch := make(chan Result, len(conns)). This way each query can run to completion, and will not block on the channel write.
Query can read once and return the first result. When all other goroutines complete, the channel will eventually be garbage collected and everything will go away. With your unbuffered channel, you create a lot of goroutines that can never terminate.
EDIT:
If you want to cancel in-flight requests, it can become significantly harder. Some operations and apis provide cancellation, and others don't. With an http request you can use Cancel field on the request struct. Simply provide a channel that you can close to cancel:
func (c *Conn) DoQuery(params string, cancel chan struct{}) Result {
//error handling omitted. It is important to handle errors properly.
req, _ := http.NewRequest(...)
req.Cancel = cancel
resp, _ := http.DefaultClient.Do(req)
//On Cancellation, the request will return an error of some kind.
return readData(resp)
}
func Query(conns []Conn, query string) Result {
ch := make(chan Result)
cancel := make(chan struct{})
for _, conn := range conns {
go func(c Conn) {
ch <- c.DoQuery(query,cancel)
}(conn)
}
first := <-ch
close(cancel)
return first
}
This may help if there is a large request to read that you won't care about, but it may or may not actually cancel the request on the remote server. If your query is not http, but a database call or something else, you will need to look into if there is a similar cancellation mechanism you can use.
The problem is this: There is a web server. I figured that it would be beneficial to use goroutines in page loading, so I went ahead and did: called loadPage function as a goroutine. However, when doing this, the server simply stops working without errors. It prints a blank, white page. The problem has to be in the function itself- something there is conflicting with the goroutine somehow.
These are the relevant functions:
func loadPage(w http.ResponseWriter, path string) {
s := GetFileContent(path)
w.Header().Add("Content-Type", getHeader(path))
w.Header().Add("Content-Length", GetContentLength(path))
fmt.Fprint(w, s)
}
func GetFileContent(path string) string {
cont, err := ioutil.ReadFile(path)
e(err)
aob := len(cont)
s := string(cont[:aob])
return s
}
func GetFileContent(path string) string {
cont, err := ioutil.ReadFile(path)
e(err)
aob := len(cont)
s := string(cont[:aob])
return s
}
func getHeader(path string) string {
images := []string{".jpg", ".jpeg", ".gif", ".png"}
readable := []string{".htm", ".html", ".php", ".asp", ".js", ".css"}
if ArrayContainsSuffix(images, path) {
return "image/jpeg"
}
if ArrayContainsSuffix(readable, path) {
return "text/html"
}
return "file/downloadable"
}
func ArrayContainsSuffix(arr []string, c string) bool {
length := len(arr)
for i := 0; i < length; i++ {
s := arr[i]
if strings.HasSuffix(c, s) {
return true
}
}
return false
}
The reason why this happens is because your HandlerFunc which calls "loadPage" is called synchronously with the request. When you call it in a go routine the Handler is actually returning immediately, causing the response to be sent immediately. That's why you get a blank page.
You can see this in server.go (line 1096):
serverHandler{c.server}.ServeHTTP(w, w.req)
if c.hijacked() {
return
}
w.finishRequest()
The ServeHTTP function calls your handler, and as soon as it returns it calls "finishRequest". So your Handler function must block as long as it wants to fulfill the request.
Using a go routine will actually not make your page any faster. Synchronizing a singe go routine with a channel, as Philip suggests, will also not help you in this case as that would be the same as not having the go routine at all.
The root of your problem is actually ioutil.ReadFile, which buffers the entire file into memory before sending it.
If you want to stream the file you need to use os.Open. You can use io.Copy to stream the contents of the file to the browser, which will used chunked encoding.
That would look something like this:
f, err := os.Open(path)
if err != nil {
http.Error(w, "Not Found", http.StatusNotFound)
return
}
n, err := io.Copy(w, f)
if n == 0 && err != nil {
http.Error(w, "Error", http.StatusInternalServerError)
return
}
If for some reason you need to do work in multiple go routines, take a look at sync.WaitGroup. Channels can also work.
If you are trying to just serve a file, there are other options that are optimized for this, such as FileServer or ServeFile.
In the typical web framework implementations in Go, the route handlers are invoked as Goroutines. I.e. at some point the web framework will say go loadPage(...).
So if you call a Go routine from inside loadPage, you have two levels of Goroutines.
The Go scheduler is really lazy and will not execute the second level if it's not forced to. So you need to enforce it through synchronization events. E.g. by using channels or the sync package. Example:
func loadPage(w http.ResponseWriter, path string) {
s := make(chan string)
go GetFileContent(path, s)
fmt.Fprint(w, <-s)
}
The Go documentation says this:
If the effects of a goroutine must be observed by another goroutine,
use a synchronization mechanism such as a lock or channel
communication to establish a relative ordering.
Why is this actually a smart thing to do? In larger projects you may deal with a large number of Goroutines that need to be coordinated somehow efficiently. So why call a Goroutine if it's output is used nowhere? A fun fact: I/O operations like fmt.Printf do trigger synchronization events too.