I'm using a rate limiter to throttle the number of requests that are routed
The requests are sent to a channel, and I want to limit the number that are processed per second but i'm struggling to understand if i'm setting this correctly, I don't get an error, but i'm unsure if i'm even using the rate limiter
This is what is being added to the channel:
type processItem struct {
itemString string
}
Here's the channel and limiter:
itemChannel := make(chan processItem, 5)
itemThrottler := rate.NewLimiter(4, 1) //4 a second, no more per second (1)
var waitGroup sync.WaitGroup
Items are added to the channel:
case "newItem":
waitGroup.Add(1)
itemToExec := new(processItem)
itemToExec.itemString = "item string"
itemChannel <- *itemToExec
Then a go routine is used to process everything that is added to the channel:
go func() {
defer waitGroup.Done()
err := itemThrottler.Wait(context.Background())
if err != nil {
fmt.Printf("Error with limiter: %s", err)
return
}
for item := range itemChannel {
execItem(item.itemString) // the processing function
}
defer func() { <-itemChannel }()
}()
waitGroup.Wait()
Can someone confirm that the following occurs:
The execItem function is run on each member of the channel 4 times a second
I don't understand what "err := itemThrottler.Wait(context.Background())" is doing in the code, how is this being invoked?
... i'm unsure if i'm even using the rate limiter
Yes, you are using the rate-limiter. You are rate-limiting the case "newItem": branch of your code.
I don't understand what "err := itemThrottler.Wait(context.Background())" is doing in the code
itemThrottler.Wait(..) will just stagger requests (4/s i.e. every 0.25s) - it does not refuse requests if the rate is exceeded. So what does this mean? If you receive a glut of 1000 requests in 1 second:
4 requests will be handled immediately; but
996 requests will create a backlog of 996 go-routines that will block
The 996 will unblock at a rate of 4/s and thus the backlog of pending go-routines will not clear for another 4 minutes (or maybe longer if more requests come in). A backlog of go-routines may or may not be what you want. If not, you may want to use Limiter.Allow - and if false is returned, then refuse the request (i.e. don't create a go-routine) and return a 429 error (if this is a HTTP request).
Finally, if this is a HTTP request, you should use it's imbedded context when calling Wait e.g.
func (a *app) myHandler(w http.ResponseWriter, r *http.Request) {
// ...
err := a.ratelimiter(r.Context())
if err != nil {
// client http request most likely canceled (i.e. caller disconnected)
}
}
Related
I am in doubt whether all of my spawned goroutines are dying after doing their assigned work.
I have to make two HTTP calls(always), but based on a flag, read the response from either one of them.
what I have done so far is ->
var result error
resultChannel := make(chan error)
var wg sync.WaitGroup
wg.Add(1) // only adding 1, as I don't need to wait for other to complete.
go func() {
_, err := // HTTP call ONE
if flagIsTrue {
defer wg.Done()
resultChannel <- err
}
}()
go func() {
_, err := // HTTP call TWO
if !flagIsTrue {
defer wg.Done()
resultChannel <- err
}
}()
go func() {
wg.Wait()
close(resultChannel)
}()
for err := range resultChannel {
result = err
}
Hence, I will wait for the corresponding call, and listen to its response only. This is working well, but since the app is deployed on the server, where I guess the main goroutine won't die(henceforth killing other goroutines), my main concern is whether the other ignorable thread will die or not after it will get the response from HTTP call(afaik, we need to tell go that a goroutine needs to die).
My concerns:
The assumption(true acc to me) that the main thread does not terminate after serving one of these calls.
Will the ignorable(response is, but necessary to trigger the API call) thread die or not?
Should I use a select case to handle this, if yes then how(other suggestions are welcome)?
If the flagIsTrue is set before creating the goroutines, then only one of the goroutines will be able to write to the channel. The other one will not attempt to write to the channel, and thus will terminate.
You could simply move the check for the flag outside, and create one goroutine based on the flag.
I have a Go RPC server that serves client requests. A client requests work (or task) from the server and the server assigns a task to the client. The server expects workers (or clients) to finish any task within a time limit. Therefore a timeout event callback mechanism is required on the server-side.
Here is what I tried so far.
func (l *Listener) RequestHandler(request string, reply string) error {
// some other work
// ....
_timer := time.NewTimer(time.Second * 5) // timer for 2 seconds
go func() {
// simulates a client not replying case, with timeout of 2 sec
y := <-_timer.C
fmt.Println("TimeOut for client")
// revert state changes becasue of client fail
}()
// set reply
// update some states
return nil
}
In the above snippet for each request from a worker (or a client) the handler in the server-side starts a timer and a goroutine. The goroutine reverts the changes done by the handler function before sending a reply to the client.
Is there any way of creating a "set of timers" and blocking wait on the "set of timers" ? Further, whenever a timer expires the blocking wait wakes up and provides us with the timer handles. Depending on the timer type we can perform different expiry handler functions in the runtime.
I am trying to implement a similar mechanism in Go that we can implement in C++ with timerfd with epoll.
Full code for the sample implementation of timers in Go. server.go and client.go.
I suggest you to explored the context package
it can be be done like this:
func main() {
c := context.Background()
wg := &sync.WaitGroup{}
f(c, wg)
wg.Wait()
}
func f(c context.Context, wg *sync.WaitGroup) {
c, _ = context.WithTimeout(c, 3*time.Second)
wg.Add(1)
go func(c context.Context) {
defer wg.Done()
select {
case <-c.Done():
fmt.Println("f() Done:", c.Err())
return
case r := <-time.After(5 * time.Second):
fmt.Println("f():", r)
}
}(c)
}
basically you initiate a base context and then derive other contexts from it, when a context is terminated, either by passing the time or a call to its close, it closes its Done channel and the Done channel of all the contexts that are derived from it.
I'm trying to develop a simple API for Slack and I want to return something to the user right away to avoid the 3000 ms timeout.
Here are my questions:
Why the This should be printed to Slack first doesn't get printed right away, instead I only got the last message which is The long and blocking process completed? But it appears in ngrok log though.
Why is my function still reaching the 3000 ms limit even though I'm already using a go routine? Is it because of the done channel?
func testFunc(w http.ResponseWriter, r *http.Request) {
// Return to the user ASAP to avoid 3000ms timeout.
// But this doesn't work. Nothing is returned but
// the message appeared in ngrok log.
fmt.Fprintln(w, "This should be printed to Slack first!")
// Get the response URL.
r.ParseForm()
responseURL := r.FormValue("response_url")
done := make(chan bool)
go func() {
fmt.Println("Warning! This is a long and blocking process.")
time.Sleep(5 * time.Second)
done <- true
}()
// This works! I received this message. But I still reached the 3000ms timeout.
func(d bool) {
if d == true {
payload := map[string]string{"text": "The long and blocking process completed!"}
j, err := json.Marshal(payload)
if err != nil {
w.WriteHeader(http.StatusInternalServerError)
}
http.Post(responseURL, "application/json", bytes.NewBuffer(j))
}
}(<-done)
}
http.ResponseWriter streams are buffered by default. If you want data to be sent to a client in realtime (e.g. HTTP SSE), you need to flush the stream after each 'event':
wf, ok := w.(http.Flusher)
if !ok {
http.Error(w, "Streaming unsupported!", http.StatusInternalServerError)
return
}
fmt.Fprintln(w, "This should be printed to Slack first!")
wf.Flush()
Flushing is expensive - so take advantage of go's buffering. There will always be an implicit flush once your handler finally exits (hence why you saw your output 'late').
I am learning stage of golang. Here is my understanding:
1. any operations with channel is blocking
2. You are writing on the channel in the
go func() {
fmt.Println("Warning! This is a long and blocking process.")
time.Sleep(5 * time.Second)
done <- true
}()
The scheduler is still moving in the main function and trying to read from the channel but it is waiting to see, something has written on the channel or not. So it got blocked, when the above function is done with writing on the channel, controls came back and main start executing again.
Note: Experts will be able to explain better.
I'm trying to make a web scraper, which can run a decent number (many thousands) of http queries per minute. The actual querying is fine but to speed up the process. I'm trying to make it concurrent. Initially I spawned a goroutine for each request but I ran out of file descriptors so after some googling I decided to use a semaphore to limit the number of concurrent goroutines.
Only I can't get this to work.
I've tried moving bits of code around but I always have the same issue: I have roughly three times as many goroutines running as I want
This is the only method I have that spawns goroutines. I limited the goroutines to 80. In my benchmarks I run this against a slice of 10000 URLs and it tends to hover at about 242 concurrent goroutines in flight, but then it suddenly goes up to almost double this and then back down to 242.
I get the same behaviour if I change the concurrent value from 80 - it usually hovers at just over three times the number of goroutines and sometimes spikes to around double that and I have no idea why.
func (B BrandScraper) ScrapeUrls(URLs ...string) []scrapeResponse {
concurrent := 80
semaphoreChan := make(chan struct{}, concurrent)
scrapeResults := make([]scrapeResponse, len(URLs))
for _, URL := range URLs {
semaphoreChan <- struct{}{}
go func(URL string) {
defer func() {
<-semaphoreChan
}()
scrapeResults = append(scrapeResults,
B.getIndividualScrape(URL))
fmt.Printf("#goroutines: %d\n", runtime.NumGoroutine())
}(URL)
}
return scrapeResults
}
I'm expecting it to be constantly at 80 goroutines - or at least constant.
This happens when I run it from a benchmarking test or when i run it from the main function.
Thanks very much for any tips!
EDIT
getIndividualScrape
calls another function:
func (B BrandScraper) doGetRequest(URL string) io.Reader {
resp, err := http.Get(URL)
if err != nil {
log.Fatal(err)
}
body, _ := ioutil.ReadAll(resp.Body)
resp.Body.Close()
return bytes.NewReader(body)
}
which obviously does an HTTP request. Could this be leaking goroutines? I thought since I'd closed the resp.Body I'd have covered that but maybe not?
I'm trying to find a good method to consume asynchronously from an input queue, process the content using several workers and then publish to an output queue. So far I've tried a number of examples, most recently using the code from here and here as inspiration.
My current code doesn't appear to be doing what it should be however, increasing the number of workers doesn't increase performance (msg/s consumed or published) and the number of goroutines remains fairly static whilst running.
main:
func main() {
maxWorkers := 10
// channel for jobs
in := make(chan []byte)
out := make(chan []byte)
// start workers
wg := &sync.WaitGroup{}
wg.Add(maxWorkers)
for i := 1; i <= maxWorkers; i++ {
log.Println(i)
defer wg.Done()
go processor(in, out)
}
// add jobs
go collector(in)
go sender(out)
// wait for workers to complete
wg.Wait()
}
The collector is basically the example from the RabbitMQ site with a goroutine that collects messages from the queue and places them on the 'in' channel:
forever := make(chan bool)
go func() {
for d := range msgs {
in <- d.Body
d.Ack(false)
}
}()
log.Printf("[*] Waiting for messages. To exit press CTRL+C")
<-forever
The processor receives an 'in' and 'out' channel, unmarshals JSON, performs a series of regexes and then places the output into the 'out' channel:
func processor(in chan []byte, out chan []byte) {
var (
// list of regexes declared here
)
for {
body := <-in
jsonIn := &Data{}
err := json.Unmarshal(body, jsonIn)
if err != nil {
log.Fatalln("Failed to decode:", err)
}
content := jsonIn.Content
//process regexes using:
//jsonIn.a = r1.FindAllString(content, -1)
jsonOut, _ := json.Marshal(jsonIn)
out <- jsonOut
}
}
And finally the sender is simply the code from the RabbitMQ site, setting up a connection, reading from the 'out' channel and then publishing to a RMQ queue:
for {
jsonOut := <-out
err = ch.Publish(
"", // exchange
q.Name, // routing key
false, // mandatory
false,
amqp.Publishing{
DeliveryMode: amqp.Persistent,
ContentType: "text/json",
Body: []byte(jsonOut),
})
failOnError(err, "Failed to publish a message")
}
This is a pattern that I'll be using quite a lot, so I'm spending a lot of time trying to find something that works correctly (and well) - any advice or help would be appreciated (and in case it isn't obvious, I'm new to Go).
There are a couple of things that jump out:
Done within main function
wg.Add(maxWorkers)
for i := 1; i <= maxWorkers; i++ {
log.Println(i)
defer wg.Done()
go processor(in, out)
}
The defer here is executed when main returns so it's not actually indicating when processing is complete. I don't think this'll have an effect on the performance profile of your program though.
To address this you could pass in wg *sync.WaitGroup to your processor so your processor can indicate when it's done.
CPU Bound Processing
Parsing messages and performing Regex is a cpu intensive workload. How many cores is your machine? How is throughput affected if you run your program on two separate machines, does throughput 2x? What if you double your amount of cores? What about running your program with 1 worker vs 2 processor workers? does that double throughput? Are you maxing out your rabbitmq local instance? is it the bottleneck??
Setting up benchmarking and load testing harnesses should allow you to setup experiments to see where your bottle necks are :)
For queue based services it's pretty easy to setup a test harness to fill rabbitmq with a set backlog and benchmark how fast you can process those messages, or to setup a load generator to send x messages/second to rabbitmq and observe if you can keep up.
Does rabbitmq have good visibility into message processing throughput? If not I frequently add a counter to go code and then log the overall averaged throughput on an interval to get a rough idea of performance:
start := time.Now()
updateInterval := time.Tick(1 * time.Second)
numIn := 0
for {
select {
case <-updateInterval:
log.Infof("IN - Count: %d", numIn)
log.Infof("IN - Througput: %.0f events/second",
float64(numIn)/(time.Now().Sub(start)).Seconds())
case e := <-msgs:
numIn++
in <- d.Body
d.Ack(false)
}
}