Right now I'm testing an extremely simple Semaphore in one of my production regions in AWS. On deployment the latency jumped from 150ms to 300ms. I assumed latency would occur, but if it could be dropped that would be great. This is a bit new to me so I'm experimenting. I've set the semaphore to allow 10000 connections. That's the same number as the maximum number of connections Redis is set to. Is the code below optimal? If not can someone help me optimize it, if I doing something wrong etc. I want to keep this as a piece of middleware so that I can simply call it like this in on the server n.UseHandler(wrappers.DoorMan(wrappers.DefaultHeaders(myRouter), 10000)).
package wrappers
import "net/http"
// DoorMan limit requests
func DoorMan(h http.Handler, n int) http.Handler {
sema := make(chan struct{}, n)
return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
sema <- struct{}{}
defer func() { <-sema }()
h.ServeHTTP(w, r)
})
}
The solution you outline has some issues. But first, let's take a small step back; there are two questions in this, one of them implied:
How do you rate limit inbound connections efficiently?
How do you prevent overloading a backend service with outbound connections?
What it sounds like you want to do is actually the second, to prevent too many requests from hitting Redis. I'll start by addressing the first one and then make some comments on the second.
Rate limiting inbound connections
If you really do want to rate limit inbound connections "at the door", you should normally never do that by waiting inside the handler. With your proposed solution, the service will keep accepting requests, which will queue up at the sema <- struct{}{} statement. If the load persists, it will eventually take down your service, either by running out of sockets, memory, or some other resource. Also note that if your request rate is approaching saturation of the semaphore, you would see an increase in latency caused by goroutines waiting at the semaphore before handling the request.
A better way to do it is to always respond as quickly as possible (especially when under heavy load). This can be done by sending a 503 Service Unavailable back to the client, or a smart load balancer, telling it to back off.
In your case, it could for example look like something along these lines:
select {
case sema <- struct{}{}:
defer func() { <-sema }()
h.ServeHTTP(w, r)
default:
http.Error(w, "Overloaded", http.StatusServiceUnavailable)
}
Rate limiting outbound connections to a backend service
If the reason for the rate limit is to avoid overloading a backend service, what you typically want to do is rather to react to that service being overloaded and apply back pressure through the request chain.
In practical terms, this could mean something as simple as putting the same kind of semaphore logic as above in a wrapper protecting all calls to the backend, and return an error through your call chain of a request if the semaphore overflows.
Additionally, if the backend sends status codes like 503 (or equivalent), you should typically propagate that indication downwards in the same way, or resort to some other fallback behaviour for handling the incoming request.
You might also want to consider combining this with a circuit breaker, cutting off attempts to call the backend service quickly if it seems to be unresponsive or down.
Rate limiting by capping the number of concurrent or queued connection as above is usually a good way to handle overload. When the backend service is overloaded, requests will typically take longer, which will then reduce the effective number of requests per second. However, if, for some reason, you want to have a fixed limit on number of requests per second, you could do that with a rate.Limiter instead of a semaphore.
A comment on performance
The cost of sending and receiving trivial objects on a channel should be sub-microsecond. Even on a highly congested channel, it wouldn't be anywhere near 150 ms of additional latency only to synchronise with the channel. So, assuming the work done in the handler is otherwise the same, whatever your latency increase comes from it should almost certainly be associated with goroutines waiting somewhere (e.g. on I/O or to get access to synchronised regions that are blocked by other goroutines).
If you are getting incoming requests at a rate close to what can be handled with your set concurrency limit of 10000, or if you are getting spikes of requests, it is possible you would see such an increase in average latency stemming from goroutines in the wait queue on the channel.
Either way, this should be easily measurable; you could for example trace timestamps at certain points in the handling pathway. I would do this on a sample (e.g. 0.1%) of all requests to avoid having the log output affect the performance.
I'd use a slightly different mechanism for this, probably a worker pool as described here:
https://gobyexample.com/worker-pools
I'd actually say keep 10000 goroutines running, (they'll be sleeping waiting to receive on a blocking channel, so it's not really a waste of resources), and send the request+response to the pool as they come in.
If you want a timeout that responds with an error when the pool is full you could implement that with a select block as well.
Related
I currently am doing something like this
watch, err := s.clientset.CoreV1().Pods("").Watch(context.TODO(), metav1.ListOptions{
FieldSelector: fmt.Sprintf("spec.myfoo=%s", s.foo),
})
for event := range watch.ResultChan() {
.......
}
I am curious if I have something similar in two different go routines will both of the watches get the same events or if both routines might get different events. Based on who got it first?
Watch internally establishes a long poll connection with the API server. After establishing a connection, the API server will send a batch of initial events and any subsequent changes. Once a timeout has occurred, the connection will be dropped.
Since your scenario involves two go routines, we cannot guarantee that both will start executing simultaneously and that both long poll connections will be established simultaneously. Furthermore, the connection may drop at some point.
In a large cluster, pods are constantly being killed and created. Thus, it is certainly possible for two go routines to receive different events.
I have a requirements where I need to do multiple things (irrelevant here) at some regular intervals. I achieved it using the code block mentioned below -
func (processor *Processor) process() {
defaultTicker := time.NewTicker(time.Second*2)
updateTicker := time.NewTicker(time.Second*5)
heartbeatTicker := time.NewTicker(time.Second*5)
timeoutTicker := time.NewTicker(30*time.Second)
refreshTicker := time.NewTicker(2*time.Minute)
defer func() {
logger.Info("processor for ", processor.id, " exited")
defaultTicker.Stop()
timeoutTicker.Stop()
updateTicker.Stop()
refreshTicker.Stop()
heartbeatTicker.Stop()
}()
for {
select {
case <-defaultTicker.C:
// spawn some go routines
case <-updateTicker.C:
// do something
case <-timeoutTicker.C:
// do something else
case <-refreshTicker.C:
// log
case <-heartbeatTicker.C:
// push metrics to redis
}
}
}
But I noticed that every once in a while, my for select loop gets stuck somewhere and I cannot seem to find where or why. By stuck I mean I stop receiving refresh ticker logs. But it starts working again normally in some time (5-10 mins)
I have made sure that all operations within each ticker completes within very little amount of time (~0ms, checked by putting logs).
My questions:
Is using multiple tickers in single select a good/normal practice (honestly I did not find many examples using multiple tickers online)
Anyone aware of any known issues/pitfalls where tickers can block the loop for longer duration.
Any help is appreciated. Thanks
Go does not provide any smart draining behavior for multiple channels, e.g., that older messages in one channel would get processed earlier than more recent messages in other channels. Anytime the loop enters the select statement a random channel is chosen.
Also see this answer and read the part about GOMAXPROCS=1. This could be related to your issue. The issue could also be in your logging package. Maybe the logs are just delayed.
In general, I think the issue must be in your case statements. Either you have a blocking function or some dysfunctional code. (Note: confirmed by the OP)
But to answer your questions:
1. Is using multiple tickers in single select a good/normal practice?
It is common to read from multiple channels randomly in a blocking way, one message at a time, e.g., to sort incoming data from multiple channels into a slice or map and avoid concurrent data access.
It is also common to add one or more tickers, e.g., to flush data and for logging or reporting. Usually the non-ticker code paths will do most of the work.
In your case, you use tickers that will run code paths that should block each other, which is a very specific use case but may be required in some scenarios. This uncommon, but not bad practice I think.
As the commenters suggested, you could also schedule different recurring tasks in separate goroutines.
2. Is anyone aware of any known issues/pitfalls where tickers can block the loop for longer duration?
The tickers themselves will not block the loop in any hidden way. The fastest ticker will always ensure the loop is looping at the speed of this ticker at least.
Note that the docs of time.NewTicker say:
The ticker will adjust the time interval or drop ticks to make up for
slow receivers
This just means, internally no new ticks are scheduled until you have consumed the last one from the single-element ticker channel.
In your example, the main pitfall is that any code in the case statements will block and thus delay the other cases.
If this is intended, everything is fine.
There may be other pitfalls if you have Microsecond or Nanosecond tickers where you may see some measurable runtime overhead or if you have hundreds of tickers and case blocks. But then you should have chose another scheduling pattern from the beginning.
In a real-time project, I have a standard event based front-end service received messages from and message queue broker. This service is designed to provide few others with relevant information.
Basically, this service loops on reception method, unmarshal the packet (protobuf for instance), update/control few parameters, marshal into another format (JSON for instance), then push to the next service.
The questions is: how to fire goroutine for most efficient overal bandwidth, with priority to incoming data?
Today, my point of view is that the most consuming operation is the unmarshalling/marshalling process. Thus, I would fire goroutine like this, after the reception of any events (that doesn't require ACK):
[...]
var rcvBuffer []byte
for {
err := evt.Receive(ctx, rcvBuffer)
go convertAndPush(rcvBuffer)
}
[...]
func convertAndPush(rcvBuffer []byte) {
// unmarshall rcvBuffer
// control
// marshall to JSONpack
JSONpack := json.Marshal(rcvBuffer)
// fan out to another goroutine communicating with other services...
pushch <- JSONpack
}
My focus is on receiving packets, not blocking CPU because of IO requests (either in- or out- ones), and not blocking IO because of CPU most costly operations of my application.
Is that way to design the application correct? What could I be missing?
btw, Message reception function doesn't trigger goroutine.
+1 with Paul Hankin's approach.
In my experiences it's also best practice to bound concurrency of convertAndPush. This can trivially be done using a semaphore or channels and a worker pool pattern. This also provides a variable to toggle during load tests (worker pool size).
If the goal is to receive packets as fast as possible a channel/worker pool pattern might be better because it can be configured with a channel size to buffer requests and keep the read path low throughput. Compared to a semaphore which would block the read path when it is at capacity.
I'm using streadway's amqp library to connect with a rabbitmq server.
The library provides a channel.Consume() function which returns a "<- chan Delivery".
It also provides a channel.Get() function which returns a "Delivery" among other things.
I've to implement a pop() functionality, and I'm using channel.Get(). However, the documentation says:
"In almost all cases, using Channel.Consume will be preferred."
Does the preferred here means recommended? Are there any disadvantages of using channel.Get() over channel.Consume()? If yes, how do I use channel.Consume() to implement a Pop() function?
As far as I can tell from the docs, yes, "preferred" does mean "recommended".
It seems that channel.Get() doesn't provide as many features as channel.Consume(), as well as being more readily usable in concurrent code due to it's returning a chan of Delivery, as opposed to each individual Delivery separately.
The extra features mentioned are exclusive, noLocal and noWait, as well as an optional Table of args "that have specific semantics for the queue or server."
To implement a Pop() function using channel.Consume() you could, to link to some code fragments from the amqp example consumer, create a channel using the Consume() function, create a function to handle the chan of Delivery which will actually implement your Pop() functionality, then fire off the handle() func in a goroutine.
The key to this is that the channel (in the linked example) will block on sending if nothing is receiving. In the example, the handle() func uses range to process the entire channel until it's empty. Your Pop() functionality may be better served by a function that just receives the last value from the chan and returns it. Every time it's run it will return the latest Delivery.
EDIT: Example function to receive the latest value from the channel and do stuff with it (This may not work for your use case, it may be more useful if the function sent the Delivery on another chan to another function to be processed. Also, I haven't tested the code below, it may be full of errors)
func handle(deliveries <-chan amqp.Delivery, done chan error) {
select {
case d = <-deliveries:
// Do stuff with the delivery
// Send any errors down the done chan. for example:
// done <- err
default:
done <- nil
}
}
It really depend of what are you trying to do. If you want to get only one message from queue (first one) you probably should use basic.get, if you are planning to process all incoming messages from queue - basic.consume is what you want.
Probably, it is not platform or library specific question but rather protocol understanding question.
UPD
I'm not familiar with it go language well, so I will try to give you some brief on AMQP details and describe use cases.
You may get in troubles and have an overhead with basic.consume sometimes:
With basic.consume you have such workflow:
Send basic.consume method to notify broker that you want to receive messages
while this is a synchronous method, wait for basic.consume-ok message from broker
Start listening to basic.deliver message from server
this is an asynchronous method and you should take care by yourself situations where no messages on server available, e.g. limit reading time
With basic.get you have such workflow:
send synchronous method basic.get to broker
wait for basic.get-ok method, which hold message(s) or basic.empty method, which denote situation no message available on server
Note about synchronous and asynchronous methods: synchronous is expected to have some response, whether asynchronous doesn't
Note on basic.qos method prefetch-count property: it is ignored when no-ack property is set on basic.consume or basic.get.
Spec has a note on basic.get: "this method provides a direct access to the messages in a queue using a synchronous dialogue that is designed for specific types of application where synchronous functionality is more important than performance" which applies for continuous messages consumption.
My personal tests show that getting in row 1000 messages with basic.get (0.38659715652466) is faster than getting 1000 messages with basic.consume one by one (0.47398710250854) on RabbitMQ 3.0.1, Erlang R14B04 in average more than 15%.
If consume only one message in main thread is your case - probably you have to use basic.get.
You still can consume only one message asynchronously, for example in separate thread or use some event mechanism. It would be better solution for you machine resource sometimes, but you have to take care about situation where no message available in queue.
If you have to process message one by one it is obvious that basic.consume should be used, I think
I have created a client/server program, the client starts
an instance of Writer class and the server starts an instance of
Reader class. Writer will then write a DATA_SIZE bytes of data
asynchronously to the Reader every USLEEP mili seconds.
Every successive async_write request by the Writer is done
only if the "on write" handler from the previous request had
been called.
The problem is, If the Writer (client) is writing more data into the
socket than the Reader (server) is capable of receiving this seems
to be the behaviour:
Writer will start writing into (I think) system buffer and even
though the data had not yet been received by the Reader it will be
calling the "on write" handler without an error.
When the buffer is full, boost::asio won't fire the "on write"
handler anymore, untill the buffer gets smaller.
In the meanwhile, the Reader is still receiving small chunks
of data.
The fact that the Reader keeps receiving bytes after I close
the Writer program seems to prove this theory correct.
What I need to achieve is to prevent this buffering because the
data need to be "real time" (as much as possible).
I'm guessing I need to use some combination of the socket options that
asio offers, like the no_delay or send_buffer_size, but I'm just guessing
here as I haven't had success experimenting with these.
I think that the first solution that one can think of is to use
UDP instead of TCP. This will be the case as I'll need to switch to
UDP for other reasons as well in the near future, but I would
first like to find out how to do it with TCP just for the sake
of having it straight in my head in case I'll have a similar
problem some other day in the future.
NOTE1: Before I started experimenting with asynchronous operations in asio library I had implemented this same scenario using threads, locks and asio::sockets and did not experience such buffering at that time. I had to switch to the asynchronous API because asio does not seem to allow timed interruptions of synchronous calls.
NOTE2: Here is a working example that demonstrates the problem: http://pastie.org/3122025
EDIT: I've done one more test, in my NOTE1 I mentioned that when I was using asio::iosockets I did not experience this buffering. So I wanted to be sure and created this test: http://pastie.org/3125452 It turns out that the buffering is there event with asio::iosockets, so there must have been something else that caused it to go smoothly, possibly lower FPS.
TCP/IP is definitely geared for maximizing throughput as intention of most network applications is to transfer data between hosts. In such scenarios it is expected that a transfer of N bytes will take T seconds and clearly it doesn't matter if receiver is a little slow to process data. In fact, as you noticed TCP/IP protocol implements the sliding window which allows the sender to buffer some data so that it is always ready to be sent but leaves the ultimate throttling control up to the receiver. Receiver can go full speed, pace itself or even pause transmission.
If you don't need throughput and instead want to guarantee that the data your sender is transmitting is as close to real time as possible, then what you need is to make sure the sender doesn't write the next packet until he receives an acknowledgement from the receiver that it has processed the previous data packet. So instead of blindly sending packet after packet until you are blocked, define a message structure for control messages to be sent back from the receiver back to the sender.
Obviously with this approach, your trade off is that each sent packet is closer to real-time of the sender but you are limiting how much data you can transfer while slightly increasing total bandwidth used by your protocol (i.e. additional control messages). Also keep in mind that "close to real-time" is relative because you will still face delays in the network as well as ability of the receiver to process data. So you might also take a look at the design constraints of your specific application to determine how "close" do you really need to be.
If you need to be very close, but at the same time you don't care if packets are lost because old packet data is superseded by new data, then UDP/IP might be a better alternative. However, a) if you have reliable deliver requirements, you might ends up reinventing a portion of tcp/ip's wheel and b) keep in mind that certain networks (corporate firewalls) tend to block UDP/IP while allowing TCP/IP traffic and c) even UDP/IP won't be exact real-time.