goroutine split strategy - go

In a real-time project, I have a standard event based front-end service received messages from and message queue broker. This service is designed to provide few others with relevant information.
Basically, this service loops on reception method, unmarshal the packet (protobuf for instance), update/control few parameters, marshal into another format (JSON for instance), then push to the next service.
The questions is: how to fire goroutine for most efficient overal bandwidth, with priority to incoming data?
Today, my point of view is that the most consuming operation is the unmarshalling/marshalling process. Thus, I would fire goroutine like this, after the reception of any events (that doesn't require ACK):
[...]
var rcvBuffer []byte
for {
err := evt.Receive(ctx, rcvBuffer)
go convertAndPush(rcvBuffer)
}
[...]
func convertAndPush(rcvBuffer []byte) {
// unmarshall rcvBuffer
// control
// marshall to JSONpack
JSONpack := json.Marshal(rcvBuffer)
// fan out to another goroutine communicating with other services...
pushch <- JSONpack
}
My focus is on receiving packets, not blocking CPU because of IO requests (either in- or out- ones), and not blocking IO because of CPU most costly operations of my application.
Is that way to design the application correct? What could I be missing?
btw, Message reception function doesn't trigger goroutine.

+1 with Paul Hankin's approach.
In my experiences it's also best practice to bound concurrency of convertAndPush. This can trivially be done using a semaphore or channels and a worker pool pattern. This also provides a variable to toggle during load tests (worker pool size).
If the goal is to receive packets as fast as possible a channel/worker pool pattern might be better because it can be configured with a channel size to buffer requests and keep the read path low throughput. Compared to a semaphore which would block the read path when it is at capacity.

Related

Create channels with extra flags in an idiomatic way

TL;DR I want to have the functionality where a channel has two extra fields that tell the producer whether it is allowed to send to the channel and if so tell the producer what value the consumer expects. Although I know how to do it with shared memory, I believe that this approach goes against Go's ideology of "Do not communicate by sharing memory; instead, share memory by communicating."
Context:
I wish to have a server S that runs (besides others) three goroutines:
Listener that just receives UDP packets and sends them to the demultplexer.
Demultiplexer that takes network packets and based on some data sends it into one of several channels
Processing task which listens to one specific channel and processes data received on that channel.
To check whether some devices on the network are still alive, the processing task will periodically send out nonces over the network and then wait for k seconds. In those k seconds, other participants of my protocol that received the nonce will send a reply containing (besides other information) the nonce. The demultiplexer will receive the packets from the listener, parse them and send them to the processing_channel. After the k seconds elapsed, the processing task processes the messages pushed onto the processing_channel by the demultiplexer.
I want the demultiplexer to not just blindly send any response (of the correct type) it received onto the the processing_channel, but to instead check whether the processing task is currently even expecting any messages and if so which nonce value it expects. I made this design decision in order to drop unwanted packets a soon as possible.
My approach:
In other languages, I would have a class with the following fields (in pseudocode):
class ActivatedChannel{
boolean flag_expecting_nonce;
int expected_nonce;
LinkedList chan;
}
The demultiplexer would then upon receiving a packet of the correct type simply acquire the lock for the ActivatedChannel processing_channel object, check whether the flag is set and the nonce matches, and if so add the message to the LinkedList chan!
Problem:
This approach makes use of locks and shared memory, which does not align with Golang's "Do not communicate by sharing memory; instead, share memory by communicating" mantra. Hence, I would like to know... :
... whether my approach is "bad" regarding Go in the sense that it relies on shared memory.
... how to achieve the outlined result in a more Go-like way.
Yes, the approach described by you doesn't align with Golang's Idiomatic way of implementation. And you have rightly pointed out that in the above approach you are communicating by sharing memory.
To achieve this in Go's Idiomatic way, one of the approaches could be that your Demultiplexer "remembers" all the processing_channels that are expecting nonce and the corresponding type of the nonce. Whenever a processing_channels is ready to receive a reply, it sends a signal to the Demultiplexe saying that it is expecting a reply.
Since Demultiplexer is at the center of all the communication it can maintain a mapping between a processing_channel & the corresponding nonce it expects. It can also maintain a "registry" of all the processing_channels which are expecting a reply.
In this approach, we are Sharing memory by communicating
For communicating that a processing_channel is expecting a reply, the following struct can be used:
type ChannelState struct {
ChannelId string // unique identifier for processing channel
IsExpectingNonce bool
ExpectedNonce int
}
In this approach, there is no lock used.

How to implement a channel and multiple readers that read the same data at the same time?

I need several functions to have the same channel as a parameter and take the same data, simultaneously.
Each of these functions has an independent task from each other, but they start from the same value.
For example, given a slice of integers, one function calculates the sum of its values ​​and another calculates the average, at the same time. They would be goroutines.
One solution would be to create multiple channels from one value, but I want to avoid that. I might have to add or remove functions and for this, I would have to add or remove channels.
I think I understand that the Fan Out pattern could be an option, but I can't quite understand its implementation.
The question is against the rules of SO—as it does not present any concrete problem to be helped with but rather requests a tutoring session.
Anyway, two pointers for further research: basically—given the property of channel that each receive consumes a value sent to it, so it's impossible to read a once sent value multiple times,—such problems have two approaches to their solutions.
The first approach, which is what called a "fan-out", is to have all the consumers have a "personal" dedicated channel, copy the value to be broadcast as many times as there are consumers and send each copy to each of those dedicated channels.
The ostensibly most natural way to implement this is to have a single channel to which the producer sends its units of work—not caring how much consumers are to read them—and then have a dedicated goroutine receive those units of work, copy each of them and send the copies out to the dedicated channels of the consumers.
The second approach is to go lower level and implement basically the same scheme using stuff from the sync package.
One can think of the following scheme:
Have a custom struct type which has a sync.Mutex protecting the type's state.
Have a field which keeps the value multiple consumers have to read.
Have a counter in that type.
Have a sync.Cond in that type as well.
Have a channel with capacity there 1 as well.
Communicating a new value to the consumers looks like this:
Lock the mutex.
Verify the counter is 0, panic otherwise.
Write the new value into the respective field.
Set the counter to the number of consumers.
Unlock the mutex.
Pulse the sync.Cond.
The consumers are supposed to sleep in a wait call on that sync.Cond.
Once the sender pulses it, the goroutines running the code of consumers get woken up and try to read the value.
Reading of the value rolls like this:
Lock the mutex.
Verify the counter is greater than zero, panic otherwise.
Read the value.
Decrement the counter by one.
If the counter becomes 0, send on that special channel.
Unlock the mutex.
The channel is needed to communicate to the sender that all the consumers are done with their reads: before attempting to send the new value the consumer has to read from that channel.
As you can probably see, the second approach is way more involved and hard to get right, so I'd recommend to go with the first one.
I would also note that you seem to lack certain background knowledge on how to go around implementing concurrently running and communicating tasks.
I hereby recommend reading The Book and at least these chapters of The Blog:
Go Concurrency Patterns: Pipelines and cancellation.
Go Concurrency Patterns: Timing out, moving on
Advanced Go Concurrency Patterns

Is there a better way limit requests at the "door"?

Right now I'm testing an extremely simple Semaphore in one of my production regions in AWS. On deployment the latency jumped from 150ms to 300ms. I assumed latency would occur, but if it could be dropped that would be great. This is a bit new to me so I'm experimenting. I've set the semaphore to allow 10000 connections. That's the same number as the maximum number of connections Redis is set to. Is the code below optimal? If not can someone help me optimize it, if I doing something wrong etc. I want to keep this as a piece of middleware so that I can simply call it like this in on the server n.UseHandler(wrappers.DoorMan(wrappers.DefaultHeaders(myRouter), 10000)).
package wrappers
import "net/http"
// DoorMan limit requests
func DoorMan(h http.Handler, n int) http.Handler {
sema := make(chan struct{}, n)
return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
sema <- struct{}{}
defer func() { <-sema }()
h.ServeHTTP(w, r)
})
}
The solution you outline has some issues. But first, let's take a small step back; there are two questions in this, one of them implied:
How do you rate limit inbound connections efficiently?
How do you prevent overloading a backend service with outbound connections?
What it sounds like you want to do is actually the second, to prevent too many requests from hitting Redis. I'll start by addressing the first one and then make some comments on the second.
Rate limiting inbound connections
If you really do want to rate limit inbound connections "at the door", you should normally never do that by waiting inside the handler. With your proposed solution, the service will keep accepting requests, which will queue up at the sema <- struct{}{} statement. If the load persists, it will eventually take down your service, either by running out of sockets, memory, or some other resource. Also note that if your request rate is approaching saturation of the semaphore, you would see an increase in latency caused by goroutines waiting at the semaphore before handling the request.
A better way to do it is to always respond as quickly as possible (especially when under heavy load). This can be done by sending a 503 Service Unavailable back to the client, or a smart load balancer, telling it to back off.
In your case, it could for example look like something along these lines:
select {
case sema <- struct{}{}:
defer func() { <-sema }()
h.ServeHTTP(w, r)
default:
http.Error(w, "Overloaded", http.StatusServiceUnavailable)
}
Rate limiting outbound connections to a backend service
If the reason for the rate limit is to avoid overloading a backend service, what you typically want to do is rather to react to that service being overloaded and apply back pressure through the request chain.
In practical terms, this could mean something as simple as putting the same kind of semaphore logic as above in a wrapper protecting all calls to the backend, and return an error through your call chain of a request if the semaphore overflows.
Additionally, if the backend sends status codes like 503 (or equivalent), you should typically propagate that indication downwards in the same way, or resort to some other fallback behaviour for handling the incoming request.
You might also want to consider combining this with a circuit breaker, cutting off attempts to call the backend service quickly if it seems to be unresponsive or down.
Rate limiting by capping the number of concurrent or queued connection as above is usually a good way to handle overload. When the backend service is overloaded, requests will typically take longer, which will then reduce the effective number of requests per second. However, if, for some reason, you want to have a fixed limit on number of requests per second, you could do that with a rate.Limiter instead of a semaphore.
A comment on performance
The cost of sending and receiving trivial objects on a channel should be sub-microsecond. Even on a highly congested channel, it wouldn't be anywhere near 150 ms of additional latency only to synchronise with the channel. So, assuming the work done in the handler is otherwise the same, whatever your latency increase comes from it should almost certainly be associated with goroutines waiting somewhere (e.g. on I/O or to get access to synchronised regions that are blocked by other goroutines).
If you are getting incoming requests at a rate close to what can be handled with your set concurrency limit of 10000, or if you are getting spikes of requests, it is possible you would see such an increase in average latency stemming from goroutines in the wait queue on the channel.
Either way, this should be easily measurable; you could for example trace timestamps at certain points in the handling pathway. I would do this on a sample (e.g. 0.1%) of all requests to avoid having the log output affect the performance.
I'd use a slightly different mechanism for this, probably a worker pool as described here:
https://gobyexample.com/worker-pools
I'd actually say keep 10000 goroutines running, (they'll be sleeping waiting to receive on a blocking channel, so it's not really a waste of resources), and send the request+response to the pool as they come in.
If you want a timeout that responds with an error when the pool is full you could implement that with a select block as well.

Spring Integration message processing partitioned by header information

I want to be able to process messages with Spring Integration in parallel. The messages come from multiple devices and we need to process messages from the same device in sequential order but the devices can be processed in multiple threads. There can be thousands of devices so I'm trying to figure out how to assign processor based on mod of the device ID using Spring Integration's semantics as much as possible. What approach should I be looking at?
It's difficult to generalize without knowing other requirements (transaction semantics etc) but probably the simplest approach would be a router sending messages to a number of QueueChannels using some kind of hash algorithm on the device id (so all messages for a particular device go to the same channel).
Then, have a single-threaded poller pulling messages from each queue.
EDIT: (response to comment)
Again, difficult to generalize, but...
See AbstractMessageRouter.determineTargetChannels() - a router actually returns a physical channel object (actually a list, but in most cases a list of 1). So, yes, you can create the QueueChannels programmatically and have the router return the appropriate one, based on the message.
Assuming you want all the messages to then be handled by the same downstream flow, you would also need to create a <bridge/> for each queue channel to bridge it to the input channel of the next component in the flow.
create a QueueChannel
create a BridgeHandler (set the outputChannel to the input channel of the next component)
create a PollingConsumer (constructor takes the channel and handler; set trigger etc)
start() the consumer.
All of this can be done in your custom router initialization and implement determineTargetChannels() to select the queue.
Depending on the processing time for your events, I would generally recommend running the downstream flow on the poller thread rather than setting a taskExecutor to avoid issues with the next poll trying to schedule another task before this one's done. You might need to increase the default taskScheduler's pool size.

Are there disadvantages of using channel.Get() over channel.Consume()?

I'm using streadway's amqp library to connect with a rabbitmq server.
The library provides a channel.Consume() function which returns a "<- chan Delivery".
It also provides a channel.Get() function which returns a "Delivery" among other things.
I've to implement a pop() functionality, and I'm using channel.Get(). However, the documentation says:
"In almost all cases, using Channel.Consume will be preferred."
Does the preferred here means recommended? Are there any disadvantages of using channel.Get() over channel.Consume()? If yes, how do I use channel.Consume() to implement a Pop() function?
As far as I can tell from the docs, yes, "preferred" does mean "recommended".
It seems that channel.Get() doesn't provide as many features as channel.Consume(), as well as being more readily usable in concurrent code due to it's returning a chan of Delivery, as opposed to each individual Delivery separately.
The extra features mentioned are exclusive, noLocal and noWait, as well as an optional Table of args "that have specific semantics for the queue or server."
To implement a Pop() function using channel.Consume() you could, to link to some code fragments from the amqp example consumer, create a channel using the Consume() function, create a function to handle the chan of Delivery which will actually implement your Pop() functionality, then fire off the handle() func in a goroutine.
The key to this is that the channel (in the linked example) will block on sending if nothing is receiving. In the example, the handle() func uses range to process the entire channel until it's empty. Your Pop() functionality may be better served by a function that just receives the last value from the chan and returns it. Every time it's run it will return the latest Delivery.
EDIT: Example function to receive the latest value from the channel and do stuff with it (This may not work for your use case, it may be more useful if the function sent the Delivery on another chan to another function to be processed. Also, I haven't tested the code below, it may be full of errors)
func handle(deliveries <-chan amqp.Delivery, done chan error) {
select {
case d = <-deliveries:
// Do stuff with the delivery
// Send any errors down the done chan. for example:
// done <- err
default:
done <- nil
}
}
It really depend of what are you trying to do. If you want to get only one message from queue (first one) you probably should use basic.get, if you are planning to process all incoming messages from queue - basic.consume is what you want.
Probably, it is not platform or library specific question but rather protocol understanding question.
UPD
I'm not familiar with it go language well, so I will try to give you some brief on AMQP details and describe use cases.
You may get in troubles and have an overhead with basic.consume sometimes:
With basic.consume you have such workflow:
Send basic.consume method to notify broker that you want to receive messages
while this is a synchronous method, wait for basic.consume-ok message from broker
Start listening to basic.deliver message from server
this is an asynchronous method and you should take care by yourself situations where no messages on server available, e.g. limit reading time
With basic.get you have such workflow:
send synchronous method basic.get to broker
wait for basic.get-ok method, which hold message(s) or basic.empty method, which denote situation no message available on server
Note about synchronous and asynchronous methods: synchronous is expected to have some response, whether asynchronous doesn't
Note on basic.qos method prefetch-count property: it is ignored when no-ack property is set on basic.consume or basic.get.
Spec has a note on basic.get: "this method provides a direct access to the messages in a queue using a synchronous dialogue that is designed for specific types of application where synchronous functionality is more important than performance" which applies for continuous messages consumption.
My personal tests show that getting in row 1000 messages with basic.get (0.38659715652466) is faster than getting 1000 messages with basic.consume one by one (0.47398710250854) on RabbitMQ 3.0.1, Erlang R14B04 in average more than 15%.
If consume only one message in main thread is your case - probably you have to use basic.get.
You still can consume only one message asynchronously, for example in separate thread or use some event mechanism. It would be better solution for you machine resource sometimes, but you have to take care about situation where no message available in queue.
If you have to process message one by one it is obvious that basic.consume should be used, I think

Resources