Create channels with extra flags in an idiomatic way - go

TL;DR I want to have the functionality where a channel has two extra fields that tell the producer whether it is allowed to send to the channel and if so tell the producer what value the consumer expects. Although I know how to do it with shared memory, I believe that this approach goes against Go's ideology of "Do not communicate by sharing memory; instead, share memory by communicating."
Context:
I wish to have a server S that runs (besides others) three goroutines:
Listener that just receives UDP packets and sends them to the demultplexer.
Demultiplexer that takes network packets and based on some data sends it into one of several channels
Processing task which listens to one specific channel and processes data received on that channel.
To check whether some devices on the network are still alive, the processing task will periodically send out nonces over the network and then wait for k seconds. In those k seconds, other participants of my protocol that received the nonce will send a reply containing (besides other information) the nonce. The demultiplexer will receive the packets from the listener, parse them and send them to the processing_channel. After the k seconds elapsed, the processing task processes the messages pushed onto the processing_channel by the demultiplexer.
I want the demultiplexer to not just blindly send any response (of the correct type) it received onto the the processing_channel, but to instead check whether the processing task is currently even expecting any messages and if so which nonce value it expects. I made this design decision in order to drop unwanted packets a soon as possible.
My approach:
In other languages, I would have a class with the following fields (in pseudocode):
class ActivatedChannel{
boolean flag_expecting_nonce;
int expected_nonce;
LinkedList chan;
}
The demultiplexer would then upon receiving a packet of the correct type simply acquire the lock for the ActivatedChannel processing_channel object, check whether the flag is set and the nonce matches, and if so add the message to the LinkedList chan!
Problem:
This approach makes use of locks and shared memory, which does not align with Golang's "Do not communicate by sharing memory; instead, share memory by communicating" mantra. Hence, I would like to know... :
... whether my approach is "bad" regarding Go in the sense that it relies on shared memory.
... how to achieve the outlined result in a more Go-like way.

Yes, the approach described by you doesn't align with Golang's Idiomatic way of implementation. And you have rightly pointed out that in the above approach you are communicating by sharing memory.
To achieve this in Go's Idiomatic way, one of the approaches could be that your Demultiplexer "remembers" all the processing_channels that are expecting nonce and the corresponding type of the nonce. Whenever a processing_channels is ready to receive a reply, it sends a signal to the Demultiplexe saying that it is expecting a reply.
Since Demultiplexer is at the center of all the communication it can maintain a mapping between a processing_channel & the corresponding nonce it expects. It can also maintain a "registry" of all the processing_channels which are expecting a reply.
In this approach, we are Sharing memory by communicating
For communicating that a processing_channel is expecting a reply, the following struct can be used:
type ChannelState struct {
ChannelId string // unique identifier for processing channel
IsExpectingNonce bool
ExpectedNonce int
}
In this approach, there is no lock used.

Related

How to implement a channel and multiple readers that read the same data at the same time?

I need several functions to have the same channel as a parameter and take the same data, simultaneously.
Each of these functions has an independent task from each other, but they start from the same value.
For example, given a slice of integers, one function calculates the sum of its values ​​and another calculates the average, at the same time. They would be goroutines.
One solution would be to create multiple channels from one value, but I want to avoid that. I might have to add or remove functions and for this, I would have to add or remove channels.
I think I understand that the Fan Out pattern could be an option, but I can't quite understand its implementation.
The question is against the rules of SO—as it does not present any concrete problem to be helped with but rather requests a tutoring session.
Anyway, two pointers for further research: basically—given the property of channel that each receive consumes a value sent to it, so it's impossible to read a once sent value multiple times,—such problems have two approaches to their solutions.
The first approach, which is what called a "fan-out", is to have all the consumers have a "personal" dedicated channel, copy the value to be broadcast as many times as there are consumers and send each copy to each of those dedicated channels.
The ostensibly most natural way to implement this is to have a single channel to which the producer sends its units of work—not caring how much consumers are to read them—and then have a dedicated goroutine receive those units of work, copy each of them and send the copies out to the dedicated channels of the consumers.
The second approach is to go lower level and implement basically the same scheme using stuff from the sync package.
One can think of the following scheme:
Have a custom struct type which has a sync.Mutex protecting the type's state.
Have a field which keeps the value multiple consumers have to read.
Have a counter in that type.
Have a sync.Cond in that type as well.
Have a channel with capacity there 1 as well.
Communicating a new value to the consumers looks like this:
Lock the mutex.
Verify the counter is 0, panic otherwise.
Write the new value into the respective field.
Set the counter to the number of consumers.
Unlock the mutex.
Pulse the sync.Cond.
The consumers are supposed to sleep in a wait call on that sync.Cond.
Once the sender pulses it, the goroutines running the code of consumers get woken up and try to read the value.
Reading of the value rolls like this:
Lock the mutex.
Verify the counter is greater than zero, panic otherwise.
Read the value.
Decrement the counter by one.
If the counter becomes 0, send on that special channel.
Unlock the mutex.
The channel is needed to communicate to the sender that all the consumers are done with their reads: before attempting to send the new value the consumer has to read from that channel.
As you can probably see, the second approach is way more involved and hard to get right, so I'd recommend to go with the first one.
I would also note that you seem to lack certain background knowledge on how to go around implementing concurrently running and communicating tasks.
I hereby recommend reading The Book and at least these chapters of The Blog:
Go Concurrency Patterns: Pipelines and cancellation.
Go Concurrency Patterns: Timing out, moving on
Advanced Go Concurrency Patterns

goroutine split strategy

In a real-time project, I have a standard event based front-end service received messages from and message queue broker. This service is designed to provide few others with relevant information.
Basically, this service loops on reception method, unmarshal the packet (protobuf for instance), update/control few parameters, marshal into another format (JSON for instance), then push to the next service.
The questions is: how to fire goroutine for most efficient overal bandwidth, with priority to incoming data?
Today, my point of view is that the most consuming operation is the unmarshalling/marshalling process. Thus, I would fire goroutine like this, after the reception of any events (that doesn't require ACK):
[...]
var rcvBuffer []byte
for {
err := evt.Receive(ctx, rcvBuffer)
go convertAndPush(rcvBuffer)
}
[...]
func convertAndPush(rcvBuffer []byte) {
// unmarshall rcvBuffer
// control
// marshall to JSONpack
JSONpack := json.Marshal(rcvBuffer)
// fan out to another goroutine communicating with other services...
pushch <- JSONpack
}
My focus is on receiving packets, not blocking CPU because of IO requests (either in- or out- ones), and not blocking IO because of CPU most costly operations of my application.
Is that way to design the application correct? What could I be missing?
btw, Message reception function doesn't trigger goroutine.
+1 with Paul Hankin's approach.
In my experiences it's also best practice to bound concurrency of convertAndPush. This can trivially be done using a semaphore or channels and a worker pool pattern. This also provides a variable to toggle during load tests (worker pool size).
If the goal is to receive packets as fast as possible a channel/worker pool pattern might be better because it can be configured with a channel size to buffer requests and keep the read path low throughput. Compared to a semaphore which would block the read path when it is at capacity.

C++ IRC Client design

I'm attempting to write an RFC 2812 compliant C++ IRC library.
I am having some trouble with the design of the client itself.
From what I have read IRC communication tends to be asynchronous.
I am using boost::asio::async_read and boost::asio::async_write.
From reading the documentation I have gathered that you cannot perform multiple async_write requests before one is completed. You therefore end up with rather nested callbacks. Doesn't this defeat the purpose of doing async calls? Wouldn't it just be better to use synchronous calls to prevent the nesting? If not, why?
Secondly, if I am not mistaken, each boost::asio::async_write should be followed up by a boost::asio::async_read to receive the server's response to the commands sent. My client's functions, therefore, would need to take a callback parameter so a user of the class may do something after the client receives a response (ex. send another message...).
If I were to continue implementing this with async, should I keep a std::deque<std::tuple<message, callback>> and each time a boost::asio::async_write is finished, and there is a tuple in the queue, dequeue and send the message then raise the callback? Would this be the optimal way to implement this system?
I'm thinking since messages are sent all the time I'm going to have to implement some kind of listener loop that queues up responses, but how would you associate these responses with the specific command that triggered them? Or in the case that the response is just a message to the channel from another user?
The IRC protocol is a full-duplex protocol. As such, one should always be listening to the server connection expecting commands to process. It could be argued that one should primarily use the messages received from the server to update state, rather than correlating request and responses, as the server may not respond to a command or may respond much later than expected. For example, one may issue a WHOIS command, but receive multiple PRIVMSG commands before receiving a response to WHOIS. For a chat client, a user would likely expect being able to receive chat messages while waiting for a response to WHOIS. Hence, having a async_write() to async_read() call chain may not be ideal in handling the protocol.
For a given socket, the Asio documentation does recommend not initiating additional read operations if there is an outstanding composed read operation and not initiating additional write operations if there is an outstanding composed write operation. Queuing up messages and having an asynchronous call chains process from the queue is a great way to fulfill this recommendation. Consider reading this answer for a nice solution using a queue and an asynchronous call chain.
Also, be aware that the server may send a PING command even on an active connection. When the client is responding with a PONG command, it may be necessary to insert the PONG command near the front of the outbound queue so that it gets sent out as soon as possible.
Doesn't this defeat the purpose of doing async calls?
The usual solution is to use strands:
Why do I need strand per connection when using boost::asio?
You are free to queue multiple asynchronous operations on the same io objects using an (implicit) strand¹.
Using a strand ensures that the completion handlers are invoked on that same logical thread.
On the Protocol
You could indeed keep a queue of commands and await responses for each command before sending the next.
You might be a little bit smarter about this if you can spot the correlation due the different type of reply, but then you'd need to keep queues per type of command. I'd consider that premature optimization.

Spring Integration message processing partitioned by header information

I want to be able to process messages with Spring Integration in parallel. The messages come from multiple devices and we need to process messages from the same device in sequential order but the devices can be processed in multiple threads. There can be thousands of devices so I'm trying to figure out how to assign processor based on mod of the device ID using Spring Integration's semantics as much as possible. What approach should I be looking at?
It's difficult to generalize without knowing other requirements (transaction semantics etc) but probably the simplest approach would be a router sending messages to a number of QueueChannels using some kind of hash algorithm on the device id (so all messages for a particular device go to the same channel).
Then, have a single-threaded poller pulling messages from each queue.
EDIT: (response to comment)
Again, difficult to generalize, but...
See AbstractMessageRouter.determineTargetChannels() - a router actually returns a physical channel object (actually a list, but in most cases a list of 1). So, yes, you can create the QueueChannels programmatically and have the router return the appropriate one, based on the message.
Assuming you want all the messages to then be handled by the same downstream flow, you would also need to create a <bridge/> for each queue channel to bridge it to the input channel of the next component in the flow.
create a QueueChannel
create a BridgeHandler (set the outputChannel to the input channel of the next component)
create a PollingConsumer (constructor takes the channel and handler; set trigger etc)
start() the consumer.
All of this can be done in your custom router initialization and implement determineTargetChannels() to select the queue.
Depending on the processing time for your events, I would generally recommend running the downstream flow on the poller thread rather than setting a taskExecutor to avoid issues with the next poll trying to schedule another task before this one's done. You might need to increase the default taskScheduler's pool size.

boost::asio sending data faster than receiving over TCP. Or how to disable buffering

I have created a client/server program, the client starts
an instance of Writer class and the server starts an instance of
Reader class. Writer will then write a DATA_SIZE bytes of data
asynchronously to the Reader every USLEEP mili seconds.
Every successive async_write request by the Writer is done
only if the "on write" handler from the previous request had
been called.
The problem is, If the Writer (client) is writing more data into the
socket than the Reader (server) is capable of receiving this seems
to be the behaviour:
Writer will start writing into (I think) system buffer and even
though the data had not yet been received by the Reader it will be
calling the "on write" handler without an error.
When the buffer is full, boost::asio won't fire the "on write"
handler anymore, untill the buffer gets smaller.
In the meanwhile, the Reader is still receiving small chunks
of data.
The fact that the Reader keeps receiving bytes after I close
the Writer program seems to prove this theory correct.
What I need to achieve is to prevent this buffering because the
data need to be "real time" (as much as possible).
I'm guessing I need to use some combination of the socket options that
asio offers, like the no_delay or send_buffer_size, but I'm just guessing
here as I haven't had success experimenting with these.
I think that the first solution that one can think of is to use
UDP instead of TCP. This will be the case as I'll need to switch to
UDP for other reasons as well in the near future, but I would
first like to find out how to do it with TCP just for the sake
of having it straight in my head in case I'll have a similar
problem some other day in the future.
NOTE1: Before I started experimenting with asynchronous operations in asio library I had implemented this same scenario using threads, locks and asio::sockets and did not experience such buffering at that time. I had to switch to the asynchronous API because asio does not seem to allow timed interruptions of synchronous calls.
NOTE2: Here is a working example that demonstrates the problem: http://pastie.org/3122025
EDIT: I've done one more test, in my NOTE1 I mentioned that when I was using asio::iosockets I did not experience this buffering. So I wanted to be sure and created this test: http://pastie.org/3125452 It turns out that the buffering is there event with asio::iosockets, so there must have been something else that caused it to go smoothly, possibly lower FPS.
TCP/IP is definitely geared for maximizing throughput as intention of most network applications is to transfer data between hosts. In such scenarios it is expected that a transfer of N bytes will take T seconds and clearly it doesn't matter if receiver is a little slow to process data. In fact, as you noticed TCP/IP protocol implements the sliding window which allows the sender to buffer some data so that it is always ready to be sent but leaves the ultimate throttling control up to the receiver. Receiver can go full speed, pace itself or even pause transmission.
If you don't need throughput and instead want to guarantee that the data your sender is transmitting is as close to real time as possible, then what you need is to make sure the sender doesn't write the next packet until he receives an acknowledgement from the receiver that it has processed the previous data packet. So instead of blindly sending packet after packet until you are blocked, define a message structure for control messages to be sent back from the receiver back to the sender.
Obviously with this approach, your trade off is that each sent packet is closer to real-time of the sender but you are limiting how much data you can transfer while slightly increasing total bandwidth used by your protocol (i.e. additional control messages). Also keep in mind that "close to real-time" is relative because you will still face delays in the network as well as ability of the receiver to process data. So you might also take a look at the design constraints of your specific application to determine how "close" do you really need to be.
If you need to be very close, but at the same time you don't care if packets are lost because old packet data is superseded by new data, then UDP/IP might be a better alternative. However, a) if you have reliable deliver requirements, you might ends up reinventing a portion of tcp/ip's wheel and b) keep in mind that certain networks (corporate firewalls) tend to block UDP/IP while allowing TCP/IP traffic and c) even UDP/IP won't be exact real-time.

Resources