How to implement a channel and multiple readers that read the same data at the same time? - go

I need several functions to have the same channel as a parameter and take the same data, simultaneously.
Each of these functions has an independent task from each other, but they start from the same value.
For example, given a slice of integers, one function calculates the sum of its values ​​and another calculates the average, at the same time. They would be goroutines.
One solution would be to create multiple channels from one value, but I want to avoid that. I might have to add or remove functions and for this, I would have to add or remove channels.
I think I understand that the Fan Out pattern could be an option, but I can't quite understand its implementation.

The question is against the rules of SO—as it does not present any concrete problem to be helped with but rather requests a tutoring session.
Anyway, two pointers for further research: basically—given the property of channel that each receive consumes a value sent to it, so it's impossible to read a once sent value multiple times,—such problems have two approaches to their solutions.
The first approach, which is what called a "fan-out", is to have all the consumers have a "personal" dedicated channel, copy the value to be broadcast as many times as there are consumers and send each copy to each of those dedicated channels.
The ostensibly most natural way to implement this is to have a single channel to which the producer sends its units of work—not caring how much consumers are to read them—and then have a dedicated goroutine receive those units of work, copy each of them and send the copies out to the dedicated channels of the consumers.
The second approach is to go lower level and implement basically the same scheme using stuff from the sync package.
One can think of the following scheme:
Have a custom struct type which has a sync.Mutex protecting the type's state.
Have a field which keeps the value multiple consumers have to read.
Have a counter in that type.
Have a sync.Cond in that type as well.
Have a channel with capacity there 1 as well.
Communicating a new value to the consumers looks like this:
Lock the mutex.
Verify the counter is 0, panic otherwise.
Write the new value into the respective field.
Set the counter to the number of consumers.
Unlock the mutex.
Pulse the sync.Cond.
The consumers are supposed to sleep in a wait call on that sync.Cond.
Once the sender pulses it, the goroutines running the code of consumers get woken up and try to read the value.
Reading of the value rolls like this:
Lock the mutex.
Verify the counter is greater than zero, panic otherwise.
Read the value.
Decrement the counter by one.
If the counter becomes 0, send on that special channel.
Unlock the mutex.
The channel is needed to communicate to the sender that all the consumers are done with their reads: before attempting to send the new value the consumer has to read from that channel.
As you can probably see, the second approach is way more involved and hard to get right, so I'd recommend to go with the first one.
I would also note that you seem to lack certain background knowledge on how to go around implementing concurrently running and communicating tasks.
I hereby recommend reading The Book and at least these chapters of The Blog:
Go Concurrency Patterns: Pipelines and cancellation.
Go Concurrency Patterns: Timing out, moving on
Advanced Go Concurrency Patterns

Related

Golang: using multiple tickers cases in single select blocks entire loop

I have a requirements where I need to do multiple things (irrelevant here) at some regular intervals. I achieved it using the code block mentioned below -
func (processor *Processor) process() {
defaultTicker := time.NewTicker(time.Second*2)
updateTicker := time.NewTicker(time.Second*5)
heartbeatTicker := time.NewTicker(time.Second*5)
timeoutTicker := time.NewTicker(30*time.Second)
refreshTicker := time.NewTicker(2*time.Minute)
defer func() {
logger.Info("processor for ", processor.id, " exited")
defaultTicker.Stop()
timeoutTicker.Stop()
updateTicker.Stop()
refreshTicker.Stop()
heartbeatTicker.Stop()
}()
for {
select {
case <-defaultTicker.C:
// spawn some go routines
case <-updateTicker.C:
// do something
case <-timeoutTicker.C:
// do something else
case <-refreshTicker.C:
// log
case <-heartbeatTicker.C:
// push metrics to redis
}
}
}
But I noticed that every once in a while, my for select loop gets stuck somewhere and I cannot seem to find where or why. By stuck I mean I stop receiving refresh ticker logs. But it starts working again normally in some time (5-10 mins)
I have made sure that all operations within each ticker completes within very little amount of time (~0ms, checked by putting logs).
My questions:
Is using multiple tickers in single select a good/normal practice (honestly I did not find many examples using multiple tickers online)
Anyone aware of any known issues/pitfalls where tickers can block the loop for longer duration.
Any help is appreciated. Thanks
Go does not provide any smart draining behavior for multiple channels, e.g., that older messages in one channel would get processed earlier than more recent messages in other channels. Anytime the loop enters the select statement a random channel is chosen.
Also see this answer and read the part about GOMAXPROCS=1. This could be related to your issue. The issue could also be in your logging package. Maybe the logs are just delayed.
In general, I think the issue must be in your case statements. Either you have a blocking function or some dysfunctional code. (Note: confirmed by the OP)
But to answer your questions:
1. Is using multiple tickers in single select a good/normal practice?
It is common to read from multiple channels randomly in a blocking way, one message at a time, e.g., to sort incoming data from multiple channels into a slice or map and avoid concurrent data access.
It is also common to add one or more tickers, e.g., to flush data and for logging or reporting. Usually the non-ticker code paths will do most of the work.
In your case, you use tickers that will run code paths that should block each other, which is a very specific use case but may be required in some scenarios. This uncommon, but not bad practice I think.
As the commenters suggested, you could also schedule different recurring tasks in separate goroutines.
2. Is anyone aware of any known issues/pitfalls where tickers can block the loop for longer duration?
The tickers themselves will not block the loop in any hidden way. The fastest ticker will always ensure the loop is looping at the speed of this ticker at least.
Note that the docs of time.NewTicker say:
The ticker will adjust the time interval or drop ticks to make up for
slow receivers
This just means, internally no new ticks are scheduled until you have consumed the last one from the single-element ticker channel.
In your example, the main pitfall is that any code in the case statements will block and thus delay the other cases.
If this is intended, everything is fine.
There may be other pitfalls if you have Microsecond or Nanosecond tickers where you may see some measurable runtime overhead or if you have hundreds of tickers and case blocks. But then you should have chose another scheduling pattern from the beginning.

Create channels with extra flags in an idiomatic way

TL;DR I want to have the functionality where a channel has two extra fields that tell the producer whether it is allowed to send to the channel and if so tell the producer what value the consumer expects. Although I know how to do it with shared memory, I believe that this approach goes against Go's ideology of "Do not communicate by sharing memory; instead, share memory by communicating."
Context:
I wish to have a server S that runs (besides others) three goroutines:
Listener that just receives UDP packets and sends them to the demultplexer.
Demultiplexer that takes network packets and based on some data sends it into one of several channels
Processing task which listens to one specific channel and processes data received on that channel.
To check whether some devices on the network are still alive, the processing task will periodically send out nonces over the network and then wait for k seconds. In those k seconds, other participants of my protocol that received the nonce will send a reply containing (besides other information) the nonce. The demultiplexer will receive the packets from the listener, parse them and send them to the processing_channel. After the k seconds elapsed, the processing task processes the messages pushed onto the processing_channel by the demultiplexer.
I want the demultiplexer to not just blindly send any response (of the correct type) it received onto the the processing_channel, but to instead check whether the processing task is currently even expecting any messages and if so which nonce value it expects. I made this design decision in order to drop unwanted packets a soon as possible.
My approach:
In other languages, I would have a class with the following fields (in pseudocode):
class ActivatedChannel{
boolean flag_expecting_nonce;
int expected_nonce;
LinkedList chan;
}
The demultiplexer would then upon receiving a packet of the correct type simply acquire the lock for the ActivatedChannel processing_channel object, check whether the flag is set and the nonce matches, and if so add the message to the LinkedList chan!
Problem:
This approach makes use of locks and shared memory, which does not align with Golang's "Do not communicate by sharing memory; instead, share memory by communicating" mantra. Hence, I would like to know... :
... whether my approach is "bad" regarding Go in the sense that it relies on shared memory.
... how to achieve the outlined result in a more Go-like way.
Yes, the approach described by you doesn't align with Golang's Idiomatic way of implementation. And you have rightly pointed out that in the above approach you are communicating by sharing memory.
To achieve this in Go's Idiomatic way, one of the approaches could be that your Demultiplexer "remembers" all the processing_channels that are expecting nonce and the corresponding type of the nonce. Whenever a processing_channels is ready to receive a reply, it sends a signal to the Demultiplexe saying that it is expecting a reply.
Since Demultiplexer is at the center of all the communication it can maintain a mapping between a processing_channel & the corresponding nonce it expects. It can also maintain a "registry" of all the processing_channels which are expecting a reply.
In this approach, we are Sharing memory by communicating
For communicating that a processing_channel is expecting a reply, the following struct can be used:
type ChannelState struct {
ChannelId string // unique identifier for processing channel
IsExpectingNonce bool
ExpectedNonce int
}
In this approach, there is no lock used.

How to find out destination of a golang channel

I am taking over maintenance of a multi-file golang program and now trying to understand the code flow. One feature of golang is the use of channels for sending values to another part of the code base. This feature can make tracing and understanding the code flow difficult, as the execution will resume at the receiving end of the channel, which may well be in a different file and may have a different name.
When reading through the code, I can see where data is being sent to a channel, but I do not see an intuitive or easy way to figure out where it is being received.
Is there a way in gloang to find out where (as in filename:linenum) a data sent through a channel is received?
No, because multiple places can receive from the same channel, and multiple instances of the same function can be receiving from different channels. Your best bet is to follow the channel itself around - look at where it's created, then what it gets passed to, and find what is receiving from it that way.

Sharing a slice across a few different goroutines

Given that I have an slice of structs of type User
Users := make([]User)
I am listening for TCP connections, and when a user connects, I'm adding an new user to this slice.
The way I've done this, is by setting up a NewUsers channel
NewUsers := make(chan User)
Upon new TCP connects, a User gets sent to this channel, and a central function waits for a User to arrive to add it to the Users slice.
But now I would like multiple subsystems (packages/functions) to use this list of Users. One function might simply want to receive a list of users, while a different function might want to broadcast messages to every user, or just users matching a certain condition.
How do multiple functions (which are possibly executed from different goroutines) safely access the list of users. I see two possible ways:
Every subsystem that needs access to this list needs their own AddUser channel and maintain their own slice of users and something needs to broadcast new users to every one of these channels.
Block access with a Mutex
Option 1 seems very convoluted and would generate a fair bit of duplication, but my understanding that Mutexes are best to be avoided if you try to stick to the "Share Memory By Communicating" mantra.
The idiomatic Go way to share data between concurrent activities is summed up in this:
Do not communicate by sharing memory; instead, share memory by
communicating.
Andrew Gerrand blogged about this, for example.
It need not be overly-complex; you can think of designing internal microservices, expressed using goroutines with channels.
In your case, this probably means designing a service element to contain the master copy of the list of users.
The main advantages of the Go/CSP strategy are that
concurrency is a design choice, along with your other aspects of design
only local knowledge is needed to understand the concurrent behaviour: this arises because a goroutine can itself consist of internal goroutines, and this applies all the way down if needed. Understanding the external behaviour of the higher-level goroutines depends only on its interfaces, not on the hidden internals.
But...
There are times when a safely shared data structure (protected by mutexes) will be sufficient now and always. It might then be argued that the extra complexity of goroutines and channels is a non-requirement.
A safely shared list data structure is something you will find several people have provided as open-source APIs. (I have one myself - see the built-ins in runtemplate).
The mutex approach is the best, safest and most manageable approach to that problem and is the fastest.
Channels are complex beasts on the inside and are much slower than a rwmutex-guarded map/slice.

Spring Integration message processing partitioned by header information

I want to be able to process messages with Spring Integration in parallel. The messages come from multiple devices and we need to process messages from the same device in sequential order but the devices can be processed in multiple threads. There can be thousands of devices so I'm trying to figure out how to assign processor based on mod of the device ID using Spring Integration's semantics as much as possible. What approach should I be looking at?
It's difficult to generalize without knowing other requirements (transaction semantics etc) but probably the simplest approach would be a router sending messages to a number of QueueChannels using some kind of hash algorithm on the device id (so all messages for a particular device go to the same channel).
Then, have a single-threaded poller pulling messages from each queue.
EDIT: (response to comment)
Again, difficult to generalize, but...
See AbstractMessageRouter.determineTargetChannels() - a router actually returns a physical channel object (actually a list, but in most cases a list of 1). So, yes, you can create the QueueChannels programmatically and have the router return the appropriate one, based on the message.
Assuming you want all the messages to then be handled by the same downstream flow, you would also need to create a <bridge/> for each queue channel to bridge it to the input channel of the next component in the flow.
create a QueueChannel
create a BridgeHandler (set the outputChannel to the input channel of the next component)
create a PollingConsumer (constructor takes the channel and handler; set trigger etc)
start() the consumer.
All of this can be done in your custom router initialization and implement determineTargetChannels() to select the queue.
Depending on the processing time for your events, I would generally recommend running the downstream flow on the poller thread rather than setting a taskExecutor to avoid issues with the next poll trying to schedule another task before this one's done. You might need to increase the default taskScheduler's pool size.

Resources