Selecting data from a channel - go

type Reader struct {
sync.RWMutex
logger *zerolog.Logger
wg *sync.WaitGroup
ticker *time.Ticker
messages chan *TransactionData
}
type TransactionData struct {
ID string
messages []*Message
}
func NewReader(logger zerolog.Logger) *Reader {
return &Reader{
wg: &sync.WaitGroup{},
logger: &l,
ticker: time.NewTicker(1 * time.Second),
messages: make(chan *TransactionData),
}
}
func (r *Reader) Write(ID string, messages []*Message) error {
r.wg.Add(1)
....
go func(ID string, messages []*domain.Message) {
defer r.wg.Done()
r.add(&TransactionData{
chatID: ID,
messages: messages,
})
}(ID, messages)
return nil
}
func (r *Reader) add(data *TransactionData) {
r.Lock()
defer r.Unlock()
r.messages <- data
}
func (r *Reader) get() {
r.Lock()
defer func() {
r.Unlock()
}()
for {
select {
case r.ticker.C:
// how to select all the data from the channel?
}
}
}
I'm exploring channels right now. And with the help of channels I'm trying to solve one problem.
Task: user reads chat history, selects all messages that user read, put these messages into channel, second channel selects data and creates transaction, through goroutine I add data into channel using Write method. I want to create a second goroutine which every second (using ticker.C) will select all the data from the channel and form a transaction. But I can't figure out how I can select all the data from the channel that I've managed to add there, example:
user-1: Write(&TransactionData{...}),
user-2: Write(&TransactionData{...}),
....
user-n: Write(&TransactionData{...})
How do I select absolutely all data and do I need to select all data or do I need to select one at a time?
Brief description of the solution:
Read from the channel in an infinite loop using goroutine
Put the subtracted tasks into the sync.Map thread-safe place
Run a second goroutine with time.Ticker.C and when it is triggered, select data for the transaction.
First question: How do I select all the data from the channel so that I collect one transaction and can update more data, provided that, I don't use any keys?(I attached an outline of the read method)
Second question: Should I get ALL the data from the channel or should I get only one TransactionData at a time and form one transaction?

There are several problems with your approach and implementation.
First: you don't need to lock a mutex to send to a channel.
Second: you don't need to lock a mutex to receive from a channel.
As for the implementation: since the channel is unbuffered, channel send will only work when there is a listening goroutine. If you receive from a channel when a ticker hits, then all the sends will happen after that ticker hits. Send operations will block until then.
So, you should get rid of that ticker, and read from the channel in a for-loop. When all the writes are completed, you should close the channel, which will terminate the for-loop, so you can write. Closing the channel is the idiomatic way of signaling to the receiver that all the data is sent.

Related

Why does sending to channel is blocked when a panic occurs?

I am going to show a simple compilable code snipped where I get weird behaviour: after I intentionally cause a panic in processData (because I pass nil-pointer) the sending to channel l.done is blocked!
package main
import (
"context"
"fmt"
"time"
)
type Loop struct {
done chan struct{}
}
type Result struct {
ID string
}
func NewLoop() *Loop {
return &Loop{
done: make(chan struct{}),
}
}
func (l *Loop) StartAsync(ctx context.Context) {
go func() {
defer func() {
l.done <- struct{}{} // BLOCKED! But I allocated it in NewLoop ctor
fmt.Sprintf("done")
}()
for {
/**/
var data *Result
l.processData(ctx, data) // passed nil
}
}()
}
func (l *Loop) processData(ctx context.Context, data *Result) {
_ = fmt.Sprintf("%s", data.ID) // intentional panic - OK
}
func main() {
l := NewLoop()
l.StartAsync(context.Background())
time.Sleep(10 * time.Second)
}
I can recover() a panic before sending to channel and I get correct error message.
What does happen with channel? It it was closed, I would get panic on sending to closed channel
It's blocking because there isn't anything receiving from the channel. Both the receive & the send operations on an initialized and unbuffered channel will block indefinitely if the opposite operation is missing. I.e. a send to a channel will block until another goroutine receives from that channel, and, likewise, a receive from a channel will block until another goroutine sends to that channel.
So basically
done := new(chan struct{})
done<-struct{}{}
will block forever unless there is another goroutine that receives from the channel, i.e. <-done. That's how channels are supposed to behave. That's how the languages specification defines their behaviour.
about the possible fixes :
given the name of your channel, you may want to run close(l.done) rather than l.done <- struct{}{}.
using a buffered channel and l.done <- struct{}{} on completion : only one call to <-l.done will be unblocked.
Suppose you have some code looking like :
l := NewLoop()
go func(){
<-l.done
closeLoggers()
}()
go func(){
<-l.done
closeDatabase()
}()
sending one item on the done channel will make that only one consumer will receive it, and in the above example only one of the two actions will be triggered when the loop completes.
using close(l.done) : once a channel is closed all calls to receive from it will proceed.
In the above example, all actions will proceed.
As a side note: there is no need for a buffer if you use a channel only for its "open/closed" state.

Sending data from one goroutine to multiple other goroutines

In a project the program receives data via websocket. This data needs to be processed by n algorithms. The amount of algorithms can change dynamically.
My attempt is to create some pub/sub pattern where subscriptions can be started and canceled on the fly. Turns out that this is a bit more challenging than expected.
Here's what I came up with (which is based on https://eli.thegreenplace.net/2020/pubsub-using-channels-in-go/):
package pubsub
import (
"context"
"sync"
"time"
)
type Pubsub struct {
sync.RWMutex
subs []*Subsciption
closed bool
}
func New() *Pubsub {
ps := &Pubsub{}
ps.subs = []*Subsciption{}
return ps
}
func (ps *Pubsub) Publish(msg interface{}) {
ps.RLock()
defer ps.RUnlock()
if ps.closed {
return
}
for _, sub := range ps.subs {
// ISSUE1: These goroutines apparently do not exit properly...
go func(ch chan interface{}) {
ch <- msg
}(sub.Data)
}
}
func (ps *Pubsub) Subscribe() (context.Context, *Subsciption, error) {
ps.Lock()
defer ps.Unlock()
// prep channel
ctx, cancel := context.WithCancel(context.Background())
sub := &Subsciption{
Data: make(chan interface{}, 1),
cancel: cancel,
ps: ps,
}
// prep subsciption
ps.subs = append(ps.subs, sub)
return ctx, sub, nil
}
func (ps *Pubsub) unsubscribe(s *Subsciption) bool {
ps.Lock()
defer ps.Unlock()
found := false
index := 0
for i, sub := range ps.subs {
if sub == s {
index = i
found = true
}
}
if found {
s.cancel()
ps.subs[index] = ps.subs[len(ps.subs)-1]
ps.subs = ps.subs[:len(ps.subs)-1]
// ISSUE2: close the channel async with a delay to ensure
// nothing will be written to the channel anymore
// via a pending goroutine from Publish()
go func(ch chan interface{}) {
time.Sleep(500 * time.Millisecond)
close(ch)
}(s.Data)
}
return found
}
func (ps *Pubsub) Close() {
ps.Lock()
defer ps.Unlock()
if !ps.closed {
ps.closed = true
for _, sub := range ps.subs {
sub.cancel()
// ISSUE2: close the channel async with a delay to ensure
// nothing will be written to the channel anymore
// via a pending goroutine from Publish()
go func(ch chan interface{}) {
time.Sleep(500 * time.Millisecond)
close(ch)
}(sub.Data)
}
}
}
type Subsciption struct {
Data chan interface{}
cancel func()
ps *Pubsub
}
func (s *Subsciption) Unsubscribe() {
s.ps.unsubscribe(s)
}
As mentioned in the comments there are (at least) two issues with this:
ISSUE1:
After a while of running in implementation of this I get a few errors of this kind:
goroutine 120624 [runnable]:
bm/internal/pubsub.(*Pubsub).Publish.func1(0x8586c0, 0xc00b44e880, 0xc008617740)
/home/X/Projects/bm/internal/pubsub/pubsub.go:30
created by bookmaker/internal/pubsub.(*Pubsub).Publish
/home/X/Projects/bm/internal/pubsub/pubsub.go:30 +0xbb
Without really understanding this it appears to me that the goroutines created in Publish() do accumulate/leak. Is this correct and what am I doing wrong here?
ISSUE2:
When I end a subscription via Unsubscribe() it occurs that Publish() tried to write to a closed channel and panics. To mitigate this I have created a goroutine to close the channel with a delay. This feel really off-best-practice but I was not able to find a proper solution to this. What would be a deterministic way to do this?
Heres a little playground for you to test with: https://play.golang.org/p/K-L8vLjt7_9
Before diving into your solution and its issues, let me recommend again another Broker approach presented in this answer: How to broadcast message using channel
Now on to your solution.
Whenever you launch a goroutine, always think of how it will end and make sure it does if the goroutine is not ought to run for the lifetime of your app.
// ISSUE1: These goroutines apparently do not exit properly...
go func(ch chan interface{}) {
ch <- msg
}(sub.Data)
This goroutine tries to send a value on ch. This may be a blocking operation: it will block if ch's buffer is full and there is no ready receiver on ch. This is out of the control of the launched goroutine, and also out of the control of the pubsub package. This may be fine in some cases, but this already places a burden on the users of the package. Try to avoid these. Try to create APIs that are easy to use and hard to misuse.
Also, launching a goroutine just to send a value on a channel is a waste of resources (goroutines are cheap and light, but you shouldn't spam them whenever you can).
You do it because you don't want to get blocked. To avoid blocking, you may use a buffered channel with a "reasonable" high buffer. Yes, this doesn't solve the blocking issue, in only helps with "slow" clients receiving from the channel.
To "truly" avoid blocking without launching a goroutine, you may use non-blocking send:
select {
case ch <- msg:
default:
// ch's buffer is full, we cannot deliver now
}
If send on ch can proceed, it will happen. If not, the default branch is chosen immediately. You have to decide what to do then. Is it acceptable to "lose" a message? Is it acceptable to wait for some time until "giving up"? Or is it acceptable to launch a goroutine to do this (but then you'll be back at what we're trying to fix here)? Or is it acceptable to get blocked until the client can receive from the channel...
Choosing a reasonable high buffer, if you encounter a situation when it still gets full, it may be acceptable to block until the client can advance and receive from the message. If it can't, then your whole app might be in an unacceptable state, and it might be acceptable to "hang" or "crash".
// ISSUE2: close the channel async with a delay to ensure
// nothing will be written to the channel anymore
// via a pending goroutine from Publish()
go func(ch chan interface{}) {
time.Sleep(500 * time.Millisecond)
close(ch)
}(s.Data)
Closing a channel is a signal to the receiver(s) that no more values will be sent on the channel. So always it should be the sender's job (and responsibility) to close the channel. Launching a goroutine to close the channel, you "hand" that job and responsibility to another "entity" (a goroutine) that will not be synchronized to the sender. This may easily end up in a panic (sending on a closed channel is a runtime panic, for other axioms see How does a non initialized channel behave?). Don't do that.
Yes, this was necessary because you launched goroutines to send. If you don't do that, then you may close "in-place", without launching a goroutine, because then the sender and closer will be the same entity: the Pubsub itself, whose sending and closing operations are protected by a mutex. So solving the first issue solves the second naturally.
In general if there are multiple senders for a channel, then closing the channel must be coordinated. There must be a single entity (often not any of the senders) that waits for all senders to finish, practically using a sync.WaitGroup, and then that single entity can close the channel, safely. See Closing channel of unknown length.

Is it OK to read len(ch) messages from channel ch?

I'm running single goroutine to handle messages channel related to some user. After processing messages the user state is updated and stored in database by the goroutine. While request to database is in progress a number of messages can be sent to the channel. I would like to process them all before sending another request to the database.
Currently I'm using len(ch) to check number of messages in the channel and reading them in a for loop.
func (c *consumer) handleUser(userID string, ch chan Message) {
user := c.db.LoadUser(userID)
for {
var msgs []Message
for n := len(ch); n > 0; n-- {
msgs = append(msgs, <-ch)
}
apply.Messages(user, msgs)
c.db.SaveUser(user)
}
ch := make(chan Message, 100)
go c.handleUser("user-1", ch)
I was searching in the internet if this is some common pattern but I couldn't find similar solutions and I'm wondering if my approach is valid/idiomatic for go programs.
Your solution would cause the spawned goroutine to spin over the channel until at least one message is sent. In other words, the goroutine never blocks at all.
Here you are trying to process multiple messages in one batch. There are different ways to implement that. But the main question to answer is: how do you know that the batch of messages is complete? The sender goroutine might have this knowledge and it can pack all the messages in one slice. On the other hand, you might not know when the batch is ready. In those cases, you need to use a timeout, like the following example.
func (c *consumer) handleUser(userID string, ch chan Message) {
user := c.db.LoadUser(userID)
for {
var msgs []Message
select {
case msg := <-ch:
//Append the message in the current batch slice
msgs = append(msgs, msg)
//Wait up to 5 seconds and then process the batch
case <-time.After(time.Second * 5):
//Timeout: process the batch of messages
if len(msgs) > 0 {
apply.Messages(user, msgs)
c.db.SaveUser(user)
}
}
}
}
Note that a possible goroutine executing this function runs only when there is actually something to do.

How do I Efficiently Chain Channels Together?

I'm trying to create message hub in golang. Messages are getting through different channels that persist in map[uint32]chan []float64. I do an endless loop over map and check if a channel has a message. If it has, I write it to the common client's write channel together with an incoming channel's id. It works fine, but uses all CPU, and other processes are throttled.
UPD: Items in map adding and removing dynamically by another function.
I thinking to limit CPU for this app through Docker, but maybe there is more elegant path?
My code :
func (c *Client) accumHandler() {
for !c.stop {
c.channels.Range(func(key, value interface{}) bool {
select {
case message := <-value.(chan []float64):
mess := map[uint32]interface{}{key.(uint32): message}
select {
case c.send <- mess:
}
default:
}
return !c.stop
})
}
}
If I'm reading the cards correctly, it seems like you are trying to pass along an array of floats to a common channel along with a channel identifier. I assume that you are doing this to pass multiple channels out to different publishers, but so that you only have to track a single channel for your consumer.
It turns out that you don't need to loop over channels to see when it's outputting a value. You can chain channels together inside of goroutines. For this reason, no busy wait is necessary. Something like this will suit your purposes (again, if I'm reading the cards correctly). Look for the all caps comment for the way around your busy loop. Link to playground.
var num_publishes = 3
func main() {
num_publishers := 10
single_consumer := make(chan []float64)
for i:=0;i<num_publishers;i+=1 {
c := make(chan []float64)
// connect channel to your single consumer channel
go func() { for { single_consumer <- <-c } }() // THIS IS PROBABLY WHAT YOU DIDN'T KNOW ABOUT
// send the channel to the publisher
go publisher(c, i*100)
}
// dumb consumer example
for i:=0;i<num_publishers*num_publishes;i+=1 {
fmt.Println(<-single_consumer)
}
}
func publisher(c chan []float64, publisher_id int) {
dummy := []float64{
float64(publisher_id+1),
float64(publisher_id+2),
float64(publisher_id+3),
}
for i:=0;i<num_publishes;i+=1 {
time.Sleep(time.Duration(rand.Intn(10000)) * time.Millisecond)
c <- dummy
}
}
It is eating all of your CPU because you are continuously cycling around the dictionary checking for messages, so even when there are no messages to process the CPU, or at least a thread or core, is running flat out. You need blocking sends and receives on the channels!
I assume you are doing this because you don't know how many channels there will be and therefor can't just select on all of the input channels. A better pattern would be to start a separate goroutine for each input channel you are currently storing in the dictionary. Each goroutine should have a loop in which it blocks waiting the input channel and on receiving a message does a blocking send to a channel to the client which is shared by all.
The question isn't complete so can't give exact code, but you're going to have goroutines that look something like this:
type Message struct {
id uint32,
message []float64
}
func receiverGoroutine(id uint32, input chan []float64, output chan Message) {
for {
message := <- input
output <- Message{id: id, message: message}
}
}
func clientGoroutine(c *Client, input chan Message) {
for {
message := <- input
// do stuff
}
}
(You'll need to add some "done" channels as well though)
Elsewhere you will start them with code like this:
clientChan := make(chan Message)
go clientGoroutine(client, clientChan)
for i:=0; i<max; i++ {
go receiverGoroutine( (uint32)i, make(chan []float64, clientChan)
}
Or you can just start the client routine and then add the others as they are needed rather than in a loop up front - depends on your use case.

Update unread channel data

Is there a way to update unread data that been sent to a channel with more up to date data?
I have a goroutine (producer) with a channel that's provides progress updates to another goroutine (consumer). In some scenarios, the progress can update much faster than the consumer consumes the update messages.
This causes me issues as I can either:
Block on sending data to the channel. This means that if the consumer is slow to read data, the progress updating goroutine totally blocks - which it shouldn't.
Don't block on sending and skip over progress updates when the channel is full. This means the consumer is always reading old, out of data data.
As an example, I might have something like this:
Progress reporting goroutine: Posts "1%" to channel
Progress reporting goroutine: Posts "2%" to channel
Progress reporting goroutine: Posts "3%" to channel
Progress consuming goroutine: Reads "1%", "2%" and "3%" from channel. "1% and "2%" are outdated information.
Is there any way to update unread channel data? Or is there a better way of going about this issue?
You can store some value in a global variable protected with RWMutex It keeps progress. Generator updates it. Consumer reads and shows.
Also you can make a non-blocking writing to channel with length of 1:
var c = make(chan struct{}, 1)
select {
case c <- struct{}{}:
default:
}
This way sender either adds one element to the channel either do nothing if it’s full.
Reader treats this empty struct as a signal - it should take updated value in the global variable.
Another way: Updatable channel
var c = make(chan int, 1)
select {
case c <- value: // channel was empty - ok
default: // channel if full - we have to delete a value from it with some precautions to not get locked in our own channel
select {
case <- c: // read stale value and put a fresh one
c <- value
default: // consumer have read it - so skip and not get locked
}
}
package main
import "fmt"
func main() {
// channel buffer must be 1 in this case
var ch = make(chan int, 1)
// when channel was not consumed and you want to update value at channel
produceToChannel(ch, 1)
produceToChannel(ch, 2)
fmt.Println(consumeFromChannel(ch)) // prints 2
// when channel was already consumed and you are producing new value
produceToChannel(ch, 3)
consumeFromChannel(ch)
produceToChannel(ch, 4)
fmt.Println(consumeFromChannel(ch)) // prints 4
}
func produceToChannel(ch chan int, v int) {
select {
case <-ch:
default:
}
ch <- v
}
func consumeFromChannel(ch chan int) int {
return <- ch
}
Just clear the channel each time before sending value to the channel.
In other words, when you send 3%, 3% becomes the only value in channel.
You can make your channel with buffer length of 1, so simply use <-ch to clear the channel.
Edit: use a select with default to clear the channel with <-ch and so do not block in the case the previous value is already read.
How about a concurrent map that stores versions for all your inbound objects, with 0 say being the default version
import "sync"
var Versions sync.Map = sync.Map{}
type Data struct {
Payload interface{}
Version int
ID int
}
func produce(c chan Data) {
for {
data := produceData()
if hasNewVersion(data) {
Versions.Store(data.ID, data.Version)
}
c <- data
}
}
func consume(c chan Data) {
for {
select {
case val:= <- c:
if ver, ok := Versions.Load(val.ID); ok {
if ver.(int) == val.Version {
// process
}
}
}
}
}

Resources