I'm using channels in Go to process a data pipeline of sorts. The code looks something like this:
type Channels struct {
inputs chan string
errc chan error
quit chan struct{}
}
func (c *Channels) doSomethingWithInput() {
defer close(c.quit)
defer close(c.errc)
for input := range p.inputs {
_, err := doSomethingThatSometimesErrors(input)
if err != nil {
c.errc <- err
return
}
}
doOneFinalThingThatCannotError()
return
}
func (c *Channels) inputData(s string) {
// This function implementation is my question
}
func StartProcessing(c *Channels, data ...string) error {
go c.doSomethingWithInput()
go func() {
defer close(c.inputs)
for _, i := range data {
select {
case <-c.quit:
break
default:
}
inputData(i)
}
}()
// Block until the quit channel is closed.
<-c.quit
if err := <-c.errc; err != nil {
return err
}
return nil
}
This seems like a reasonable way to communicate a quit signal between channel processors and is based on this blog post about concurrency patterns in Go.
The thing I struggle with using this pattern is the inputData function. Adding strings to the input channel needs to wait for doSomethingWithInput() to read the channel, but it also might error. inputData needs to try and feed the inputs channel but give up if told to quit. The best I could do was this:
func (c *Channels) inputData(s string) {
for {
select {
case <-c.quit:
return
case c.inputs <- s:
return
}
}
}
Essentially, "oscillate between your options until one of them sticks." To be clear, I don't think it's a bad design. It just feels... wasteful. Like I'm missing something clever. How can I tell a channel sender to quit in Go when a channel consumer errors?
Your inputData() is fine, that's the way to do it.
In your use case, your channel consumer, the receiver, aka doSomethingWithInput() is the one which should have control over the "quit" channel. As it is, if an error occurs, just return from doSomethingWithInput(), which will in turn close the quit channel and make the sender(s) quit (will trigger case <-quit:). That is in fact the clever bit.
Just watch out with your error channel that's not buffered and closed when doSomethingWithInput() exits. You cannot read it afterwards to collect errors. You need to close it in your main function and initialize it with some capacity (make(chan int, 10) for example), or create a consumer goroutine for it. You may also want to try reading it with a select statement: your error checking code, as it is, will block forever if there are no errors.
Related
This one is a tricky issue that bugs me quite a bit.
Essentially, I wrote an integration microservice that provides data streams from Binance crypto exchange using the Go client. A client sends a start messages, starts data stream for a symbol, and at some point, sends a close message to stop the stream. My implementation looks basically like this:
func (c BinanceClient) StartDataStream(clientType bn.ClientType, symbol, interval string) error {
switch clientType {
case bn.SPOT_LIVE:
wsKlineHandler := c.handlers.klineHandler.SpotKlineHandler
wsErrHandler := c.handlers.klineHandler.ErrHandler
_, stopC, err := binance.WsKlineServe(symbol, interval, wsKlineHandler, wsErrHandler)
if err != nil {
fmt.Println(err)
return err
} else {
c.state.clientSymChanMap[clientType][symbol] = stopC
return nil
}
...
}
The clientSymChanMap stores the stopChannel in a nested hashmap so that I can retrieve the stop channel later to stop the data feed. The stop function has been implemented accordingly:
func (c BinanceClient) StopDataStream(clientType bn.ClientType, symbol string) {
//mtd := "StopDataStream: "
stopC := c.state.clientSymChanMap[clientType][symbol]
if isClosed(stopC) {
DbgPrint(" Channel is already closed. Do nothing for: " + symbol)
} else {
close(stopC)
}
// Delete channel from the map otherwise the next StopAll throws a NPE due to closing a dead channel
delete(c.state.clientSymChanMap[clientType], symbol)
return
}
To prevent panics from already closed channels, I use a check function that returns true in case the channel is already close.
func isClosed(ch <-chan struct{}) bool {
select {
case <-ch:
return true
default:
}
return false
}
Looks nice, but has a catch. When I run the code with starting data for just one symbol, it starts and closes the datafeed exactly as expected.
However, when starting multiple data feeds, then the above code somehow never closes the websocket and just keeps streaming data forever. Without the isClosed check, I get panics of trying to close a closed channel, but with the check in place, well, nothing gets closed.
When looking at the implementation of the above binance.WsKlineServe function, it's quite obvious that it just wraps a new websocket with each invocation and then returns the done & stop channel.
The documentation gives the following usage example:
wsKlineHandler := func(event *binance.WsKlineEvent) {
fmt.Println(event)
}
errHandler := func(err error) {
fmt.Println(err)
}
doneC, stopC, err := binance.WsKlineServe("LTCBTC", "1m", wsKlineHandler, errHandler)
if err != nil {
fmt.Println(err)
return
}
<-doneC
Because the doneC channel actually blocks, I removed it and thought that storing the stopC channel and then use it later to stop the datafeed would work. However, it only does so for one single instance. When multiple streams are open, this doesn't work anymore.
Any idea what that's the case and how to fix it?
Firstly, this is dangerous:
if isClosed(stopC) {
DbgPrint(" Channel is already closed. Do nothing for: " + symbol)
} else {
close(stopC) // <- can't be sure channel is still open
}
there is no guarantee that after your polling check of the channel state, that the channel will still be in that same state in the next line of code. So this code could in theory could panic if it's called concurrently.
If you want an asynchronous action to occur on the channel close - it's best to do this explicitly from its own goroutine. So you could try this:
go func() {
stopC := c.state.clientSymChanMap[clientType][symbol]
<-stopC
// stopC definitely closed now
delete(c.state.clientSymChanMap[clientType], symbol)
}()
P.S. you do need some sort of mutex on your map, since the delete is asynchronous - you need to ensure any adds to the map don't datarace with this.
P.P.S Channels are reclaimed by the GC when they go out of scope. If you are no longer reading from it - they do not need to be explicitly closed to be reclaimed by the GC.
Using channels for stopping a goroutine or closing something is very tricky. There are lots of things you can do wrong or forget to do.
context.WithCancel abstracts that complexity away, making the code more readable and maintainable.
Some code snippets:
ctx, cancel := context.WitchCancel(context.TODO())
TheThingToCancel(ctx, ...)
// Whenever you want to stop TheThingToCancel. Can be called multiple times.
cancel()
Then in a for loop you'd often have a select like this:
for {
select {
case <-ctx.Done():
return
default:
}
// do stuff
}
Here some code that is closer to your specific case of an open connection:
func TheThingToCancel(ctx context.Context) (context.CancelFunc, error) {
ctx, cancel := context.WithCancel(ctx)
conn, err := net.Dial("tcp", ":12345")
if err != nil {
cancel()
return nil, err
}
go func() {
<-ctx.Done()
_ = conn.Close()
}()
go func() {
defer func() {
_ = conn.Close()
// make sure context is always cancelled to avoid goroutine leak
cancel()
}()
var bts = make([]byte, 1024)
for {
n, err := conn.Read(bts)
if err != nil {
return
}
fmt.Println(bts[:n])
}
}()
return cancel, nil
}
It returns the cancel function to be able to close it from the outside.
Cancelling a context can be done many times over without a panic like would occur if a channel is closed multiple times. That is one advantage. Also you can derive contexts from other contexts and thereby close a lot of contexts that all stop different routines by closing a parent context. Carefully designed, this is very powerful for shutting down different routines belonging together that also need to be able to be shut down individually.
I'm new to Go and concurrency in Go. I'm trying to use a Go context to cancel a set of Go routines once I find a member with a given ID.
A Group stores a list of Clients, and each Client has a list of Members. I want to search in parallel all the Clients and all their Members to find a Member with a given ID. Once this Member is found, I want to cancel all the other Go routines and return the discovered Member.
I've tried the following implementation, using a context.WithCancel and a WaitGroup.
This doesn't work however, and hangs indefinitely, never getting past the line waitGroup.Wait(), but I'm not sure why exactly.
func (group *Group) MemberWithID(ID string) (*models.Member, error) {
found := make(chan *models.Member)
ctx := context.Background()
ctx, cancel := context.WithCancel(ctx)
defer cancel()
var waitGroup sync.WaitGroup
for _, client := range group.Clients {
waitGroup.Add(1)
go func(clientToQuery Client) {
defer waitGroup.Done()
select {
case <-ctx.Done():
return
default:
}
member, _ := client.ClientMemberWithID(ID)
if member != nil {
found <- member
cancel()
return
}
} (client)
}
waitGroup.Wait()
if len(found) > 0 {
return <-found, nil
}
return nil, fmt.Errorf("no member found with given id")
}
found is an unbuffered channel, so sending on it blocks until there is someone ready to receive from it.
Your main() function would be the one to receive from it, but only after waitGroup.Wait() returns. But that will block until all launched goroutines call waitGroup.Done(). But that won't happen until they return, which won't happen until they can send on found. It's a deadlock.
If you change found to be buffered, that will allow sending values on it even if main() is not ready to receive from it (as many values as big the buffer is).
But you should receive from found before waitGroup.Wait() returns.
Another solution is to use a buffer of 1 for found, and use non-blocking send on found. That way the first (fastest) goroutine will be able to send the result, and the rest (given we're using non-blocking send) will simply skip sending.
Also note that it should be the main() that calls cancel(), not each launched goroutines individually.
For this kind of use case I think a sync.Once is probably a better fit than a channel. When you find the first non-nil member, you want to do two different things:
Record the member you found.
Cancel the remaining goroutines.
A buffered channel can easily do (1), but makes (2) a bit more complicated. But a sync.Once is perfect for doing two different things the first time something interesting happens!
I would also suggest aggregating non-trivial errors, so that you can report something more useful than no member found if, say, your database connection fails or some other nontrivial error occurs. You can use a sync.Once for that, too!
Putting it all together, I would want to see something like this (https://play.golang.org/p/QZXUUnbxOv5):
func (group *Group) MemberWithID(ctx context.Context, id string) (*Member, error) {
ctx, cancel := context.WithCancel(ctx)
defer cancel()
var (
wg sync.WaitGroup
member *Member
foundOnce sync.Once
firstNontrivialErr error
errOnce sync.Once
)
for _, client := range group.Clients {
wg.Add(1)
client := client // https://golang.org/doc/faq#closures_and_goroutines
go func() {
defer wg.Done()
m, err := client.ClientMemberWithID(ctx, id)
if m != nil {
foundOnce.Do(func() {
member = m
cancel()
})
} else if nf := (*MemberNotFoundError)(nil); !errors.As(err, &nf) {
errOnce.Do(func() {
firstNontrivialErr = err
})
}
}()
}
wg.Wait()
if member == nil {
if firstNontrivialErr != nil {
return nil, firstNontrivialErr
}
return nil, &MemberNotFoundError{ID: id}
}
return member, nil
}
I was trying to understand the following piece of code that reads from a channel of channels. I am having some difficulties wrapping my head around the idea.
bridge := func(done <-chan interface{}, chanStream <-chan <-chan interface{}) <-chan interface{} {
outStream := make(chan interface{})
go func() {
defer close(outStream)
for {
var stream <-chan interface{}
select {
case <-done:
return
case maybeSteram, ok := <-chanStream:
if ok == false {
return
}
stream = maybeSteram
}
for c := range orDone(done, stream) {
select {
case outStream <- c:
case <-done: // Why we are selection from the done channel here?
}
}
}
}()
return outStream
}
The orDone function:
orDone := func(done <-chan interface{}, inStream <-chan interface{}) <-chan interface{} {
outStream := make(chan interface{})
go func() {
defer close(outStream)
for {
select {
case <-done:
return
case v, ok := <-inStream:
if ok == false {
return
}
select {
case outStream <- v:
case <-done: // Again why we are only reading from this channel? Shouldn't we return from here?
// Why we are not retuening from here?
}
}
}
}()
return outStream
}
As mentioned in the comment, I need some help to understand why we are selecting in the for c := range orDone(donem, stream). Can anyone explain what is going on here?
Thanks in advance.
Edit
I took the code from the book concurrency in go. The full code can be found here: https://github.com/kat-co/concurrency-in-go-src/blob/master/concurrency-patterns-in-go/the-bridge-channel/fig-bridge-channel.go
In both cases, the select is done to avoid blocking — if the reader isn't reading from our output channel, the write might block (maybe even forever), but we want the goroutine to terminate when the done channel is closed, without waiting for anything else. By using the select, it will wait until either thing happens, and then continue, instead of waiting indefinitely for the write to complete before checking done.
As for the other question, "why are we not returning here?": well, we could. But we don't have to, because a closed channel remains readable forever (producing an unlimited number of zero values) once it's been closed. So it's okay to do nothing in those "bottom" selects; if done was in fact closed we will go back up to the top of the loop and hit the case <-done: return there. I suppose it's a matter of style. I probably would have written the return myself, but the author of this sample may have wanted to avoid handling the same condition in two places. As long as it's just return it doesn't really matter, but if you wanted to do some additional action on done, that behavior would have to be updated in two places if the bottom select returns, but only in one place if it doesn't.
Fist I will explain the usage of done channel. It's a common pattern followed in many concurrency related package implementation in Go. Done channels purpose is to denote the end of computation or more like a stop signal. Usually done channel will be listened by many go-routines or multiple places in code flow. One such example is Done channel in Go's builtin "context" package. Since Go doesn't have anything like broadcast ie., to signal all listeners of a channel (expecting this feature is not a good idea too), people just close the channel and all listeners will receive nil value. In your case, since the second select statement is in the end of for block, the code owner might have decided to just continue the loop so that on next iteration, the first select statement which listens on done channel will return from the function.
I'm building an app to run a command every time the code changes. I'm using a fsnotify for this feature. But, I can't understand how it is waiting a main goroutine.
I found that using sync.WaitGroup is more idiomatic, but I'm curious how done chan bool makes a goroutine is waiting in fsnotify example code.
I've tried to remove done in the example code of fsnotify, but it's not waiting a goroutine, just is exited.
watcher, err := fsnotify.NewWatcher()
if err != nil {
log.Fatal(err)
}
defer watcher.Close()
done := make(chan bool)
go func() {
for {
select {
case event, ok := <-watcher.Events:
if !ok {
return
}
log.Println("event:", event)
if event.Op&fsnotify.Write == fsnotify.Write {
log.Println("modified file:", event.Name)
}
case err, ok := <-watcher.Errors:
if !ok {
return
}
log.Println("error:", err)
}
}
}()
err = watcher.Add("/tmp/foo")
if err != nil {
log.Fatal(err)
}
<-done
I'm not entirely sure what you're asking, but there's a subtle bug in the code you've provided.
A done channel is a common way to block until an action completes. It is used like this:
done := make(chan X) // Where X is any type
go func() {
// Some logic, possibly in a loop
close(done)
}()
// Other logic
<-done // Wait for `done` to be closed
The type of the channel is unimportant, as no data is (necissarily) sent over the channel, so bool works, but struct{} is more idiomatic, as it indicates that no data can be sent.
Your example almost does this, except that it never calls close(done). This is a bug. It means that the code will always wait forever at <-done, thus negating the entire purpose of a done channel. Your example code will never exit.
This means the code, as you have provided, could be also written as:
go func() {
// Do stuff
}()
// Do other stuff
<Any code that blocks forever>
Because there are countless ways to block forever--none of them ever useful in practice--the channel in your example is not needed.
As per my study, I found an answer from one guy in reddit.com. This is kind of trick though, using <-done makes the main goroutine waiting an any value from chan done, eventually this app keeps running for fsnotify to watch and send a event to the main goroutine.
I'm looking for a solution to multiplex some channel output in go.
I have a source of data which is a read from an io.Reader that I send to a single channel. On the other side I have a websocket request handler that reads from the channel. Now it happens that two clients create a websocket connection, both reading from the same channel but each of them only getting a part of the messages.
Code example (simplified):
func (b *Bootloader) ReadLog() (<-chan []byte, error) {
if b.logCh != nil {
logrus.Warn("ReadLog called while channel already exists!")
return b.logCh, nil // This is where we get problems
}
b.logCh = make(chan []byte, 0)
go func() {
buf := make([]byte, 1024)
for {
n, err := b.p.Read(buf)
if err == nil {
msg := make([]byte, n)
copy(msg, buf[:n])
b.logCh <- msg
} else {
break
}
}
close(b.logCh)
b.logCh = nil
}()
return b.logCh, nil
}
Now when ReadLog() is called twice, the second call just returns the channel created in the first call, which leads to the problem explained above.
The question is: How to do proper multiplexing?
Is it better/easier/more ideomatic to care about the multiplexing on the sending or receiving site?
Should I hide the channel from the receiver and work with callbacks?
I'm a little stuck at the moment. Any hints are welcome.
Mutiplexing is pretty straightforward: make a slice of channels you want to multiplex to, start up a goroutine that reads from the original channel and copies each message to each channel in the slice:
// Really this should be in Bootloader but this is just an example
var consumers []chan []byte
func (b *Bootloader) multiplex() {
// We'll use a sync.once to make sure we don't start a bunch of these.
sync.Once(func(){
go func() {
// Every time a message comes over the channel...
for v := range b.logCh {
// Loop over the consumers...
for _,cons := range consumers {
// Send each one the message
cons <- v
}
}
}()
})
}