In a simple scheduler for timers I'm writing, I'm making use of a monitor goroutine to sync start/stop and timer done events.
The monitor goroutine, when stripped down to the essential, looks like this:
actions := make(chan func(), 1024)
// monitor goroutine
go func() {
for a := range actions {
a()
}
}()
actions <- func() {
actions <- func() {
// causes deadlock when buffer size is reached
}
}
This works great, until an action is sent that sends another action.
It's possible for a scheduled action to schedule another action, which causes a deadlock when the buffer size is reached.
Is there any clean way to solve this issue without resorting to shared state (which I've tried in my particular problem, but is quite ugly)?
The problem arises from the fact that when your monitor goroutine "takes out" (receives) a value (a function) and executes it (this also happens on the monitor goroutine), during its execution it sends a value on the monitored channel.
This in itself would not cause deadlock as when the function is executed (a()), it is already taken out of the buffered channel, so there is at least one free space in it, so sending a new value on the channel inside a() could proceed without blocking.
Problem may arise if there are other goroutines which may also send values on the monitored channel, which is your case.
One way to avoid the deadlock is that if the function being executed tries to "put back" (send) a function not in the same goroutine (which is the monitor goroutine), but in a new goroutine, so the monitor goroutine is not blocked:
actions <- func() {
// Send new func value in a new goroutine:
// this will never block the monitor goroutine
go func() {
actions <- func() {
// No deadlock.
}
}()
}
By doing this, the monitor goroutine will not be blocked even if buffer of actions is full, because sending a value on it will happen in a new goroutine (which may be blocked until there is free space in the buffer).
If you want to avoid always spawning a new goroutine when sending a value on actions, you may use a select to first try to send it without spawning a new goroutine. Only when buffer of actions is full and cannot be sent without blocking, shall you spawn a goroutine for sending to avoid deadlock, which - depending on your actual case may happen very rarely, and in this case it's inevitable anyway to avoid deadlock:
actions <- func() {
newfv := func() { /* do something */ }
// First try to send with select:
select {
case actions <- newfv:
// Success!
default:
// Buffer is full, must do it in new goroutine:
go func() {
actions <- newfv
}()
}
}
If you have to do this in many places, it's recommended to create a helper function for it:
func safeSend(fv func()) {
// First try to send with select:
select {
case actions <- fv:
// Success!
default:
// Buffer is full, must do it in new goroutine:
go func() {
actions <- fv
}()
}
}
And using it:
actions <- func() {
safeSend(func() {
// something to do
})
}
Related
In a project the program receives data via websocket. This data needs to be processed by n algorithms. The amount of algorithms can change dynamically.
My attempt is to create some pub/sub pattern where subscriptions can be started and canceled on the fly. Turns out that this is a bit more challenging than expected.
Here's what I came up with (which is based on https://eli.thegreenplace.net/2020/pubsub-using-channels-in-go/):
package pubsub
import (
"context"
"sync"
"time"
)
type Pubsub struct {
sync.RWMutex
subs []*Subsciption
closed bool
}
func New() *Pubsub {
ps := &Pubsub{}
ps.subs = []*Subsciption{}
return ps
}
func (ps *Pubsub) Publish(msg interface{}) {
ps.RLock()
defer ps.RUnlock()
if ps.closed {
return
}
for _, sub := range ps.subs {
// ISSUE1: These goroutines apparently do not exit properly...
go func(ch chan interface{}) {
ch <- msg
}(sub.Data)
}
}
func (ps *Pubsub) Subscribe() (context.Context, *Subsciption, error) {
ps.Lock()
defer ps.Unlock()
// prep channel
ctx, cancel := context.WithCancel(context.Background())
sub := &Subsciption{
Data: make(chan interface{}, 1),
cancel: cancel,
ps: ps,
}
// prep subsciption
ps.subs = append(ps.subs, sub)
return ctx, sub, nil
}
func (ps *Pubsub) unsubscribe(s *Subsciption) bool {
ps.Lock()
defer ps.Unlock()
found := false
index := 0
for i, sub := range ps.subs {
if sub == s {
index = i
found = true
}
}
if found {
s.cancel()
ps.subs[index] = ps.subs[len(ps.subs)-1]
ps.subs = ps.subs[:len(ps.subs)-1]
// ISSUE2: close the channel async with a delay to ensure
// nothing will be written to the channel anymore
// via a pending goroutine from Publish()
go func(ch chan interface{}) {
time.Sleep(500 * time.Millisecond)
close(ch)
}(s.Data)
}
return found
}
func (ps *Pubsub) Close() {
ps.Lock()
defer ps.Unlock()
if !ps.closed {
ps.closed = true
for _, sub := range ps.subs {
sub.cancel()
// ISSUE2: close the channel async with a delay to ensure
// nothing will be written to the channel anymore
// via a pending goroutine from Publish()
go func(ch chan interface{}) {
time.Sleep(500 * time.Millisecond)
close(ch)
}(sub.Data)
}
}
}
type Subsciption struct {
Data chan interface{}
cancel func()
ps *Pubsub
}
func (s *Subsciption) Unsubscribe() {
s.ps.unsubscribe(s)
}
As mentioned in the comments there are (at least) two issues with this:
ISSUE1:
After a while of running in implementation of this I get a few errors of this kind:
goroutine 120624 [runnable]:
bm/internal/pubsub.(*Pubsub).Publish.func1(0x8586c0, 0xc00b44e880, 0xc008617740)
/home/X/Projects/bm/internal/pubsub/pubsub.go:30
created by bookmaker/internal/pubsub.(*Pubsub).Publish
/home/X/Projects/bm/internal/pubsub/pubsub.go:30 +0xbb
Without really understanding this it appears to me that the goroutines created in Publish() do accumulate/leak. Is this correct and what am I doing wrong here?
ISSUE2:
When I end a subscription via Unsubscribe() it occurs that Publish() tried to write to a closed channel and panics. To mitigate this I have created a goroutine to close the channel with a delay. This feel really off-best-practice but I was not able to find a proper solution to this. What would be a deterministic way to do this?
Heres a little playground for you to test with: https://play.golang.org/p/K-L8vLjt7_9
Before diving into your solution and its issues, let me recommend again another Broker approach presented in this answer: How to broadcast message using channel
Now on to your solution.
Whenever you launch a goroutine, always think of how it will end and make sure it does if the goroutine is not ought to run for the lifetime of your app.
// ISSUE1: These goroutines apparently do not exit properly...
go func(ch chan interface{}) {
ch <- msg
}(sub.Data)
This goroutine tries to send a value on ch. This may be a blocking operation: it will block if ch's buffer is full and there is no ready receiver on ch. This is out of the control of the launched goroutine, and also out of the control of the pubsub package. This may be fine in some cases, but this already places a burden on the users of the package. Try to avoid these. Try to create APIs that are easy to use and hard to misuse.
Also, launching a goroutine just to send a value on a channel is a waste of resources (goroutines are cheap and light, but you shouldn't spam them whenever you can).
You do it because you don't want to get blocked. To avoid blocking, you may use a buffered channel with a "reasonable" high buffer. Yes, this doesn't solve the blocking issue, in only helps with "slow" clients receiving from the channel.
To "truly" avoid blocking without launching a goroutine, you may use non-blocking send:
select {
case ch <- msg:
default:
// ch's buffer is full, we cannot deliver now
}
If send on ch can proceed, it will happen. If not, the default branch is chosen immediately. You have to decide what to do then. Is it acceptable to "lose" a message? Is it acceptable to wait for some time until "giving up"? Or is it acceptable to launch a goroutine to do this (but then you'll be back at what we're trying to fix here)? Or is it acceptable to get blocked until the client can receive from the channel...
Choosing a reasonable high buffer, if you encounter a situation when it still gets full, it may be acceptable to block until the client can advance and receive from the message. If it can't, then your whole app might be in an unacceptable state, and it might be acceptable to "hang" or "crash".
// ISSUE2: close the channel async with a delay to ensure
// nothing will be written to the channel anymore
// via a pending goroutine from Publish()
go func(ch chan interface{}) {
time.Sleep(500 * time.Millisecond)
close(ch)
}(s.Data)
Closing a channel is a signal to the receiver(s) that no more values will be sent on the channel. So always it should be the sender's job (and responsibility) to close the channel. Launching a goroutine to close the channel, you "hand" that job and responsibility to another "entity" (a goroutine) that will not be synchronized to the sender. This may easily end up in a panic (sending on a closed channel is a runtime panic, for other axioms see How does a non initialized channel behave?). Don't do that.
Yes, this was necessary because you launched goroutines to send. If you don't do that, then you may close "in-place", without launching a goroutine, because then the sender and closer will be the same entity: the Pubsub itself, whose sending and closing operations are protected by a mutex. So solving the first issue solves the second naturally.
In general if there are multiple senders for a channel, then closing the channel must be coordinated. There must be a single entity (often not any of the senders) that waits for all senders to finish, practically using a sync.WaitGroup, and then that single entity can close the channel, safely. See Closing channel of unknown length.
I have a question similar to How to stop a goroutine with a small twist. I don't know for sure if the goroutine is running.
var quit = make(chan bool)
func f1() {
go func() {
t := time.NewTimer(time.Minute)
select {
case <-t.C:
// Do stuff
case <-quit:
return
}
}()
}
func f2() {
quit <- true
}
If f2() is called less than a minute after f1(), then the goroutine returns. However, if it is called later than one minute, the goroutine will have already returned and f2() would block.
I want f2() to cancel the goroutine if it is running and do nothing otherwise.
What I'm trying to achieve here is to perform a task if and only if it is not canceled within a minute of creation.
Clarifications:
There is nothing to stop f2() from being called more than once.
There is only one goroutine running at a time. The caller of f1() will make sure that it's not called more than once per minute.
Use contexts.
Run f1 with the context which may be cancelled.
Run f2 with the associated cancellation function.
func f1(ctx context.Context) {
go func(ctx context.Context) {
t := time.NewTimer(time.Minute)
select {
case <-t.C:
// Do stuff
case <-ctx.Done():
return
}
}(ctx)
}
func f2(cancel context.CancelFunc) {
cancel()
}
And later, to coordinate the two functions, you would do this:
ctx, cancel := context.WithCancel(context.Background())
f1(ctx)
f2(cancel)
You can also experiment with the context.WithTimeout function to incorporate externally-defined timeouts.
In the case where you don't know whether there is a goroutine running already, you can initialize the ctx and cancel variables like above, but don't pass them into anything. This avoids having to check for nil.
Remember to treat ctx and cancel as variables to be copied, not as references, because you don't want multiple goroutines to share memory - that may cause a race condition.
You can give the channel a buffer size of 1. This means you can send one value to it without blocking, even if that value is not received right away (or at all).
var quit = make(chan bool, 1)
I think the top answer is better, this is just another solution that could translate to other situations.
I am trying to create an intermediate layer between user and tcp, with Send and Receive functions. Currently, I am trying to integrate a context, so that the Send and Receive respects a context. However, I don't know how to make them respect the context's cancellation.
Until now, I got the following.
// c.underlying is a net.Conn
func (c *tcpConn) Receive(ctx context.Context) ([]byte, error) {
if deadline, ok := ctx.Deadline(); ok {
// Set the read deadline on the underlying connection according to the
// given context. This read deadline applies to the whole function, so
// we only set it once here. On the next read-call, it will be set
// again, or will be reset in the else block, to not keep an old
// deadline.
c.underlying.SetReadDeadline(deadline)
} else {
c.underlying.SetReadDeadline(time.Time{}) // remove the read deadline
}
// perform reads with
// c.underlying.Read(myBuffer)
return frameData, nil
}
However, as far as I understand that code, this only respects a context.WithTimeout or context.WithDeadline, and not a context.WithCancel.
If possible, I would like to pass that into the connection somehow, or actually abort the reading process.
How can I do that?
Note: If possible, I would like to avoid another function that reads in another goroutine and pushed a result back on a channel, because then, when calling cancel, and I am reading 2GB over the network, that doesn't actually cancel the read, and the resources are still used. If not possible in another way however, I would like to know if there is a better way of doing that than a function with two channels, one for a []byte result and one for an error.
EDIT:
With the following code, I can respect a cancel, but it doesn't abort the read.
// apply deadline ...
result := make(chan interface{})
defer close(result)
go c.receiveAsync(result)
select {
case res := <-result:
if err, ok := res.(error); ok {
return nil, err
}
return res.([]byte), nil
case <-ctx.Done():
return nil, ErrTimeout
}
}
func (c *tcpConn) receiveAsync(result chan interface{}) {
// perform the reads and push either an error or the
// read bytes to the result channel
If the connection can be closed on cancellation, you can setup a goroutine to shutdown the connection on cancellation within the Receive method. If the connection must be reused again later, then there is no way to cancel a Read in progress.
recvDone := make(chan struct{})
defer close(recvDone)
// setup the cancellation to abort reads in process
go func() {
select {
case <-ctx.Done():
c.underlying.CloseRead()
// Close() can be used if this isn't necessarily a TCP connection
case <-recvDone:
}
}()
It will be a little more work if you want to communicate the cancelation error back, but the CloseRead will provide a clean way to stop any pending TCP Read calls.
(I don't believe my issue is a duplicate of this QA: go routine blocking the others one, because I'm running Go 1.9 which has the preemptive scheduler whereas that question was asked for Go 1.2).
My Go program calls into a C library wrapped by another Go-lang library that makes a blocking call that can last over 60 seconds. I want to add a timeout so it returns in 3 seconds:
Old code with long block:
// InvokeSomething is part of a Go wrapper library that calls the C library read_something function. I cannot change this code.
func InvokeSomething() ([]Something, error) {
ret := clib.read_something(&input) // this can block for 60 seconds
if ret.Code > 1 {
return nil, CreateError(ret)
}
return ret.Something, nil
}
// This is my code I can change:
func MyCode() {
something, err := InvokeSomething()
// etc
}
My code with a go-routine, channel, and timeout, based on this Go example: https://gobyexample.com/timeouts
type somethingResult struct {
Something []Something
Err error
}
func MyCodeWithTimeout() {
ch = make(chan somethingResult, 1);
go func() {
something, err := InvokeSomething() // blocks here for 60 seconds
ret := somethingResult{ something, err }
ch <- ret
}()
select {
case result := <-ch:
// etc
case <-time.After(time.Second *3):
// report timeout
}
}
However when I run MyCodeWithTimeout it still takes 60 seconds before it executes the case <-time.After(time.Second * 3) block.
I know that attempting to read from an unbuffered channel with nothing in it will block, but I created the channel with a buffered size of 1 so as far as I can tell I'm doing it correctly. I'm surprised the Go scheduler isn't preempting my goroutine, or does that depend on execution being in go-lang code and not an external native library?
Update:
I read that the Go-scheduler, at least in 2015, is actually "semi-preemptive" and it doesn't preempt OS threads that are in "external code": https://github.com/golang/go/issues/11462
you can think of the Go scheduler as being partially preemptive. It's by no means fully cooperative, since user code generally has no control over scheduling points, but it's also not able to preempt at arbitrary points
I heard that runtime.LockOSThread() might help, so I changed the function to this:
func MyCodeWithTimeout() {
ch = make(chan somethingResult, 1);
defer close(ch)
go func() {
runtime.LockOSThread()
defer runtime.UnlockOSThread()
something, err := InvokeSomething() // blocks here for 60 seconds
ret := somethingResult{ something, err }
ch <- ret
}()
select {
case result := <-ch:
// etc
case <-time.After(time.Second *3):
// report timeout
}
}
...however it didn't help at all and it still blocks for 60 seconds.
Your proposed solution to do thread locking in the goroutine started in MyCodeWithTimeout() does not give guarantee MyCodeWithTimeout() will return after 3 seconds, and the reason for this is that first: no guarantee that the started goroutine will get scheduled and reach the point to lock the thread to the goroutine, and second: because even if the external command or syscall gets called and returns within 3 seconds, there is no guarantee that the other goroutine running MyCodeWithTimeout() will get scheduled to receive the result.
Instead do the thread locking in MyCodeWithTimeout(), not in the goroutine it starts:
func MyCodeWithTimeout() {
runtime.LockOSThread()
defer runtime.UnlockOSThread()
ch = make(chan somethingResult, 1);
defer close(ch)
go func() {
something, err := InvokeSomething() // blocks here for 60 seconds
ret := somethingResult{ something, err }
ch <- ret
}()
select {
case result := <-ch:
// etc
case <-time.After(time.Second *3):
// report timeout
}
}
Now if MyCodeWithTimeout() execution starts, it will lock the goroutine to the OS thread, and you can be sure that this goroutine will notice the value sent on the timers value.
NOTE: This is better if you want it to return within 3 seconds, but this sill will not give guarantee, as the timer that fires (sends a value on its channel) runs in its own goroutine, and this thread locking has no effect on the scheduling of that goroutine.
If you want guarantee, you can't rely on other goroutines giving the "exit" signal, you can only rely on this happening in your goroutine running the MyCodeWithTimeout() function (because since you did thread locking, you can be sure it gets scheduled).
An "ugly" solution which spins up CPU usage for a given CPU core would be:
for end := time.Now().Add(time.Second * 3); time.Now().Before(end); {
// Do non-blocking check:
select {
case result := <-ch:
// Process result
default: // Must have default to be non-blocking
}
}
Note that the "urge" of using time.Sleep() in this loop would take away the guarantee, as time.Sleep() may use goroutines in its implementation and certainly does not guarantee to return exactly after the given duration.
Also note that if you have 8 CPU cores and runtime.GOMAXPROCS(0) returns 8 for you, and your goroutines are still "starving", this may be a temporary solution, but you still have more serious problems using Go's concurrency primitives in your app (or a lack of using them), and you should investigate and "fix" those. Locking a thread to a goroutine may even make it worse for the rest of the goroutines.
I am using golang to implement a simple event driven worker. It's like this:
go func() {
for {
select {
case data := <-ch:
time.Sleep(1)
someGlobalMap[data.key] = data.value
}
}
}()
And the main function will create several goroutines, and each of them will do thing like this:
ch <- data
fmt.Println(someGlobalMap[data.key])
As you can see that, because my worker need some time to do the work, I will got nil result in my main function.How can I control this workflow properly?
EDIT: I may have misread your question, I see that you mention that main will start many producer goroutines. I thought it was many consumer goroutines, and a single producer. Leaving the answer here in case it can be useful for others looking for that pattern, though the bullet points still apply to your case.
So if I understand correctly your use-case, you can't expect to send on a channel and read the results immediately after. You don't know when the worker will process that send, you need to communicate between the goroutines, and that is done with channels. Assuming just calling a function with a return value doesn't work in your scenario, if you really need to send to a worker, then block until you have the result, you could send a channel as part of the data structure, and block-receive on it after the send, i.e.:
resCh := make(chan Result)
ch <- Data{key, value, resCh}
res := <- resCh
But you should probably try to break down the work as a pipeline of independent steps instead, see the blog post that I linked to in the original answer.
Original answer where I thought it was a single producer - multiple consumers/workers pattern:
This is a common pattern for which Go's goroutines and channels semantics are very well suited. There are a few things you need to keep in mind:
The main function will not automatically wait for goroutines to finish. If there's nothing else to do in the main, then the program exits and you don't have your results.
The global map that you use is not thread-safe. You need to synchronize access via a mutex, but there's a better way - use an output channel for results, which is already synchronized.
You can use a for..range over a channel, and you can safely share a channel between multiple goroutines. As we'll see, that makes this pattern quite elegant to write.
Playground: https://play.golang.org/p/WqyZfwldqp
For more on Go pipelines and concurrency patterns, to introduce error handling, early cancellation, etc.: https://blog.golang.org/pipelines
Commented code for the use-case you mention:
// could be a command-line flag, a config, etc.
const numGoros = 10
// Data is a similar data structure to the one mentioned in the question.
type Data struct {
key string
value int
}
func main() {
var wg sync.WaitGroup
// create the input channel that sends work to the goroutines
inch := make(chan Data)
// create the output channel that sends results back to the main function
outch := make(chan Data)
// the WaitGroup keeps track of pending goroutines, you can add numGoros
// right away if you know how many will be started, otherwise do .Add(1)
// each time before starting a worker goroutine.
wg.Add(numGoros)
for i := 0; i < numGoros; i++ {
// because it uses a closure, it could've used inch and outch automaticaly,
// but if the func gets bigger you may want to extract it to a named function,
// and I wanted to show the directed channel types: within that function, you
// can only receive from inch, and only send (and close) to outch.
//
// It also receives the index i, just for fun so it can set the goroutines'
// index as key in the results, to show that it was processed by different
// goroutines. Also, big gotcha: do not capture a for-loop iteration variable
// in a closure, pass it as argument, otherwise it very likely won't do what
// you expect.
go func(i int, inch <-chan Data, outch chan<- Data) {
// make sure WaitGroup.Done is called on exit, so Wait unblocks
// eventually.
defer wg.Done()
// range over a channel gets the next value to process, safe to share
// concurrently between all goroutines. It exits the for loop once
// the channel is closed and drained, so wg.Done will be called once
// ch is closed.
for data := range inch {
// process the data...
time.Sleep(10 * time.Millisecond)
outch <- Data{strconv.Itoa(i), data.value}
}
}(i, inch, outch)
}
// start the goroutine that prints the results, use a separate WaitGroup to track
// it (could also have used a "done" channel but the for-loop would be more complex, with a select).
var wgResults sync.WaitGroup
wgResults.Add(1)
go func(ch <-chan Data) {
defer wgResults.Done()
// to prove it processed everything, keep a counter and print it on exit
var n int
for data := range ch {
fmt.Println(data.key, data.value)
n++
}
// for fun, try commenting out the wgResults.Wait() call at the end, the output
// will likely miss this line.
fmt.Println(">>> Processed: ", n)
}(outch)
// send work, wherever that comes from...
for i := 0; i < 1000; i++ {
inch <- Data{"main", i}
}
// when there's no more work to send, close the inch, so the goroutines will begin
// draining it and exit once all values have been processed.
close(inch)
// wait for all goroutines to exit
wg.Wait()
// at this point, no more results will be written to outch, close it to signal
// to the results goroutine that it can terminate.
close(outch)
// and wait for the results goroutine to actually exit, otherwise the program would
// possibly terminate without printing the last few values.
wgResults.Wait()
}
In real-life scenarios, where the amount of work is not known ahead of time, the closing of the in-channel could come from e.g. a SIGINT signal. Just make sure no code path can send work after the channel was closed as that would panic.