I have a pipeline with goroutines connected by channels so that each goroutine will trigger another one until all have run. Put even simpler, imagine two goroutines A and B so that when A is done it should tell B it can run.
It's working fine and I have tried a few variants as I have learnt more about pipelines in Go.
Currently I have a signalling channel
ch := make(chan struct{})
go A(ch)
go B(ch)
...
that B blocks on
func B(ch <-chan struct{}) {
<-ch
...
and A closes when done
func A(ch chan struct{}) {
defer close(ch)
...
}
This works fine and I also tried, instead of closing, sending an empty struct struct{} in A().
Is there any difference between closing the channel or sending the empty struct? Is either way cheaper / faster / better?
Naturally, sending any other type in the channel takes up "some" amount of memory, but how is it with the empty struct? Close is just part of the channel so not "sent" as such even if information is passed between goroutines.
I'm well aware of premature optimization. This is only to understand things, not to optimize anything.
Maybe there's an idiomatic Go way to do this even?
Thanks for any clarification on this!
closing a channel indicates that there will be no more sends on that channel. It's usually preferable, since you would get a panic after that point in the case of an inadvertent send or close (programming error). A close also can signal multiple receivers that there are no more messages, which you can't as easily coordinate by just sending sentinel values.
Naturally, sending any other type in the channel takes up "some" amount of memory, but how is it with the empty struct?
There's no guarantee that it does take any extra memory in an unbuffered channel (it's completely an implementation detail). The send blocks until the receive can proceed.
Close is just part of the channel so not "sent" as such even if information is passed between goroutines.
There's no optimization here, close is simply another type of message that can be sent to a channel.
Each construct has a clear meaning, and you should use the appropriate one.
Send a sentinel value if you need to signal one receiver, and keep the channel open to send more values.
Close the channel if this is the final message, possibly signal multiple receivers, and it would be an error to send or close again.
You can receive from closed channel by multiple goroutines and they will never block. It's a main advantage. It's one_to_many pattern.
finish := make(chan struct{}) can be used in many_to_one pattern when many concurrent runners want to report things done, and outsider will not panic.
It's not about memory consumption.
Related
I am reading the go programming language book and there is this example in Chapter 8.4
func mirroredQuery() string{
responses := make(chan string, 3)
go func() { responses <- request("asia.gopl.io") }()
go func() { responses <- request("americas.gopl.io") }()
go func() { responses <- request("europe.gopl.io") }()
return <- responses // return the quickest response
}
There is also this comment
Had we used an unbuffered channel, the two slower goroutines would have gotten stuck trying to send their responses on a channel from which no goroutine will ever receive.
This comment itself makes sense. But what happens to the two slow goroutines when mirroredQuery returns in the buffered case? Do they still run to finish or get cancelled?
EDIT: I understand that if the main goroutnine exits, then the 2 slower gorountines will 'evaporate' no matter they are running or not. But what if the main goroutine is still running, mirroredQuery() has already returned, would the 2 slow goroutines run to end? Basically, does responses still exist after mirroredQuery returns? If so, then it seems the 2 slow goroutines can finish in principle; if not, then we still have leakage just like the unbuffered case?
When the main goroutine returns, the entire runtime system quits, rather abruptly. Hence any goroutines that are stuck waiting to send on an unbuffered or full channel simply ... cease to exist. They're not canceled, nor do they run, nor do they wait. Think of it as the flash paper being set on fire.
One can call this a goroutine leak, the same way one can refer to any resources (such as open files) not closed-or-freed before a program terminates a "leak". But since the entire process terminates, there's nothing left. There's no real leak here. It's not a tidy cleanup, but the system does clean up.
Here's a Playground link, as an example.
(If you make use of things not defined by the Go system itself, you could get various leaks that way. For instance, in the old System V Shared Memory world, you can create shared memory segments (shm_open) and if you never close and unlink them, they persist. This is by design: they're meant to act a lot like files in a file system, except that they exist in memory, rather than on some sort of disk drive or whatever. But this is far outside normal everyday Go programming.)
Re your edit: if the main goroutine has not exited, so that the program is still running, the other goroutines continue to run (or wait) until they run out of things to do and return themselves, or do something that causes them to exit (such as call runtime.Goexit, or do something that causes a panic). In this case, that's: wait for a response, then send the response into the channel, then return. Assuming they get a response, they'll put the response into the channel. Assuming that putting the response into the channel works (does not panic and not block), they will then return. Having returned, they are done and they evaporate. The channel itself persists and holds the strings: this is a resource leak, albeit a minor one, especially in a toy program.
If there are no references left to the channel itself, the channel itself will be garbage-collected, along with the strings in it; this cleans up the leaked resources. Since we assume that mirroredQuery has returned, and that at this point the last of the spun-off goroutines has also returned, that's the last reference to the channel, so now the channel can be GCed. (Whether and when this happens is up to the runtime.) Until the last of these goroutines finishes, there's still at least one reference to the channel, preventing the channel (and hence the strings) from being GCed.
Had the channel been unbuffered, the two "losing" goroutines would block in the attempt to send into the channel. That would cause those goroutines to remain, which in turn would cause the channel to remain, which in turn would cause the resources to remain allocated until the program as a whole terminates. So that would be "bad".
Had mirroredQuery closed the channel, the two "losing" goroutines could attempt to send on a closed channel, which would cause them to invoke the panic code, which would kill the program. That too would be "bad". The simplest code that achieves the desired result is to make the channel buffered.
Should one of the goroutines wait (for a response) for several years, that would hold those "leaked" resources for all those years. That would also be "bad" (slightly), so we'd want to make sure that they don't wait forever. But that's impractical in a small demonstration program.
I am going through a tutorial on building web servers using go.
The author, instead of directly using the http.ListenAndServe() method, he creates the http.Server struct.
He then proceeds by:
creating a buffered channel for listening for errors
serverErrors := make(chan errors, 1)
spawning the http listening goroutine that binds to that channel
go func(){
fmt.Println("starting...")
serverErrors <- api.ListenAndServe()
}()
The reason behind using a buffered channel is according to the instructor
so that the goroutine can exit if we do not collect this error
There is indeed below in a program a select block where the errors coming from this channel are being collected.
Can anyone pls help me understand how the goroutine gets to exit if we don't collect the error?
What would be the practical difference had we used an unbuffered channel?
Short answer:
For any channel (buffered or not), channel reads block if nothing is written to the channel.
For non-buffered channels, channel writes will block if no one is listening.
It is a common technique with error-channels (since only one item will ever be written to the channel), to make it a buffered channel of size 1. It ensures the write will happen without blocking - and the writer goroutine can continue on its way and return.
Therefore the service does not relying on the client caller reading from the error channel to perform its cleanup.
Note: to reclaim a channel re GC, it only has to go out of scope - it does not need to be fully drained. Nor does it need to be closed. Once it goes out of scope from the both ends, it will be GC'ed.
If you refer the code for ListenAndServe(), you'll notice the following comments on how it works. Quoting from there itself:
// ListenAndServe always returns a non-nil error. After Shutdown or Close,
// the returned error is ErrServerClosed.
Also,
// When Shutdown is called, Serve, ListenAndServe, and
// ListenAndServeTLS immediately return ErrServerClosed. Make sure the
// program doesn't exit and waits instead for Shutdown to return.
Your select block is waiting for Shutdown (error) considering that you're gracefully handling the server's shutdown and doesn't let the goroutine exit before it gracefully closes.
In the case of func (srv *Server) Close() (eg. Most use defer srv.Close(), right?):
// Close immediately closes all active net.Listeners and any
// connections in state StateNew, StateActive, or StateIdle. For a
// Close returns any error returned from closing the Server's
// underlying Listener(s).
// graceful shutdown, use Shutdown.
So, the same explanation as above carries of using the select block.
Now, let's categorize channels as buffered and unbuffered, and if we do care about the guarantee of delivery of the signal (communication with the channel), then unbuffered one ensures it. Whereas, if the buffered channel (size = 1) which is in your case, then it ensures delivery but might be delayed.
Let's elaborate unbuffered channels:
A send operation on an unbuffered channel blocks the sending goroutine until another
goroutine executes a corresponding receive on that same channel, at which point the value
is transmitted and both goroutines may continue
Conversely, if received on the channel earlier (<-chan) than send operation, then the
receiving goroutine is blocked until the corresponding send operation occurs on the
same channel on another goroutine.
Aforementioned points for unbuffered channels indicate synchronous nature.
Remember, func main() is also a goroutine.
Let's elaborate buffered channels:
A send operation on a buffered channel pushes an element at the back of the queue,
and a receive operation pops an element from the front of the queue.
1. If the channel is full, the send operation blocks its goroutine until space is made available by another goroutine's receive.
2. If the channel is empty, a receive operation blocks until a value is sent by another goroutine.
So in your case size of the channel is 1. The other sender goroutine can send in a non-blocking manner as the receiver channel of the other goroutine dequeues it as soon as it receives. But, if you remember, I mentioned delayed delivery for the channel with size 1 as we don't how much time it'll take for the receiver channel goroutine to return.
Hence, to block the sender goroutine, select block is used. And from the referenced code's documentation, you can see
// Make sure the program doesn't exit and waits instead for Shutdown to return.
Also, for more clarity, you can refer: Behaviour of channels
The author explains it with pure clarity.
I have a buffered channel that are read by multiple (4 in this example) go routines.
queue := make(chan string, 10000) // a large buffered channel
Each go routine checks the number of elements available in the channel and process them all.
for i :=0; i< 4; i++{ // spun 4 go routines
go func() {
for {
for elem := range queue {
// do something with the elem from the channel
}
}
}
}
Will multiple go routines collide on the reads? In other words, could different go routine grab the same elem in the channel, or while one go routine is reading the buffer, the other go routines already read and processed some of the elements? How to block other go routines from reading while one go routine is reading?
Simple answer: no. Elements placed on a Go channel can only be read once, regardless of how many goroutines are trying to read off the channel at the same time, and that applies regardless of whether the channel is buffered or not. There's no possibility that an element will be read by two different goroutines unless that element was sent to the channel more than once. The only thing that buffering does, with regards to channel semantics, is remove the necessity for the read and write to occur synchronously.
In other words, could different go routine grab the same elem in the channel, or while one go routine is reading the buffer, the other go routines already read and processed some of the elements?
Nope...
I believe a misunderstanding is in difference between non-blocking and thread safe concepts.
Non blocking (buffered) channels
Sends to a buffered channel block only when the buffer is full.
Buffered channels just like it said have buffers to store some amount of items. It allows reading goroutine to read without await for writing goroutine put an item to a channel on condition something already written to a channel. If a channel unbuffered it can contain just single item therefore it requires to block channel for writing before a written item be withdrawn. "Blocking/non-blocking" concept doesn't related to "thread safe" concept and non-blocking doesn't mean not thread safe.
Thread safety of Go channels
Go channels are thread safe in all available ways of use. Channel is a reference type so once allocated with make channel could be passed by value because it has implicit pointer to a single memory slot. Obviously contained in a channel item never be copied and couldn't be read twice.
The question is in the title. Let's say I have several goroutines (more than 100) all of which eventually send data to one chan (name it mychan := make(chan int)) One another goroutine does <- mychan in an endless for loop Is it okay or the chan can happen to lose some data? Should I use buffered chan instead? Or perhaps I am to create a chan and a "demon" goroutine that will extract message for each worker goroutine?
If something has been successfully sent into the channel then no, it can't be lost in correctly working environment (I mean if you're tampering with your memory or you have bit flips due to cosmic rays then don't expect anything of course).
Message is successfully sent when ch <- x returns. Otherwise, if it panics, it's not really being sent and if you don't recover than you could claim it's lost (however, it would be lost due to application logic). Panic can happen if channel is closed or, say, you're out of memory.
Similarly if sender is putting into the channel in non-blocking mode (by using select), you should have a sufficient buffer in your channel, because messages can be "lost" (although somehow intentionally). For example signal.Notify is working this way:
Package signal will not block sending to c: the caller must ensure that c has sufficient buffer space to keep up with the expected signal rate.
No, they can't be lost.
While the language spec does not in any way impose any particular implementation on channels, you can think of them as semaphores protecting either a single value (for the single message) or an array/list of them (for buffered channels).
The semantics are then enforced in such a way that as soon as a goroutine wants to send a message to a channel, it tries to acquire a free data slot using that semaphore, and then either succeeds at sending—there's a free slot for its message—or blocks—when there isn't. As soon as such a slot appears—someone has received an existing message—the sending succeeds and the sending goroutine gets unblocked.
This is a simplified explanation. In other words, channels in Go is not like message queues which usually are happy with losing messages.
On a side note, I'm not really sure what happens if the receiver panics in some specific state when it's about to receive your message. In other words, I'm not sure whether Go guarantees that the message is either sent or not in the presence of a receiver panicking in an unfortunate moment.
Oh, and there's that grey area of the main goroutine exiting (that one running the main.main() function): the spec states clear than the main goroutine does not wait for any other goroutines to complete when it exits. So unless you somehow arrange for the synchronized controlled shutdown of all your spawned goroutines, I believe they may lose messages. On the other hand, in this case the world is collapsing anyway…
Message can not be lost. It can be not sent.Order of goroutines execution not defined. So your endless for loop can receive from only one worker all time, and even can sleep if it isn't in main thread. To be sure your queue works in regular fashion you better explicitly in 'main' receive messages for each worker.
I've just started using concurrency in go. I have experience from concurrency in other languages and was saddened that go throws a panic if you're trying to write to a closed channel.
This pattern would have been really useful because you can decouple the lifecycle of actors and make them independent. This allows you to not have to synchronize the cleanup of them. Essentially I can let the reader close the channel before shutting down and let an arbitrary number of writers be notified and stop blocking (cancellation) via a write error on the channel.
I therefore wrote a generic function to handle this form of message passing:
/// Sends a message to a remote general channel.
/// Returns true if the message was sent (the send stopped blocking) or false if
/// the sender closed the channel.
/// When the channel is unbuffered, this function returning true means that the
/// sender has received the message and is acting on it.
func SendRemoteCmd(ch chan interface{}, msg interface{}) bool {
defer func() {
recover()
}()
ch <- msg
return true
}
It works great, I'm just afraid that golang developers will get angry, call me and tell me they will "find me" when they read this code. There is also probably some good reason why the language gods has decided that this should be a panic in the first place. If that is the case, what design do you suggest instead?
Because sending to a closed channel is a program error, your channel sends to channels that can be closed must be synchronized. Generally the correct pattern here is to obtain a lock of some kind prior to attempting to send, if the channel can be closed by a third party.
This isn't particularly interesting until you're attempting to send on a channel that can be closed, as part of a select statement that involves other possible operations. In that case, a common pattern is to set channels to nil if operations on them shouldn't, or can't proceed. I have a very complex example of this in the connection.writeOptimizer function in my torrent client here.
Note that there is carefully considered ownership of resources involved in the write pipeline in the example, and this is a good way to prevent issues with for example, closing channels. writeOptimizer effectively owns the connection.writeCh, and signals downstream that no further data is coming by closing it. It's also the only goroutine that is sending to that routine, thereby avoiding to having to synchronize writes with the closing of the channel by some other means.