I've just started using concurrency in go. I have experience from concurrency in other languages and was saddened that go throws a panic if you're trying to write to a closed channel.
This pattern would have been really useful because you can decouple the lifecycle of actors and make them independent. This allows you to not have to synchronize the cleanup of them. Essentially I can let the reader close the channel before shutting down and let an arbitrary number of writers be notified and stop blocking (cancellation) via a write error on the channel.
I therefore wrote a generic function to handle this form of message passing:
/// Sends a message to a remote general channel.
/// Returns true if the message was sent (the send stopped blocking) or false if
/// the sender closed the channel.
/// When the channel is unbuffered, this function returning true means that the
/// sender has received the message and is acting on it.
func SendRemoteCmd(ch chan interface{}, msg interface{}) bool {
defer func() {
recover()
}()
ch <- msg
return true
}
It works great, I'm just afraid that golang developers will get angry, call me and tell me they will "find me" when they read this code. There is also probably some good reason why the language gods has decided that this should be a panic in the first place. If that is the case, what design do you suggest instead?
Because sending to a closed channel is a program error, your channel sends to channels that can be closed must be synchronized. Generally the correct pattern here is to obtain a lock of some kind prior to attempting to send, if the channel can be closed by a third party.
This isn't particularly interesting until you're attempting to send on a channel that can be closed, as part of a select statement that involves other possible operations. In that case, a common pattern is to set channels to nil if operations on them shouldn't, or can't proceed. I have a very complex example of this in the connection.writeOptimizer function in my torrent client here.
Note that there is carefully considered ownership of resources involved in the write pipeline in the example, and this is a good way to prevent issues with for example, closing channels. writeOptimizer effectively owns the connection.writeCh, and signals downstream that no further data is coming by closing it. It's also the only goroutine that is sending to that routine, thereby avoiding to having to synchronize writes with the closing of the channel by some other means.
Related
I am going through a tutorial on building web servers using go.
The author, instead of directly using the http.ListenAndServe() method, he creates the http.Server struct.
He then proceeds by:
creating a buffered channel for listening for errors
serverErrors := make(chan errors, 1)
spawning the http listening goroutine that binds to that channel
go func(){
fmt.Println("starting...")
serverErrors <- api.ListenAndServe()
}()
The reason behind using a buffered channel is according to the instructor
so that the goroutine can exit if we do not collect this error
There is indeed below in a program a select block where the errors coming from this channel are being collected.
Can anyone pls help me understand how the goroutine gets to exit if we don't collect the error?
What would be the practical difference had we used an unbuffered channel?
Short answer:
For any channel (buffered or not), channel reads block if nothing is written to the channel.
For non-buffered channels, channel writes will block if no one is listening.
It is a common technique with error-channels (since only one item will ever be written to the channel), to make it a buffered channel of size 1. It ensures the write will happen without blocking - and the writer goroutine can continue on its way and return.
Therefore the service does not relying on the client caller reading from the error channel to perform its cleanup.
Note: to reclaim a channel re GC, it only has to go out of scope - it does not need to be fully drained. Nor does it need to be closed. Once it goes out of scope from the both ends, it will be GC'ed.
If you refer the code for ListenAndServe(), you'll notice the following comments on how it works. Quoting from there itself:
// ListenAndServe always returns a non-nil error. After Shutdown or Close,
// the returned error is ErrServerClosed.
Also,
// When Shutdown is called, Serve, ListenAndServe, and
// ListenAndServeTLS immediately return ErrServerClosed. Make sure the
// program doesn't exit and waits instead for Shutdown to return.
Your select block is waiting for Shutdown (error) considering that you're gracefully handling the server's shutdown and doesn't let the goroutine exit before it gracefully closes.
In the case of func (srv *Server) Close() (eg. Most use defer srv.Close(), right?):
// Close immediately closes all active net.Listeners and any
// connections in state StateNew, StateActive, or StateIdle. For a
// Close returns any error returned from closing the Server's
// underlying Listener(s).
// graceful shutdown, use Shutdown.
So, the same explanation as above carries of using the select block.
Now, let's categorize channels as buffered and unbuffered, and if we do care about the guarantee of delivery of the signal (communication with the channel), then unbuffered one ensures it. Whereas, if the buffered channel (size = 1) which is in your case, then it ensures delivery but might be delayed.
Let's elaborate unbuffered channels:
A send operation on an unbuffered channel blocks the sending goroutine until another
goroutine executes a corresponding receive on that same channel, at which point the value
is transmitted and both goroutines may continue
Conversely, if received on the channel earlier (<-chan) than send operation, then the
receiving goroutine is blocked until the corresponding send operation occurs on the
same channel on another goroutine.
Aforementioned points for unbuffered channels indicate synchronous nature.
Remember, func main() is also a goroutine.
Let's elaborate buffered channels:
A send operation on a buffered channel pushes an element at the back of the queue,
and a receive operation pops an element from the front of the queue.
1. If the channel is full, the send operation blocks its goroutine until space is made available by another goroutine's receive.
2. If the channel is empty, a receive operation blocks until a value is sent by another goroutine.
So in your case size of the channel is 1. The other sender goroutine can send in a non-blocking manner as the receiver channel of the other goroutine dequeues it as soon as it receives. But, if you remember, I mentioned delayed delivery for the channel with size 1 as we don't how much time it'll take for the receiver channel goroutine to return.
Hence, to block the sender goroutine, select block is used. And from the referenced code's documentation, you can see
// Make sure the program doesn't exit and waits instead for Shutdown to return.
Also, for more clarity, you can refer: Behaviour of channels
The author explains it with pure clarity.
Due to Go's philosophy a channel should be closed by the sender only. When a channel is bidirectional where should it be closed?
The question is a little bit hard to interpret since go does not have bidirectional channels. Data flows only in a single direction - from a writer to a reader.
What you can have in Go is multiple readers or writers on a channel. Whether this makes sense depends a little bit on the context. If you have multiple writers you would need some kind of synchronization for the close operation, e.g. a mutex. However you would then also need to lock this before each write operation in order to ensure that you don't write on a closed channel. If you don't really need the information that the channel was closed on receiver side you could also simply omit the close, as the garbage collector will also collect unclosed channels just fine.
The question is in the title. Let's say I have several goroutines (more than 100) all of which eventually send data to one chan (name it mychan := make(chan int)) One another goroutine does <- mychan in an endless for loop Is it okay or the chan can happen to lose some data? Should I use buffered chan instead? Or perhaps I am to create a chan and a "demon" goroutine that will extract message for each worker goroutine?
If something has been successfully sent into the channel then no, it can't be lost in correctly working environment (I mean if you're tampering with your memory or you have bit flips due to cosmic rays then don't expect anything of course).
Message is successfully sent when ch <- x returns. Otherwise, if it panics, it's not really being sent and if you don't recover than you could claim it's lost (however, it would be lost due to application logic). Panic can happen if channel is closed or, say, you're out of memory.
Similarly if sender is putting into the channel in non-blocking mode (by using select), you should have a sufficient buffer in your channel, because messages can be "lost" (although somehow intentionally). For example signal.Notify is working this way:
Package signal will not block sending to c: the caller must ensure that c has sufficient buffer space to keep up with the expected signal rate.
No, they can't be lost.
While the language spec does not in any way impose any particular implementation on channels, you can think of them as semaphores protecting either a single value (for the single message) or an array/list of them (for buffered channels).
The semantics are then enforced in such a way that as soon as a goroutine wants to send a message to a channel, it tries to acquire a free data slot using that semaphore, and then either succeeds at sending—there's a free slot for its message—or blocks—when there isn't. As soon as such a slot appears—someone has received an existing message—the sending succeeds and the sending goroutine gets unblocked.
This is a simplified explanation. In other words, channels in Go is not like message queues which usually are happy with losing messages.
On a side note, I'm not really sure what happens if the receiver panics in some specific state when it's about to receive your message. In other words, I'm not sure whether Go guarantees that the message is either sent or not in the presence of a receiver panicking in an unfortunate moment.
Oh, and there's that grey area of the main goroutine exiting (that one running the main.main() function): the spec states clear than the main goroutine does not wait for any other goroutines to complete when it exits. So unless you somehow arrange for the synchronized controlled shutdown of all your spawned goroutines, I believe they may lose messages. On the other hand, in this case the world is collapsing anyway…
Message can not be lost. It can be not sent.Order of goroutines execution not defined. So your endless for loop can receive from only one worker all time, and even can sleep if it isn't in main thread. To be sure your queue works in regular fashion you better explicitly in 'main' receive messages for each worker.
I have a pipeline with goroutines connected by channels so that each goroutine will trigger another one until all have run. Put even simpler, imagine two goroutines A and B so that when A is done it should tell B it can run.
It's working fine and I have tried a few variants as I have learnt more about pipelines in Go.
Currently I have a signalling channel
ch := make(chan struct{})
go A(ch)
go B(ch)
...
that B blocks on
func B(ch <-chan struct{}) {
<-ch
...
and A closes when done
func A(ch chan struct{}) {
defer close(ch)
...
}
This works fine and I also tried, instead of closing, sending an empty struct struct{} in A().
Is there any difference between closing the channel or sending the empty struct? Is either way cheaper / faster / better?
Naturally, sending any other type in the channel takes up "some" amount of memory, but how is it with the empty struct? Close is just part of the channel so not "sent" as such even if information is passed between goroutines.
I'm well aware of premature optimization. This is only to understand things, not to optimize anything.
Maybe there's an idiomatic Go way to do this even?
Thanks for any clarification on this!
closing a channel indicates that there will be no more sends on that channel. It's usually preferable, since you would get a panic after that point in the case of an inadvertent send or close (programming error). A close also can signal multiple receivers that there are no more messages, which you can't as easily coordinate by just sending sentinel values.
Naturally, sending any other type in the channel takes up "some" amount of memory, but how is it with the empty struct?
There's no guarantee that it does take any extra memory in an unbuffered channel (it's completely an implementation detail). The send blocks until the receive can proceed.
Close is just part of the channel so not "sent" as such even if information is passed between goroutines.
There's no optimization here, close is simply another type of message that can be sent to a channel.
Each construct has a clear meaning, and you should use the appropriate one.
Send a sentinel value if you need to signal one receiver, and keep the channel open to send more values.
Close the channel if this is the final message, possibly signal multiple receivers, and it would be an error to send or close again.
You can receive from closed channel by multiple goroutines and they will never block. It's a main advantage. It's one_to_many pattern.
finish := make(chan struct{}) can be used in many_to_one pattern when many concurrent runners want to report things done, and outsider will not panic.
It's not about memory consumption.
I'm using streadway's amqp library to connect with a rabbitmq server.
The library provides a channel.Consume() function which returns a "<- chan Delivery".
It also provides a channel.Get() function which returns a "Delivery" among other things.
I've to implement a pop() functionality, and I'm using channel.Get(). However, the documentation says:
"In almost all cases, using Channel.Consume will be preferred."
Does the preferred here means recommended? Are there any disadvantages of using channel.Get() over channel.Consume()? If yes, how do I use channel.Consume() to implement a Pop() function?
As far as I can tell from the docs, yes, "preferred" does mean "recommended".
It seems that channel.Get() doesn't provide as many features as channel.Consume(), as well as being more readily usable in concurrent code due to it's returning a chan of Delivery, as opposed to each individual Delivery separately.
The extra features mentioned are exclusive, noLocal and noWait, as well as an optional Table of args "that have specific semantics for the queue or server."
To implement a Pop() function using channel.Consume() you could, to link to some code fragments from the amqp example consumer, create a channel using the Consume() function, create a function to handle the chan of Delivery which will actually implement your Pop() functionality, then fire off the handle() func in a goroutine.
The key to this is that the channel (in the linked example) will block on sending if nothing is receiving. In the example, the handle() func uses range to process the entire channel until it's empty. Your Pop() functionality may be better served by a function that just receives the last value from the chan and returns it. Every time it's run it will return the latest Delivery.
EDIT: Example function to receive the latest value from the channel and do stuff with it (This may not work for your use case, it may be more useful if the function sent the Delivery on another chan to another function to be processed. Also, I haven't tested the code below, it may be full of errors)
func handle(deliveries <-chan amqp.Delivery, done chan error) {
select {
case d = <-deliveries:
// Do stuff with the delivery
// Send any errors down the done chan. for example:
// done <- err
default:
done <- nil
}
}
It really depend of what are you trying to do. If you want to get only one message from queue (first one) you probably should use basic.get, if you are planning to process all incoming messages from queue - basic.consume is what you want.
Probably, it is not platform or library specific question but rather protocol understanding question.
UPD
I'm not familiar with it go language well, so I will try to give you some brief on AMQP details and describe use cases.
You may get in troubles and have an overhead with basic.consume sometimes:
With basic.consume you have such workflow:
Send basic.consume method to notify broker that you want to receive messages
while this is a synchronous method, wait for basic.consume-ok message from broker
Start listening to basic.deliver message from server
this is an asynchronous method and you should take care by yourself situations where no messages on server available, e.g. limit reading time
With basic.get you have such workflow:
send synchronous method basic.get to broker
wait for basic.get-ok method, which hold message(s) or basic.empty method, which denote situation no message available on server
Note about synchronous and asynchronous methods: synchronous is expected to have some response, whether asynchronous doesn't
Note on basic.qos method prefetch-count property: it is ignored when no-ack property is set on basic.consume or basic.get.
Spec has a note on basic.get: "this method provides a direct access to the messages in a queue using a synchronous dialogue that is designed for specific types of application where synchronous functionality is more important than performance" which applies for continuous messages consumption.
My personal tests show that getting in row 1000 messages with basic.get (0.38659715652466) is faster than getting 1000 messages with basic.consume one by one (0.47398710250854) on RabbitMQ 3.0.1, Erlang R14B04 in average more than 15%.
If consume only one message in main thread is your case - probably you have to use basic.get.
You still can consume only one message asynchronously, for example in separate thread or use some event mechanism. It would be better solution for you machine resource sometimes, but you have to take care about situation where no message available in queue.
If you have to process message one by one it is obvious that basic.consume should be used, I think