When to use a finilizer to close a channel? - go

This is the second of two questions (this is the first one) to help make sense of the Go generics proposal examples.
In particular I am having trouble-so far-understanding two bits of code from the examples section of the proposal entitled "Channels":
The second issue I have is in the following definition of the the Ranger function.
Namely, I don't understand the need to call runtime.SetFinalizer(r,r.finalize) where in fact what the finalize) method of the *Receiver[T] type is supposed to do is simply to signal that the receiver is done receiving values (close(r.done)).
The way I see it, by providing a finalizer for a *Receiver[T] the code is delegating the obligation to close the receiver to the runtime.
The way I understand this piece of code, is that the *Receiver[T] signals to the *Sender[T] that it won't be receiving any more values when the GC decides that the former is unreachable ie no more references are available to it.
If my interpretation is correct, why wait that long for the receiver to signal it's done? Is't it possible, to explicitly handle the close operation in the code somehow?
Thanks.
Code:
// Ranger provides a convenient way to exit a goroutine sending values
// when the receiver stops reading them.
//
// Ranger returns a Sender and a Receiver. The Receiver provides a
// Next method to retrieve values. The Sender provides a Send method
// to send values and a Close method to stop sending values. The Next
// method indicates when the Sender has been closed, and the Send
// method indicates when the Receiver has been freed.
func Ranger[T any]() (*Sender[T], *Receiver[T]) {
c := make(chan T)
d := make(chan bool)
s := &Sender[T]{values: c, done: d}
r := &Receiver[T]{values: c, done: d}
// The finalizer on the receiver will tell the sender
// if the receiver stops listening.
runtime.SetFinalizer(r, r.finalize)
return s, r
}
// A Sender is used to send values to a Receiver.
type Sender[T any] struct {
values chan<- T
done <-chan bool
}
// Send sends a value to the receiver. It reports whether any more
// values may be sent; if it returns false the value was not sent.
func (s *Sender[T]) Send(v T) bool {
select {
case s.values <- v:
return true
case <-s.done:
// The receiver has stopped listening.
return false
}
}
// Close tells the receiver that no more values will arrive.
// After Close is called, the Sender may no longer be used.
func (s *Sender[T]) Close() {
close(s.values)
}
// A Receiver receives values from a Sender.
type Receiver[T any] struct {
values <-chan T
done chan<- bool
}
// Next returns the next value from the channel. The bool result
// reports whether the value is valid. If the value is not valid, the
// Sender has been closed and no more values will be received.
func (r *Receiver[T]) Next() (T, bool) {
v, ok := <-r.values
return v, ok
}
// finalize is a finalizer for the receiver.
// It tells the sender that the receiver has stopped listening.
func (r *Receiver[T]) finalize() {
close(r.done)
}

TLDR: Your understanding is correct, the done channel may simply be closed by the receiver "manually" to signal the lost of interest (to stop the communication and relieve the sender from its duty).
Channels are used for goroutines to communicate in a concurrency safe manner. The idiomatic use is that the sender party keeps sending values, and once there are no more values to send, it is signaled by the sender closing the channel.
The receiver party keeps receiving from the channel until it is closed, which signals there won't be (can't be) any more values coming on the channel. This is usually / easiest done using a for range over the channel.
So usually the receiver has to keep receiving until the channel is closed, else the sender party would get blocked forever. Often this is OK / sufficient.
The demonstrated Ranger() construct is for the non-general case when there's need / possibility for the receiver to stop the communication.
A single channel does not provide a mean for the receiver party to signal the sender that the receiver has lost interest, and no more values are needed. This requires an additional channel which the receiver has to close (and the sender has to monitor of course). As long as there's a single receiver, this is also OK. But if there are multiple receivers, closing the done channel gets a little more complicated: it's not OK for all the receivers to close the done channel: closing an already closed channel panics. So the receivers also have to be coordinated, so only a single receiver, or rather the coordinator party itself closes the done channel, once only; and this has to happen after all receivers "abandoned" the channel.
Ranger() helps with this, and in a simple way by delegating closing the done channel using a finalizer. This is acceptable because usually it wouldn't even be the receiver(s) task to stop the communication, but in the rare case if this still arises, it will be dealt with (in an easy way, without the need of an additional, coordinator goroutine).

Related

How does this go-routine in an anonymous function exactly work?

func (s *server) send(m *message) error {
go func() {
s.outgoingMessageChan <- message
}()
return nil
}
func main(s *server) {
for {
select {
case <-someChannel:
// do something
case msg := <-s.outGoingMessageChan:
// take message sent from "send" and do something
}
}
}
I am pulling out of this s.outgoingMessageChan in another function, before using an anonymous go function, a call to this function would usually block - meaning whenever send is called, s.outgoingMessageChan <- message would block until something is pulling out of it. However after wrapping it like this it doesn't seem to block anymore. I understand that it kind of sends this operation to the background and proceeds as usual, but I'm not able to wrap my head around how this doesn't affect the current function call.
Each time send is called a new goroutine is created, and returns immediately. (BTW there is no reason to return an error if there can never be an error.) The goroutine (which has it's own "thread" of execution) will block if nothing is ready to read from the chan (assuming it's unbuffered). Once the message is read off the chan the goroutine will continue but since it does nothing else it will simply end.
I should point out that there is no such thing as an anonymous goroutine. Goroutines have no identifier at all (except for a number that you should only use for debugging purposes). You have an anonymous function which you put the go keyword in front causing it to run in a separate goroutine.
For a send function that blocks as you seem to want then just use:
func (s *server) send(m *message) {
s.outgoingMessageChan <- message
}
However, I can't see any point in this function (though it would be inlined and just as efficient as not using a function).
I suspect you may be calling send many times before anything is read from the chan. In this case many new goroutines will be created (each time you call send) which will all block. Each time the chan is read from one will unblock delivering its value and that goroutine will terminate. Doing this you are simply creating an inefficient buffering mechanism. Moreover, if send is called for a prolonged period at a faster rate than the values can be read from the chan then you will eventually run out of memory. Better would be to use a buffered chan (and no goroutines) that once it (the chan) became full exerted "back-pressure" on whatever was producing the messages.
Another point is that the function name main is used to identify the entry point to a program. Please use another name for your 2nd function above. It also seems like it should be a method (using s *server receiver) than a function.

Why does net/rpc/client's Go method require a buffered channel?

I am unable to figure out why the method requires you to specifically provide a buffered channel.
From the documentation,
func (*Client) Go
func (client *Client) Go(serviceMethod string, args interface{}, reply interface{}, done chan *Call) *Call
Go invokes the function asynchronously. It returns the Call structure
representing the invocation. The done channel will signal when the
call is complete by returning the same Call object. If done is nil, Go
will allocate a new channel. If non-nil, done must be buffered or Go
will deliberately crash.
LeGEC alluded to this in their comment.
Digging in further you will find this bit in client.go
func (call *Call) done() {
select {
case call.Done <- call:
// ok
default:
// We don't want to block here. It is the caller's responsibility to make
// sure the channel has enough buffer space. See comment in Go().
if debugLog {
log.Println("rpc: discarding Call reply due to insufficient Done chan capacity")
}
}
}
From what you can see here is that the library expects the call to be completely asynchronous. This means the done channel must have enough capacity to completely decouple the two processes (i.e. no blocking at all).
Further when the select statement is used as seen, it is the idiomatic way to do a non-blocking channel operation.

Is it safe to hide sending to channel behind function call

I have a struct called Hub with a Run() method which is executed in its own goroutine. This method sequentially handles incoming messages. Messages arrive concurrently from multiple producers (separate goroutines). Of course I use a channel to accomplish this task. But now I want to hide the Hub behind an interface to be able to choose from its implementations. So, using a channel as a simple Hub's field isn't appropriate.
package main
import "fmt"
import "time"
type Hub struct {
msgs chan string
}
func (h *Hub) Run() {
for {
msg, hasMore := <- h.msgs
if !hasMore {
return
}
fmt.Println("hub: msg received", msg)
}
}
func (h *Hub) SendMsg(msg string) {
h.msgs <- msg
}
func send(h *Hub, prefix string) {
for i := 0; i < 5; i++ {
fmt.Println("main: sending msg")
h.SendMsg(fmt.Sprintf("%s %d", prefix, i))
}
}
func main() {
h := &Hub{make(chan string)}
go h.Run()
for i := 0; i < 10; i++ {
go send(h, fmt.Sprintf("msg sender #%d", i))
}
time.Sleep(time.Second)
}
So I've introduced Hub.SendMsg(msg string) function that just calls h.msgs <- msg and which I can add to the HubInterface. And as a Go-newbie I wonder, is it safe from the concurrency perspective? And if so - is it a common approach in Go?
Playground here.
Channel send semantics do not change when you move the send into a method. Andrew's answer points out that the channel needs to be created with make to send successfully, but that was always true, whether or not the send is inside a method.
If you are concerned about making sure callers can't accidentally wind up with invalid Hub instances with a nil channel, one approach is to make the struct type private (hub) and have a NewHub() function that returns a fully initialized hub wrapped in your interface type. Since the struct is private, code in other packages can't try to initialize it with an incomplete struct literal (or any struct literal).
That said, it's often possible to create invalid or nonsense values in Go and that's accepted: net.IP("HELLO THERE BOB") is valid syntax, or net.IP{}. So if you think it's better to expose your Hub type go ahead.
Easy answer
Yes
Better answer
No
Channels are great for emitting data from unknown go-routines. They do so safely, however I would recommend being careful with a few parts. In the listed example the channel is created with the construction of the struct by the consumer (and not not by a consumer).
Say the consumer creates the Hub like the following: &Hub{}. Perfectly valid... Apart from the fact that all the invokes of SendMsg() will block for forever. Luckily you placed those in their own go-routines. So you're still fine right? Wrong. You are now leaking go-routines. Seems fine... until you run this for a period of time. Go encourages you to have valid zero values. In this case &Hub{} is not valid.
Ensuring SendMsg() won't block could be achieved via a select{} however you then have to decide what to do when you encounter the default case (e.g. throw data away). The channel could block for more reasons than bad setup too. Say later you do more than simply print the data after reading from the channel. What if the read gets very slow, or blocks on IO. You then will start pushing back on the producers.
Ultimately, channels allow you to not think much about concurrency... However if this is something of high-throughput, then you have quite a bit to consider. If it is production code, then you need to understand that your API here involves SendMsg() blocking.

Behavior of sleep and select in go

I'm trying to understand a bit more about what happens under the surface during various blocking/waiting types of operations in Go. Take the following example:
otherChan = make(chan int)
t = time.NewTicker(time.Second)
for {
doThings()
// OPTION A: Sleep
time.Sleep(time.Second)
// OPTION B: Blocking ticker
<- t.C
// OPTION C: Select multiple
select {
case <- otherChan:
case <- t.C:
}
}
From a low level view (system calls, cpu scheduling) what is the difference between these while waiting?
My understanding is that time.Sleep leaves the CPU free to perform other tasks until the specified time has elapsed. Does the blocking ticker <- t.C do the same? Is the processor polling the channel or is there an interrupt involved? Does having multiple channels in a select change anything?
In other words, assuming that otherChan never had anything put into it, would these three options execute in an identical way, or would one be less resource intensive than the others?
That's a very interesting question, so I did cd into my Go source to start looking.
time.Sleep
time.Sleep is defined like this:
// src/time/sleep.go
// Sleep pauses the current goroutine for at least the duration d.
// A negative or zero duration causes Sleep to return immediately.
func Sleep(d Duration)
No body, no definition in an OS-specific time_unix.go!?! A little search and the answer is because time.Sleep is actually defined in the runtime:
// src/runtime/time.go
// timeSleep puts the current goroutine to sleep for at least ns nanoseconds.
//go:linkname timeSleep time.Sleep
func timeSleep(ns int64) {
// ...
}
Which in retrospect makes a lot of sense, as it has to interact with the goroutine scheduler. It ends up calling goparkunlock, which "puts the goroutine into a waiting state". time.Sleep creates a runtime.timer with a callback function that is called when the timer expires - that callback function wakes up the goroutine by calling goready. See next section for more details on the runtime.timer.
time.NewTicker
time.NewTicker creates a *Ticker (and time.Tick is a helper function that does the same thing but directly returns *Ticker.C, the ticker's receive channel, instead of *Ticker, so you could've written your code with it instead) has similar hooks into the runtime: a ticker is a struct that holds a runtimeTimer and a channel on which to signal the ticks.
runtimeTimer is defined in the time package but it must be kept in sync with timer in src/runtime/time.go, so it is effectively a runtime.timer. Remember that in time.Sleep, the timer had a callback function to wake up the sleeping goroutine? In the case of *Ticker, the timer's callback function sends the current time on the ticker's channel.
Then, the real waiting/scheduling happens on the receive from the channel, which is essentially the same as the select statement unless otherChan sends something before the tick, so let's look at what happens on a blocking receive.
<- chan
Channels are implemented (now in Go!) in src/runtime/chan.go, by the hchan struct. Channel operations have matching functions, and a receive is implemented by chanrecv:
// chanrecv receives on channel c and writes the received data to ep.
// ep may be nil, in which case received data is ignored.
// If block == false and no elements are available, returns (false, false).
// Otherwise, if c is closed, zeros *ep and returns (true, false).
// Otherwise, fills in *ep with an element and returns (true, true).
func chanrecv(t *chantype, c *hchan, ep unsafe.Pointer, block bool) (selected, received bool) {
// ...
}
This part has a lot of different cases, but in your example, it is a blocking receive from an asynchronous channel (time.NewTicker creates a channel with a buffer of 1), but anyway it ends up calling... goparkunlock, again to allow other goroutines to proceed while this one is stuck waiting.
So...
In all cases, the goroutine ends up being parked (which is not really shocking - it can't make progress, so it has to leave its thread available for a different goroutine if there's any available). A glance at the code seems to suggest that the channel has a bit more overhead than a straight-up time.Sleep. However, it allows far more powerful patterns, such as the last one in your example: the goroutine can be woken up by another channel, whichever comes first.
To answer your other questions, regarding polling, the timers are managed by a goroutine that sleeps until the next timer in its queue, so it's working only when it knows a timer has to be triggered. When the next timer has expired, it wakes up the goroutine that called time.Sleep (or sends the value on the ticker's channel, it does whatever the callback function does).
There's no polling in channels, the receive is unlocked when a send is made on the channel, in chansend of the chan.go file:
// wake up a waiting receiver
sg := c.recvq.dequeue()
if sg != nil {
recvg := sg.g
unlock(&c.lock)
if sg.releasetime != 0 {
sg.releasetime = cputicks()
}
goready(recvg, 3)
} else {
unlock(&c.lock)
}
That was an interesting dive into Go's source code, very interesting question! Hope I answered at least part of it!

Multiple receivers on a single channel. Who gets the data?

Unbuffered channels block receivers until data is available on the channel. It's not clear to me how this blocking behaves with multiple receivers on the same channel (say when using goroutines). I am sure they would all block as long as there is no data sent on that channel.
But what happens once I send a single value to that channel? Which receiver/goroutine will get the data and therefore unblock? All of them? The first in line? Random?
A single random (non-deterministic) one will receive it.
See the language spec:
Execution of a "select" statement proceeds in several steps:
For all the cases in the statement, the channel operands of receive operations and the channel and right-hand-side expressions of send
statements are evaluated exactly once, in source order, upon entering
the "select" statement. The result is a set of channels to receive
from or send to, and the corresponding values to send. Any side
effects in that evaluation will occur irrespective of which (if any)
communication operation is selected to proceed. Expressions on the
left-hand side of a RecvStmt with a short variable declaration or
assignment are not yet evaluated.
If one or more of the communications can proceed, a single one that can proceed is chosen via a uniform pseudo-random selection.
Otherwise, if there is a default case, that case is chosen. If there
is no default case, the "select" statement blocks until at least one
of the communications can proceed.
Unless the selected case is the default case, the respective communication operation is executed.
If the selected case is a RecvStmt with a short variable declaration or an assignment, the left-hand side expressions are
evaluated and the received value (or values) are assigned.
The statement list of the selected case is executed.
By default the goroutine communication is synchronous and unbuffered: sends do not complete until there is a receiver to accept the value. There must be a receiver ready to receive data from the channel and then the sender can hand it over directly to the receiver.
So channel send/receive operations block until the other side is ready:
1. A send operation on a channel blocks until a receiver is available for the same channel: if there’s no recipient for the value on ch, no other value can be put in the channel. And the other way around: no new value can be sent in ch when the channel is not empty! So the send operation will wait until ch becomes available again.
2. A receive operation for a channel blocks until a sender is available for the same channel: if there is no value in the channel, the receiver blocks.
This is illustrated in the below example:
package main
import "fmt"
func main() {
ch1 := make(chan int)
go pump(ch1) // pump hangs
fmt.Println(<-ch1) // prints only 0
}
func pump(ch chan int) {
for i:= 0; ; i++ {
ch <- i
}
}
Because there is no receiver the goroutine hangs and print only the first number.
To workaround this we need to define a new goroutine which reads from the channel in an infinite loop.
func receive(ch chan int) {
for {
fmt.Println(<- ch)
}
}
Then in main():
func main() {
ch := make(chan int)
go pump(ch)
receive(ch)
}
If the program is allowing multiple goroutines to receive on a single channel then the sender is broadcasting. Each receiver should be equally able to process the data. So it does not matter what mechanism the go runtime uses to decide which of the many goroutine receivers will run Cf. https://github.com/golang/go/issues/247. But only ONE will run for each sent item if the channel is unbuffered.
There have been some discussion about this
But what is established in the Go Memory Model is that it will be at most one of them.
Each send on a particular channel is matched to a corresponding receive from that channel, usually in a different goroutine.
That isn't as clear cut as I would like, but later down they give this example of a semaphore implementation
var limit = make(chan int, 3)
func main() {
for _, w := range work {
go func(w func()) {
limit <- 1
w()
// if it were possible for more than one channel to receive
// from a single send, it would be possible for this to release
// more than one "lock", making it an invalid semaphore
// implementation
<-limit
}(w)
}
select{}
}

Resources