From my understanding of Go scheduler, Go scheduling algorithm is partially preemptive: goroutine switches happen when a goroutine is calling a function or blocking on I/O.
Does a goroutine switch happen when sending a message to a channel?
// goroutine A
ch <- message
// some additional code without function calls
// goroutine B
message := <- ch
In the code above, I want the code after ch <- message in A to be executed before switching to B, is this guaranteed? or does B get scheduled right after A sends a message on ch?
A's channel send can block, at which point it yields to the scheduler and you have no guarantee when A will receive control again. It might be after the code you're interested in in B. So the sample code has problems even with GOMAXPROCS=1.
Stepping back: when preemption happens is an implementation detail; it has changed in the past (there wasn't always a chance of preemption on function call) and may change in the future. In terms of the memory model, your program is incorrect if it relies on facts about when code executes that happen to be true today but aren't guaranteed. If you want to block some code in B from running until A does something, you need to figure out a way to arrange that using channels or sync primitives.
And as user JimB notes, you don't even need to consider preemption to run into problems with the sample code. A and B could be running simultaneously on different CPU cores, and the code after the receive in B could run while the code after the send in A is running.
My practical understanding of the language and runtime says that without you blocking explicitly after ch <- message and before invoking goroutine B, you have no guarantees that A will complete or run before B. I don't know how that is actually implemented but I also don't care because I accept the goroutine abstraction at face value. Don't rely on coincidental functionality in your program. Just going off your example, my recommendation would be to pass a channel into goroutine A and then block waiting to receive off it in order to serialize A and B.
Related
I am reading the go programming language book and there is this example in Chapter 8.4
func mirroredQuery() string{
responses := make(chan string, 3)
go func() { responses <- request("asia.gopl.io") }()
go func() { responses <- request("americas.gopl.io") }()
go func() { responses <- request("europe.gopl.io") }()
return <- responses // return the quickest response
}
There is also this comment
Had we used an unbuffered channel, the two slower goroutines would have gotten stuck trying to send their responses on a channel from which no goroutine will ever receive.
This comment itself makes sense. But what happens to the two slow goroutines when mirroredQuery returns in the buffered case? Do they still run to finish or get cancelled?
EDIT: I understand that if the main goroutnine exits, then the 2 slower gorountines will 'evaporate' no matter they are running or not. But what if the main goroutine is still running, mirroredQuery() has already returned, would the 2 slow goroutines run to end? Basically, does responses still exist after mirroredQuery returns? If so, then it seems the 2 slow goroutines can finish in principle; if not, then we still have leakage just like the unbuffered case?
When the main goroutine returns, the entire runtime system quits, rather abruptly. Hence any goroutines that are stuck waiting to send on an unbuffered or full channel simply ... cease to exist. They're not canceled, nor do they run, nor do they wait. Think of it as the flash paper being set on fire.
One can call this a goroutine leak, the same way one can refer to any resources (such as open files) not closed-or-freed before a program terminates a "leak". But since the entire process terminates, there's nothing left. There's no real leak here. It's not a tidy cleanup, but the system does clean up.
Here's a Playground link, as an example.
(If you make use of things not defined by the Go system itself, you could get various leaks that way. For instance, in the old System V Shared Memory world, you can create shared memory segments (shm_open) and if you never close and unlink them, they persist. This is by design: they're meant to act a lot like files in a file system, except that they exist in memory, rather than on some sort of disk drive or whatever. But this is far outside normal everyday Go programming.)
Re your edit: if the main goroutine has not exited, so that the program is still running, the other goroutines continue to run (or wait) until they run out of things to do and return themselves, or do something that causes them to exit (such as call runtime.Goexit, or do something that causes a panic). In this case, that's: wait for a response, then send the response into the channel, then return. Assuming they get a response, they'll put the response into the channel. Assuming that putting the response into the channel works (does not panic and not block), they will then return. Having returned, they are done and they evaporate. The channel itself persists and holds the strings: this is a resource leak, albeit a minor one, especially in a toy program.
If there are no references left to the channel itself, the channel itself will be garbage-collected, along with the strings in it; this cleans up the leaked resources. Since we assume that mirroredQuery has returned, and that at this point the last of the spun-off goroutines has also returned, that's the last reference to the channel, so now the channel can be GCed. (Whether and when this happens is up to the runtime.) Until the last of these goroutines finishes, there's still at least one reference to the channel, preventing the channel (and hence the strings) from being GCed.
Had the channel been unbuffered, the two "losing" goroutines would block in the attempt to send into the channel. That would cause those goroutines to remain, which in turn would cause the channel to remain, which in turn would cause the resources to remain allocated until the program as a whole terminates. So that would be "bad".
Had mirroredQuery closed the channel, the two "losing" goroutines could attempt to send on a closed channel, which would cause them to invoke the panic code, which would kill the program. That too would be "bad". The simplest code that achieves the desired result is to make the channel buffered.
Should one of the goroutines wait (for a response) for several years, that would hold those "leaked" resources for all those years. That would also be "bad" (slightly), so we'd want to make sure that they don't wait forever. But that's impractical in a small demonstration program.
I am going through a tutorial on building web servers using go.
The author, instead of directly using the http.ListenAndServe() method, he creates the http.Server struct.
He then proceeds by:
creating a buffered channel for listening for errors
serverErrors := make(chan errors, 1)
spawning the http listening goroutine that binds to that channel
go func(){
fmt.Println("starting...")
serverErrors <- api.ListenAndServe()
}()
The reason behind using a buffered channel is according to the instructor
so that the goroutine can exit if we do not collect this error
There is indeed below in a program a select block where the errors coming from this channel are being collected.
Can anyone pls help me understand how the goroutine gets to exit if we don't collect the error?
What would be the practical difference had we used an unbuffered channel?
Short answer:
For any channel (buffered or not), channel reads block if nothing is written to the channel.
For non-buffered channels, channel writes will block if no one is listening.
It is a common technique with error-channels (since only one item will ever be written to the channel), to make it a buffered channel of size 1. It ensures the write will happen without blocking - and the writer goroutine can continue on its way and return.
Therefore the service does not relying on the client caller reading from the error channel to perform its cleanup.
Note: to reclaim a channel re GC, it only has to go out of scope - it does not need to be fully drained. Nor does it need to be closed. Once it goes out of scope from the both ends, it will be GC'ed.
If you refer the code for ListenAndServe(), you'll notice the following comments on how it works. Quoting from there itself:
// ListenAndServe always returns a non-nil error. After Shutdown or Close,
// the returned error is ErrServerClosed.
Also,
// When Shutdown is called, Serve, ListenAndServe, and
// ListenAndServeTLS immediately return ErrServerClosed. Make sure the
// program doesn't exit and waits instead for Shutdown to return.
Your select block is waiting for Shutdown (error) considering that you're gracefully handling the server's shutdown and doesn't let the goroutine exit before it gracefully closes.
In the case of func (srv *Server) Close() (eg. Most use defer srv.Close(), right?):
// Close immediately closes all active net.Listeners and any
// connections in state StateNew, StateActive, or StateIdle. For a
// Close returns any error returned from closing the Server's
// underlying Listener(s).
// graceful shutdown, use Shutdown.
So, the same explanation as above carries of using the select block.
Now, let's categorize channels as buffered and unbuffered, and if we do care about the guarantee of delivery of the signal (communication with the channel), then unbuffered one ensures it. Whereas, if the buffered channel (size = 1) which is in your case, then it ensures delivery but might be delayed.
Let's elaborate unbuffered channels:
A send operation on an unbuffered channel blocks the sending goroutine until another
goroutine executes a corresponding receive on that same channel, at which point the value
is transmitted and both goroutines may continue
Conversely, if received on the channel earlier (<-chan) than send operation, then the
receiving goroutine is blocked until the corresponding send operation occurs on the
same channel on another goroutine.
Aforementioned points for unbuffered channels indicate synchronous nature.
Remember, func main() is also a goroutine.
Let's elaborate buffered channels:
A send operation on a buffered channel pushes an element at the back of the queue,
and a receive operation pops an element from the front of the queue.
1. If the channel is full, the send operation blocks its goroutine until space is made available by another goroutine's receive.
2. If the channel is empty, a receive operation blocks until a value is sent by another goroutine.
So in your case size of the channel is 1. The other sender goroutine can send in a non-blocking manner as the receiver channel of the other goroutine dequeues it as soon as it receives. But, if you remember, I mentioned delayed delivery for the channel with size 1 as we don't how much time it'll take for the receiver channel goroutine to return.
Hence, to block the sender goroutine, select block is used. And from the referenced code's documentation, you can see
// Make sure the program doesn't exit and waits instead for Shutdown to return.
Also, for more clarity, you can refer: Behaviour of channels
The author explains it with pure clarity.
The question is in the title. Let's say I have several goroutines (more than 100) all of which eventually send data to one chan (name it mychan := make(chan int)) One another goroutine does <- mychan in an endless for loop Is it okay or the chan can happen to lose some data? Should I use buffered chan instead? Or perhaps I am to create a chan and a "demon" goroutine that will extract message for each worker goroutine?
If something has been successfully sent into the channel then no, it can't be lost in correctly working environment (I mean if you're tampering with your memory or you have bit flips due to cosmic rays then don't expect anything of course).
Message is successfully sent when ch <- x returns. Otherwise, if it panics, it's not really being sent and if you don't recover than you could claim it's lost (however, it would be lost due to application logic). Panic can happen if channel is closed or, say, you're out of memory.
Similarly if sender is putting into the channel in non-blocking mode (by using select), you should have a sufficient buffer in your channel, because messages can be "lost" (although somehow intentionally). For example signal.Notify is working this way:
Package signal will not block sending to c: the caller must ensure that c has sufficient buffer space to keep up with the expected signal rate.
No, they can't be lost.
While the language spec does not in any way impose any particular implementation on channels, you can think of them as semaphores protecting either a single value (for the single message) or an array/list of them (for buffered channels).
The semantics are then enforced in such a way that as soon as a goroutine wants to send a message to a channel, it tries to acquire a free data slot using that semaphore, and then either succeeds at sending—there's a free slot for its message—or blocks—when there isn't. As soon as such a slot appears—someone has received an existing message—the sending succeeds and the sending goroutine gets unblocked.
This is a simplified explanation. In other words, channels in Go is not like message queues which usually are happy with losing messages.
On a side note, I'm not really sure what happens if the receiver panics in some specific state when it's about to receive your message. In other words, I'm not sure whether Go guarantees that the message is either sent or not in the presence of a receiver panicking in an unfortunate moment.
Oh, and there's that grey area of the main goroutine exiting (that one running the main.main() function): the spec states clear than the main goroutine does not wait for any other goroutines to complete when it exits. So unless you somehow arrange for the synchronized controlled shutdown of all your spawned goroutines, I believe they may lose messages. On the other hand, in this case the world is collapsing anyway…
Message can not be lost. It can be not sent.Order of goroutines execution not defined. So your endless for loop can receive from only one worker all time, and even can sleep if it isn't in main thread. To be sure your queue works in regular fashion you better explicitly in 'main' receive messages for each worker.
Is it more idiomatic to have an async api, with a blocking function as the synchronous api that simply calls the async api and waits for an answer before returning, rather than using a non-concurrent api and let the caller run it in their own goroutine if they want it async?
In my current case I have a worker goroutine that reads from a request channel and sends the return value down the response channel (that it got in a request struct from the request channel).
This seems to differ from the linked question since I need the return values, or to synchronize so that I can be sure the api call finishes before I do something else, to avoid race conditions.
For golang, I recommend Effective Go-concurrency. Especially I think everyone using golang need to known the basics of goroutine and parallelization:
Goroutines are multiplexed onto multiple OS threads so if one should block, such as while waiting for I/O, others continue to run. Their design hides many of the complexities of thread creation and management.
The current implementation of the Go runtime dedicates only a single core to user-level processing. An arbitrary number of goroutines can be blocked in system calls, but by default only one can be executing user-level code at any time.
I understand from this question "Golang - What is channel buffer size?" that if the channel is buffered it won't block.
c := make(chan int, 1)
c <- data1 // doesn't block
c <- data2 // blocks until another goroutine receives from the channel
c <- data3
c <- data4
But I don't understand whats the use of it. Suppose if I have 2 goroutines, 1st one will received data1 and 2nd one receives data2 then it will block till any subroutines gets free to process data3.
I don't understand what difference did it make ? It would have executed the same way without buffer. Can you explain a possible scenario where buffering is useful ?
A buffered channel allows the goroutine that is adding data to the buffered channel to keep running and doing things, even if the goroutines reading from the channel are starting to fall behind a little bit.
For example, you might have one goroutine that is receiving HTTP requests and you want it to be as fast as possible. However you also want it to queue up some background job, like sending an email, which could take a while. So the HTTP goroutine just parses the user's request and quickly adds the background job to the buffered channel. The other goroutines will process it when they have time. If you get a sudden surge in HTTP requests, the users will not notice any slowness in the HTTP if your buffer is big enough.
This site has a good explanation:
https://www.openmymind.net/Introduction-To-Go-Buffered-Channels/