Golang Concurrency Code Review of Codewalk - go

I'm trying to understand best practices for Golang concurrency. I read O'Reilly's book on Go's concurrency and then came back to the Golang Codewalks, specifically this example:
https://golang.org/doc/codewalk/sharemem/
This is the code I was hoping to review with you in order to learn a little bit more about Go. My first impression is that this code is breaking some best practices. This is of course my (very) unexperienced opinion and I wanted to discuss and gain some insight on the process. This isn't about who's right or wrong, please be nice, I just want to share my views and get some feedback on them. Maybe this discussion will help other people see why I'm wrong and teach them something.
I'm fully aware that the purpose of this code is to teach beginners, not to be perfect code.
Issue 1 - No Goroutine cleanup logic
func main() {
// Create our input and output channels.
pending, complete := make(chan *Resource), make(chan *Resource)
// Launch the StateMonitor.
status := StateMonitor(statusInterval)
// Launch some Poller goroutines.
for i := 0; i < numPollers; i++ {
go Poller(pending, complete, status)
}
// Send some Resources to the pending queue.
go func() {
for _, url := range urls {
pending <- &Resource{url: url}
}
}()
for r := range complete {
go r.Sleep(pending)
}
}
The main method has no way to cleanup the Goroutines, which means if this was part of a library, they would be leaked.
Issue 2 - Writers aren't spawning the channels
I read that as a best practice, the logic to create, write and cleanup a channel should be controlled by a single entity (or group of entities). The reason behind this is that writers will panic when writing to a closed channel. So, it is best for the writer(s) to create the channel, write to it and control when it should be closed. If there are multiple writers, they can be synced with a WaitGroup.
func StateMonitor(updateInterval time.Duration) chan<- State {
updates := make(chan State)
urlStatus := make(map[string]string)
ticker := time.NewTicker(updateInterval)
go func() {
for {
select {
case <-ticker.C:
logState(urlStatus)
case s := <-updates:
urlStatus[s.url] = s.status
}
}
}()
return updates
}
This function shouldn't be in charge of creating the updates channel because it is the reader of the channel, not the writer. The writer of this channel should create it and pass it to this function. Basically saying to the function "I will pass updates to you via this channel". But instead, this function is creating a channel and it isn't clear who is responsible of cleaning it up.
Issue 3 - Writing to a channel asynchronously
This function:
func (r *Resource) Sleep(done chan<- *Resource) {
time.Sleep(pollInterval + errTimeout*time.Duration(r.errCount))
done <- r
}
Is being referenced here:
for r := range complete {
go r.Sleep(pending)
}
And it seems like an awful idea. When this channel is closed, we'll have a goroutine sleeping somewhere out of our reach waiting to write to that channel. Let's say this goroutine sleeps for 1h, when it wakes up, it will try to write to a channel that was closed in the cleanup process. This is another example of why the writters of the channels should be in charge of the cleanup process. Here we have a writer who's completely free and unaware of when the channel was closed.
Please
If I missed any issues from that code (related to concurrency), please list them. It doesn't have to be an objective issue, if you'd have designed the code in a different way for any reason, I'm also interested in learning about it.
Biggest lesson from this code
For me the biggest lesson I take from reviewing this code is that the cleanup of channels and the writing to them has to be synchronized. They have to be in the same for{} or at least communicate somehow (maybe via other channels or primitives) to avoid writing to a closed channel.

It is the main method, so there is no need to cleanup. When main returns, the program exits. If this wasn't the main, then you would be correct.
There is no best practice that fits all use cases. The code you show here is a very common pattern. The function creates a goroutine, and returns a channel so that others can communicate with that goroutine. There is no rule that governs how channels must be created. There is no way to terminate that goroutine though. One use case this pattern fits well is reading a large resultset from a
database. The channel allows streaming data as it is read from the
database. In that case usually there are other means of terminating the
goroutine though, like passing a context.
Again, there are no hard rules on how channels should be created/closed. A channel can be left open, and it will be garbage collected when it is no longer used. If the use case demands so, the channel can be left open indefinitely, and the scenario you worry about will never happen.

As you are asking about if this code was part of a library, yes it would be poor practice to spawn goroutines with no cleanup inside a library function. If those goroutines carry out documented behaviour of the library, it's problematic that the caller doesn't know when that behaviour is going to happen. If you have any behaviour that is typically "fire and forget", it should be the caller who chooses when to forget about it. For example:
func doAfter5Minutes(f func()) {
go func() {
time.Sleep(5 * time.Minute)
f()
log.Println("done!")
}()
}
Makes sense, right? When you call the function, it does something 5 minutes later. The problem is that it's easy to misuse this function like this:
// do the important task every 5 minutes
for {
doAfter5Minutes(importantTaskFunction)
}
At first glance, this might seem fine. We're doing the important task every 5 minutes, right? In reality, we're spawning many goroutines very quickly, probably consuming all available memory before they start dropping off.
We could implement some kind of callback or channel to signal when the task is done, but really, the function should be simplified like so:
func doAfter5Minutes(f func()) {
time.Sleep(5 * time.Minute)
f()
log.Println("done!")
}
Now the caller has the choice of how to use it:
// call synchronously
doAfter5Minutes(importantTaskFunction)
// fire and forget
go doAfter5Minutes(importantTaskFunction)
This function arguably should also be changed. As you say, the writer should effectively own the channel, as they should be the one closing it. The fact that this channel-reading function insists on creating the channel it reads from actually coerces itself into this poor "fire and forget" pattern mentioned above. Notice how the function needs to read from the channel, but it also needs to return the channel before reading. It therefore had to put the reading behaviour in a new, un-managed goroutine to allow itself to return the channel right away.
func StateMonitor(updates chan State, updateInterval time.Duration) {
urlStatus := make(map[string]string)
ticker := time.NewTicker(updateInterval)
defer ticker.Stop() // not stopping the ticker is also a resource leak
for {
select {
case <-ticker.C:
logState(urlStatus)
case s := <-updates:
urlStatus[s.url] = s.status
}
}
}
Notice that the function is now simpler, more flexible and synchronous. The only thing that the previous version really accomplishes, is that it (mostly) guarantees that each instance of StateMonitor will have a channel all to itself, and you won't have a situation where multiple monitors are competing for reads on the same channel. While this may help you avoid a certain class of bugs, it also makes the function a lot less flexible and more likely to have resource leaks.
I'm not sure I really understand this example, but the golden rule for channel closing is that the writer should always be responsible for closing the channel. Keep this rule in mind, and notice a few points about this code:
The Sleep method writes to r
The Sleep method is executed concurrently, with no method of tracking how many instances are running, what state they are in, etc.
Based on these points alone, we can say that there probably isn't anywhere in the program where it would be safe to close r, because there's seemingly no way of knowing if it will be used again.

Related

Execute jobs concurrently in sequential order

"What?" you ask, "That title doesn't make any sense."
Consider the following:
Jobs with different ids may be processed asynchronously but jobs with the same id should be processed synchronously and in order from the queue.
My current implementation creates a go routine to handle the jobs for each specific id and looks something like this:
func FanOut() chan<- *Job {
channel := make(chan *Job)
routines = make(map[string]chan<- *Job)
go func() {
for j := range channel {
r, found := routines[j.id]
if !found {
r = Routine()
routines[j.id] = r
}
r <- j
}
}()
return channel
}
This appears to work well (in current testing) but the creation of thousands of go routines might not be the best approach? Additionally the fan out code blocks unless a buffered channel is used.
Rather than a collection of go routines (above) I'm considering using a collection of sync.Mutex. The idea would be to have a pool of go routines which must first establish a lock on the mutex corresponding to the job id.
Are there any existing Go patterns suited to handling these requirements?
Is there a better approach?
Create a channel for each ID - perhaps a slice of channels or a map (indexed by ID). Each channel would have a go-routine that processes the jobs for that ID in order. Simple.
I wouldn't worry about creating too many go-routines. And I wouldn't use mutex - without getting into too much detail using channels and go-routines allows each job to only be processed by one go-routine at a time and avoids possibility of data races.
BTW I only added this as an answer as I am not permitted to add comments (yet?).

Convention when using Reader interface inside select statement

I've wrapped a queue to implement the Writer and Reader interfaces (for pushing and popping, respectively).
I need to continuously listen to the queue, and handle every message that comes through. This is simple when the queue is represented as a channel, but more difficult otherwise:
loop:
for {
var data []byte
select {
case <-done:
break loop
case _, err := queue.Read(data):
fmt.Println(string(data))
}
}
What's the proper way to do this? Read here is blocking - it waits until the queue has a message.
Is there a better, more idiomatic way to achieve this?
It’s harder to take a synchronous API (like queue.Read as you described above) and make it asynchronous than it is to do the opposite.
The idea would be to create a new goroutine (using, for example go func() {...}) and have that goroutine execute the read and write the output to a channel.
Then the first goroutine would block on that channel and the one it’s already blocking on.
This has the potentially to leave orphaned resources for a little while if the read takes to long but if you have a synchronous API, it’s the best you can do.

Use of goroutines when steps are sequential

I feel like the answer to my question is no but asking for certainty as I've only started playing around with Go for a few days. Should we encapsulate IO bound tasks (like http requests) into goroutines even if it was to be used in a sequential use case?
Here's my naive example. Say I have a method that makes 3 http requests but need to be executed sequentially. Is there any benefit in creating the invoke methods as goroutines? I understand the example below would actually take a performance hit.
func myMethod() {
chan1 := make(chan int)
chan2 := make(chan int)
chan3 := make(chan int)
go invoke1(chan1)
res1 := <-chan1
invoke2(res1, chan2)
res2 := <-chan2
invoke3(res2, chan3)
// Do something with <-chan3
}
One possible reason that comes to mind is to future proof the invoke methods for when they're called in a concurrent context later on when other develops start re-using the method. Any other reasons?
There's nothing standard that would say yes or no to this question.
Although you can do it correctly this way, it is much simpler to stick to plain sequential execution.
Three reasons come to mind:
this is still sequential: you're waiting for every single goroutine sequentially, so this buys you nothing. Performance probably doesn't change much if it's only doing an http request, both cases will spend most of their time waiting for the response.
error handling is much simpler if you just get result, err := invoke; if err != nil .... rather than having to pass both results and errors through channels
over-generalization is a more apt word than "future proofing". If you need to call your invoke methods asynchronously in the future, then change your code in the future. It will be just as easy then to add asynchronous wrappers around your functions.
I am a little bit late to the party but I think I still have something to share.
Before answering the question, I would like to dive a little into the Go Proverb, Concurrency is not parellism. It is a very common misunderstanding of goroutine and Go's language feature that when people thinking of goroutine, they thinks about the ability of being parell.
But as Rob Pike pointed out in many Go Talks, what Go and goroutine auctually provides, is concurency. Concurency is a model of better interepting the real world, a way and a structure of code, of how code interact.
So back to the question. Should goroutine be used when steps are sequential? It depends on the design. If your code compose of individual parts that talks to each other very naturally, or if some of your code preserve a state and frequently return just does not make sense, or if your code fits in any other concurency design, it is perfectly fine to use goroutine and channel and select statement. Rob Pike, again, gives a Go Talk on a lexer used by now Go's text/template where the lexer uses a goroutine but the parser, obviously, only use the lexer sequentially. He stated that by using goroutine and channel, at cost of a little performance, a better API is acheived.
But on the other hand, in your example and probably what you are thinking about (codes that may require future parellism), I agree with #Marc. Stick for blocking call, at least for now.
You could use a channel like buf, or use []string (slice of strings urls). You don't get any benefit from gorutines, if you need only sequential execution, because we couldn't control goroutine when it's start.
But we can ask then don't wait runtime.Gosched()
From documentation:
Gosched yields the processor, allowing other goroutines to run. It does not suspend the current goroutine, so execution resumes automatically.
Example of sequential execution:
package main
func main() {
urls := []string{
"https://google.com",
"https://yahoo.com",
"https://youtube.com",
}
buf := make([][]byte, 0, len(urls))
for _, v := range urls {
buf = append(buf, sendRequest(v))
}
}
func sendRequest(url string) []byte {
//send
return []byte("")
}
For the solution as such, I see no gain by solving your problem the way you are in the real world. That doesn't mean there isn't gain in your solution. If you're after practice for example synchronising go routines, then you have a plethora of possibilities experimenting and learning.
So to me this is far from a clear no, but no clear yes either. I would refrain from saying maybe, it depends on what exactly your after. Are you after actually solving a problem or learning?

Go GC stopping my goroutine?

I have been trying to get into Go from the more traditional languages such as Java and C and so far I've been enjoying the well-thought out design choices that Go offers. When I started my first "real" project though, I ran into a problem that almost nobody seems to have.
My project is a simple networking implementation that sends and receives packets. The general structure is something like this (of course simplified):
A client manages the net.Conn with the server. This Clientcreates a PacketReaderand a PacketWriter. These both run infinite loops in a different goroutine. The PacketReader takes an interface with a single OnPacketReceived function that is implemented by the client.
The PacketReader code looks something like this:
go func() {
for {
bytes, err := reader.ReadBytes(10) // Blocks the current routine until bytes are available.
if err != nil {
panic(err) // Handle error
}
reader.handler.OnPacketReceived(reader.parseBytes(bytes))
}
}()
The PacketWriter code looks something like this:
go func() {
for {
if len(reader.packetQueue) > 0 {
// Write packet
}
}
}()
In order to make Client blocking, the client makes a channel that gets filled by OnPacketReceived, something like this:
type Client struct {
callbacks map[int]chan interface{}
// More fields
}
func (c *Client) OnPacketReceived(packet *Packet) {
c.callbacks[packet.Id] <- packet.Data
}
func (c *Client) SendDataBlocking(id int, data interface{}) interface{} {
c.PacketWriter.QueuePacket(data)
return <-c.callbacks[id]
}
Now here is my problem: the reader.parseBytes function does some intensive decoding operation that creates quite a lot of objects (to the point that the GC decides to run). The GC however, pauses the reader goroutine that is currently decoding the bytes, and then hangs. A problem that seems similar to mine is described here. I have confirmed that it is actually the GC causing it, because running it with GOGC=off runs successfully.
At this point, my 3 routines look like this:
- Client (main routine): Waiting for channel
- Writer: Still running, waiting for new data in queue
- Reader: Set as runnable, but not actually running
Somehow, the GC is either not able to stop all routines in order to run, or it does not resume said goroutines after it stopped them.
So my question is this: Is there any way to fix this? I am new to Go so I don't really know if my design choices are even remotely conventional, and I'm all up with changing the structure of my program. Do I need to change the way I handle packet reading callbacks, do I need to try and make the packet decoder less intensive? Thanks!
Edit: I am running Go 1.5.1, I'll try to get a working example later today.
As per mrd0ll4rs comment, changed the writer to use channels instead of a slice (I don't even know why I did that in the first place). That seemed to give the GC enough "mobility" to allow the threads to stop. Adding in the runtime.Gosched() and still using slices also worked though, but the channels seemed more "go-esque".

What's the point of one-way channels in Go?

I'm learning Go and so far very impressed with it. I've read all the online docs at golang.org and am halfway through Chrisnall's "The Go Programming Language Phrasebook". I get the concept of channels and think that they will be extremely useful. However, I must have missed something important along the way, as I can't see the point to one-way channels.
If I'm interpreting them correctly, a read-only channel can only be received on and a write-only channel can only be transmitted on, so why have a channel that you can send to and never receive on? Can they be cast from one "direction" to the other? If so, again, what's the point if there's no actual constraint? Are they nothing more than a hint to client code of the channel's purpose?
A channel can be made read-only to whoever receives it, while the sender still has a two-way channel to which they can write. For example:
func F() <-chan int {
// Create a regular, two-way channel.
c := make(chan int)
go func() {
defer close(c)
// Do stuff
c <- 123
}()
// Returning it, implicitly converts it to read-only,
// as per the function return type.
return c
}
Whoever calls F(), receives a channel from which they can only read.
This is mostly useful to avoid potential misuse of a channel at compile time.
Because read/write-only channels are distinct types, the compiler can use
its existing type-checking mechanisms to ensure the caller does not try to write
stuff into a channel it has no business writing to.
I think the main motivation for read-only channels is to prevent corruption and panics of the channel. Imagine if you could write to the channel returned by time.After. This could mess up a lot of code.
Also, panics can occur if you:
close a channel more than once
write to a closed channel
These operations are compile-time errors for read-only channels, but they can cause nasty race conditions when multiple go-routines can write/close a channel.
One way of getting around this is to never close channels and let them be garbage collected. However, close is not just for cleanup, but it actually has use when the channel is ranged over:
func consumeAll(c <-chan bool) {
for b := range c {
...
}
}
If the channel is never closed, this loop will never end. If multiple go-routines are writing to a channel, then there's a lot of book-keeping that has to go on with deciding which one will close the channel.
Since you cannot close a read-only channel, this makes it easier to write correct code. As #jimt pointed out in his comment, you cannot convert a read-only channel to a writeable channel, so you're guaranteed that only parts of the code with access to the writable version of a channel can close/write to it.
Edit:
As for having multiple readers, this is completely fine, as long as you account for it. This is especially useful when used in a producer/consumer model. For example, say you have a TCP server that just accepts connections and writes them to a queue for worker threads:
func produce(l *net.TCPListener, c chan<- net.Conn) {
for {
conn, _ := l.Accept()
c<-conn
}
}
func consume(c <-chan net.Conn) {
for conn := range c {
// do something with conn
}
}
func main() {
c := make(chan net.Conn, 10)
for i := 0; i < 10; i++ {
go consume(c)
}
addr := net.TCPAddr{net.ParseIP("127.0.0.1"), 3000}
l, _ := net.ListenTCP("tcp", &addr)
produce(l, c)
}
Likely your connection handling will take longer than accepting a new connection, so you want to have lots of consumers with a single producer. Multiple producers is more difficult (because you need to coordinate who closes the channel) but you can add some kind of a semaphore-style channel to the channel send.
Go channels are modelled on Hoare's Communicating Sequential Processes, a process algebra for concurrency that is oriented around event flows between communicating actors (small 'a'). As such, channels have a direction because they have a send end and a receive end, i.e. a producer of events and a consumer of events. A similar model is used in Occam and Limbo also.
This is important - it would be hard to reason about deadlock issues if a channel-end could arbitrarily be re-used as both sender and receiver at different times.

Resources