Appending to a file concurrently in Go - go

I need to write a piece of code in Go which would allow multiple go routines to write to a file concurrently.
From my current research I have seen a great suggestion on another post about utilizing a single go routine which actually deals with the file access and then have several go routines send data through a channel and essentially these appends would be sequential because a single routine would be accessing the file. This is how I have prototyped this functionality so far.
func ListenAndServe(ch chan dataStruct){
data:= <-ch
// I only care about writing right now
// opening the file in read only I think is thread safe
Write(data)
}
func Write(data){// some logic to have the data written in the file}
func Read(id ){ // some logic to open the file in read only mode }
func main(){
ch := make(chan dataStruct)
// Would I have to start the function in its own routine like so before I begin handling
// the actual write/ read requests by spawning separate routines
// and then ListenAndServe function would block on the channel until there is something ?
go ListenAndServe(ch)
// here I would write some code for sending each request to a go routine
}
I am still new to go but If I follow this advice I found then It seems to me that the ListenAndServe function would exit as soon as it is done with the first request therefore I would have to wrap the contents of the ListenAndServe function in a for loop like so:
func ListenAndServe(ch chan dataStruct){
for {
data:= <-ch
Write(data)
}
}
I wanted to ask if this assumption is correct and if it realistic to accomplish this by simply blocking the routine from exiting with the infinite loop.
If anyone can give me an opinion or steer me towards a better approach I would be grateful

Related

Golang Concurrency Code Review of Codewalk

I'm trying to understand best practices for Golang concurrency. I read O'Reilly's book on Go's concurrency and then came back to the Golang Codewalks, specifically this example:
https://golang.org/doc/codewalk/sharemem/
This is the code I was hoping to review with you in order to learn a little bit more about Go. My first impression is that this code is breaking some best practices. This is of course my (very) unexperienced opinion and I wanted to discuss and gain some insight on the process. This isn't about who's right or wrong, please be nice, I just want to share my views and get some feedback on them. Maybe this discussion will help other people see why I'm wrong and teach them something.
I'm fully aware that the purpose of this code is to teach beginners, not to be perfect code.
Issue 1 - No Goroutine cleanup logic
func main() {
// Create our input and output channels.
pending, complete := make(chan *Resource), make(chan *Resource)
// Launch the StateMonitor.
status := StateMonitor(statusInterval)
// Launch some Poller goroutines.
for i := 0; i < numPollers; i++ {
go Poller(pending, complete, status)
}
// Send some Resources to the pending queue.
go func() {
for _, url := range urls {
pending <- &Resource{url: url}
}
}()
for r := range complete {
go r.Sleep(pending)
}
}
The main method has no way to cleanup the Goroutines, which means if this was part of a library, they would be leaked.
Issue 2 - Writers aren't spawning the channels
I read that as a best practice, the logic to create, write and cleanup a channel should be controlled by a single entity (or group of entities). The reason behind this is that writers will panic when writing to a closed channel. So, it is best for the writer(s) to create the channel, write to it and control when it should be closed. If there are multiple writers, they can be synced with a WaitGroup.
func StateMonitor(updateInterval time.Duration) chan<- State {
updates := make(chan State)
urlStatus := make(map[string]string)
ticker := time.NewTicker(updateInterval)
go func() {
for {
select {
case <-ticker.C:
logState(urlStatus)
case s := <-updates:
urlStatus[s.url] = s.status
}
}
}()
return updates
}
This function shouldn't be in charge of creating the updates channel because it is the reader of the channel, not the writer. The writer of this channel should create it and pass it to this function. Basically saying to the function "I will pass updates to you via this channel". But instead, this function is creating a channel and it isn't clear who is responsible of cleaning it up.
Issue 3 - Writing to a channel asynchronously
This function:
func (r *Resource) Sleep(done chan<- *Resource) {
time.Sleep(pollInterval + errTimeout*time.Duration(r.errCount))
done <- r
}
Is being referenced here:
for r := range complete {
go r.Sleep(pending)
}
And it seems like an awful idea. When this channel is closed, we'll have a goroutine sleeping somewhere out of our reach waiting to write to that channel. Let's say this goroutine sleeps for 1h, when it wakes up, it will try to write to a channel that was closed in the cleanup process. This is another example of why the writters of the channels should be in charge of the cleanup process. Here we have a writer who's completely free and unaware of when the channel was closed.
Please
If I missed any issues from that code (related to concurrency), please list them. It doesn't have to be an objective issue, if you'd have designed the code in a different way for any reason, I'm also interested in learning about it.
Biggest lesson from this code
For me the biggest lesson I take from reviewing this code is that the cleanup of channels and the writing to them has to be synchronized. They have to be in the same for{} or at least communicate somehow (maybe via other channels or primitives) to avoid writing to a closed channel.
It is the main method, so there is no need to cleanup. When main returns, the program exits. If this wasn't the main, then you would be correct.
There is no best practice that fits all use cases. The code you show here is a very common pattern. The function creates a goroutine, and returns a channel so that others can communicate with that goroutine. There is no rule that governs how channels must be created. There is no way to terminate that goroutine though. One use case this pattern fits well is reading a large resultset from a
database. The channel allows streaming data as it is read from the
database. In that case usually there are other means of terminating the
goroutine though, like passing a context.
Again, there are no hard rules on how channels should be created/closed. A channel can be left open, and it will be garbage collected when it is no longer used. If the use case demands so, the channel can be left open indefinitely, and the scenario you worry about will never happen.
As you are asking about if this code was part of a library, yes it would be poor practice to spawn goroutines with no cleanup inside a library function. If those goroutines carry out documented behaviour of the library, it's problematic that the caller doesn't know when that behaviour is going to happen. If you have any behaviour that is typically "fire and forget", it should be the caller who chooses when to forget about it. For example:
func doAfter5Minutes(f func()) {
go func() {
time.Sleep(5 * time.Minute)
f()
log.Println("done!")
}()
}
Makes sense, right? When you call the function, it does something 5 minutes later. The problem is that it's easy to misuse this function like this:
// do the important task every 5 minutes
for {
doAfter5Minutes(importantTaskFunction)
}
At first glance, this might seem fine. We're doing the important task every 5 minutes, right? In reality, we're spawning many goroutines very quickly, probably consuming all available memory before they start dropping off.
We could implement some kind of callback or channel to signal when the task is done, but really, the function should be simplified like so:
func doAfter5Minutes(f func()) {
time.Sleep(5 * time.Minute)
f()
log.Println("done!")
}
Now the caller has the choice of how to use it:
// call synchronously
doAfter5Minutes(importantTaskFunction)
// fire and forget
go doAfter5Minutes(importantTaskFunction)
This function arguably should also be changed. As you say, the writer should effectively own the channel, as they should be the one closing it. The fact that this channel-reading function insists on creating the channel it reads from actually coerces itself into this poor "fire and forget" pattern mentioned above. Notice how the function needs to read from the channel, but it also needs to return the channel before reading. It therefore had to put the reading behaviour in a new, un-managed goroutine to allow itself to return the channel right away.
func StateMonitor(updates chan State, updateInterval time.Duration) {
urlStatus := make(map[string]string)
ticker := time.NewTicker(updateInterval)
defer ticker.Stop() // not stopping the ticker is also a resource leak
for {
select {
case <-ticker.C:
logState(urlStatus)
case s := <-updates:
urlStatus[s.url] = s.status
}
}
}
Notice that the function is now simpler, more flexible and synchronous. The only thing that the previous version really accomplishes, is that it (mostly) guarantees that each instance of StateMonitor will have a channel all to itself, and you won't have a situation where multiple monitors are competing for reads on the same channel. While this may help you avoid a certain class of bugs, it also makes the function a lot less flexible and more likely to have resource leaks.
I'm not sure I really understand this example, but the golden rule for channel closing is that the writer should always be responsible for closing the channel. Keep this rule in mind, and notice a few points about this code:
The Sleep method writes to r
The Sleep method is executed concurrently, with no method of tracking how many instances are running, what state they are in, etc.
Based on these points alone, we can say that there probably isn't anywhere in the program where it would be safe to close r, because there's seemingly no way of knowing if it will be used again.

The Behaviour Of Goroutines with Channels

The output of the code given bellow and is somewhat confusing, Please help me understand the behaviour of the channels and goroutines and how
does the execution actually takes place.
I have tried to understand the flow of the program but the statement after the "call of goroutine" gets executed, even though the goroutine is called,
later on the statements in goroutines are executed,
on second "call of goroutine" the behaviour is different and the sequence of printing/flow of program changes.
Following is the code:
package main
import "fmt"
func main() {
fmt.Println("1")
done := make(chan string)
go test(done)
fmt.Println("7")
fmt.Println(<-done)
fmt.Println("8")
fmt.Println(<-done)
fmt.Println("9")
fmt.Println(<-done)
}
func test(done chan string) {
fmt.Println("2")
done <- "3"
done <- "10"
fmt.Println("4")
done <- "5"
fmt.Println("6")
}
The result of the above code:
1
7
2
3
8
10
9
4
6
5
Please help me understand why and how this result comes out.
Concept 1: Channels
Visualize a channel as a tube where data goes in one end and out the other. The first data in is the first data that comes out the other side. There are buffered channels and non-buffered channels but for your example you only need to understand the default channel, which is unbuffered. Unbuffered channels only allow one value in the channel at a time.
Writing to an Unbuffered Channel
Code that looks like this writes data into one end of the channel.
ch <- value
Now, this code actually waits to be done executing until something reads the value out of the channel. An unbuffered channel only allows for one value at a time to be within it, and doesn't continue executing until it is read. We'll see later how this affects the ordering of how your code is executed.
Reading from an Unbuffered Channel
To read from an unbuffered channel (visualize taking a value out of the channel), the code to do this looks like
[value :=] <-ch
when you read code documentation [things in] square brackets indicate that what's within them is optional. Above, without the [value :=] you'll just take a value out of the channel and don't use it for anything.
Now when there's a value in the channel, this code has two side effects. One, it reads the value out of a channel in whatever routine we are in now, and proceeds with the value. The other effect it has is to allow the goroutine which put the value into the channel to continue. This is the critical bit that's necessary to understand your example program.
In the event there is NO value in the channel yet, it will wait for a value to be written into the channel before continuing. In other words, the thread blocks until the channel has a value to read.
Concept 2: Goroutines
A goroutine allows your code to continue executing two pieces of code concurrently. This can be used to allow your code to execute faster, or attend to multiple problems at the same time (think of a server where multiple users are loading pages from it at the same time).
Your question arises when you try to figure out the ordering that code is executed when you have multiple routines executing concurrently. This is a good question and others have correctly stated that it depends. When you spawn two goroutines, the ordering of which lines of code are executed is arbitrary.
The code below with a goroutine may print executing a() or end main() first. This is due to the fact that spawning a gorouting means there are two concurrent streams (threads) of execution happening at the same time. In this case, one thread stays in main() and the other starts executing the first line in a(). How the runtime decides to choose which to run first is arbitrary.
func main() {
fmt.Println("start main()")
go a()
fmt.Println("end main()")
}
func a() {
fmt.Println("executing a()")
}
Goroutines + Channels
Now let's use a channel to control the ordering of what get's executed, when.
The only difference now is we create a channel, pass it into the goroutine, and wait for it's value to be written before continuing in main. From earlier, we discussed how the routine reading the value from a channel needs to wait until there's a value in the channel before continuing. Since executing a() is always printed before the channel is written to, we will always wait to read the value put into the channel until executing a() has printed. Since we read from the channel (which happens after the channel is written) before printing end main(), executing a() will always print before end main(). I made this playground so you can run it for yourself.
func main() {
fmt.Println("start main()")
ch := make(chan int)
go a(ch)
<-ch
fmt.Println("end main()")
}
func a(ch chan int) {
fmt.Println("executing a()")
ch <- 0
}
Your Example
I think at this point you could figure out what happens when, and what might happen in a different order. My own first attempt was wrong when I went through it in my head (see edit history). You have to be careful! I'll not give the right answer, upon editing, since I realized this may be a homework assignment.
EDIT: more semantics about <-done
On my first go through, I forgot to mention that fmt.Println(<-done) is conceptually the same as the following.
value := <-done
fmt.Println(value)
This is important because it helps you see that when the main() thread reads from the done channel, it doesn't print it at the same time. These are two separate steps to the runtime.

golang pipelining channels - works as a separate function, but doesn't work as a part of main function

I'm new to go, and at the moment I'm trying to understand how channel synchronisation works. I'm solving a test task that requires me to build a pipeline from channels. I wrote two similar solutions, but one of this doesn't work for an unknown reason (for me).
This doesn't work (the go-routines are started from the function directly):
https://play.golang.org/p/EHceKjZZ-G
This works (the go-routines are started from a separate function):
https://play.golang.org/p/QysTAVxbVc
I'm totally lost, I don't see the difference and can't understand why the first example doesn't work. Does any one have any idea?
You're using a capture variable fn across goroutine, in which the variable will be overridden during iteration. What is seen by all goroutines is the latest job in the funcs. Change your code in Pipe function to the following:
for _, fn := range funcs {
out = make(chan interface{})
wg.Add(1)
go func(f job, inx, outx chan interface{}) {
f(inx, outx)
close(outx)
wg.Done()
}(fn, in, out)
in = out
}
It's one of common mistake in golang.

Kill a method in an infinite loop (golang)

I am working with a piece of code that has an intentional infinite loop, I can't modify that code. I want to write some tests on that method (e.g. make sure it triggers actions at the right times) but I don't want to orphan a bunch of go routines. So I am trying to find a way that I can kill/interrupt that goroutine.
I was thinking of trying to wrap it in a wrapper function that would kill it after a signal. Like this (doesn't work).
func wrap(inf func()) func() {
return func() {
select {
case inf():
case <-time.After(5 * time.Second):
}
}
}
func main() {
go wrap(inf())()
// do things
}
All the variations I can think of don't really work. I was thinking of wrapping the inf in a function that writes to a channel (that will never get called), or something with a return statement. And then the select can read from that. The problem is then you have to launch that. If you do it in this routine you're never getting to the select. If you do it in another routine, you've just made the problem worse.
So, is there a way that I can kill that routine?
(yes - I would rather change the infinite loop code, but can't here)
If you can't change the loop code, you can't kill the loop. Go is rather intentionally designed such that there's no way to kill a goroutine from outside of the goroutine, short of actually terminating the program itself.
If you can change the loop itself, the typical method of killing a routine is to provide a quit channel to the goroutine, and then close (or send on) that channel to tell the loop to exit. Example:
quitCh := make(chan struct{})
go func() {
for {
select {
case <-quitCh:
return
// other cases to handle what you need to do
}
}
}()
// Once you're done
close(quitCh) // goroutine exits
But without some way to coding that closure behavior into the loop itself, there's no way (to my knowledge) of specifically killing that goroutine or terminating the loop within it (unless you can trigger a panic in it, but that's a terrible way to handle that issue)

Go GC stopping my goroutine?

I have been trying to get into Go from the more traditional languages such as Java and C and so far I've been enjoying the well-thought out design choices that Go offers. When I started my first "real" project though, I ran into a problem that almost nobody seems to have.
My project is a simple networking implementation that sends and receives packets. The general structure is something like this (of course simplified):
A client manages the net.Conn with the server. This Clientcreates a PacketReaderand a PacketWriter. These both run infinite loops in a different goroutine. The PacketReader takes an interface with a single OnPacketReceived function that is implemented by the client.
The PacketReader code looks something like this:
go func() {
for {
bytes, err := reader.ReadBytes(10) // Blocks the current routine until bytes are available.
if err != nil {
panic(err) // Handle error
}
reader.handler.OnPacketReceived(reader.parseBytes(bytes))
}
}()
The PacketWriter code looks something like this:
go func() {
for {
if len(reader.packetQueue) > 0 {
// Write packet
}
}
}()
In order to make Client blocking, the client makes a channel that gets filled by OnPacketReceived, something like this:
type Client struct {
callbacks map[int]chan interface{}
// More fields
}
func (c *Client) OnPacketReceived(packet *Packet) {
c.callbacks[packet.Id] <- packet.Data
}
func (c *Client) SendDataBlocking(id int, data interface{}) interface{} {
c.PacketWriter.QueuePacket(data)
return <-c.callbacks[id]
}
Now here is my problem: the reader.parseBytes function does some intensive decoding operation that creates quite a lot of objects (to the point that the GC decides to run). The GC however, pauses the reader goroutine that is currently decoding the bytes, and then hangs. A problem that seems similar to mine is described here. I have confirmed that it is actually the GC causing it, because running it with GOGC=off runs successfully.
At this point, my 3 routines look like this:
- Client (main routine): Waiting for channel
- Writer: Still running, waiting for new data in queue
- Reader: Set as runnable, but not actually running
Somehow, the GC is either not able to stop all routines in order to run, or it does not resume said goroutines after it stopped them.
So my question is this: Is there any way to fix this? I am new to Go so I don't really know if my design choices are even remotely conventional, and I'm all up with changing the structure of my program. Do I need to change the way I handle packet reading callbacks, do I need to try and make the packet decoder less intensive? Thanks!
Edit: I am running Go 1.5.1, I'll try to get a working example later today.
As per mrd0ll4rs comment, changed the writer to use channels instead of a slice (I don't even know why I did that in the first place). That seemed to give the GC enough "mobility" to allow the threads to stop. Adding in the runtime.Gosched() and still using slices also worked though, but the channels seemed more "go-esque".

Resources