Race conditions in io.Pipe? - go

I have a function which returns the Reader end of an io.Pipe and kicks off a go-routine which writes data to the Writer end of it, and then closes the pipe.
func GetPipeReader() io.ReadCloser {
r, w := io.Pipe()
go func() {
_, err := io.CopyN(w, SomeReaderOfSize(N), N)
w.CloseWithError(err)
}()
return r
}
func main() {
var buf bytes.Buffer
io.Copy(&buf, GetPipeReader())
println("got", buf.Len(), "bytes")
}
https://play.golang.org/p/OAijIwmtRr
This seems to always work in my testing, in that I get all the data I wrote. But the API docs are a bit worrying to me:
func Pipe() (*PipeReader, *PipeWriter)
Pipe creates a synchronous in-memory pipe. [...] Reads on one end are
matched with writes on the other, [...] there is no internal
buffering.
func (w *PipeWriter) CloseWithError(err error) error
CloseWithError closes the writer; subsequent reads from the read half
of the pipe will return no bytes and the error err, or EOF if err is
nil.
What I want to know is, what are the possible race conditions here? Is is plausible that my go-routine will write a bunch of data and then close the pipe before I can read it all?
Do I need to use a channel for some signalling on when to close? What can go wrong, basically.

No, there are no race conditions. As the documentation mentions, reads on one end are matched with writes on the other. So, when CloseWithError() is reached, it means every Write has successfully completed and been matched with a corresponding Read - so the other end must have read everything there was to read.

Related

Synchronizing two goroutines with a size 1 channel

In an effort to learn golang, I was looking through the go source for reverseproxy:
https://golang.org/src/net/http/httputil/reverseproxy.go
I found this block of code (truncated):
...
errc := make(chan error, 1)
spc := switchProtocolCopier{user: conn, backend: backConn}
go spc.copyToBackend(errc)
go spc.copyFromBackend(errc)
<-errc
return
}
// switchProtocolCopier exists so goroutines proxying data back and
// forth have nice names in stacks.
type switchProtocolCopier struct {
user, backend io.ReadWriter
}
func (c switchProtocolCopier) copyFromBackend(errc chan<- error) {
_, err := io.Copy(c.user, c.backend)
errc <- err
}
func (c switchProtocolCopier) copyToBackend(errc chan<- error) {
_, err := io.Copy(c.backend, c.user)
errc <- err
}
The portion that caught my attention was the creation of the errc buffered channel. I thought (probably naively) that we would use an unbuffered channel and the later receive from errc would need to run twice, like this:
<-errc
<-errc
As written, I understand that reading from the channel will ensure at least one of the copy methods has run. I also understand that the first send to the channel will not block, while the second will block only if the first one has not yet been received.
What I don't understand, is why it is written like this. Is it to ensure that only one of the methods completes? If that is the case, couldn't they technically both run?
Thanks!
The channel of size one helps realize a binary semaphore.
Since at most one value is consumed from the channel (on line 549), changing the size of the channel to be greater than one will not affect the currently exhibited behavior, which is wait until at least one of the two go routines complete executing the Copy operation.

is it safe to call a method from two different goroutine at same time?

I have two goroutines go doProcess_A() and go doProcess_B(). Both can call saveData(), a non goroutine method.
should I use go saveData() instead of saveData() ?
Which one is safe?
var waitGroup sync.WaitGroup
func main() {
for i:=0; i<4; i++{
waitGroup.Add(2)
go doProcess_A(i)
go doProcess_B(i)
}
waitGroup.Wait()
}
func doProcess_A(i int) {
// do process
// the result will be stored in data variable
data := "processed data-A as string"
uniqueFileName := "file_A_"+strconv.Itoa(i)+".txt"
saveData(uniqueFileName, data)
waitGroup.Done()
}
func doProcess_B(i int) {
// do some process
// the result will be stored in data variable
data := "processed data-B as string"
uniqueFileName := "file_B_"+strconv.Itoa(i)+".txt"
saveData(uniqueFileName, data)
waitGroup.Done()
}
// write text file
func saveData(fileName ,dataStr string) {
// file name will be unique.
// there is no chance to be same file name
err := ioutil.WriteFile("out/"+fileName, []byte(dataStr), 0644)
if err != nil {
panic(err)
}
}
here, does one goroutine wait for disk file operation when other goroutine is doing?
or, are two goroutine make there own copy of saveData() ?
Goroutines typically don't wait for anything except you explicitly tell them to or if an operation is waiting on a channel or other blocking operation. In your code there is a possibility of a race condition with unwanted results if multiple goroutines call the saveData() function with same filename. It appears that the two goroutines are writing to different files, therefore as long as the filenames are unique, the saveData operation will be safe in a goroutine. It doesn't make sense to use a go routine to call saveData(), don't unnecessarily complicate your life, just call it directly in the doProcess_X functions.
Read more about goroutines and make sure you are using it where it is absolutely necessary. - https://gobyexample.com/goroutines
Note: Just because you are writing a Go application doesn't mean you
should litter it with goroutines. Read and understand what problem it
solves so as to know the best time to use it.

How to set timeout in *os.File/io.Read in golang

I know there is a function called SetReadDeadline that can set a timeout in socket(conn.net) reading, while io.Read not. There is a way that starts another routine as a timer to solve this problem, but it brings another problem that the reader routine(io.Read) still block:
func (self *TimeoutReader) Read(buf []byte) (n int, err error) {
ch := make(chan bool)
n = 0
err = nil
go func() { // this goroutime still exist even when timeout
n, err = self.reader.Read(buf)
ch <- true
}()
select {
case <-ch:
return
case <-time.After(self.timeout):
return 0, errors.New("Timeout")
}
return
}
This question is similar in this post, but the answer is unclear.
Do you guys have any good idea to solve this problem?
Instead of setting a timeout directly on the read, you can close the os.File after a timeout. As written in https://golang.org/pkg/os/#File.Close
Close closes the File, rendering it unusable for I/O. On files that support SetDeadline, any pending I/O operations will be canceled and return immediately with an error.
This should cause your read to fail immediately.
Your mistake here is something different:
When you read from the reader you just read one time and that is wrong:
go func() {
n, err = self.reader.Read(buf) // this Read needs to be in a loop
ch <- true
}()
Here is a simple example (https://play.golang.org/p/2AnhrbrhLrv)
buf := bytes.NewBufferString("0123456789")
r := make([]byte, 3)
n, err := buf.Read(r)
fmt.Println(string(r), n, err)
// Output: 012 3 <nil>
The size of the given slice is used when using the io.Reader. If you would log the n variable in your code you would see, that not the whole file is read. The select statement outside of your goroutine is at the wrong place.
go func() {
a := make([]byte, 1024)
for {
select {
case <-quit:
result <- []byte{}
return
default:
_, err = self.reader.Read(buf)
if err == io.EOF {
result <- a
return
}
}
}
}()
But there is something more! You want to implement the io.Reader interface. After the Read() method is called until the file ends you should not start a goroutine in here, because you just read chunks of the file.
Also the timeout inside the Read() method doesn't help, because that timeout works for each call and not for the whole file.
In addition to #apxp's point about looping over Read, you could use a buffer size of 1 byte so that you never block as long is there is data to read.
When interacting with external resources anything can happen. It is possible for any given io.Reader implementation to simply block forever. Here, I'll write one for you...
type BlockingReader struct{}
func (BlockingReader) Read(b []byte) (int, error) {
<-make(chan struct{})
return 0, nil
}
Remember anyone can implement an interface, so you can't make any assumptions that it will behave like *os.File or any other standard library io.Reader. In addition to asinine coding like mine above, an io.Reader could legitimately connect to a resources that can block forever.
You cannot kill gorountines, so if an io.Reader truly blocks forever the blocked goroutine will continue to consume resources until your application terminates. However, this shouldn't be a problem, a blocked goroutine does not consume much in the way of resources, and should be fine as long as you don't blindly retry blocked Reads by spawning more gorountines.

Why is my code causing a stall or race condition?

For some reason, once I started adding strings through a channel in my goroutine, the code stalls when I run it. I thought that it was a scope/closure issue so I moved all code directly into the function to no avail. I have looked through Golang's documentation and all examples look similar to mine so I am kind of clueless as to what is going wrong.
func getPage(url string, c chan<- string, swg sizedwaitgroup.SizedWaitGroup) {
defer swg.Done()
doc, err := goquery.NewDocument(url)
if err != nil{
fmt.Println(err)
}
nodes := doc.Find(".v-card .info")
for i := range nodes.Nodes {
el := nodes.Eq(i)
var name string
if el.Find("h3.n span").Size() != 0{
name = el.Find("h3.n span").Text()
}else if el.Find("h3.n").Size() != 0{
name = el.Find("h3.n").Text()
}
address := el.Find(".adr").Text()
phoneNumber := el.Find(".phone.primary").Text()
website, _ := el.Find(".track-visit-website").Attr("href")
//c <- map[string] string{"name":name,"address":address,"Phone Number": phoneNumber,"website": website,};
c <- fmt.Sprint("%s%s%s%s",name,address,phoneNumber,website)
fmt.Println([]string{name,address,phoneNumber,website,})
}
}
func getNumPages(url string) int{
doc, err := goquery.NewDocument(url)
if err != nil{
fmt.Println(err);
}
pagination := strings.Split(doc.Find(".pagination p").Contents().Eq(1).Text()," ")
numItems, _ := strconv.Atoi(pagination[len(pagination)-1])
return int(math.Ceil(float64(numItems)/30))
}
func main() {
arrChan := make(chan string)
swg := sizedwaitgroup.New(8)
zips := []string{"78705","78710","78715"}
for _, item := range zips{
swg.Add()
go getPage(fmt.Sprintf(base_url,item,1),arrChan,swg)
}
swg.Wait()
}
Edit:
so I fixed it by passing sizedwaitgroup as a reference but when I remove the buffer it doesn't work does that mean that I need to know how many elements will be sent to the channel in advance?
Issue
Building off of Colin Stewart's answer, from the code you have posted, as far as I can tell, your issue is actually with reading your arrChan. You write into it, but there's no place where you read from it in your code.
From the documentation :
If the channel is unbuffered, the sender blocks until the receiver has received the value. If the channel has a buffer, the sender blocks only until the value
has been copied to the buffer; if the buffer is full, this means
waiting until some receiver has retrieved a value.
By making the channel buffered, what's happening is your code is no longer blocking on the channel write operations, the line that looks like:
c <- fmt.Sprint("%s%s%s%s",name,address,phoneNumber,website)
My guess is that if you're still hanging at when the channel has a size of 5000, it's because you have more than 5000 values returned across all of your loops over node.Nodes. Once your buffered channel is full, the operations block until the channel has space, just like if you were writing to an unbuffered channel.
Fix
Here's a minimal example showing you how you would fix something like this (basically just add a reader)
package main
import "sync"
func getPage(item string, c chan<- string) {
c <- item
}
func readChannel(c <-chan string) {
for {
<-c
}
}
func main() {
arrChan := make(chan string)
wg := sync.WaitGroup{}
zips := []string{"78705", "78710", "78715"}
for _, item := range zips {
wg.Add(1)
go func() {
defer wg.Done()
getPage(item, arrChan)
}()
}
go readChannel(arrChan) // comment this out and you'll deadlock
wg.Wait()
}
Your channel has no buffer, so writes will block until the value can be read, and at least in the code you have posted, there are no readers.
You don't need to know size to make it work. But you might in order to exit cleanly. Which can be a bit tricky to observe at time because your program will exit once your main function exits and all goroutines still running are killed immediately finished or not.
As a warm up example, change readChannel in photoionized's response to this:
func readChannel(c <-chan string) {
for {
url := <-c
fmt.Println (url)
}
}
It only adds printing to the original code. But now you'll see better what is actually happening. Notice how it usually only prints two strings when code actually writes 3. This is because code exits once all writing goroutines finish, but reading goroutine is aborted mid way as result. You can "fix" it by removing "go" before readChannel (which would be same as reading the channel in main function). And then you'll see 3 strings printed, but program crashes with a dead lock as readChannel is still reading from the channel, but nobody writes into it anymore. You can fix that too by reading exactly 3 strings in readChannel(), but that requires knowing how many strings you expect to receive.
Here is my minimal working example (I'll use it to illustrate the rest):
package main
import (
"fmt"
"sync"
)
func getPage(url string, c chan<- string, wg *sync.WaitGroup) {
defer wg.Done()
c <- fmt.Sprintf("Got page for %s\n",url)
}
func readChannel(c chan string, wg *sync.WaitGroup) {
defer wg.Done()
var url string
ok := true
for ok {
url, ok = <- c
if ok {
fmt.Printf("Received: %s\n", url)
} else {
fmt.Println("Exiting readChannel")
}
}
}
func main() {
arrChan := make(chan string)
var swg sync.WaitGroup
base_url := "http://test/%s/%d"
zips := []string{"78705","78710","78715"}
for _, item := range zips{
swg.Add(1)
go getPage(fmt.Sprintf(base_url,item,1),arrChan,&swg)
}
var wg2 sync.WaitGroup
wg2.Add(1)
go readChannel(arrChan, &wg2)
swg.Wait()
// All written, signal end to readChannel by closing the channel
close(arrChan)
wg2.Wait()
}
Here I close the channel to signal to readChannel that there is nothing left to read, so it can exit cleanly at proper time. But sometimes you might want instead to tell readChannel to read exactly 3 strings and finish. Or may be you would want to start one reader for each writer and each reader will read exactly one string... Well, there are many ways to skin a cat and choice is all yours.
Note, if you remove wg2.Wait() line your code becomes equivalent to photoionized's response and will only print two strings whilst writing 3. This is because code exits once all writers finish (ensured by swg.Wait()), but it does not wait for readChannel to finish.
If you remove close(arrChan) line instead, your code will crash with a deadlock after printing 3 lines as code waits for readChannel to finish, but readChannel waits to read from a channel which nobody is writing to anymore.
If you just remove "go" before the readChannel call, it becomes equivalent of reading from channel inside main function. It will again crash with a dead lock after printing 3 strings because readChannel is still reading when all writers have already finished (and readChannel has already read all they written). A tricky point here is that swg.Wait() line will never be reached by this code as readChannel never exits.
If you move readChannel call after the swg.Wait() then code will crash before even printing a single string. But this is a different dead lock. This time code reaches swg.Wait() and stops there waiting for writers. First writer succeeds, but channel is not buffered, so next writer blocks until someone reads from the channel the data already written. Trouble is - nobody reads from the channel yet as readChannel has not been called yet. So, it stalls and crashes with a dead lock. This particular issue can be "fixed", but making channel buffered as in make(chan string, 3) as that will allow writers to keep writing even though nobody is reading from that channel yet. And sometimes this is what you want. But here again you have to know the maximum of messages to ever be in the channel buffer. And most of the time it's only deferring a problem - just add one more writer and you are where you started - code stalls and crashes as channel buffer is full and that one extra writer is waiting for someone to read from the buffer.
Well, this should covers all bases. So, check your code and see which case is yours.

Concurrent writing to file from ajax request

I want to write requests to one file from some ajax script. The problem arises when there will be many of those in a second and writing to file will take more time than the break between requests, and when there will be two requests at the same time.
How could I solve this?
I've came up using mutex, like:
var mu sync.Mutex
func writeToFile() {
mu.Lock()
defer mu.Unlock()
// write to file
}
But it makes the whole thing synchronous and I don't really know what happens when there are two requests at the same time. And it still does not lock the file itself.
Uh, what's the proper way to do this?
You only need to make writing to the file "sequential", meaning don't allow 2 concurrent goroutines to write to the file. Yes, if you use locking in the writeToFile() function, serving your ajax requests may become (partially) sequential too.
What I suggest is open the file once, when your application starts. And designate a single goroutine which will be responsible writing to the file, no other goroutines should do it.
And use a buffered channel to send data that should be written to the file. This will make serving ajax requests non-blocking, and still the file will not be written concurrently / parallel.
Note that this way ajax requests won't even have to wait while the data is actually written to the file (faster response time). This may or may not be a problem. For example if later writing fails, your ajax response might already be committed => no chance to signal failure to the client.
Example how to do it:
var (
f *os.File
datach = make(chan []byte, 100) // Buffered channel
)
func init() {
// Open file for appending (create if doesn't exist)
var err error
f, err = os.OpenFile("data.txt", os.O_CREATE|os.O_APPEND|os.O_WRONLY, 0666)
if err != nil {
panic(err)
}
// Start goroutine which writes to the file
go writeToFile()
}
func writeToFile() {
// Loop through any data that needs to be written:
for data := range datach {
if _, err := f.Write(data); err != nil {
// handle error!
}
}
// We get here if datach is closed: shutdown
f.Close()
}
func ajaxHandler(w http.ResponseWriter, r *http.Request) {
// Assmeble data that needs to be written (appended) to the file
data := []byte{1, 2, 3}
// And send it:
datach <- data
}
To gracefully exit from the app, you should close the datach channel: when it's closed, the loop in the writeToFile() will terminate, and the file will be closed (flushing any cached data, releasing OS resources).
If you want to write text to the file, you may declare the data channel like this:
var datach = make(chan string, 100) // Buffered channel
And you may use File.WriteString() to write it to the file:
if _, err := f.WriteString(data); err != nil {
// handle error!
}

Resources