I've just started learning Go. I'm writing a small server application, and the function (method) that handles the requests (through http.HandleFunc) writes to a file - always the same file. Since, as I understand, http.HandleFunc starts a new goroutine for each request, I'm worried that the file writes might end up interfering with each other in some way - by blocking each other or just overlapping.
Looking at the actual output this problem has not arisen so far, but could it arise, and if so how do I fix it?
Here's a cleaned up version of my code:
package main
import (
"os"
"net/http"
)
type Service struct{
file *os.File
}
func (ser *Service) handleRequest(w http.ResponseWriter, req *http.Request){
//do lots of stuff that does not affect file
message := ...
n, err := ser.file.Write(message) //This is what I'm worried about
//handle error and wrap up
}
func main(){
m := http.NewServeMux()
fi,err := os.Open("/boolanger/file.txt")
//handle error
ser := &Service{file:fi}
m.HandleFunc("/service/", ser.handleRequest)
server := http.Server{
Addr: ":8080",
Handler: m}
serverError := server.ListenAndServe()
}
Ideally I'd like the file writes to be made in the order the requests came in, but this is not that important. I'm more worried about the different file writes interfering in some way.
File writes are blocking and atomic. So, concurrent writes will wait for each other, and will not "interfere" with each other, though output may be interleaved. If you want more control, wrap your writes with a sync.Mutex to ensure that one routine completes all its writes before the next routine starts its writes.
Related
In an effort to learn golang, I was looking through the go source for reverseproxy:
https://golang.org/src/net/http/httputil/reverseproxy.go
I found this block of code (truncated):
...
errc := make(chan error, 1)
spc := switchProtocolCopier{user: conn, backend: backConn}
go spc.copyToBackend(errc)
go spc.copyFromBackend(errc)
<-errc
return
}
// switchProtocolCopier exists so goroutines proxying data back and
// forth have nice names in stacks.
type switchProtocolCopier struct {
user, backend io.ReadWriter
}
func (c switchProtocolCopier) copyFromBackend(errc chan<- error) {
_, err := io.Copy(c.user, c.backend)
errc <- err
}
func (c switchProtocolCopier) copyToBackend(errc chan<- error) {
_, err := io.Copy(c.backend, c.user)
errc <- err
}
The portion that caught my attention was the creation of the errc buffered channel. I thought (probably naively) that we would use an unbuffered channel and the later receive from errc would need to run twice, like this:
<-errc
<-errc
As written, I understand that reading from the channel will ensure at least one of the copy methods has run. I also understand that the first send to the channel will not block, while the second will block only if the first one has not yet been received.
What I don't understand, is why it is written like this. Is it to ensure that only one of the methods completes? If that is the case, couldn't they technically both run?
Thanks!
The channel of size one helps realize a binary semaphore.
Since at most one value is consumed from the channel (on line 549), changing the size of the channel to be greater than one will not affect the currently exhibited behavior, which is wait until at least one of the two go routines complete executing the Copy operation.
I want to write requests to one file from some ajax script. The problem arises when there will be many of those in a second and writing to file will take more time than the break between requests, and when there will be two requests at the same time.
How could I solve this?
I've came up using mutex, like:
var mu sync.Mutex
func writeToFile() {
mu.Lock()
defer mu.Unlock()
// write to file
}
But it makes the whole thing synchronous and I don't really know what happens when there are two requests at the same time. And it still does not lock the file itself.
Uh, what's the proper way to do this?
You only need to make writing to the file "sequential", meaning don't allow 2 concurrent goroutines to write to the file. Yes, if you use locking in the writeToFile() function, serving your ajax requests may become (partially) sequential too.
What I suggest is open the file once, when your application starts. And designate a single goroutine which will be responsible writing to the file, no other goroutines should do it.
And use a buffered channel to send data that should be written to the file. This will make serving ajax requests non-blocking, and still the file will not be written concurrently / parallel.
Note that this way ajax requests won't even have to wait while the data is actually written to the file (faster response time). This may or may not be a problem. For example if later writing fails, your ajax response might already be committed => no chance to signal failure to the client.
Example how to do it:
var (
f *os.File
datach = make(chan []byte, 100) // Buffered channel
)
func init() {
// Open file for appending (create if doesn't exist)
var err error
f, err = os.OpenFile("data.txt", os.O_CREATE|os.O_APPEND|os.O_WRONLY, 0666)
if err != nil {
panic(err)
}
// Start goroutine which writes to the file
go writeToFile()
}
func writeToFile() {
// Loop through any data that needs to be written:
for data := range datach {
if _, err := f.Write(data); err != nil {
// handle error!
}
}
// We get here if datach is closed: shutdown
f.Close()
}
func ajaxHandler(w http.ResponseWriter, r *http.Request) {
// Assmeble data that needs to be written (appended) to the file
data := []byte{1, 2, 3}
// And send it:
datach <- data
}
To gracefully exit from the app, you should close the datach channel: when it's closed, the loop in the writeToFile() will terminate, and the file will be closed (flushing any cached data, releasing OS resources).
If you want to write text to the file, you may declare the data channel like this:
var datach = make(chan string, 100) // Buffered channel
And you may use File.WriteString() to write it to the file:
if _, err := f.WriteString(data); err != nil {
// handle error!
}
I have a function which returns the Reader end of an io.Pipe and kicks off a go-routine which writes data to the Writer end of it, and then closes the pipe.
func GetPipeReader() io.ReadCloser {
r, w := io.Pipe()
go func() {
_, err := io.CopyN(w, SomeReaderOfSize(N), N)
w.CloseWithError(err)
}()
return r
}
func main() {
var buf bytes.Buffer
io.Copy(&buf, GetPipeReader())
println("got", buf.Len(), "bytes")
}
https://play.golang.org/p/OAijIwmtRr
This seems to always work in my testing, in that I get all the data I wrote. But the API docs are a bit worrying to me:
func Pipe() (*PipeReader, *PipeWriter)
Pipe creates a synchronous in-memory pipe. [...] Reads on one end are
matched with writes on the other, [...] there is no internal
buffering.
func (w *PipeWriter) CloseWithError(err error) error
CloseWithError closes the writer; subsequent reads from the read half
of the pipe will return no bytes and the error err, or EOF if err is
nil.
What I want to know is, what are the possible race conditions here? Is is plausible that my go-routine will write a bunch of data and then close the pipe before I can read it all?
Do I need to use a channel for some signalling on when to close? What can go wrong, basically.
No, there are no race conditions. As the documentation mentions, reads on one end are matched with writes on the other. So, when CloseWithError() is reached, it means every Write has successfully completed and been matched with a corresponding Read - so the other end must have read everything there was to read.
In go, how can I control the concurrent writing to a text file?
I ask this because I will have multiple goroutines writing to a text file using the same file handler.
I wrote this bit of code to try and see what happens but I'm not sure if I did it "right":
package main
import (
"os"
"sync"
"fmt"
"time"
"math/rand"
"math"
)
func WriteToFile( i int, f *os.File, w *sync.WaitGroup ){
//sleep for either 200 or 201 milliseconds
randSleep := int( math.Floor( 200 + ( 2 * rand.Float64() ) ) )
fmt.Printf( "Thread %d waiting %d\n", i, randSleep )
time.Sleep( time.Duration(randSleep) * time.Millisecond )
//write to the file
fmt.Fprintf( f, "Printing out: %d\n", i )
//write to stdout
fmt.Printf( "Printing out: %d\n", i )
w.Done()
}
func main() {
rand.Seed( time.Now().UnixNano() )
d, err := os.Getwd()
if err != nil {
fmt.Println( err )
}
filename := d + "/log.txt"
f, err := os.OpenFile( filename, os.O_CREATE | os.O_WRONLY | os.O_TRUNC, 0666 )
if err != nil {
fmt.Println( err )
}
var w *sync.WaitGroup = new(sync.WaitGroup)
w.Add( 10 )
//start 10 writers to the file
for i:=1; i <= 10; i++ {
go WriteToFile( i, f, w )
}
//wait for writers to finish
w.Wait()
}
I half expected that the output would show something like this in the file instead of the coherent output I got:
Printing Printing out: 2
out: 5
Poriuntitng: 6
Essentially, I expected the characters to come out incoherently and interweaved due to a lack of synchronization. Did I not write code that would coax this behavior out? Or is some mechanism during calls to fmt.Fprintf synchronizing the writing?
A simple approach to controlling concurrent access is via a service goroutine, receiving messages from a channel. This goroutine would have sole access to the file. Access would therefore be sequential, without any race problems.
Channels do a good job of interleaving requests. The clients write to the channel instead of directly to the file. Messages on the channel are automatically interleaved for you.
The benefit of this approach over simply using a Mutex is that you start viewing your program as a collection of microservices. This is the CSP way and leads to easy composition of large systems from smaller components.
There are many ways to control concurrent access. The easiest is to use a Mutex:
var mu sync.Mutex
func WriteToFile( i int, f *os.File, w *sync.WaitGroup ){
mu.Lock()
defer mu.Unlock()
// etc...
}
As to why you're not seeing problems, Go uses operating system calls to implement file access, and those system calls are thread safe (emphasis added):
According to POSIX.1-2008/SUSv4 Section XSI 2.9.7 ("Thread Interactions with Regular File Operations"):
All of the following functions shall be atomic with respect to
each other in the effects specified in POSIX.1-2008 when they
operate on regular files or symbolic links: ...
Among the APIs subsequently listed are write() and writev(2). And
among the effects that should be atomic across threads (and
processes) are updates of the file offset. However, on Linux before
version 3.14, this was not the case: if two processes that share an
open file description (see open(2)) perform a write() (or writev(2))
at the same time, then the I/O operations were not atomic with
respect updating the file offset, with the result that the blocks of
data output by the two processes might (incorrectly) overlap. This
problem was fixed in Linux 3.14.
I would still use a lock though, since Go code is not automatically thread safe. (two goroutines modifying the same variable will result in strange behavior)
I thought I'd found an easy way to return an http response immediately then do some work in the background without blocking. However, this doesn't work.
func MyHandler(w http.ResponseWriter, r *http.Request) {
//handle form values
go doSomeBackgroundWork() // this will take 2 or 3 seconds
w.WriteHeader(http.StatusOK)
}
It works the first time--the response is returned immediately and the background work starts. However, any further requests hang until the background goroutine completes. Is there a better way to do this, that doesn't involve setting up a message queue and a separate background process.
I know this question was posted 4 years ago, but I hope someone can find this useful.
Here is a way to do that
There are something called worker pool https://gobyexample.com/worker-pools Using go routines and channels
But in the following code I adapt it to a handler. (Consider for simplicity I'm ignoring the errors and I'm using jobs as global variable)
package main
import (
"fmt"
"net/http"
"time"
)
var jobs chan int
func worker(jobs <-chan int) {
fmt.Println("Register the worker")
for i := range jobs {
fmt.Println("worker processing job", i)
time.Sleep(time.Second * 5)
}
}
func handler(w http.ResponseWriter, r *http.Request) {
jobs <- 1
fmt.Fprintln(w, "hello world")
}
func main() {
jobs = make(chan int, 100)
go worker(jobs)
http.HandleFunc("/request", handler)
http.ListenAndServe(":9090", nil)
}
The explanation:
main()
Runs the worker in background using a go routine
Start the service with my handler
note that the worker in this moment is ready to receive a job
worker()
it is a go routine that receives a channel
the for loop never ends because the channel is never closed
when a the channel contain a job, do some work (e.g. waits for 5 seconds)
handler()
writes to the channel to active a job
immediately returns printing "hello world" to the page
the cool thing is that you can send as many requests you want and because this scenario only contains 1 worker. the next request will wait until the previous one finished.
This is awesome Go!
Go multiplexes goroutines onto the available threads which is determined by the GOMAXPROCS environment setting. As a result if this is set to 1 then a single goroutine can hog the single thread Go has available to it until it yields control back to the Go runtime. More than likely doSomeBackgroundWork is hogging all the time on a single thread which is preventing the http handler from getting scheduled.
There are a number of ways to fix this.
First, as a general rule when using goroutines, you should set GOMAXPROCS to the number of CPUs your system has or to whichever is bigger.
Second, you can yield control in a goroutine by doing any of the following:
runtime.Gosched()
ch <- foo
foo := <-ch
select { ... }
mutex.Lock()
mutex.Unlock()
All of these will yield back to the Go runtime scheduler giving other goroutines a chance to work.