Concurrent writing to a file

Concurrent writing to a file - go

In go, how can I control the concurrent writing to a text file?
I ask this because I will have multiple goroutines writing to a text file using the same file handler.
I wrote this bit of code to try and see what happens but I'm not sure if I did it "right":
package main
import (
"os"
"sync"
"fmt"
"time"
"math/rand"
"math"
)
func WriteToFile( i int, f *os.File, w *sync.WaitGroup ){
//sleep for either 200 or 201 milliseconds
randSleep := int( math.Floor( 200 + ( 2 * rand.Float64() ) ) )
fmt.Printf( "Thread %d waiting %d\n", i, randSleep )
time.Sleep( time.Duration(randSleep) * time.Millisecond )
//write to the file
fmt.Fprintf( f, "Printing out: %d\n", i )
//write to stdout
fmt.Printf( "Printing out: %d\n", i )
w.Done()
}
func main() {
rand.Seed( time.Now().UnixNano() )
d, err := os.Getwd()
if err != nil {
fmt.Println( err )
}
filename := d + "/log.txt"
f, err := os.OpenFile( filename, os.O_CREATE | os.O_WRONLY | os.O_TRUNC, 0666 )
if err != nil {
fmt.Println( err )
}
var w *sync.WaitGroup = new(sync.WaitGroup)
w.Add( 10 )
//start 10 writers to the file
for i:=1; i <= 10; i++ {
go WriteToFile( i, f, w )
}
//wait for writers to finish
w.Wait()
}
I half expected that the output would show something like this in the file instead of the coherent output I got:
Printing Printing out: 2
out: 5
Poriuntitng: 6
Essentially, I expected the characters to come out incoherently and interweaved due to a lack of synchronization. Did I not write code that would coax this behavior out? Or is some mechanism during calls to fmt.Fprintf synchronizing the writing?

A simple approach to controlling concurrent access is via a service goroutine, receiving messages from a channel. This goroutine would have sole access to the file. Access would therefore be sequential, without any race problems.
Channels do a good job of interleaving requests. The clients write to the channel instead of directly to the file. Messages on the channel are automatically interleaved for you.
The benefit of this approach over simply using a Mutex is that you start viewing your program as a collection of microservices. This is the CSP way and leads to easy composition of large systems from smaller components.

There are many ways to control concurrent access. The easiest is to use a Mutex:
var mu sync.Mutex
func WriteToFile( i int, f *os.File, w *sync.WaitGroup ){
mu.Lock()
defer mu.Unlock()
// etc...
}
As to why you're not seeing problems, Go uses operating system calls to implement file access, and those system calls are thread safe (emphasis added):
According to POSIX.1-2008/SUSv4 Section XSI 2.9.7 ("Thread Interactions with Regular File Operations"):
All of the following functions shall be atomic with respect to
each other in the effects specified in POSIX.1-2008 when they
operate on regular files or symbolic links: ...
Among the APIs subsequently listed are write() and writev(2). And
among the effects that should be atomic across threads (and
processes) are updates of the file offset. However, on Linux before
version 3.14, this was not the case: if two processes that share an
open file description (see open(2)) perform a write() (or writev(2))
at the same time, then the I/O operations were not atomic with
respect updating the file offset, with the result that the blocks of
data output by the two processes might (incorrectly) overlap. This
problem was fixed in Linux 3.14.
I would still use a lock though, since Go code is not automatically thread safe. (two goroutines modifying the same variable will result in strange behavior)

Related

Multiple Go routines reading from the same channel

Hi I'm having a problem with a control channel (of sorts).
The essence of my program:
I do not know how many go routines I will be running at runtime
I will need to restart these go routines at set times, however, they could also potentially error out (and then restarted), so their timing will not be predictable.
These go routines will be putting messages onto a single channel.
So What I've done is created a simple random message generator to put messages onto a channel.
When the timer is up (random duration for testing) I put a message onto a control channel which is a struct payload, so I know there was a close signal and which go routine it was; in reality I'd then do some other stuff I'd need to do before starting the go routines again.
My problem is:
I receive the control message within my reflect.Select loop
I do not (or unable to) receive it in my randmsgs() loop
Therefore I can not stop my randmsgs() go routine.
I believe I'm right in understanding that multiple go routines can read from a single channel, therefore I think I'm misunderstanding how reflect.SelectCases fit into all of this.
My code:
package main
import (
"fmt"
"math/rand"
"reflect"
"time"
)
type testing struct {
control bool
market string
}
func main() {
rand.Seed(time.Now().UnixNano())
// explicitly define chanids for tests.
var chanids []string = []string{"GR I", "GR II", "GR III", "GR IV"}
stream := make(chan string)
control := make([]chan testing, len(chanids))
reflectCases := make([]reflect.SelectCase, len(chanids)+1)
// MAKE REFLECT SELECTS FOR 4 CONTROL CHANS AND 1 DATA CHANNEL
for i := range chanids {
control[i] = make(chan testing)
reflectCases[i] = reflect.SelectCase{Dir: reflect.SelectRecv, Chan: reflect.ValueOf(control[i])}
}
reflectCases[4] = reflect.SelectCase{Dir: reflect.SelectRecv, Chan: reflect.ValueOf(stream)}
// START GO ROUTINES
for i, val := range chanids {
runningFunc(control[i], val, stream, 1+rand.Intn(30-1))
}
// READ DATA
for {
o, recieved, ok := reflect.Select(reflectCases)
if !ok {
fmt.Println("You really buggered this one up...")
}
ty, err := recieved.Interface().(testing)
if err == true {
fmt.Printf("Read from: %v, and recieved close signal from: %s\n", o, ty.market)
// close control & stream here.
} else {
ty := recieved.Interface().(string)
fmt.Printf("Read from: %v, and recieved value from: %s\n", o, ty)
}
}
}
// THE GO ROUTINES - TIMER AND RANDMSGS
func runningFunc(q chan testing, chanid string, stream chan string, dur int) {
go timer(q, dur, chanid)
go randmsgs(q, chanid, stream)
}
func timer(q chan testing, t int, message string) {
for t > 0 {
time.Sleep(time.Second)
t--
}
q <- testing{true, message}
}
func randmsgs(q chan testing, chanid string, stream chan string) {
for {
select {
case <-q:
fmt.Println("Just sitting by the mailbox. :(")
return
default:
secondsToWait := 1 + rand.Intn(5-1)
time.Sleep(time.Second * time.Duration(secondsToWait))
stream <- fmt.Sprintf("%s: %d", chanid, secondsToWait)
}
}
}
I apologise for the wall of text, but I'm all out of ideas :(!
K/Regards,
C.

Your channels q in the second half are the same as control[0...3] in the first.
Your reflect.Select that you are running also reads from all of these channels, with no delay.
The problem I think comes down to that your reflect.Select is simply running too fast and "stealing" all the channel output right away. This is why randmsgs is never able to read the messages.
You'll notice that if you remove the default case from randmsgs, the function is able to (potentially) read some of the messages from q.
select {
case <-q:
fmt.Println("Just sitting by the mailbox. :(")
return
}
This is because now that it is running without delay, it is always waiting for a message on q and thus has the chance to beat the reflect.Select in the race.
If you read from the same channel in multiple goroutines, then the data passed will simply go to whatever goroutine reads it first.
This program appears to just be an experiment / learning experience, but I'll offer some criticism that may help.
Again, generally you don't have multiple goroutines reading from the same channel if both goroutines are doing different tasks. You're creating a mostly non-deterministic race as to which goroutine fetches the data first.
Second, this is a common beginner's anti-pattern with select that you should avoid:
for {
select {
case v := <-myChan:
doSomething(v)
default:
// Oh no, there wasn't anything! Guess we have to wait and try again.
time.Sleep(time.Second)
}
This code is redundant because select already behaves in such a way that if no case is initially ready, it will wait until any case is ready and then proceed with that one. This default: sleep is effectively making your select loop slower and yet spending less time actually waiting on the channel (because 99.999...% of the time is spent on time.Sleep).

Asynchronous file writes inside http.HandleFunc in Golang

I've just started learning Go. I'm writing a small server application, and the function (method) that handles the requests (through http.HandleFunc) writes to a file - always the same file. Since, as I understand, http.HandleFunc starts a new goroutine for each request, I'm worried that the file writes might end up interfering with each other in some way - by blocking each other or just overlapping.
Looking at the actual output this problem has not arisen so far, but could it arise, and if so how do I fix it?
Here's a cleaned up version of my code:
package main
import (
"os"
"net/http"
)
type Service struct{
file *os.File
}
func (ser *Service) handleRequest(w http.ResponseWriter, req *http.Request){
//do lots of stuff that does not affect file
message := ...
n, err := ser.file.Write(message) //This is what I'm worried about
//handle error and wrap up
}
func main(){
m := http.NewServeMux()
fi,err := os.Open("/boolanger/file.txt")
//handle error
ser := &Service{file:fi}
m.HandleFunc("/service/", ser.handleRequest)
server := http.Server{
Addr: ":8080",
Handler: m}
serverError := server.ListenAndServe()
}
Ideally I'd like the file writes to be made in the order the requests came in, but this is not that important. I'm more worried about the different file writes interfering in some way.

File writes are blocking and atomic. So, concurrent writes will wait for each other, and will not "interfere" with each other, though output may be interleaved. If you want more control, wrap your writes with a sync.Mutex to ensure that one routine completes all its writes before the next routine starts its writes.

length of slice vary while already using waitgroup

I have a hard time understanding concurrency/paralel. in my code I made a loop of 5 cycle. Inside of the loop I added the wg.Add(1), in total I have 5 Adds. Here's the code:
package main
import (
"fmt"
"sync"
)
func main() {
var list []int
wg := sync.WaitGroup{}
for i := 0; i < 5; i++ {
wg.Add(1)
go func(c *[]int, i int) {
*c = append(*c, i)
wg.Done()
}(&list, i)
}
wg.Wait()
fmt.Println(len(list))
}
The main func waits until all the goroutines finish but when I tried to print the length of slice I get random results. ex (1,3,etc) is there something that is missing for it to get the expected result ie 5 ?

is there something that is missing for it to get the expected result ie 5 ?
Yes, proper synchronization. If multiple goroutines access the same variable where at least one of them is a write, you need explicit synchronization.
Your example can be "secured" with a single mutex:
var list []int
wg := sync.WaitGroup{}
mu := &sync.Mutex{} // A mutex
for i := 0; i < 5; i++ {
wg.Add(1)
go func(c *[]int, i int) {
mu.Lock() // Must lock before accessing shared resource
*c = append(*c, i)
mu.Unlock() // Unlock when we're done with it
wg.Done()
}(&list, i)
}
wg.Wait()
fmt.Println(len(list))
This will always print 5.
Note: the same slice is read at the end to prints its length, yet we are not using the mutex there. This is because the use of waitgroup ensures that we can only get to that point after all goroutines that modify it have completed their job, so data race cannot occur there. But in general both reads and writes have to be synchronized.
See possible duplicates:
go routine not collecting all objects from channel
Server instances with multiple users
Why does this code cause data race?
How safe are Golang maps for concurrent Read/Write operations?
golang struct concurrent read and write without Lock is also running ok?
See related questions:
Can I concurrently write different slice elements
If I am using channels properly should I need to use mutexes?
Is it safe to read a function pointer concurrently without a lock?
Concurrent access to maps with 'range' in Go

what can create huge overhead of goroutines?

for an assignment we are using go and one of the things we are going to do is to parse a uniprotdatabasefile line-by-line to collect uniprot-records.
I prefer not to share too much code, but I have a working code snippet that does parse such a file (2.5 GB) correctly in 48 s (measured using the time go-package). It parses the file iteratively and add lines to a record until a record end signal is reached (a full record), and metadata on the record is created. Then the record string is nulled, and a new record is collected line-by-line. Then I thought that I would try to use go-routines.
I have got some tips before from stackoverflow, and then to the original code I simple added a function to handle everything concerning the metadata-creation.
So, the code is doing
create an empty record,
iterate the file and add lines to the record,
if a record stop signal is found (now we have a full record) - give it to a go routine to create the metadata
null the record string and continue from 2).
I also added a sync.WaitGroup() to make sure that I waited (in the end) for each routine to finish. I thought that this would actually lower the time spent on parsing the databasefile as it continued to parse while the goroutines would act on each record. However, the code seems to run for more than 20 minutes indicating that something is wrong or the overhead went crazy. Any suggestions?
package main
import (
"bufio"
"crypto/sha1"
"fmt"
"io"
"log"
"os"
"strings"
"sync"
"time"
)
type producer struct {
parser uniprot
}
type unit struct {
tag string
}
type uniprot struct {
filenames []string
recordUnits chan unit
recordStrings map[string]string
}
func main() {
p := producer{parser: uniprot{}}
p.parser.recordUnits = make(chan unit, 1000000)
p.parser.recordStrings = make(map[string]string)
p.parser.collectRecords(os.Args[1])
}
func (u *uniprot) collectRecords(name string) {
fmt.Println("file to open ", name)
t0 := time.Now()
wg := new(sync.WaitGroup)
record := []string{}
file, err := os.Open(name)
errorCheck(err)
scanner := bufio.NewScanner(file)
for scanner.Scan() { //Scan the file
retText := scanner.Text()
if strings.HasPrefix(retText, "//") {
wg.Add(1)
go u.handleRecord(record, wg)
record = []string{}
} else {
record = append(record, retText)
}
}
file.Close()
wg.Wait()
t1 := time.Now()
fmt.Println(t1.Sub(t0))
}
func (u *uniprot) handleRecord(record []string, wg *sync.WaitGroup) {
defer wg.Done()
recString := strings.Join(record, "\n")
t := hashfunc(recString)
u.recordUnits <- unit{tag: t}
u.recordStrings[t] = recString
}
func hashfunc(record string) (hashtag string) {
hash := sha1.New()
io.WriteString(hash, record)
hashtag = string(hash.Sum(nil))
return
}
func errorCheck(err error) {
if err != nil {
log.Fatal(err)
}
}

First of all: your code is not thread-safe. Mainly because you're accessing a hashmap
concurrently. These are not safe for concurrency in go and need to be locked. Faulty line in your code:
u.recordStrings[t] = recString
As this will blow up when you're running go with GOMAXPROCS > 1, I'm assuming that you're not doing that. Make sure you're running your application with GOMAXPROCS=2 or higher to achieve parallelism.
The default value is 1, therefore your code runs on one single OS thread which, of course, can't be scheduled on two CPU or CPU cores simultaneously. Example:
$ GOMAXPROCS=2 go run udb.go uniprot_sprot_viruses.dat
At last: pull the values from the channel or otherwise your program will not terminate.
You're creating a deadlock if the number of goroutines exceeds your limit. I tested with a
76MiB file of data, you said your file was about 2.5GB. I have 16347 entries. Assuming linear growth,
your file will exceed 1e6 and therefore there are not enough slots in the channel and your program
will deadlock, giving no result while accumulating goroutines which don't run to fail at the end
(miserably).
So the solution should be to add a go routine which pulls the values from the channel and does
something with them.
As a side note: If you're worried about performance, do not use strings as they're always copied. Use []byte instead.

Why is this program not performing better with goroutines?

I am learning Go programming language. Please consider the following program,
package main
import (
"fmt"
"bytes"
"os"
"os/exec"
"path/filepath"
"sync"
)
func grep(file string) {
defer wg.Done()
cmd := exec.Command("grep", "-H", "--color=always", "add", file)
var out bytes.Buffer
cmd.Stdout = &out
cmd.Run()
fmt.Printf("%s\n", out.String())
}
func walkFn(path string, info os.FileInfo, err error) error {
if !info.IsDir() {
wg.Add(1)
go grep (path)
}
return nil
}
var wg sync.WaitGroup
func main() {
filepath.Walk("/tmp/", walkFn)
wg.Wait()
}
This program walks all the files in the /tmp directory, and does a grep on each file in a goroutine. So this will spawn n goroutines where n is the number of files present in the /tmp directory. Main waits till all goroutines finishes the work.
Interestingly, this program take same time to execute with and without goroutines. Try running go grep (path, c) and grep (path, c) (you need to comment channel stuff when doing this).
I was expecting goroutine version to run faster as multiple grep runs concurrently. But it executes almost in equal time. I am wondering why this happens?

Try using more cores. Also, use a better root directory for comparative purposes, like the Go directory. An SSD makes a big difference too. For example,
func main() {
runtime.GOMAXPROCS(runtime.NumCPU())
goroot := "/home/peter/go/"
filepath.Walk(goroot, walkFn)
wg.Wait()
fmt.Println("GOMAXPROCS:", runtime.GOMAXPROCS(0))
}
GOMAXPROCS: 1
real 0m10.137s
user 0m2.628s
sys 0m6.472s
GOMAXPROCS: 4
real 0m3.284s
user 0m2.492s
sys 0m5.116s

Your program's performance is bound to the speed of the disk (or ram, if /tmp is a ram disk): the computation is I/O bound. No matter how many goroutines run in parallel it can't read faster than that.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio