Why don't goroutines write in parallel using WriteAt? - go

I'm experimenting a bit with reading and writing from a file.
To write to a file concurrently I created the following function:
func write(f *os.File, b []byte, off int64, c chan int) {
var _, err = f.WriteAt(b, off)
check(err)
c <- 0
}
I then create a file and 100000 goroutines to perform the write operations.
They each write an array of 16384 bytes to the hard disk:
func main() {
path := "E:/test"
f, err := os.OpenFile(path, os.O_RDWR|os.O_CREATE, 0666)
check(err)
size := int64(16384)
ones := make([]byte, size)
n := int64(100000)
c := make(chan int, n)
for i := int64(0); i < size; i++ {
ones[i] = 1
}
// Start timing
start := time.Now()
for i := int64(0); i < n; i++ {
go write(f, ones, size*i, c)
}
for i := int64(0); i < n; i++ {
<-c
}
// Check elapsed time
fmt.Println(time.Now().Sub(start))
err = f.Sync()
check(err)
err = f.Close()
check(err)
}
In this case about 1.6 GB is written where each goroutine writes to a non-overlapping byte range. The documentation for the io package states that Clients of WriteAt can execute parallel WriteAt calls on the same destination if the ranges do not overlap.
So what I expect to see, is that when I use go write(f, ones, 0, c), it would take much longer since all write operations would be on the same byterange.
However after testing this my results are quite unexpected:
Using go write(f, ones, size*i, c) took an average of about 3s
But using go write(f, ones, 0, c) only took an average of about 480ms
Do I use the WriteAt function in the wrong way? How could i achieve concurrent writing to non-overlapping byteranges?

Related

How to free memory manually in golang

Below is a code to calculte C(36,8) and save result to file
func combine_dfs(n int, k int) (ans [][]int) {
temp := []int{}
var dfs func(int)
dfs = func(cur int) {
if len(temp)+(n-cur+1) < k {
return
}
if len(temp) == k {
comb := make([]int, k)
copy(comb, temp)
ans = append(ans, comb)
return
}
temp = append(temp, cur)
dfs(cur + 1)
temp = temp[:len(temp)-1]
dfs(cur + 1)
}
dfs(1)
return
}
func DoCombin() {
fmt.Printf("%v\n", "calculator...")
cst := []byte{}
for i := 'a'; i <= 'z'; i++ {
cst = append(cst, byte(i))
}
for i := '0'; i <= '9'; i++ {
cst = append(cst, byte(i))
}
n := 36
k := 8
arr := combine_dfs(n, k)
fmt.Printf("%v\n", "writefile...")
file, _ := os.OpenFile("result.txt", os.O_CREATE|os.O_TRUNC|os.O_RDWR|os.O_APPEND, 0666)
defer file.Close()
for _, m := range arr {
b:= bytes.Buffer{}
b.Reset()
for _, i := range m {
b.WriteByte(cst[i-1])
}
b.WriteByte('\n')
file.Write(b.Bytes())
}
}
but i write file so slow..
so i want use goroutine to write file (use pool to limit the number of goroutine):
func DoCombin2() {
fmt.Printf("%v\n", "calculator...")
cst := []byte{}
for i := 'a'; i <= 'z'; i++ {
cst = append(cst, byte(i))
}
for i := '0'; i <= '9'; i++ {
cst = append(cst, byte(i))
}
n := 36
k := 8
arr := combine_dfs(n, k)
fmt.Printf("%v\n", "writefile...")
file, _ := os.OpenFile("result.txt", os.O_CREATE|os.O_TRUNC|os.O_RDWR|os.O_APPEND, 0666)
defer file.Close()
pool := make(chan int, 100)
for _, m := range arr {
go func(m []int) {
pool <- 1
b := bytes.Buffer{}
b.Reset()
for _, i := range m {
b.WriteByte(cst[i-1])
}
b.WriteByte('\n')
file.Write(b.Bytes())
<-pool
}(m)
}
}
but the memory exploded
I try using sync.Pool to avoid it, but it fail:
var bufPool = sync.Pool{
New: func() interface{} {
return new(bytes.Buffer)
},
}
func DoCombin() {
fmt.Printf("%v\n", "calculator...")
cst := []byte{}
for i := 'a'; i <= 'z'; i++ {
cst = append(cst, byte(i))
}
for i := '0'; i <= '9'; i++ {
cst = append(cst, byte(i))
}
n := 36
k := 8
arr := combine_dfs(n, k)
fmt.Printf("%v\n", "writefile...")
file, _ := os.OpenFile("result.txt", os.O_CREATE|os.O_TRUNC|os.O_RDWR|os.O_APPEND, 0666)
defer file.Close()
pool := make(chan int, 100)
for _, m := range arr {
go func(m []int) {
pool <- 1
b, _ := bufPool.Get().(*bytes.Buffer)
b.Reset()
for _, i := range m {
b.WriteByte(cst[i-1])
}
b.WriteByte('\n')
bufPool.Put(b)
file.Write(b.Bytes())
<-pool
}(m)
}
}
Is there any way to avoid memory explosion?
1.Why can't I avoid it after using sync.Pool?
2.Is there any way to limit memory usage in windows(in linux i know) ?
3.Is there other way to avoid memory explosion?
4.Is the memory explosion because of bytes.Buffer? How to free bytes.Buffer manually?
Update 02/20/2023
This proposal arenas is on hold indefinitely due to serious API concerns. The GOEXPERIMENT=arena code may be changed incompatibly or removed at any time, and we do not recommend its use in production.
Per this Proposal: arena: new package providing memory arenas
We propose the addition of a new arena package to the Go standard library. The arena package will allow the allocation of any number of arenas. Objects of arbitrary type can be allocated from the memory of the arena, and an arena automatically grows in size as needed. When all objects in an arena are no longer in use, the arena can be explicitly freed to reclaim its memory efficiently without general garbage collection. We require that the implementation provide safety checks, such that, if an arena free operation is unsafe, the program will be terminated before any incorrect behavior happens.
This feature has been merged to the master branch under arena, and maybe could be released in go 1.20. With the arena package, you could allocate memory by yourself and manually free it if it is no longer in use.
Sample codes
a := arena.NewArena()
defer a.Free()
tt := arena.New[T1](a)
tt.n = 1
ts := arena.MakeSlice[T1](a, 99, 100)
if len(ts) != 99 {
t.Errorf("Slice() len = %d, want 99", len(ts))
}
if cap(ts) != 100 {
t.Errorf("Slice() cap = %d, want 100", cap(ts))
}
ts[1].n = 42
in go 1.19
The garbage collector has added support for a soft memory limit,
The garbage collector has added support for a soft memory limit, discussed in detail in the new garbage collection guide. The limit can be particularly helpful for optimizing Go programs to run as efficiently as possible in containers with dedicated amounts of memory.
the new garbage collection guide

Why is the O(n^2) solution faster? Day 1 Advent of Code 2020

I have two solutions for the first Problem from Advent of Code. The first solution (p1) has the time complexity of O(n). The second (p2) of O(n^2). But why is the second faster?
https://adventofcode.com/2020/day/1
BenchmarkP1 ​ 12684 92239 ns/op
BenchmarkP2 3161 90705 ns/op
//O(n)
func p1(value int) (int, int){
m := make(map[int]int)
f, err := os.Open("nums.txt")
printError(err)
defer f.Close()
scanner := bufio.NewScanner(f)
for scanner.Scan() {
intVar, err := strconv.Atoi(scanner.Text())
printError(err)
m[intVar]=intVar
}
for _, key := range m {
l, ok := m[value-key]
if ok {
return l, key
}
}
return 0, 0
}
//O(n^2)
func p2(value int) (int, int){
var data []int
f, err := os.Open("nums.txt")
printError(err)
defer f.Close()
scanner := bufio.NewScanner(f)
for scanner.Scan() {
intVar, err := strconv.Atoi(scanner.Text())
printError(err)
data= append(data, intVar)
}
for ki, i := range data {
for kj, j := range data {
if ki != kj && i+j == value {
return i , j
}
}
}
return 0, 0
}
Just try more test data:
func main() {
// generate txt
generateTxt(10000)
// test
time := countTime(p1)
fmt.Println("time cost of O(n^2): ", time)
time = countTime(p2)
fmt.Println("time cost of O(n): ", time)
}
func countTime(f func(int) (int, int)) int64 {
tick := time.Now().UnixNano()
fmt.Println(tick)
f(2020)
tock := time.Now().UnixNano()
return tock - tick
}
result(ns):
data
O(n)
O(n^2)
500
510700
529900
5000
787900
4589600
explanation
Big-O means how the time cost increase while data scale increase, so small size data cannot reflect this well.
And also notice: p1 need to make a map with some time cost. if you try p2 after p1, the io of p2 might be benifited from cache too.

for range vs static channel length golang

I have a channel taking events parsed from a log file and another one which is used for synchronization. There were 8 events for the purpose of my test.
When using the for range syntax, I get 4 events. When using the known number (8), I can get all of them.
func TestParserManyOpinit(t *testing.T) {
ch := make(chan event.Event, 1000)
done := make(chan bool)
go parser.Parse("./test_data/many_opinit", ch, done)
count := 0
exp := 8
evtList := []event.Event{}
<-done
close(ch)
//This gets all the events
for i := 0; i < 8; i++ {
evtList = append(evtList, <-ch)
count++
}
//This only gives me four
//for range ch {
// evtList = append(evtList, <-ch)
// count++
//}
if count != exp || count != len(evtList) {
t.Errorf("Not proper lenght, got %d, exp %d, evtList %d", count, exp, len(evtList))
}
func Parse(filePath string, evtChan chan event.Event, done chan bool) {
log.Info(fmt.Sprintf("(thread) Parsing file %s", filePath))
file, err := os.Open(filePath)
defer file.Close()
if err != nil {
log.Error("Cannot read file " + filePath)
}
count := 0
scan := bufio.NewScanner(file)
scan.Split(splitFunc)
scan.Scan() //Skip log file header
for scan.Scan() {
text := scan.Text()
text = strings.Trim(text, "\n")
splitEvt := strings.Split(text, "\n")
// Some parsing ...
count++
evtChan <- evt
}
fmt.Println("Done ", count) // gives 8
done <- true
}
I must be missing something related to for loops on a channel.
I've tried adding a time.Sleep just before the done <- true part. It didn't change the result.
When you use for range, each loop iteration reads from the channel, and you're not using the read value. Hence, half the values are discarded. It should be:
for ev := range ch {
evtList = append(evtList, ev)
count++
}
In order to actually utilize the values read in the loop iterator.
Ranging over channels is demonstrated in the Tour of Go and detailed in the Go spec.

how to limit goroutine

I'm developing a gmail client based on google api.
I have a list of labels obtained through this call
r, err := s.gClient.Service.Users.Labels.List(s.gClient.User).Do()
Then, for every label I need to get details
for _, l := range r.Labels {
d, err := s.gClient.Service.Users.Labels.Get(s.gClient.User, l.Id).Do()
}
I'd like to handle the loop in a more powerful way so I have implemented a goroutine in the loop:
ch := make(chan label.Label)
for _, l := range r.Labels {
go func(gmailLabels *gmailclient.Label, gClient *gmail.Client, ch chan<- label.Label) {
d, err := s.gClient.Service.Users.Labels.Get(s.gClient.User, l.Id).Do()
if err != nil {
panic(err)
}
// Performs some operation with the label `d`
preparedLabel := ....
ch <- preparedLabel
}(l, s.gClient, ch)
}
for i := 0; i < len(r.Labels); i++ {
lab := <-ch
fmt.Printf("Processed %v\n", lab.LabelID)
}
The problem with this code is that gmail api has a rate limit, so, I get this error:
panic: googleapi: Error 429: Too many concurrent requests for user, rateLimitExceeded
What is the correct way to handle this situation?
How about only starting e.g. 10 goroutines and pass the values in from one for loop in another go routine. The channels have a small buffer to decrease synchronisation time.
chIn := make(chan label.Label, 20)
chOut := make(chan label.Label, 20)
for i:=0;i<10;i++ {
go func(gClient *gmail.Client, chIn chan label.Label, chOut chan<- label.Label) {
for gmailLabels := range chIn {
d, err := s.gClient.Service.Users.Labels.Get(s.gClient.User, l.Id).Do()
if err != nil {
panic(err)
}
// Performs some operation with the label `d`
preparedLabel := ....
chOut <- preparedLabel
}
}(s.gClient, chIn, chOut)
}
go func(chIn chan label.Label) {
defer close(chIn)
for _, l := range r.Labels {
chIn <- l
}
}(chIn)
for i := 0; i < len(r.Labels); i++ {
lab := <-chOut
fmt.Printf("Processed %v\n", lab.LabelID)
}
EDIT:
Here a playground sample.

"fan in" - one "fan out" behavior

Say, we have three methods to implement "fan in" behavior
func MakeChannel(tries int) chan int {
ch := make(chan int)
go func() {
for i := 0; i < tries; i++ {
ch <- i
}
close(ch)
}()
return ch
}
func MergeByReflection(channels ...chan int) chan int {
length := len(channels)
out := make(chan int)
cases := make([]reflect.SelectCase, length)
for i, ch := range channels {
cases[i] = reflect.SelectCase{Dir: reflect.SelectRecv, Chan: reflect.ValueOf(ch)}
}
go func() {
for length > 0 {
i, line, opened := reflect.Select(cases)
if !opened {
cases[i].Chan = reflect.ValueOf(nil)
length -= 1
} else {
out <- int(line.Int())
}
}
close(out)
}()
return out
}
func MergeByCode(channels ...chan int) chan int {
length := len(channels)
out := make(chan int)
go func() {
var i int
var ok bool
for length > 0 {
select {
case i, ok = <-channels[0]:
out <- i
if !ok {
channels[0] = nil
length -= 1
}
case i, ok = <-channels[1]:
out <- i
if !ok {
channels[1] = nil
length -= 1
}
case i, ok = <-channels[2]:
out <- i
if !ok {
channels[2] = nil
length -= 1
}
case i, ok = <-channels[3]:
out <- i
if !ok {
channels[3] = nil
length -= 1
}
case i, ok = <-channels[4]:
out <- i
if !ok {
channels[4] = nil
length -= 1
}
}
}
close(out)
}()
return out
}
func MergeByGoRoutines(channels ...chan int) chan int {
var group sync.WaitGroup
out := make(chan int)
for _, ch := range channels {
go func(ch chan int) {
for i := range ch {
out <- i
}
group.Done()
}(ch)
}
group.Add(len(channels))
go func() {
group.Wait()
close(out)
}()
return out
}
type MergeFn func(...chan int) chan int
func main() {
length := 5
tries := 1000000
channels := make([]chan int, length)
fns := []MergeFn{MergeByReflection, MergeByCode, MergeByGoRoutines}
for _, fn := range fns {
sum := 0
t := time.Now()
for i := 0; i < length; i++ {
channels[i] = MakeChannel(tries)
}
for i := range fn(channels...) {
sum += i
}
fmt.Println(time.Since(t))
fmt.Println(sum)
}
}
Results are (at 1 CPU, I have used runtime.GOMAXPROCS(1)):
19.869s (MergeByReflection)
2499997500000
8.483s (MergeByCode)
2499997500000
4.977s (MergeByGoRoutines)
2499997500000
Results are (at 2 CPU, I have used runtime.GOMAXPROCS(2)):
44.94s
2499997500000
10.853s
2499997500000
3.728s
2499997500000
I understand the reason why MergeByReflection is slowest, but what is about the difference between MergeByCode and MergeByGoRoutines?
And when we increase the CPU number why "select" clause (used MergeByReflection directly and in MergeByCode indirectly) becomes slower?
Here is a preliminary remark. The channels in your examples are all unbuffered, meaning they will likely block at put or get time.
In this example, there is almost no processing except channel management. The performance is therefore dominated by synchronization primitives. Actually, there is very little of this code that can be parallelized.
In the MergeByReflection and MergeByCode functions, select is used to listen to multiple input channels, but nothing is done to take in account the output channel (which may therefore block, while some event could be available on one of the input channels).
In the MergeByGoRoutines function, this situation cannot happen: when the output channel blocks, it does not prevent an other input channel to be read by another goroutine. There are therefore better opportunities for the runtime to parallelize the goroutines, and less contention on the input channels.
The MergeByReflection code is the slowest because it has the overhead of reflection, and almost nothing can be parallelized.
The MergeByGoRoutines function is the fastest because it reduces the contention (less synchronization is needed), and because output contention has a lesser impact on the input performance. It can therefore benefit of a small improvement when running with multiple cores (contrary to the two other methods).
There is so much synchronization activity with MergeByReflection and MergeByCode, that running on multiple cores negatively impacts the performance. You could have different performance by using buffered channels though.

Resources