data-race, two goroutines plus same val [closed] - go

Closed. This question is not reproducible or was caused by typos. It is not currently accepting answers.
This question was caused by a typo or a problem that can no longer be reproduced. While similar questions may be on-topic here, this one was resolved in a way less likely to help future readers.
Closed 1 year ago.
Improve this question
consider below code, in my opinon, val will between 100 and 200, but it's always 200
var val = 0
func main() {
num := runtime.NumCPU()
fmt.Println("使用cpu数量", num)
go add("A")
go add("B")
time.Sleep(1 * time.Second)
fmt.Println("val的最终结果", val)
}
func add(proc string) {
for i := 0; i < 100; i++ {
val++
fmt.Printf("execute process[%s] and val is %d\n", proc, val)
time.Sleep(5 * time.Millisecond)
}
}
why val always is 200 at last?

There are 2 problems with your code:
You have a data race - concurrently writing and reading val without synchronization. The presence of it makes reasoning about program outcome meaningless.
The sleep in main() of 1 second is too short - the goroutines may not be done yet after 1 second. You expect fmt.Printf to take no time at all, but console output does take significant amount of time (on some OS longer than others). So the loop won't take 100 * 5 = 500 milliseconds, but much, much longer.
Here's a fixed version that atomically increments val and properly waits for both goroutines to finish instead of assuming that they will be done within 1 second.
var val = int32(0)
func main() {
num := runtime.NumCPU()
fmt.Println("使用cpu数量", num)
var wg sync.WaitGroup
wg.Add(2)
go add("A", &wg)
go add("B", &wg)
wg.Wait()
fmt.Println("val的最终结果", atomic.LoadInt32(&val))
}
func add(proc string, wg *sync.WaitGroup) {
for i := 0; i < 100; i++ {
tmp := atomic.AddInt32(&val, 1)
fmt.Printf("execute process[%s] and val is %d\n", proc, tmp)
time.Sleep(5 * time.Millisecond)
}
wg.Done()
}

Incrementing the integer will take only on the order of a few nanoseconds, while each goroutine is waiting 5 milliseconds between each increment.
Meaning, each goroutine is only spending about a 1,000,000th of the time actually doing the operation, the rest of the time is sleeping. Therefore, the likelihood of an interference happening is quite low (since both goroutines would need to be doing the operation simultaneously).
Even if both goroutines are on equal timers, the practical precision of the time library is nowhere near the nanosecond scale required to produce collisions consistently. Plus, the goroutines are doing some printing, which will further diverge the timings.
As pointed out in the comments, your code does still have a data race (as is your intention, seemingly), meaning that we cannot say anything for certain about the output of your program, despite any observations. It's possible that it would output any number from 100-200, or different numbers entirely.

Related

How to use channels efficiently [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 2 years ago.
Improve this question
I read on Uber's style guide that one should use at most a channel length of 1.
Although it's clear to me that using a channel size of 100 or 1000 is very bad practice, I was however wondering why a channel size of 10 isn't considered a valid option. I'm missing some part to get to the right conclusion.
Below, you can follow my arguments (and counter arguments) backed by some benchmark test.
I understand that, if your both go-routines, responsible for writing or reading from this channel, would be interrupted in between sequential writings or readings to/from the channel by some other IO action, no gain is to be expected from a higher channel buffer and I agree that 1 is the best option.
But, lets say that there is no significant other go-routine switching needed apart from the implicit locking and unlocking caused by writing/reading to/from the channel. Then I would conclude the following:
Consider the amount of context switches when processing 100 values on a channel with either a channel buffer of size 1 and of 10 (GR = go-routine)
Buffer=1: (GR1 inserts 1 value, GR2 reads 1 value) X 100 ~ 200 go-routine switches
Buffer=10: (GR1 inserts 10 values, GR2 reads 10 values) X 10 ~ 20 go-routine switches
I did some benchmarking to prove that this actually goes faster:
package main
import (
"testing"
)
type a struct {
b [100]int64
}
func BenchmarkBuffer1(b *testing.B) {
count := 0
c := make(chan a, 1)
go func() {
for i := 0; i < b.N; i++ {
c <- a{}
}
close(c)
}()
for v := range c {
for i := range v.b {
count += i
}
}
}
func BenchmarkBuffer10(b *testing.B) {
count := 0
c := make(chan a, 10)
go func() {
for i := 0; i < b.N; i++ {
c <- a{}
}
close(c)
}()
for v := range c {
for i := range v.b {
count += i
}
}
}
Results when comparing simple reading & writing + non-blocking processing:
BenchmarkBuffer1-12 5072902 266 ns/op
BenchmarkBuffer10-12 6029602 179 ns/op
PASS
BenchmarkBuffer1-12 5228782 256 ns/op
BenchmarkBuffer10-12 5392410 216 ns/op
PASS
BenchmarkBuffer1-12 4806208 287 ns/op
BenchmarkBuffer10-12 4637842 233 ns/op
PASS
However, if I add a sleep every 10 reads, it doesn't yield any better results.
import (
"testing"
"time"
)
func BenchmarkBuffer1WithSleep(b *testing.B) {
count := 0
c := make(chan int, 1)
go func() {
for i := 0; i < b.N; i++ {
c <- i
}
close(c)
}()
for a := range c {
count++
if count%10 == 0 {
time.Sleep(time.Duration(a) * time.Nanosecond)
}
}
}
func BenchmarkBuffer10WithSleep(b *testing.B) {
count := 0
c := make(chan int, 10)
go func() {
for i := 0; i < b.N; i++ {
c <- i
}
close(c)
}()
for a := range c {
count++
if count%10 == 0 {
time.Sleep(time.Duration(a) * time.Nanosecond)
}
}
}
Results when adding a sleep every 10 reads:
BenchmarkBuffer1WithSleep-12 856886 53219 ns/op
BenchmarkBuffer10WithSleep-12 929113 56939 ns/op
FYI: I also did the test again with only one CPU and got the following results:
BenchmarkBuffer1 5831193 207 ns/op
BenchmarkBuffer10 6226983 180 ns/op
BenchmarkBuffer1WithSleep 556635 35510 ns/op
BenchmarkBuffer10WithSleep 984472 61434 ns/op
Absolutely nothing is wrong with a channel of cap 500 e.g. if this channel is used as a semaphore.
The style guide you read recommends to not use buffered channels of let's say cap 64 "because this looks like a nice number". But this recommendation is not because of performance! (Btw: You microbenchmarks are useless microbenchmarks, they do not measure anything relevant.)
An unbuffered channel is some kind of synchronisation primitive and us such very much useful.
A buffered channel, well, may buffer between sender and receiver and this buffering can be problematic for observing, tuning and debugging the code (because creation and consumption are further decoupled). Thats why the style guide recommends unbuffered channels (or at most a cap of 1 as this is sometimes needed for correctness!).
It also doesn't prohibit larger buffer caps:
Any other [than 0 or 1] size must be subject to a high level of scrutiny. Consider how the size is determined, what prevents the channel from filling up under load and blocking writers, and what happens when this occurs. [emph. mine]
You may use a cap of 27 if you can explain why 27 (and not 22 or 31) and how this will influence program behaviour (not only performance!) if the buffer is filled.
Most people overrate performance. Correctness, operational stability and maintainability come first. And this is what this style guide is about here.

Unexpected Addition Behaviour of two Uint64 [closed]

Closed. This question is not reproducible or was caused by typos. It is not currently accepting answers.
This question was caused by a typo or a problem that can no longer be reproduced. While similar questions may be on-topic here, this one was resolved in a way less likely to help future readers.
Closed 2 years ago.
Improve this question
Problem
In the code below, I have a couple of Go-routines and the one I'm facing issues with is the "calculateMemoryUsage()" Go-routine, where the avg memory usage is to be calculated every half a second. For some reason, the total keeps giving me weird values. Here's the Print Logs:
Memory Usage: 162224
Average: 162224
Total: 162224
Iteration Count: 1
Memory Usage: 181200
Average: 171712
Total: 343424
Iteration Count: 2
Memory Usage: 187864
Average: 119858
Total: 359576
Iteration Count: 3
As seen, from the third iteration, the average messes up because the total doesn't add right. After debugging, I see that the memory usage is read fine, but the total seems to give me issues. I suspected the GC, but during this problem the LastGC value is set to 0, meaning no GC is performed. Any suggestions would be highly appreciated! :)
Code
func main() {
db, err := sql.Open("mysql", "<credentials_removed>##tcp(127.0.0.1:3306)/rts")
err = db.Ping()
if err != nil {
panic(err.Error()) // proper error handling instead of panic in your app
}
showStocksChannel := make(chan bool)
showBestPerformingChannel := make(chan bool)
go calculateMemoryUsage()
go showStocks(showStocksChannel, db)
go changeStockPrices(showStocksChannel, showBestPerformingChannel, db)
go displayBestPerformingStocks(showBestPerformingChannel, db)
showStocksChannel <- true
select{}
}
func calculateMemoryUsage() {
var averageMemoryUsage uint64 = 0
var iterations uint64 = 0
var usage uint64 = 0
var total uint64 = 0
for iterations <= 200 {
var memoryStats runtime.MemStats
runtime.ReadMemStats(&memoryStats)
iterations = iterations + 1
usage = memoryStats.Alloc
total = (averageMemoryUsage + usage)
averageMemoryUsage = total / iterations
fmt.Printf("\nMemory Usage: %v\nAverage: %v\nTotal: %v\nIteration Count: %v\n\n", usage, averageMemoryUsage, total, iterations)
time.Sleep(time.Millisecond * 1000)
//fmt.Printf("\nLast GC: %v\nNext GC: %v\n\n", memoryStats.LastGC, memoryStats.NextGC)
}
fmt.Printf("\nAverage Memory Usage: %v bytes\n\n", averageMemoryUsage)
}
this line
total = (averageMemoryUsage + usage)
should be
total = (total + usage)

Why is my code running slower after trying Goroutines?

I decided to try figuring out Goroutines and channels. I made a function that takes a list and adds 10 to every element. I then made another function that attempts to incorporate channels and goroutines. When I timed the code it ran much slower. I tried doing some research but was unable to figure anything out.
Here is my code with channels:
package main
import ("fmt"
"time")
func addTen(channel chan int) {
channel <- 10 + <-channel
}
func listPlusTen(list []int) []int {
channel := make(chan int)
for i:= 0; i < len(list); i++ {
go addTen(channel)
channel <- list[i]
list[i] = <-channel
}
return list
}
func main(){
var size int
list := make([]int, 0)
fmt.Print("Enter the list size: ")
fmt.Scanf("%d", &size)
for i:=0; i <= size; i++ {
list = append(list, i)
}
start := time.Now()
list = listPlusTen(list)
end := time.Now()
fmt.Println(end.Sub(start))
}
You are adding a lot of synchronization overhead to the baseline algorithm. You have len(list) goroutines, all waiting to read from a common channel. When you write to the channel, the scheduler picks one of those goroutines, and that goroutines adds 10, and writes to the channel, which enables the main goroutine again. It is hard to speculate without really measuring it, but if you move the goroutine creation outside the for-loop, then you will have only one goroutine, reducing the scheduler overhead. However, in any comparison to the baseline algorithm this will be slower, because each operation involves two synchronizations and two context switches, which take more that the algorithm itself.
I had a similar experience with go routines where I was making 20 http requests using the same function.
calling the function in a simple loop was much faster than using wait groups

Channels and Parallelism confusion

I'm learning myself Golang, and I'm a bit confused about parallelism and how it is implemented in Golang.
Given the following example:
package main
import (
"fmt"
"sync"
"math/rand"
"time"
)
const (
workers = 1
rand_count = 5000000
)
func start_rand(ch chan int) {
defer close(ch)
var wg sync.WaitGroup
wg.Add(workers)
rand_routine := func(counter int) {
defer wg.Done()
for i:=0;i<counter;i++ {
seed := time.Now().UnixNano()
rand.Seed(seed)
ch<-rand.Intn(5000)
}
}
for i:=0; i<workers; i++ {
go rand_routine(rand_count/workers)
}
wg.Wait()
}
func main() {
start_time := time.Now()
mychan := make(chan int, workers)
go start_rand(mychan)
var wg sync.WaitGroup
wg.Add(workers)
work_handler := func() {
defer wg.Done()
for {
v, isOpen := <-mychan
if !isOpen { break }
fmt.Println(v)
}
}
for i:=0;i<workers;i++ {
go work_handler()
}
wg.Wait()
elapsed_time := time.Since(start_time)
fmt.Println("Done",elapsed_time)
}
This piece of code takes about one minute to run on my Macbook. I assumed that increasing the "workers" constants, would launch additional go routines, and since my laptop has multiple cores, would shorten the execution time.
This is not the case however. Increasing the workers does not reduce the execution time.
I was thinking that setting workers to 1, would create 1 goroutine to generate the random numbers, and setting it to 4, would create 4 goroutines. Given the multicore nature of my laptop, I was expecting that 4 workers would run on different cores, and therefore, increae the performance.
However, I see increased load on all my cores, even when workers is set to 1. What am I missing here?
Your code has some issues which makes it inherently slow:
You are seeding inside the loop. This needs only to be done once
You are using the same source for random numbers. This source is thread safe, but takes away any performance gains for concurrent workers. You could create a source for each worker with rand.New
You are printing a lot. Printing is thread safe, too. So that takes away any speed gains for concurrent workers.
As Zak already pointed out: The concurrent work inside the go routines is very cheap and the communication is expensive.
You could rewrite your program like that. Then you will see some speed gains when you change the number of workers:
package main
import (
"fmt"
"math/rand"
"time"
)
const (
workers = 1
randCount = 5000000
)
var results = [randCount]int{}
func randRoutine(start, counter int, c chan bool) {
r := rand.New(rand.NewSource(time.Now().UnixNano()))
for i := 0; i < counter; i++ {
results[start+i] = r.Intn(5000)
}
c <- true
}
func main() {
startTime := time.Now()
c := make(chan bool)
start := 0
for w := 0; w < workers; w++ {
go randRoutine(start, randCount/workers, c)
start += randCount / workers
}
for i := 0; i < workers; i++ {
<-c
}
elapsedTime := time.Since(startTime)
for _, i := range results {
fmt.Println(i)
}
fmt.Println("Time calulating", elapsedTime)
elapsedTime = time.Since(startTime)
fmt.Println("Toal time", elapsedTime)
}
This program does a lot of work in a go routine and communicates minimal. Also a different random source is used for each go routine.
Your code does not have just a single routine, even though you set the workers to 1.
There is 1 goroutine from the call go start_rand(...)
That goroutine creates N (worker) routines with go rand_routine(...) and waits for them to finish.
Then you also start N (worker) go routines with go work_handler()
Then you also have 1 goroutine that was started by main() func call.
so: 1 + 2N + 1 routines running for any given N where N == workers.
Plus, on top of that, the work that you are doing in the goroutines is pretty cheap (fast to execute). You are just generating random numbers.
If you look at the blocking and scheduler latency profiles of the program:
You can see from both of the images above that most of the time is spent in the concurrency constructs. This suggests there is a lot of contention in your program. While goroutines are cheap, there is still some blocking and synchronisation that needs to be done when sending a value over a channel. This can take a large proportion of the time of the program when the work being done by the producer is very fast / cheap.
To answer your original question, you see load on many cores because you have more than a single goroutine running.

Reasonable use of goroutines in Go programs

My program has a long running task. I have a list jdIdList that is too big - up to 1000000 items, so the code below doesn't work. Is there a way to improve the code with better use of goroutines?
It seems I have too many goroutines running which makes my code fail to run.
What is a reasonable number of goroutines to have running?
var wg sync.WaitGroup
wg.Add(len(jdIdList))
c := make(chan string)
// just think jdIdList as [0...1000000]
for _, jdId := range jdIdList {
go func(jdId string) {
defer wg.Done()
for _, itemId := range itemIdList {
// following code is doing some computation which consumes much time(you can just replace them with time.Sleep(time.Second * 1)
cvVec, ok := cvVecMap[itemId]
if !ok {
continue
}
jdVec, ok := jdVecMap[jdId]
if !ok {
continue
}
// long time compute
_ = 0.3*computeDist(jdVec.JdPosVec, cvVec.CvPosVec) + 0.7*computeDist(jdVec.JdDescVec, cvVec.CvDescVec)
}
c <- fmt.Sprintf("done %s", jdId)
}(jdId)
}
go func() {
for resp := range c {
fmt.Println(resp)
}
}()
It looks like you're running too many things concurrently, making your computer run out of memory.
Here's a version of your code that uses a limited number of worker goroutines instead of a million goroutines as in your example. Since only a few goroutines run at once, they have much more memory available each before the system starts to swap. Make sure the memory each small computation requires times the number of concurrent goroutines is less than the memory you have in your system, so if the code inside for jdId := range work loop requires less than 1GB memory, and you have 4 cores and at least 4 GB of RAM, setting clvl to 4 should work fine.
I also removed the waitgroups. The code is still correct, but only uses channels for synchronization. A for range loop over a channel reads from that channel until it is closed. This is how we tell the worker threads when we are done.
https://play.golang.org/p/Sy3i77TJjA
runtime.GOMAXPROCS(runtime.NumCPU()) // not needed on go 1.5 or later
c := make(chan string)
work := make(chan int, 1) // increasing 1 to a higher number will probably increase performance
clvl := 4 // runtime.NumCPU() // simulating having 4 cores, use NumCPU otherwise
var wg sync.WaitGroup
wg.Add(clvl)
for i := 0; i < clvl; i++ {
go func(i int) {
for jdId := range work {
time.Sleep(time.Millisecond * 100)
c <- fmt.Sprintf("done %d", jdId)
}
wg.Done()
}(i)
}
// give workers something to do
go func() {
for i := 0; i < 10; i++ {
work <- i
}
close(work)
}()
// close output channel when all workers are done
go func() {
wg.Wait()
close(c)
}()
count := 0
for resp := range c {
fmt.Println(resp, count)
count += 1
}
which generated this output on go playground, while simulating four cpu cores.
done 1 0
done 0 1
done 3 2
done 2 3
done 5 4
done 4 5
done 7 6
done 6 7
done 9 8
done 8 9
Notice how the ordering is not guaranteed. The jdId variable holds the value you want. You should always test your concurrent programs using the go race detector.
Also note that if you are using go 1.4 or earlier and haven't set the GOMAXPROCS environment variable to the number of cores, you should do that, or add runtime.GOMAXPROCS(runtime.NumCPU()) to the beginning of your program.

Resources