Unexpected Addition Behaviour of two Uint64 [closed] - go

Closed. This question is not reproducible or was caused by typos. It is not currently accepting answers.
This question was caused by a typo or a problem that can no longer be reproduced. While similar questions may be on-topic here, this one was resolved in a way less likely to help future readers.
Closed 2 years ago.
Improve this question
Problem
In the code below, I have a couple of Go-routines and the one I'm facing issues with is the "calculateMemoryUsage()" Go-routine, where the avg memory usage is to be calculated every half a second. For some reason, the total keeps giving me weird values. Here's the Print Logs:
Memory Usage: 162224
Average: 162224
Total: 162224
Iteration Count: 1
Memory Usage: 181200
Average: 171712
Total: 343424
Iteration Count: 2
Memory Usage: 187864
Average: 119858
Total: 359576
Iteration Count: 3
As seen, from the third iteration, the average messes up because the total doesn't add right. After debugging, I see that the memory usage is read fine, but the total seems to give me issues. I suspected the GC, but during this problem the LastGC value is set to 0, meaning no GC is performed. Any suggestions would be highly appreciated! :)
Code
func main() {
db, err := sql.Open("mysql", "<credentials_removed>##tcp(127.0.0.1:3306)/rts")
err = db.Ping()
if err != nil {
panic(err.Error()) // proper error handling instead of panic in your app
}
showStocksChannel := make(chan bool)
showBestPerformingChannel := make(chan bool)
go calculateMemoryUsage()
go showStocks(showStocksChannel, db)
go changeStockPrices(showStocksChannel, showBestPerformingChannel, db)
go displayBestPerformingStocks(showBestPerformingChannel, db)
showStocksChannel <- true
select{}
}
func calculateMemoryUsage() {
var averageMemoryUsage uint64 = 0
var iterations uint64 = 0
var usage uint64 = 0
var total uint64 = 0
for iterations <= 200 {
var memoryStats runtime.MemStats
runtime.ReadMemStats(&memoryStats)
iterations = iterations + 1
usage = memoryStats.Alloc
total = (averageMemoryUsage + usage)
averageMemoryUsage = total / iterations
fmt.Printf("\nMemory Usage: %v\nAverage: %v\nTotal: %v\nIteration Count: %v\n\n", usage, averageMemoryUsage, total, iterations)
time.Sleep(time.Millisecond * 1000)
//fmt.Printf("\nLast GC: %v\nNext GC: %v\n\n", memoryStats.LastGC, memoryStats.NextGC)
}
fmt.Printf("\nAverage Memory Usage: %v bytes\n\n", averageMemoryUsage)
}

this line
total = (averageMemoryUsage + usage)
should be
total = (total + usage)

Related

data-race, two goroutines plus same val [closed]

Closed. This question is not reproducible or was caused by typos. It is not currently accepting answers.
This question was caused by a typo or a problem that can no longer be reproduced. While similar questions may be on-topic here, this one was resolved in a way less likely to help future readers.
Closed 1 year ago.
Improve this question
consider below code, in my opinon, val will between 100 and 200, but it's always 200
var val = 0
func main() {
num := runtime.NumCPU()
fmt.Println("使用cpu数量", num)
go add("A")
go add("B")
time.Sleep(1 * time.Second)
fmt.Println("val的最终结果", val)
}
func add(proc string) {
for i := 0; i < 100; i++ {
val++
fmt.Printf("execute process[%s] and val is %d\n", proc, val)
time.Sleep(5 * time.Millisecond)
}
}
why val always is 200 at last?
There are 2 problems with your code:
You have a data race - concurrently writing and reading val without synchronization. The presence of it makes reasoning about program outcome meaningless.
The sleep in main() of 1 second is too short - the goroutines may not be done yet after 1 second. You expect fmt.Printf to take no time at all, but console output does take significant amount of time (on some OS longer than others). So the loop won't take 100 * 5 = 500 milliseconds, but much, much longer.
Here's a fixed version that atomically increments val and properly waits for both goroutines to finish instead of assuming that they will be done within 1 second.
var val = int32(0)
func main() {
num := runtime.NumCPU()
fmt.Println("使用cpu数量", num)
var wg sync.WaitGroup
wg.Add(2)
go add("A", &wg)
go add("B", &wg)
wg.Wait()
fmt.Println("val的最终结果", atomic.LoadInt32(&val))
}
func add(proc string, wg *sync.WaitGroup) {
for i := 0; i < 100; i++ {
tmp := atomic.AddInt32(&val, 1)
fmt.Printf("execute process[%s] and val is %d\n", proc, tmp)
time.Sleep(5 * time.Millisecond)
}
wg.Done()
}
Incrementing the integer will take only on the order of a few nanoseconds, while each goroutine is waiting 5 milliseconds between each increment.
Meaning, each goroutine is only spending about a 1,000,000th of the time actually doing the operation, the rest of the time is sleeping. Therefore, the likelihood of an interference happening is quite low (since both goroutines would need to be doing the operation simultaneously).
Even if both goroutines are on equal timers, the practical precision of the time library is nowhere near the nanosecond scale required to produce collisions consistently. Plus, the goroutines are doing some printing, which will further diverge the timings.
As pointed out in the comments, your code does still have a data race (as is your intention, seemingly), meaning that we cannot say anything for certain about the output of your program, despite any observations. It's possible that it would output any number from 100-200, or different numbers entirely.

why the length of this go slice is 4 and why the output has the space in slice? [closed]

Closed. This question is not reproducible or was caused by typos. It is not currently accepting answers.
This question was caused by a typo or a problem that can no longer be reproduced. While similar questions may be on-topic here, this one was resolved in a way less likely to help future readers.
Closed 1 year ago.
Improve this question
I am new to golang and while running this code snippet I am getting the len as 4, trying to understand why so ?
package main
import "fmt"
type phone struct {
model string
camera Camera
ram int
}
type Camera struct {
lens string
aparature int
}
func main() {
var m = make(map[string]phone)
myphn1 := phone{model: "iphone", camera: Camera{"20", 4}, ram: 6}
myphn2 := phone{model: "pixel", camera: Camera{"50", 2}, ram: 6}
m["myphn1"] = myphn1
m["myphn2"] = myphn2
var k = make([]string, len(m))
for key, _ := range m {
k = append(k, key)
}
fmt.Println(k)
fmt.Println(len(k))
}
I understand this adds size of 2 while creating, but while printing it gives somelike this , is the space in answer for 2 unallocated entries ?
[ myphn2 myphn1]
4
This creates a slice of length 2 (len(m) is 2 here):
var k = make([]string, len(m))
This adds two elements to it, for a total of 4:
for key, _ := range m {
k = append(k, key)
}
If you want to preallocate a slice, you need to provide a length of zero along with the desired capacity:
var k = make([]string, 0, len(m))
This is covered with examples in the Tour of Go.
You create a slice with length 2, and appended two more elements to it, so the length is 4.
what you probably want to do is to create a slice with capacity 2:
var k = make([]string,0,len(m))

How to use channels efficiently [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 2 years ago.
Improve this question
I read on Uber's style guide that one should use at most a channel length of 1.
Although it's clear to me that using a channel size of 100 or 1000 is very bad practice, I was however wondering why a channel size of 10 isn't considered a valid option. I'm missing some part to get to the right conclusion.
Below, you can follow my arguments (and counter arguments) backed by some benchmark test.
I understand that, if your both go-routines, responsible for writing or reading from this channel, would be interrupted in between sequential writings or readings to/from the channel by some other IO action, no gain is to be expected from a higher channel buffer and I agree that 1 is the best option.
But, lets say that there is no significant other go-routine switching needed apart from the implicit locking and unlocking caused by writing/reading to/from the channel. Then I would conclude the following:
Consider the amount of context switches when processing 100 values on a channel with either a channel buffer of size 1 and of 10 (GR = go-routine)
Buffer=1: (GR1 inserts 1 value, GR2 reads 1 value) X 100 ~ 200 go-routine switches
Buffer=10: (GR1 inserts 10 values, GR2 reads 10 values) X 10 ~ 20 go-routine switches
I did some benchmarking to prove that this actually goes faster:
package main
import (
"testing"
)
type a struct {
b [100]int64
}
func BenchmarkBuffer1(b *testing.B) {
count := 0
c := make(chan a, 1)
go func() {
for i := 0; i < b.N; i++ {
c <- a{}
}
close(c)
}()
for v := range c {
for i := range v.b {
count += i
}
}
}
func BenchmarkBuffer10(b *testing.B) {
count := 0
c := make(chan a, 10)
go func() {
for i := 0; i < b.N; i++ {
c <- a{}
}
close(c)
}()
for v := range c {
for i := range v.b {
count += i
}
}
}
Results when comparing simple reading & writing + non-blocking processing:
BenchmarkBuffer1-12 5072902 266 ns/op
BenchmarkBuffer10-12 6029602 179 ns/op
PASS
BenchmarkBuffer1-12 5228782 256 ns/op
BenchmarkBuffer10-12 5392410 216 ns/op
PASS
BenchmarkBuffer1-12 4806208 287 ns/op
BenchmarkBuffer10-12 4637842 233 ns/op
PASS
However, if I add a sleep every 10 reads, it doesn't yield any better results.
import (
"testing"
"time"
)
func BenchmarkBuffer1WithSleep(b *testing.B) {
count := 0
c := make(chan int, 1)
go func() {
for i := 0; i < b.N; i++ {
c <- i
}
close(c)
}()
for a := range c {
count++
if count%10 == 0 {
time.Sleep(time.Duration(a) * time.Nanosecond)
}
}
}
func BenchmarkBuffer10WithSleep(b *testing.B) {
count := 0
c := make(chan int, 10)
go func() {
for i := 0; i < b.N; i++ {
c <- i
}
close(c)
}()
for a := range c {
count++
if count%10 == 0 {
time.Sleep(time.Duration(a) * time.Nanosecond)
}
}
}
Results when adding a sleep every 10 reads:
BenchmarkBuffer1WithSleep-12 856886 53219 ns/op
BenchmarkBuffer10WithSleep-12 929113 56939 ns/op
FYI: I also did the test again with only one CPU and got the following results:
BenchmarkBuffer1 5831193 207 ns/op
BenchmarkBuffer10 6226983 180 ns/op
BenchmarkBuffer1WithSleep 556635 35510 ns/op
BenchmarkBuffer10WithSleep 984472 61434 ns/op
Absolutely nothing is wrong with a channel of cap 500 e.g. if this channel is used as a semaphore.
The style guide you read recommends to not use buffered channels of let's say cap 64 "because this looks like a nice number". But this recommendation is not because of performance! (Btw: You microbenchmarks are useless microbenchmarks, they do not measure anything relevant.)
An unbuffered channel is some kind of synchronisation primitive and us such very much useful.
A buffered channel, well, may buffer between sender and receiver and this buffering can be problematic for observing, tuning and debugging the code (because creation and consumption are further decoupled). Thats why the style guide recommends unbuffered channels (or at most a cap of 1 as this is sometimes needed for correctness!).
It also doesn't prohibit larger buffer caps:
Any other [than 0 or 1] size must be subject to a high level of scrutiny. Consider how the size is determined, what prevents the channel from filling up under load and blocking writers, and what happens when this occurs. [emph. mine]
You may use a cap of 27 if you can explain why 27 (and not 22 or 31) and how this will influence program behaviour (not only performance!) if the buffer is filled.
Most people overrate performance. Correctness, operational stability and maintainability come first. And this is what this style guide is about here.

Precision loss when printing floats as strings [closed]

Closed. This question is not reproducible or was caused by typos. It is not currently accepting answers.
This question was caused by a typo or a problem that can no longer be reproduced. While similar questions may be on-topic here, this one was resolved in a way less likely to help future readers.
Closed 5 years ago.
Improve this question
It just happened to me that if I store int number into a struct and later apply division with them. The precision would be lost.
func main() {
var x = 94911151
var y = 94911150
// If we use the value to calculate division directly, it would be fine
var result1 = float64(94911151)/94911150
var result2 = float64(x)/float64(y)
fmt.Println(result1, result2)
// If we pass the values directly as parameter into a function and then apply division, it would be fine
getParas(x,y)
// If we pass the values into a stuct, and then retrieve the value from struct, then apply division, the precision would be lost.
getLinearParas(Point{x,y},Point{0,0})
}
func getParas(a int, b int){
diffX := a -0
diffY := b-0
c:= float64(diffX) / float64(diffY)
fmt.Println(c)
}
type Point struct{
X int
Y int
}
func getLinearParas(point1 Point, point2 Point) {
diffX := point1.X - point2.X
diffY := point1.Y - point2.Y
a := float64(diffX) / float64(diffY)
fmt.Printf("diffY: %d; diffX:%d ; a:%f \n", diffY, diffX, a)
}
Like the code, If I put int values into a struct, and later apply division on them. the precision would be lost somehow.
The result of running above code is
1.00000001053617 1.00000001053617
1.00000001053617
diffY: 94911150; diffX:94911151 ; a:1.000000
Or you can try it yourself in playground
https://play.golang.org/p/IDS18rfv9e6
Could anyone explain why this happens? and how to avoid such loss?
Thank you very much.
Change %f in the format string to %v or %g or %.14f. fmt.Println prints things with the equivalent of %v, %v for float64 is treated as %g. %f prints values with 6 significant digits by default, %g uses "the smallest number of digits necessary to identify the value uniquely".

Why the first time memory copy runs is slow?

What I found:
I print the time cost of golang's copy, and it shows the first time of memory copy is slow. But the second time is much faster even I run "copy" on different memory address.
Here is my test codes:
func TestCopyLoop1x32M(t *testing.T) {
copyLoopSameDst(32*1024*1024, 1)
}
func TestCopyLoopOnex32M(t *testing.T) {
copyLoopSameDst(32*1024*1024, 1)
}
func copyLoopSameDst(size, loops int) {
in := make([]byte, size)
out := make([]byte, size)
rand.Seed(0)
fillRandom(in) // insert random byte into slice
now := time.Now()
for i := 0; i < loops; i++ {
copy(out, in)
}
cost := time.Since(now)
fmt.Println(cost.Seconds() / float64(loops))
}
func TestCopyDiffLoop1x32M(t *testing.T) {
copyLoopDiffDst(32*1024*1024, 1)
}
func copyLoopDiffDst(size, loops int) {
ins := make([][]byte, loops)
outs := make([][]byte, loops)
for i := 0; i < loops; i++ {
out := make([]byte, size)
outs[i] = out
in := make([]byte, size)
rand.Seed(0)
fillRandom(in)
ins[i] = in
}
now := time.Now()
for i := 0; i < loops; i++ {
copy(outs[i], ins[i])
}
cost := time.Since(now)
fmt.Println(cost.Seconds() / float64(loops))
}
The Result(on a i5-4278U):
Run all the three case:
TestCopyLoop1x32M : 0.023s
TestCopyLoopOnex32M : 0.0038s
TestCopyDiffLoop1x32M : 0.0038s
Run first&second case:
TestCopyLoop1x32M : 0.023s
TestCopyLoopOnex32M : 0.0038s
Run first&third case:
TestCopyLoop1x32M : 0.023s
TestCopyLoop1x32M : 0.023s
My questions:
They have different memory address and different data, how could the next case get benefit from the first one?
Why the Result3 is not same as Result2? Don't they do the same thing?
If I add the loop in "copyLoopSameDst", I know the next time will be faster because the cache, but my cpu's L3 Cache is only 3MB, I can't explain the huge improvement
Why "copyLoopDiffDst" will speed up after two case?
My guess:
the instruction cache help to improve performance, but it can't explain question2
the cpu cache works beyond my imagination, but it can't explain question2 either
After more research and testing, I think I can answer part of my questions.
The reason of cache works in next test case is Golang's (maybe other languages will do same things, because malloc memory is a system call) memory allocation.
When the data is big, kernel will reuse the block which just been freed.
I print the in&out []byte's address(in Golang, the first 8bytes of a slice is it's memory address, so I write a assembly to get the address):
addr: [0 192 8 32 196 0 0 0] [0 192 8 34 196 0 0 0]
cost: 0.019228028
addr: [0 192 8 36 196 0 0 0] [0 192 8 32 196 0 0 0]
cost: 0.003770281
addr: [0 192 8 34 196 0 0 0] [0 192 8 32 196 0 0 0]
cost: 0.003806502
You will find program reusing some memory address, so write hit happen in the next copy action.
If I create in/out out of function, the reusing will not happen, and it slow down.
But if you set the block very small (for example, under 32KB) You will find the speeding up again although kernel give your a new memory address. In my opinion the main reason is the data is not aligned by 64bytes, so the next loop data (its location is nearby the first one) will be caught into cache, at the same time, the first loop waster much time for fill cache. And the next loop can get the instruction cache and other data cache for run the function. When the data is small, these little cache will make big influence.
I still feel amazed, the data size is 10x of my cpu cache size, but the cache still can help a lot. Anyway, it's another question.

Resources