How to detect what is preventing multiple cores being used in golang? - go

So, I have a piece of code that is concurrent and it's meant to be run onto each CPU/core.
There are two large vectors with input/output values
var (
input = make([]float64, rowCount)
output = make([]float64, rowCount)
)
these are filled and I want to compute the distance (error) between each input-output pair. Being the pairs independent, a possible concurrent version is the following:
var d float64 // Error to be computed
// Setup a worker "for each CPU"
ch := make(chan float64)
nw := runtime.NumCPU()
for w := 0; w < nw; w++ {
go func(id int) {
var wd float64
// eg nw = 4
// worker0, i = 0, 4, 8, 12...
// worker1, i = 1, 5, 9, 13...
// worker2, i = 2, 6, 10, 14...
// worker3, i = 3, 7, 11, 15...
for i := id; i < rowCount; i += nw {
res := compute(input[i])
wd += distance(res, output[i])
}
ch <- wd
}(w)
}
// Compute total distance
for w := 0; w < nw; w++ {
d += <-ch
}
The idea is to have a single worker for each CPU/core, and each worker processes a subset of the rows.
The problem I'm having is that this code is no faster than the serial code.
Now, I'm using Go 1.7 so runtime.GOMAXPROCS should be already set to runtime.NumCPU(), but even setting it explicitly does not improves performances.
distance is just (a-b)*(a-b);
compute is a bit more complex, but should be reentrant and use global data only for reading (and uses math.Pow and math.Sqrt functions);
no other goroutine is running.
So, besides accessing the global data (input/output) for reading, there are no locks/mutexes that I am aware of (not using math/rand, for example).
I also compiled with -race and nothing emerged.
My host has 4 virtual cores, but when I run this code I get (using htop) CPU usage to 102%, but I expected something around 380%, as it happened in the past with other go code that used all the cores.
I would like to investigate, but I don't know how the runtime allocates threads and schedule goroutines.
How can I debug this kind of issues? Can pprof help me in this case? What about the runtime package?
Thanks in advance

Sorry, but in the end I got the measurement wrong. #JimB was right, and I had a minor leak, but not so much to justify a slowdown of this magnitude.
My expectations were too high: the function I was making concurrent was called only at the beginning of the program, therefore the performance improvement was just minor.
After applying the pattern to other sections of the program, I got the expected results. My mistake in evaluation which section was the most important.
Anyway, I learned a lot of interesting things meanwhile, so thanks a lot to all the people trying to help!

Related

Disable array/slice bounds checking in Golang to improve performance

I'm writing a NES/Famicom emulator. I register a callback function that will be called every time a pixel is rendered. It means that my callback function will be called about 3.5 million times (256width * 240height * 60fps).
In my callback function, there are many array/slice operations, and I found that Go will do bounds checking every time I index an element in it. But the indexes are results of bit and operations so I can tell that it will NOT exceed both bounds.
So, I'm here to ask if there is a way to disable bounds checking?
Thank you.
Using gcflags you can disable bounds checking.
go build -gcflags=-B .
If you really need to avoid the bounds check, you can use the unsafe package and use C-style pointer arithmetic to perform your lookups:
index := 2
size := unsafe.Sizeof(YourStruct{})
p := unsafe.Pointer(&yourStructSlice[0])
indexp := (unsafe.Pointer)(uintptr(p) + size*uintptr(index))
yourStructPtr := (*YourStruct)(indexp)
https://play.golang.org/p/GDNphKsJPOv
You should time it to determine how much CPU run time you are actually saving by doing this, but it is probably true it is possible to make it faster using this approach.
Also, you may want to have a look at the actual generated instructions to make sure that what you outputting is actually more efficient. Doing lookups without bounds checks very well may be more trouble than it's worth. Some info on how to do that here: https://github.com/teh-cmc/go-internals/blob/master/chapter1_assembly_primer/README.md
Another common approach is to write performance critical code in assembly (see https://golang.org/doc/asm). Ain't no automatic bounds checking in asm :)
The XY Problem
The XY problem is asking about your attempted solution rather than
your actual problem.
Your real problem is overall performance. Let's see some benchmarks to show that bounds checking is a significant problem. It may not be a significant problem. For example, less than one millisecond per second,
Bounds check:
BenchmarkPixels-4 300 4034580 ns/op
No bounds check:
BenchmarkPixels-4 500 3150985 ns/op
bounds_test.go:
package main
import (
"testing"
)
const (
width = 256
height = 240
frames = 60
)
var pixels [width * height]byte
func writePixel(w, h int) {
pixels[w*height+h] = 42
}
func BenchmarkPixels(b *testing.B) {
for N := 0; N < b.N; N++ {
for f := 0; f < frames; f++ {
for w := 0; w < width; w++ {
for h := 0; h < height; h++ {
writePixel(w, h)
}
}
}
}
}

How to implement "i++ and i>=max ? 0: i" that only use atomic in Go

only use atomic implement the follow code:
const Max = 8
var index int
func add() int {
index++
if index >= Max {
index = 0
}
return index
}
such as:
func add() int {
atomic.AddUint32(&index, 1)
// error: race condition
atomic.CompareAndSwapUint32(&index, Max, 0)
return index
}
but it is wrong. there is a race condition.
can be implemented that don't use lock ?
Solving it without loops and locks
A simple implementation may look like this:
const Max = 8
var index int64
func Inc() int64 {
value := atomic.AddInt64(&index, 1)
if value < Max {
return value // We're done
}
// Must normalize, optionally reset:
value %= Max
if value == 0 {
atomic.AddInt64(&index, -Max) // Reset
}
return value
}
How does it work?
It simply adds 1 to the counter; atomic.AddInt64() returns the new value. If it's less than Max, "we're done", we can return it.
If it's greater than or equal to Max, then we have to normalize the value (make sure it's in the range [0..Max)) and reset the counter.
Reset may only be done by a single caller (a single goroutine), which will be selected by the counter's value. The winner will be the one that caused the counter to reach Max.
And the trick to avoid the need of locks is to reset it by adding -Max, not by setting it to 0. Since the counter's value is normalized, it won't cause any problems if other goroutines are calling it and incrementing it concurrently.
Of course with many goroutines calling this Inc() concurrently it may be that the counter will be incremented more that Max times before a goroutine that ought to reset it can actually carry out the reset, which would cause the counter to reach or exceed 2 * Max or even 3 * Max (in general: n * Max). So we handle this by using a value % Max == 0 condition to decide if a reset should happen, which again will only happen at a single goroutine for each possible values of n.
Simplification
Note that the normalization does not change values already in the range [0..Max), so you may opt to always perform the normalization. If you want to, you may simplify it to this:
func Inc() int64 {
value := atomic.AddInt64(&index, 1) % Max
if value == 0 {
atomic.AddInt64(&index, -Max) // Reset
}
return value
}
Reading the counter without incrementing it
The index variable should not be accessed directly. If there's a need to read the counter's current value without incrementing it, the following function may be used:
func Get() int64 {
return atomic.LoadInt64(&index) % Max
}
Extreme scenario
Let's analyze an "extreme" scenario. In this, Inc() is called 7 times, returning the numbers 1..7. Now the next call to Inc() after the increment will see that the counter is at 8 = Max. It will then normalize the value to 0 and wants to reset the counter. Now let's say before the reset (which is to add -8) is actually executed, 8 other calls happen. They will increment the counter 8 times, and the last one will again see that the counter's value is 16 = 2 * Max. All the calls will normalize the values into the range 0..7, and the last call will again go on to perform a reset. Let's say this reset is again delayed (e.g. for scheduling reasons), and yet another 8 calls come in. For the last, the counter's value will be 24 = 3 * Max, the last call again will go on to perform a reset.
Note that all calls will only return values in the range [0..Max). Once all reset operations are executed, the counter's value will be 0, properly, because it had a value of 24 and there were 3 "pending" reset operations. In practice there's only a slight chance for this to happen, but this solution handles it nicely and efficiently.
I assume your goal is to never let index has value equal or greater than Max. This can be solved using CAS (Compare-And-Swap) loop:
const Max = 8
var index int32
func add() int32 {
var next int32;
for {
prev := atomic.LoadInt32(&index)
next = prev + 1;
if next >= Max {
next = 0
}
if (atomic.CompareAndSwapInt32(&index, prev, next)) {
break;
}
}
return next
}
CAS can be used to implement almost any operation atomically like this. The algorithm is:
Load the value
Perform the desired operation
Use CAS, goto 1 on failure.

How to return the port number in 2 bytes to client in socks5 proxy?

I am trying to implement socks5 proxy server.
Most things are clear according to the rfc but I'm stuck interpreting client port and writing my port number in bytes.
I made a function that tkes an int and returns 2 bytes. This function first converts number into binary then literally splits the bits as string then converts them back to byte.However this seems wrong because if the right most bits are 0 they are lost.
Here is the function
func getBytesOfInt(i int) []byte {
binary := fmt.Sprintf("%b", i)
if i < 255 {
return []byte{byte(i)}
}
first := binary[:8]
last := binary[9:]
fmt.Println(binary, first, last)
i1, _ := strconv.ParseInt(first, 2, 64)
i2, _ := strconv.ParseInt(last, 2, 64)
return []byte{byte(i1), byte(i2)}
}
Can you please explain me how am i supposed to parse the number and get 2 bytes and most importantly how am i going to cast it back to an integer.
Currently if you give 1024 to this function it will return []byte{0x80, 0x0} which is 128 in decimals but as you see the right bits are lost theres only one 0 which is useless.
Your code has multiple problem. First :8 and 9: miss an element ([8]), see: https://play.golang.org/p/yuhh4ZeJFNL
And also, you should interept the second byte as lowbyte of the int and the first as highbyte, not literally cut the binary string. for example 4 should be interept as [0x0,0x4] instead of [0x4,0x0] which shoulld be 1024.
If you want to keep using strconv you should use:
n := len(binary)
first := binary[:n-8]
last := binary[n-8:]
However it is very unefficient.
I would suggest b[0],b[1] = i >> 8, i & 255, and i = b[0]<<8 + b[1] .

Implement serial in to parallel out shift register with AVR microcontroller

On the internet there are quite a number of tutorials of how to control a shift register with a microcontroller, but is it actually possible to implement the shift register function with only the microcontroller?
If you have enough pins, I don't see why the naive way wouldn't work...
For an n-bit shift in register, you need n+2 pins:
One clock-in
One data-in
n data-out
The pseudocode of the implementation is:
var byte r := 0 // Assuming n=8, so 8 bits fit into a single byte
var byte i := 0
forever:
wait for clock-in = low
wait for clock-in = high
r := r << 0 | data-in
i := i + 1
if i = n:
data-out<1..n> := r
i = 0
If you want to make sure that data-out is updated synchronously, make sure you use pins of a single port: then the data-out<1..n> := r statement can literally be a single port register assignment.
If you want to run this concurrently with other code, you should be able to use a pin for clock-in that can trigger an interrupt.

Project Euler 16 - Help in solving it

I'm solving Project Euler problem 16, I've ended up with a code that can logically solve it, but is unable to process as I believe its overflowing or something? I tried int64 in place of int but it just prints 0,0. If i change the power to anything below 30 it works, but above 30 it does not work, Can anyone point out my mistake? I believe its not able to calculate 2^1000.
// PE_16 project main.go
package main
import (
"fmt"
)
func power(x, y int) int {
var pow int
var final int
final = 1
for pow = 1; pow <= y; pow++ {
final = final * x
}
return final
}
func main() {
var stp int
var sumfdigits int
var u, t, h, th, tth, l int
stp = power(2,1000)
fmt.Println(stp)
u = stp / 1 % 10
t = stp / 10 % 10
h = stp / 100 % 10
th = stp / 1000 % 10
tth = stp / 10000 % 10
l = stp / 100000 % 10
sumfdigits = u + t + h + th + tth + l
fmt.Println(sumfdigits)
}
Your approach to this problem requires exact integer math up to 1000 bits in size. But you're using int which is 32 or 64 bits. math/big.Int can handle such task. I intentionally do not provide a ready made solution using big.Int as I assume your goal is to learn by doing it by yourself, which I believe is the intent of Project Euler.
As noted by #jnml, ints aren't large enough; if you wish to calculate 2^1000 in Go, big.Ints are a good choice here. Note that math/big provides the Exp() method which will be easier to use than converting your power function to big.Ints.
I worked through some Project Euler problems about a year ago, doing them in Go to get to know the language. I didn't like the ones that required big.Ints, which aren't so easy to work with in Go. For this one, I "cheated" and did it in one line of Ruby:
Removed because I remembered it was considered bad form to show a working solution, even in a different language.
Anyway, my Ruby example shows another thing I learned with Go's big.Ints: sometimes it's easier to convert them to a string and work with that string than to work with the big.Int itself. This problem strikes me as one of those cases.
Converting my Ruby algorithm to Go, I only work with big.Ints on one line, then it's easy to work with the string and get the answer in just a few lines of code.
You don't need to use math/big. Below is a school boy maths way of doubling a decimal number as a hint!
xs holds the decimal digits in least significant first order. Pass in a pointer to the digits (pxs) as the slice might need to get bigger.
func double(pxs *[]int) {
xs := *pxs
carry := 0
for i, x := range xs {
n := x*2 + carry
if n >= 10 {
carry = 1
n -= 10
} else {
carry = 0
}
xs[i] = n
}
if carry != 0 {
*pxs = append(xs, carry)
}
}

Resources