Can I optimise this further so that it runs faster? - performance

As you can see in the following pprof output, I have these nested for loops which take most of the time of my program. Source is in golang but code is explained below:
8.55mins 1.18hrs 20: for k := range mapSource {
4.41mins 1.20hrs 21: if positions, found := mapTarget[k]; found {
. . 22: // save all matches
1.05mins 1.05mins 23: for _, targetPos := range positions {
2.25mins 2.33mins 24: for _, sourcePos := range mapSource[k] {
1.28s 15.78s 25: matches = append(matches, match{int32(targetPos), int32(sourcePos)})
. . 26: }
. . 27: }
. . 28: }
. . 29: }
At the moment the structures I'm using are 2 map[int32][]int32, targetMap and sourceMap.
These maps contain, for a given key, an array of ints. Now I want to find the keys that match in both maps, and save the combinations of the elements in the arrays.
So for example:
sourceMap[1] = [3,4]
sourceMap[5] = [9,10]
targetMap[1] = [1,2,3]
targetMap[2] = [2,3]
targetMap[3] = [1,2]
The only key in common is 1 and the result will be [(3,1), (3,2), (3,3), (4,1), (4,2), (4,3)]
Is there any possible way (a more appropriate data structure or whatever) that could improve the speed of my program?
In my case, maps can contain somewhere between 1000 and 150000 keys, while the arrays inside are usually pretty small.
EDIT: Concurrency is not an option as this is already being run several times in several threads at the same time.

Can I optimise this further so that it runs faster?
Is there any possible way (a more appropriate data structure or
whatever) that could improve the speed of my program?
Probably.
The XY problem is asking about your
attempted solution rather than your actual problem. This leads to
enormous amounts of wasted time and energy, both on the part of people
asking for help, and on the part of those providing help.
We don't have even the most basic information about your problem, a description of the form, content, and frequency of your original input data, and your desired output. What original data should drive a benchmark?
I created some fictional original data, which produced some fictional output and results:
BenchmarkPeterSO-4 30 44089894 ns/op 5776666 B/op 31 allocs/op
BenchmarkIvan-4 10 152300554 ns/op 26023924 B/op 6022 allocs/op
It is possible that your algorithms are slow.

I would probably do it like this so that I can do some of the work concurrently:
https://play.golang.org/p/JHAmPRh7jr
package main
import (
"fmt"
"sync"
)
var final [][]int32
var wg sync.WaitGroup
var receiver chan []int32
func main() {
final = [][]int32{}
mapTarget := make(map[int32][]int32)
mapSource := make(map[int32][]int32)
mapSource[1] = []int32{3, 4}
mapSource[5] = []int32{9, 10}
mapTarget[1] = []int32{1, 2, 3}
mapTarget[2] = []int32{2, 3}
mapTarget[3] = []int32{1, 2}
wg = sync.WaitGroup{}
receiver = make(chan []int32)
go func() {
for elem := range receiver {
final = append(final, elem)
wg.Done()
}
}()
for k := range mapSource {
if _, ok := mapTarget[k]; ok {
wg.Add(1)
go permutate(mapSource[k], mapTarget[k])
}
}
wg.Wait()
fmt.Println(final)
}
func permutate(a, b []int32) {
for i := 0; i < len(a); i++ {
for j := 0; j < len(b); j++ {
wg.Add(1)
receiver <- []int32{a[i], b[j]}
}
}
wg.Done()
}
You may even want to see if you get any benefit from this:
for k := range mapSource {
wg.Add(1)
go func(k int32) {
if _, ok := mapTarget[k]; ok {
wg.Add(1)
go permutate(mapSource[k], mapTarget[k])
}
wg.Done()
}(k)
}

The best optimization probably involves changing the source and target data structures to begin with so you're don't have to iterate as much, but it's hard to be sure without knowing more about what the underlying problem you're solving is, and how the maps are generated.
However, there is an optimization that should get you a roughly 2x boost (just an educated guess), depending on the exact numbers.
var sources, targets []int32
for k, srcPositions := range mapSource {
if tgtPositions, found := mapTarget[k]; found {
sources = append(sources, srcPositions...)
targets = append(targets, tgtPositions...)
}
}
matches = make([]match, len(sources) * len(targets))
i := 0
for _, s := range(sources) {
for _, t := range(targets) {
matches[i] = match{s, t}
i++
}
}
The general idea is to minimize the amount of copying that has to be done, and improve the locality of memory references. I think this is about the best you can do with this data structure. My hunch is this isn't the best datastructure to begin with for the underlying problem, and there are much bigger gains to be had.

At first I was thinking:
Calculate keys in common in one batch, and calculate the final slice size.
Make slice with capacity that step 1 calculated.
Append one by one.
Then next structure, but it would not generate final result as array, but all the append work would be just link the node.
type node struct {
val int
parent *node
next *node
child *node
}
type tree struct {
root *node
level int
}
var sourceMap map[int]*tree

Related

Altering my usage of channels in mergesort kills my program; OR am I misunderstanding scope when dealing with goroutines?

A few days ago, I posted this topic on the Code Review site. In it, I detailed my first attempt at implementing goroutines into my mergesort code and while it worked fine, I was hoping for a better implementation. As I thought about it more, I had what I thought was a solid idea: instead of constantly waiting for both the left and right side to complete before merging both sides together, why not take the (presumably) sorted singleton chunks you get from the left side as it is sorting itself and sorted single chunks you get from the right and sort those as they come?
I attempted to restructure my code but I've run into a bit of an issue: from what I can tell, my implementation of the base case has caused massive problems, or I am misunderstanding the scope of goroutines and am telling channels to close when they, in a different sort block, are still being used. I was hoping someone could help me refine my understanding or, on the chance that my code is broken in a simple fashion, help me to understand an issue I will present after this code:
package main
import (
"crypto/rand"
"fmt"
"os"
"strconv"
)
var (
nums []byte //The slice of numbers we want to sort
numVals int = -1
)
//User can optionally add a parameter that determines how many random numbers will be sorted
//If none are provided, 100 will be used
func main() {
if len(os.Args) >= 2 {
numVals, _ = strconv.Atoi(os.Args[1])
} else {
numVals = 2
}
nums = initSlice()
ms := make(chan byte)
go mergeSort(nums, ms)
pos := 0
for val := range ms {
nums[pos] = val
pos++
}
for _, value := range nums {
fmt.Printf("%d\n", value)
}
}
func initSlice() []byte {
vals := make([]byte, numVals)
_, err := rand.Read(vals)
if err != nil {
panic(err)
}
return vals
}
func mergeSort(arr []byte, ms chan byte) {
if len(arr) <= 1 {
if len(arr) == 1 { //base case
ms <- arr[0]
}
close(ms)
return
}
leftMS := make(chan byte)
go mergeSort(arr[:len(arr)/2], leftMS)
rightMS := make(chan byte)
go mergeSort(arr[len(arr)/2:], rightMS)
left, lOK := <-leftMS
right, rOK := <-rightMS
for lOK && rOK {
leftLeast := left <= right
if leftLeast {
ms <- left
left, lOK = <-leftMS
} else {
ms <- right
right, lOK = <-rightMS
}
}
if lOK {
ms <- left
for val := range leftMS {
ms <- val
}
}
if rOK {
ms <- right
for val := range rightMS {
ms <- val
}
}
close(ms)
}
Overall, my biggest question would be, let's say we have the following sort:
If I am currently working through '38' and '27' pairing and I close that ms channel, I would expect it is not the same channel as the channel that starts everything off in main? If not, is there a way I can create new channels recursively while still keeping the name?
Hope this all makes sense and thanks for the help.
Your channel use is not your problem. There are two problems with your program.
First, you have to collect the results in a separate array in the main goroutine, otherwise, you'll be modifying the array you're sorting as it is being sorted.
Second, this block:
} else {
ms <- right
right, lOK = <-rightMS
It should be
right, rOK = <-rightMS
You're setting lOK with rightMS, not rOK.

LoadOrStore in a sync.Map without creating a new structure each time

Is it possible to LoadOrStore into a Go sync.Map without creating a new structure every time? If not, what alternatives are available?
The use case here is if I'm using the sync.Map as a cache where cache misses are rare (but possible) and on a cache miss I want to add to the map, I need to initialize a structure every single time LoadOrStore is called rather than just creating the struct when needed. I'm worried this will hurt the GC, initializing hundreds of thousands of structures that will not be needed.
In Java this can be done using computeIfAbsent.
you can try:
var m sync.Map
s, ok := m.Load("key")
if !ok {
s, _ = m.LoadOrStore("key", "value")
}
fmt.Println(s)
play demo
This is my solution: use sync.Map and sync.One
type syncData struct {
data interface{}
once *sync.Once
}
func LoadOrStore(m *sync.Map, key string, f func() (interface{}, error)) (interface{}, error) {
temp, _ := m.LoadOrStore(key, &syncData{
data: nil,
once: &sync.Once{},
})
d := temp.(*syncData)
var err error
if d.data == nil {
d.once.Do(func() {
d.data, err = f()
if err != nil {
//if failed, will try again by new sync.Once
d.once = &sync.Once{}
}
})
}
return d.data, err
}
Package sync
import "sync"
type Map
Map is like a Go map[interface{}]interface{} but is safe for
concurrent use by multiple goroutines without additional locking or
coordination. Loads, stores, and deletes run in amortized constant
time.
The Map type is specialized. Most code should use a plain Go map
instead, with separate locking or coordination, for better type safety
and to make it easier to maintain other invariants along with the map
content.
The Map type is optimized for two common use cases: (1) when the entry
for a given key is only ever written once but read many times, as in
caches that only grow, or (2) when multiple goroutines read, write,
and overwrite entries for disjoint sets of keys. In these two cases,
use of a Map may significantly reduce lock contention compared to a Go
map paired with a separate Mutex or RWMutex.
The usual way to solve these problems is to construct a usage model and then benchmark it.
For example, since "cache misses are rare", assume that Load wiil work most of the time and only LoadOrStore (with value allocation and initialization) when necessary.
$ go test map_test.go -bench=. -benchmem
BenchmarkHit-4 2 898810447 ns/op 44536 B/op 1198 allocs/op
BenchmarkMiss-4 1 2958103053 ns/op 483957168 B/op 43713042 allocs/op
$
map_test.go:
package main
import (
"strconv"
"sync"
"testing"
)
func BenchmarkHit(b *testing.B) {
for N := 0; N < b.N; N++ {
var m sync.Map
for i := 0; i < 64*1024; i++ {
for k := 0; k < 256; k++ {
// Assume cache hit
v, ok := m.Load(k)
if !ok {
// allocate and initialize value
v = strconv.Itoa(k)
a, loaded := m.LoadOrStore(k, v)
if loaded {
v = a
}
}
_ = v
}
}
}
}
func BenchmarkMiss(b *testing.B) {
for N := 0; N < b.N; N++ {
var m sync.Map
for i := 0; i < 64*1024; i++ {
for k := 0; k < 256; k++ {
// Assume cache miss
// allocate and initialize value
var v interface{} = strconv.Itoa(k)
a, loaded := m.LoadOrStore(k, v)
if loaded {
v = a
}
_ = v
}
}
}
}

Slice merge in golang recommendation

Is there a way to make this golang code shorter?
func MergeSlices(s1 []float32, s2 []int32) []int {
var slice []int
for i := range s1 {
slice = append(slice, int(s1[i]))
}
for i := range s2 {
slice = append(slice, int(s2[i]))
}
return slice
}
You can't eliminate the loops to convert each element to int individually, because you can't convert whole slices of different element types. For explanation, see this question: Type converting slices of interfaces in go
The most you can do is use named result type, and a for range with 2 iteration values, where you can omit the first (the index) by assigning it to the blank identifier, and the 2nd will be the value:
func MergeSlices(s1 []float32, s2 []int32) (s []int) {
for _, v := range s1 {
s = append(s, int(v))
}
for _, v := range s2 {
s = append(s, int(v))
}
return
}
But know that your code is fine as-is. My code is not something to always follow, it was to answer your question: how to make your code shorter. If you want to improve your code, you could start by looking at its performance, or even refactoring your code to not end up needing to merge slices of different types.
Your code should be correct, maintainable, readable, and reasonably efficient. Note that shortness of code is not one of the important goals. For good reason, Stack Exchange has another site for Code Golf questions: Programming Puzzles & Code Golf.
Your code could be improved; it's inefficient. For example, merging two len(256) slices,
BenchmarkMergeSlices 200000 8350 ns/op 8184 B/op 10 allocs/op
Here's a more efficient (and longer) version:
BenchmarkMergeSlices 300000 4420 ns/op 4096 B/op 1 allocs/op
.
func MergeSlices(s1 []float32, s2 []int32) []int {
slice := make([]int, 0, len(s1)+len(s2))
for i := range s1 {
slice = append(slice, int(s1[i]))
}
for i := range s2 {
slice = append(slice, int(s2[i]))
}
return slice
}
Use the Go Code Review Comments for Named Result Parameters. For example: "Don't name result parameters just to avoid declaring a var inside the function; that trades off a minor implementation brevity at the cost of unnecessary API verbosity. Clarity of docs is always more important than saving a line or two in your function."
var s1 []int
var s2 []int
newSlice = append(s1, s2...)
The code can't get any shorter, but that's a goal of dubious value to begin with; it's not overly verbose as-is. You can, however, likely improve performance by eliminating the intermediate allocations. Every time you call append, if the target slice doesn't have enough space, it expands it, guessing at the necessary size since you haven't told it how much space it will need.
The simplest would just be to presize your target slice (replace var slice []int with slice := make([]int, 0, len(s1) + len(s2)); that way the appends never have to expand it. Setting the second parameter to 0 is important, that sets the length to zero, and the capacity to the total size needed, so that your appends will work as expected.
Once you've presized it though, you can get rid of the appends entirely, and directly set each index:
func MergeSlices(s1 []float32, s2 []int32) []int {
slice := make([]int, len(s1) + len(s2))
for i,v := range s1 {
slice[i] = int(v)
}
for i,v := range s2 {
slice[i+len(s1)] = int(v)
}
return slice
}
Playground link

Saving results from a parallelized goroutine

I am trying to parallelize an operation in golang and save the results in a manner that I can iterate over to sum up afterwords.
I have managed to set up the parameters so that no deadlock occurs, and I have confirmed that the operations are working and being saved correctly within the function. When I iterate over the Slice of my struct and try and sum up the results of the operation, they all remain 0. I have tried passing by reference, with pointers, and with channels (causes deadlock).
I have only found this example for help: https://golang.org/doc/effective_go.html#parallel. But this seems outdated now, as Vector as been deprecated? I also have not found any references to the way this function (in the example) was constructed (with the func (u Vector) before the name). I tried replacing this with a Slice but got compile time errors.
Any help would be very appreciated. Here is the key parts of my code:
type job struct {
a int
b int
result *big.Int
}
func choose(jobs []Job, c chan int) {
temp := new(big.Int)
for _,job := range jobs {
job.result = //perform operation on job.a and job.b
//fmt.Println(job.result)
}
c <- 1
}
func main() {
num := 100 //can be very large (why we need big.Int)
n := num
k := 0
const numCPU = 6 //runtime.NumCPU
count := new(big.Int)
// create a 2d slice of jobs, one for each core
jobs := make([][]Job, numCPU)
for (float64(k) <= math.Ceil(float64(num / 2))) {
// add one job to each core, alternating so that
// job set is similar in difficulty
for i := 0; i < numCPU; i++ {
if !(float64(k) <= math.Ceil(float64(num / 2))) {
break
}
jobs[i] = append(jobs[i], Job{n, k, new(big.Int)})
n -= 1
k += 1
}
}
c := make(chan int, numCPU)
for i := 0; i < numCPU; i++ {
go choose(jobs[i], c)
}
// drain the channel
for i := 0; i < numCPU; i++ {
<-c
}
// computations are done
for i := range jobs {
for _,job := range jobs[i] {
//fmt.Println(job.result)
count.Add(count, job.result)
}
}
fmt.Println(count)
}
Here is the code running on the go playground https://play.golang.org/p/X5IYaG36U-
As long as the []Job slice is only modified by one goroutine at a time, there's no reason you can't modify the job in place.
for i, job := range jobs {
jobs[i].result = temp.Binomial(int64(job.a), int64(job.b))
}
https://play.golang.org/p/CcEGsa1fLh
You should also use a WaitGroup, rather than rely on counting tokens in a channel yourself.

code block in goroutine produces strange wrong results

i have a big N*1 name array
i am currently using goroutine to calculate the edit distance of a name among each other
the question is the results at [B] [C] are different, maybe like
ABC BCD 7
ABC BCD 3
there are 20000 records in names
var names []string
divide names into two chunks
nameCount := len(names)
procs := 2
chunkSize := nameCount / procs
channel
ch := make(chan int)
var wg sync.WaitGroup
for i := 0; i < procs; i++ { //create two goroutines
start := i * chunkSize
end := (i+1)*chunkSize - 1
fmt.Println(start, end) //get slice start and end
wg.Add(1)
go func(slices []string, allnames []string) {
for _, slice := range slices {
minDistance = 256
distance := 0
sum := 0
for _, name := range allnames {
distance = calcEditDist(slice, name) //get the LD [A]
sum += 1
if distance > 0 && distance < minDistance {
minDistance = distance
fmt.Println(slice, name, distance) //[B]
fmt.Println(slice, name, calcEditDist(slice, name)) //[C]
} else if distance == minDistance {
fmt.Println(slice, name, distance)
fmt.Println(slice, name, calcEditDist(slice, name))
}
}
// for _, name := range allnames {
// fmt.Println(slice, name)
// }
ch <- sum
// fmt.Println(len(allnames), slice)
break
}
wg.Done()
}(names[start:end], names)
}
i placed the calcEditDist #https://github.com/copywrite/keyboardDistance/blob/master/parallel.go
PS:
if i declare
var dp [max][max]int
in calcEditDist as local variable instead of global, the results are right, but is incredibly slow
UPDATE 1
Thanks all buddy,i take the great advice below in three steps
1) i shrinked the dp to a much reasonable size, like 100 or even smaller, DONE
2) i put the dp declaration in each goroutine and pass its pointer as Nick said, DONE
3) later i will try to dynamically alloc dp, LATER
the performance improved steeply, ╰(°▽°)╯
As you've identified in your posting, having dp as a global variable is the problem.
Allocating it each time in CalcEditDistance is too slow.
You have two possible solutions.
1) you only need 1 dp array per go-routine, so allocate it in the for loop loop and pass a pointer to it (don't pass the array directly as arrays pass by value which will involve a lot of copying!)
for i := 0; i < procs; i++ { //create two goroutines
start := i * chunkSize
end := (i+1)*chunkSize - 1
fmt.Println(start, end) //get slice start and end
wg.Add(1)
go func(slices []string, allnames []string) {
var dp [max][max]int // allocate
for _, slice := range slices {
minDistance = 256
distance := 0
sum := 0
for _, name := range allnames {
distance = calcEditDist(slice, name, &dp) // pass dp pointer here
Change calcEditDist to take the dp
func CalcEditDist(A string, B string, dp *[max][max]int) int {
lenA := len(A)
lenB := len(B)
2) Re-write your calcEditDistance so it doesn't need the massive O(N^2) dp array.
If you study the function carefully it only ever accesses a row up and a column to the left, so all the storage you actually need is a previous row and a previous columns which you could allocate dynamically at very little cost. This would make it scale to any length of string too.
That would need a bit of careful thought though!

Resources