code block in goroutine produces strange wrong results - go

i have a big N*1 name array
i am currently using goroutine to calculate the edit distance of a name among each other
the question is the results at [B] [C] are different, maybe like
ABC BCD 7
ABC BCD 3
there are 20000 records in names
var names []string
divide names into two chunks
nameCount := len(names)
procs := 2
chunkSize := nameCount / procs
channel
ch := make(chan int)
var wg sync.WaitGroup
for i := 0; i < procs; i++ { //create two goroutines
start := i * chunkSize
end := (i+1)*chunkSize - 1
fmt.Println(start, end) //get slice start and end
wg.Add(1)
go func(slices []string, allnames []string) {
for _, slice := range slices {
minDistance = 256
distance := 0
sum := 0
for _, name := range allnames {
distance = calcEditDist(slice, name) //get the LD [A]
sum += 1
if distance > 0 && distance < minDistance {
minDistance = distance
fmt.Println(slice, name, distance) //[B]
fmt.Println(slice, name, calcEditDist(slice, name)) //[C]
} else if distance == minDistance {
fmt.Println(slice, name, distance)
fmt.Println(slice, name, calcEditDist(slice, name))
}
}
// for _, name := range allnames {
// fmt.Println(slice, name)
// }
ch <- sum
// fmt.Println(len(allnames), slice)
break
}
wg.Done()
}(names[start:end], names)
}
i placed the calcEditDist #https://github.com/copywrite/keyboardDistance/blob/master/parallel.go
PS:
if i declare
var dp [max][max]int
in calcEditDist as local variable instead of global, the results are right, but is incredibly slow
UPDATE 1
Thanks all buddy,i take the great advice below in three steps
1) i shrinked the dp to a much reasonable size, like 100 or even smaller, DONE
2) i put the dp declaration in each goroutine and pass its pointer as Nick said, DONE
3) later i will try to dynamically alloc dp, LATER
the performance improved steeply, ╰(°▽°)╯

As you've identified in your posting, having dp as a global variable is the problem.
Allocating it each time in CalcEditDistance is too slow.
You have two possible solutions.
1) you only need 1 dp array per go-routine, so allocate it in the for loop loop and pass a pointer to it (don't pass the array directly as arrays pass by value which will involve a lot of copying!)
for i := 0; i < procs; i++ { //create two goroutines
start := i * chunkSize
end := (i+1)*chunkSize - 1
fmt.Println(start, end) //get slice start and end
wg.Add(1)
go func(slices []string, allnames []string) {
var dp [max][max]int // allocate
for _, slice := range slices {
minDistance = 256
distance := 0
sum := 0
for _, name := range allnames {
distance = calcEditDist(slice, name, &dp) // pass dp pointer here
Change calcEditDist to take the dp
func CalcEditDist(A string, B string, dp *[max][max]int) int {
lenA := len(A)
lenB := len(B)
2) Re-write your calcEditDistance so it doesn't need the massive O(N^2) dp array.
If you study the function carefully it only ever accesses a row up and a column to the left, so all the storage you actually need is a previous row and a previous columns which you could allocate dynamically at very little cost. This would make it scale to any length of string too.
That would need a bit of careful thought though!

Related

Creating a parallel word counter in Go

I am trying to create a word counter that returns an array of the number of times each word in a text file appears. Moreover, I have been assigned to parallelize this program.
My initial attempt at this task was as follows
Implementation 1
func WordCount(words []string, startWord int, endWord int, waitGroup *sync.WaitGroup, freqsChannel chan<- map[string]int) {
freqs := make(map[string]int)
for i := startWord; i < endWord; i++ {
word := words[i]
freqs[word]++
}
freqsChannel <- freqs
waitGroup.Done()
}
func ParallelWordCount(text string) map[string]int {
// Split text into string array of the words in text.
text = strings.ToLower(text)
text = strings.ReplaceAll(text, ",", "")
text = strings.ReplaceAll(text, ".", "")
words := strings.Fields(text)
length := len(words)
threads := 28
freqsChannel := make(chan map[string]int, threads)
var waitGroup sync.WaitGroup
waitGroup.Add(threads)
defer waitGroup.Wait()
wordsPerThread := length / threads // always rounds down
wordsInLastThread := length - (threads-1)*wordsPerThread
startWord := -wordsPerThread
endWord := 0
for i := 1; i <= threads; i++ {
if i < threads {
startWord += wordsPerThread
endWord += wordsPerThread
} else {
startWord += wordsInLastThread
endWord += wordsInLastThread
}
go WordCount(words, startWord, endWord, &waitGroup, freqsChannel)
}
freqs := <-freqsChannel
for i := 1; i < threads; i++ {
subFreqs := <-freqsChannel
for word, count := range subFreqs {
freqs[word] += count
}
}
return freqs
}
According to my teaching assistant, this was not a good solution as the pre-processing of the text file carried out by
text = strings.ToLower(text)
text = strings.ReplaceAll(text, ",", "")
text = strings.ReplaceAll(text, ".", "")
words := strings.Fields(text)
in ParallelWordCount goes against the idea of parallel processing.
Now, to fix this, I have moved the responsibility of processing the text file into an array of words into the the WordCount function that is called on separate goroutines for different parts of the text file. Below is the code for my second implementation.
Implementation 2
func WordCount(text string, waitGroup *sync.WaitGroup, freqsChannel chan<- map[string]int) {
freqs := make(map[string]int)
text = strings.ToLower(text)
text = strings.ReplaceAll(text, ",", "")
text = strings.ReplaceAll(text, ".", "")
words := strings.Fields(text)
for _, value := range words {
freqs[value]++
}
freqsChannel <- freqs
waitGroup.Done()
}
func splitCount(str string, subStrings int, waitGroup *sync.WaitGroup, freqsChannel chan<- map[string]int) {
if subStrings != 1 {
length := len(str)
charsPerSubstring := length / subStrings
i := 0
for str[charsPerSubstring+i] != ' ' {
i++
}
subString := str[0 : charsPerSubstring+i+1]
go WordCount(subString, waitGroup, freqsChannel)
splitCount(str[charsPerSubstring+i+1:length], subStrings-1, waitGroup, freqsChannel)
} else {
go WordCount(str, waitGroup, freqsChannel)
}
}
func ParallelWordCount(text string) map[string]int {
threads := 28
freqsChannel := make(chan map[string]int, threads)
var waitGroup sync.WaitGroup
waitGroup.Add(threads)
defer waitGroup.Wait()
splitCount(text, threads, &waitGroup, freqsChannel)
// Collect and return frequences
freqs := <-freqsChannel
for i := 1; i < threads; i++ {
subFreqs := <-freqsChannel
for word, count := range subFreqs {
freqs[word] += count
}
}
return freqs
}
The average runtime of this implementation is 3 ms compared to the old average of 5 ms, but have I thoroughly addressed the issue raised by my teaching assistant or does the second implementation also not take full advantage of parallel processing to efficiently count the words of text file?
Two things that I see:
Second example is better as you have split the text parsing and word counting into several goroutines. One thing you can try is to not count words in WordCount method, but just push them to the channel and increment them in the main counter. You can check if that is any faster, I'm not sure. Also, check the fan-in pattern for more details.
Parallel processing might still not be fully utilized, because I
don't believe you have 28 CPU cores available :). Number of cores is determining how many WordCount goroutines are working in parrallel, the rest of them will be distributed concurrently base on available resources (available CPU cores). Here is a great article explaining this.
Implementation 2 Issues
In method splitCount(), what if the total length of the string is less than 28. Still, it will call wordcount() equal to the number of words.
Also, it will fail as we are doing waitgroup.done 28 times in that scenario.
Recursively calling splitWord() is making it slow. We should split and call in a loop
The number of threads should not be 28 always as we don't know how many words are there in the string.
Will try to develop a more optimised approach and will update the answer.

Combine multiple maps into a map whose value for a given key is the sum of the values of the key in the combined maps

I have written a program that identifies all unique words in a text document and counts how many times each of the words appears. To improve the performance of my program I am trying to break up the word counting into several goroutines that can run in parallel.
Initially, I tried using a single map that was passed by reference to each goroutine, where each goroutine would count the words in part of the document. This caused a panic because the program was trying to write to the same map from multiple goroutines simultaniously. To solve this issue, I created a mutex that would prevent multiple goroutines from writing to the map simultaniously. At this point, the program functioned as expected, but with no performance difference compared to the original sequential implementation of the WordCount function. At second thought, this is not surprising given that the mutex forces the other goroutines to wait before writing to the map, hence preventing parallel computation.
Below is the code that uses a mutex to avoid the described runtime panic, but also fails to count words in parallel.
func WordCount(words []string, startWord int, endWord int, freqs map[string]int, waitGroup *sync.WaitGroup, mutex *sync.Mutex) {
mutex.Lock()
for i := startWord; i < endWord; i++ {
word := words[i]
freqs[word]++
}
mutex.Unlock()
waitGroup.Done()
}
func ParallelWordCount(text string) map[string]int {
// Split text into string array of the words in text.
text = strings.ToLower(text)
text = strings.ReplaceAll(text, ",", "")
text = strings.ReplaceAll(text, ".", "")
words := strings.Fields(text)
length := len(words)
freqs := make(map[string]int)
var mutex sync.Mutex
var waitGroup sync.WaitGroup
waitGroup.Add(2)
defer waitGroup.Wait()
threads := 2
wordsPerThread := length / threads // always rounds down
wordsInLastThread := length - (threads-1)*wordsPerThread
startWord := -wordsPerThread
var endWord int
for i := 1; i <= threads; i++ {
if i < threads {
startWord += wordsPerThread * i
endWord += wordsPerThread * i
} else {
startWord += wordsInLastThread
endWord += wordsInLastThread
}
go WordCount(words, startWord, endWord, freqs, &waitGroup, &mutex)
}
return freqs
}
I belive that I could achieve parallel word counting if I would create a local map of word frequencies for each goroutine and in the end combine the local frequency maps into a single map with the word counts for the entire text file. The problem I am currently facing is how to combine the local frequency maps. Concretely, I need to know how to combine multiple maps into a map whose value for a given key is the sum of the values of the key in the maps to be combined.
To clarify the underlying logic of what I am trying to do I have included the below example. The ConcurrentSum function returns the sum of the elements in an array by computing the lower and upper halves of the array concurrently. In my case, I want to, in parallel, count words in different parts of my text file and, ultimately, combine the word counts into a single map of word counts representative of the entire document.
func sum(a []int, res chan<- int) {
var sum int
for i := 0; i < len(a); i++ {
sum += a[i]
}
res <- sum
}
// concurrently sum the array a.
func ConcurrentSum(a []int) int {
n := len(a)
ch := make(chan int)
go sum(a[:n/2], ch)
go sum(a[n/2:], ch)
return <-ch + <-ch
}
I believe you could create an array of maps each one being used for each process and then reading in each map using a list to keep track of the words you've already counted. Assuming that each word is a key to the number of times counted, that's how it looks.
Parallel processing here might not be the best choice considering the concurrency side since everything needs to be kept separate for a real performance increase. If you have the storage space then you most certainly can use a list and get at worst case O(N) efficiency from the integration of the maps. You will need to keep the the integration of the maps in a single thread or a single process.

Can I optimise this further so that it runs faster?

As you can see in the following pprof output, I have these nested for loops which take most of the time of my program. Source is in golang but code is explained below:
8.55mins 1.18hrs 20: for k := range mapSource {
4.41mins 1.20hrs 21: if positions, found := mapTarget[k]; found {
. . 22: // save all matches
1.05mins 1.05mins 23: for _, targetPos := range positions {
2.25mins 2.33mins 24: for _, sourcePos := range mapSource[k] {
1.28s 15.78s 25: matches = append(matches, match{int32(targetPos), int32(sourcePos)})
. . 26: }
. . 27: }
. . 28: }
. . 29: }
At the moment the structures I'm using are 2 map[int32][]int32, targetMap and sourceMap.
These maps contain, for a given key, an array of ints. Now I want to find the keys that match in both maps, and save the combinations of the elements in the arrays.
So for example:
sourceMap[1] = [3,4]
sourceMap[5] = [9,10]
targetMap[1] = [1,2,3]
targetMap[2] = [2,3]
targetMap[3] = [1,2]
The only key in common is 1 and the result will be [(3,1), (3,2), (3,3), (4,1), (4,2), (4,3)]
Is there any possible way (a more appropriate data structure or whatever) that could improve the speed of my program?
In my case, maps can contain somewhere between 1000 and 150000 keys, while the arrays inside are usually pretty small.
EDIT: Concurrency is not an option as this is already being run several times in several threads at the same time.
Can I optimise this further so that it runs faster?
Is there any possible way (a more appropriate data structure or
whatever) that could improve the speed of my program?
Probably.
The XY problem is asking about your
attempted solution rather than your actual problem. This leads to
enormous amounts of wasted time and energy, both on the part of people
asking for help, and on the part of those providing help.
We don't have even the most basic information about your problem, a description of the form, content, and frequency of your original input data, and your desired output. What original data should drive a benchmark?
I created some fictional original data, which produced some fictional output and results:
BenchmarkPeterSO-4 30 44089894 ns/op 5776666 B/op 31 allocs/op
BenchmarkIvan-4 10 152300554 ns/op 26023924 B/op 6022 allocs/op
It is possible that your algorithms are slow.
I would probably do it like this so that I can do some of the work concurrently:
https://play.golang.org/p/JHAmPRh7jr
package main
import (
"fmt"
"sync"
)
var final [][]int32
var wg sync.WaitGroup
var receiver chan []int32
func main() {
final = [][]int32{}
mapTarget := make(map[int32][]int32)
mapSource := make(map[int32][]int32)
mapSource[1] = []int32{3, 4}
mapSource[5] = []int32{9, 10}
mapTarget[1] = []int32{1, 2, 3}
mapTarget[2] = []int32{2, 3}
mapTarget[3] = []int32{1, 2}
wg = sync.WaitGroup{}
receiver = make(chan []int32)
go func() {
for elem := range receiver {
final = append(final, elem)
wg.Done()
}
}()
for k := range mapSource {
if _, ok := mapTarget[k]; ok {
wg.Add(1)
go permutate(mapSource[k], mapTarget[k])
}
}
wg.Wait()
fmt.Println(final)
}
func permutate(a, b []int32) {
for i := 0; i < len(a); i++ {
for j := 0; j < len(b); j++ {
wg.Add(1)
receiver <- []int32{a[i], b[j]}
}
}
wg.Done()
}
You may even want to see if you get any benefit from this:
for k := range mapSource {
wg.Add(1)
go func(k int32) {
if _, ok := mapTarget[k]; ok {
wg.Add(1)
go permutate(mapSource[k], mapTarget[k])
}
wg.Done()
}(k)
}
The best optimization probably involves changing the source and target data structures to begin with so you're don't have to iterate as much, but it's hard to be sure without knowing more about what the underlying problem you're solving is, and how the maps are generated.
However, there is an optimization that should get you a roughly 2x boost (just an educated guess), depending on the exact numbers.
var sources, targets []int32
for k, srcPositions := range mapSource {
if tgtPositions, found := mapTarget[k]; found {
sources = append(sources, srcPositions...)
targets = append(targets, tgtPositions...)
}
}
matches = make([]match, len(sources) * len(targets))
i := 0
for _, s := range(sources) {
for _, t := range(targets) {
matches[i] = match{s, t}
i++
}
}
The general idea is to minimize the amount of copying that has to be done, and improve the locality of memory references. I think this is about the best you can do with this data structure. My hunch is this isn't the best datastructure to begin with for the underlying problem, and there are much bigger gains to be had.
At first I was thinking:
Calculate keys in common in one batch, and calculate the final slice size.
Make slice with capacity that step 1 calculated.
Append one by one.
Then next structure, but it would not generate final result as array, but all the append work would be just link the node.
type node struct {
val int
parent *node
next *node
child *node
}
type tree struct {
root *node
level int
}
var sourceMap map[int]*tree

Saving results from a parallelized goroutine

I am trying to parallelize an operation in golang and save the results in a manner that I can iterate over to sum up afterwords.
I have managed to set up the parameters so that no deadlock occurs, and I have confirmed that the operations are working and being saved correctly within the function. When I iterate over the Slice of my struct and try and sum up the results of the operation, they all remain 0. I have tried passing by reference, with pointers, and with channels (causes deadlock).
I have only found this example for help: https://golang.org/doc/effective_go.html#parallel. But this seems outdated now, as Vector as been deprecated? I also have not found any references to the way this function (in the example) was constructed (with the func (u Vector) before the name). I tried replacing this with a Slice but got compile time errors.
Any help would be very appreciated. Here is the key parts of my code:
type job struct {
a int
b int
result *big.Int
}
func choose(jobs []Job, c chan int) {
temp := new(big.Int)
for _,job := range jobs {
job.result = //perform operation on job.a and job.b
//fmt.Println(job.result)
}
c <- 1
}
func main() {
num := 100 //can be very large (why we need big.Int)
n := num
k := 0
const numCPU = 6 //runtime.NumCPU
count := new(big.Int)
// create a 2d slice of jobs, one for each core
jobs := make([][]Job, numCPU)
for (float64(k) <= math.Ceil(float64(num / 2))) {
// add one job to each core, alternating so that
// job set is similar in difficulty
for i := 0; i < numCPU; i++ {
if !(float64(k) <= math.Ceil(float64(num / 2))) {
break
}
jobs[i] = append(jobs[i], Job{n, k, new(big.Int)})
n -= 1
k += 1
}
}
c := make(chan int, numCPU)
for i := 0; i < numCPU; i++ {
go choose(jobs[i], c)
}
// drain the channel
for i := 0; i < numCPU; i++ {
<-c
}
// computations are done
for i := range jobs {
for _,job := range jobs[i] {
//fmt.Println(job.result)
count.Add(count, job.result)
}
}
fmt.Println(count)
}
Here is the code running on the go playground https://play.golang.org/p/X5IYaG36U-
As long as the []Job slice is only modified by one goroutine at a time, there's no reason you can't modify the job in place.
for i, job := range jobs {
jobs[i].result = temp.Binomial(int64(job.a), int64(job.b))
}
https://play.golang.org/p/CcEGsa1fLh
You should also use a WaitGroup, rather than rely on counting tokens in a channel yourself.

unexpected slice append behaviour

I encountered weird behaviour in go code today: when I append elements to slice in loop and then try to create new slices based on the result of the loop, last append overrides slices from previous appends.
In this particular example it means that sliceFromLoop j,g and h slice's last element are not 100,101 and 102 respectively, but...always 102!
Second example - sliceFromLiteral behaves as expected.
package main
import "fmt"
func create(iterations int) []int {
a := make([]int, 0)
for i := 0; i < iterations; i++ {
a = append(a, i)
}
return a
}
func main() {
sliceFromLoop()
sliceFromLiteral()
}
func sliceFromLoop() {
fmt.Printf("** NOT working as expected: **\n\n")
i := create(11)
fmt.Println("initial slice: ", i)
j := append(i, 100)
g := append(i, 101)
h := append(i, 102)
fmt.Printf("i: %v\nj: %v\ng: %v\nh:%v\n", i, j, g, h)
}
func sliceFromLiteral() {
fmt.Printf("\n\n** working as expected: **\n")
i := []int{0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10}
fmt.Println("initial slice: ", i)
j := append(i, 100)
g := append(i, 101)
h := append(i, 102)
fmt.Printf("i: %v\nj: %v\ng: %v\nh:%v\n", i, j, g, h)
}
link to play.golang:
https://play.golang.org/p/INADVS3Ats
After some reading, digging and experimenting I found that this problem is originated in slices referencing the same underlaying array values and can be solved by copying slice to new one before appending anything, however it looks quite... hesitantly.
What's the idomatic way for creating many new slices based on old ones and not worrying about changing values of old slices?
Don't assign append to anything other than itself.
As you mention in the question, the confusion is due to the fact that append both changes the underlying array and returns a new slice (since the length might be changed). You'd imagine that it copies that backing array, but it doesn't, it just allocates a new slice object that points at it. Since i never changes, all those appends keep changing the value of backingArray[12] to a different number.
Contrast this to appending to an array, which allocates a new literal array every time.
So yes, you need to copy the slice before you can work on it.
func makeFromSlice(sl []int) []int {
result := make([]int, len(sl))
copy(result, sl)
return result
}
func main() {
i := make([]int, 0)
for ii:=0; ii<11; ii++ {
i = append(i, ii)
}
j := append(makeFromSlice(i), 100) // works fine
}
The slice literal behavior is explained because a new array is allocated if the append would exceed the cap of the backing array. This has nothing to do with slice literals and everything to do with the internals of how exceeding the cap works.
a := []int{1,2,3,4,5,6,7}
fmt.Printf("len(a) %d, cap(a) %d\n", a, len(a), cap(a))
// len(a) 7, cap(a) 7
b := make([]int, 0)
for i:=1; i<8, i++ {
b = append(b, i)
} // b := []int{1,2,3,4,5,6,7}
// len(b) 7, cap(b) 8
b = append(b, 1) // any number, just so it hits cap
i := append(b, 100)
j := append(b, 101)
k := append(b, 102) // these work as expected now
If you need a copy of a slice, there's no other way to do it other than, copying the slice. You should almost never assign the result of append to a variable other than the first argument of append. It leads to hard to find bugs, and will behave differently depending on whether the slice has the required capacity or not.
This isn't a commonly needed pattern, but as with all things of this nature if you need to repeate a few lines of code multiple times, then you can use a small helper function:
func copyAndAppend(i []int, vals ...int) []int {
j := make([]int, len(i), len(i)+len(vals))
copy(j, i)
return append(j, vals...)
}
https://play.golang.org/p/J99_xEbaWo
There is also a little bit simpler way to implement copyAndAppend function:
func copyAndAppend(source []string, items ...string) []string {
l := len(source)
return append(source[:l:l], items...)
}
Here we just make sure that source has no available capacity and so copying is forced.

Resources