I am trying to scrape related words for a given word for which I am using BFS starting with the given word and searching through each related word on dictionary.com
I have tried this code without concurrency and it works just fine, but takes a lot of time hence, tried using go routines but my code gets stuck after the first iteration. The first level of BFS works just fine but then in the second level it hangs!
package main
import (
"fmt"
"github.com/gocolly/colly"
"sync"
)
var wg sync.WaitGroup
func buildURL(word string) string {
return "https://www.dictionary.com/browse/" + string(word)
}
func get(url string) []string {
c := colly.NewCollector()
c.IgnoreRobotsTxt = true
var ret []string
c.OnHTML("a.css-cilpq1.e15p0a5t2", func(e *colly.HTMLElement) {
ret = append(ret, string(e.Text))
})
c.Visit(url)
c.Wait()
return ret
}
func threading(c chan []string, word string) {
defer wg.Done()
var words []string
for _, w := range get(buildURL(word)) {
words = append(words, w)
}
c <- words
}
func main() {
fmt.Println("START")
word := "jump"
maxDepth := 2
//bfs
var q map[string]int
nq := map[string]int {
word: 0,
}
vis := make(map[string]bool)
queue := make(chan []string, 5000)
for i := 1; i <= maxDepth; i++ {
fmt.Println(i)
q, nq = nq, make(map[string]int)
for word := range q {
if _, ok := vis[word]; !ok {
wg.Add(1)
vis[word] = true
go threading(queue, word)
for v := range queue {
fmt.Println(v)
for _, w := range v {
nq[w] = i
}
}
}
}
}
wg.Wait()
close(queue)
fmt.Println("END")
}
OUTPUT:
START
1
[plunge dive rise upsurge bounce hurdle fall vault drop advance upturn inflation increment spurt boost plummet skip bound surge take]
hangs just here forever, counter = 2 is not printed!
Can check here https://www.dictionary.com/browse/jump for the related words.
According to Tour of Go
Sends to a buffered channel block only when the buffer is full.
Receives block when the buffer is empty.
So, in this case, you are creating a buffered channel using 5000 as length.
for i := 1; i <= maxDepth; i++ {
fmt.Println(i)
q, nq = nq, make(map[string]int)
for word := range q { // for each word
if _, ok := vis[word]; !ok { // if not visited visit
wg.Add(1) // add a worker
vis[word] = true
go threading(queue, word) // fetch in concurrent manner
for v := range queue { // <<< blocks here when queue is empty
fmt.Println(v)
for _, w := range v {
nq[w] = i
}
}
}
}
}
As you can see I've commented in the code, after 1st iteration the for loop gonna block until channel is empty. In this case after fetching jump It sends the array corresponding similar words, but after that as the for loop is blocking as zerkems explains you will not get to next iteration(i = 2). You can ultimately close the channel to end the blocking in for loop. But since you use the same channel to write over multiple goroutines it will panic if you closed it from multiple goroutines.
To overcome this we can come up with a nice workaround.
We exactly know how much unvisited items we are fetching for.
We now know where is the block
First, we need to count the unvisited words and then we can iterate that much of the time
vis := make(map[string]bool)
queue := make(chan []string, 5000)
for i := 1; i <= maxDepth; i++ {
fmt.Println(i)
q, nq = nq, make(map[string]int)
unvisited := 0
for word := range q {
if _, ok := vis[word]; !ok {
vis[word] = true
unvisited++
wg.Add(1)
go threading(queue, word)
}
}
wg.Wait() // wait until jobs are done
for j := 0; j < unvisited; j++ { // << does not block as we know how much
v := <-queue // we exactly try to get unvisited amount
fmt.Println(v)
for _, w := range v {
nq[w] = i
}
}
}
In this situation, we are simply counting what is the minimum iterations we need to go to get results. Also, you can see that I've moved down the for loop outer and use original one to just add words to workers. It will ask to fetch all words and will wait in the following loop to complete there tasks in a non-blocking way.
Latter loop waits until all workers are done. After that next iteration, works and next level of BFS can be reached.
Summary
Distribute workload
Wait for results
Don't do both at the same time
Hope this helps.
Related
The code below starts a few workers. Each worker receives a value via a channel which is added to a map where the key is the worker ID and value is the number received. Finally, when I add all the values received, I should get an expected result (in this case 55 because that is what you get when you add from 1..10). In most cases, I am not seeing the expected output. What am I doing wrong here? I do not want to solve it by adding a sleep. I would like to identify the issue programmatically and fix it.
type counter struct {
value int
count int
}
var data map[string]counter
var lock sync.Mutex
func adder(wid string, n int) {
defer lock.Unlock()
lock.Lock()
d := data[wid]
d.count++
d.value += n
data[wid] = d
return
}
func main() {
fmt.Println(os.Getpid())
data = make(map[string]counter)
c := make(chan int)
for w := 1; w <= 3; w++ { //starting 3 workers here
go func(wid string) {
data[wid] = counter{}
for {
v, k := <-c
if !k {
continue
}
adder(wid, v)
}
}(strconv.Itoa(w)) // worker is given an ID
}
time.Sleep(1 * time.Second) // If this is not added, only one goroutine is recorded.
for i := 1; i <= 10; i++ {
c <- i
}
close(c)
total := 0
for i, v := range data {
fmt.Println(i, v)
total += v.value
}
fmt.Println(total)
}
Your code has two significant races:
The initialization of data[wid] = counter{} is not synchronized with other goroutines that may be reading and rewriting data.
The worker goroutines do not signal when they are done modifying data, which means your main goroutine may read data before they finish writing.
You also have a strange construct:
for {
v, k := <-c
if !k {
continue
}
adder(wid, v)
}
k will only be false when the channel c is closed, after which the goroutine spins as much as it can. This would be better written as for v := range c.
To fix the reading code in the main goroutine, we'll use the more normal for ... range c idiom and add a sync.WaitGroup, and have each worker invoke Done() on the wait-group. The main goroutine will then wait for them to finish. To fix the initialization, we'll lock the map (there are other ways to do this, e.g., to set up the map before starting any of the goroutines, or to rely on the fact that empty map slots read as zero, but this one is straightforward). I also took out the extra debug. The result is this code, also available on the Go Playground.
package main
import (
"fmt"
// "os"
"strconv"
"sync"
// "time"
)
type counter struct {
value int
count int
}
var data map[string]counter
var lock sync.Mutex
var wg sync.WaitGroup
func adder(wid string, n int) {
defer lock.Unlock()
lock.Lock()
d := data[wid]
d.count++
d.value += n
data[wid] = d
}
func main() {
// fmt.Println(os.Getpid())
data = make(map[string]counter)
c := make(chan int)
for w := 1; w <= 3; w++ { //starting 3 workers here
wg.Add(1)
go func(wid string) {
lock.Lock()
data[wid] = counter{}
lock.Unlock()
for v := range c {
adder(wid, v)
}
wg.Done()
}(strconv.Itoa(w)) // worker is given an ID
}
for i := 1; i <= 10; i++ {
c <- i
}
close(c)
wg.Wait()
total := 0
for i, v := range data {
fmt.Println(i, v)
total += v.value
}
fmt.Println(total)
}
(This can be improved easily, e.g., there's no reason for wg to be global.)
Well, I like #torek's answer but I wanted to post this answer as it contains a bunch of improvements:
Reduce the usage of locks (For such simple tasks, avoid locks. If you benchmark it, you'll notice a good difference because my code uses the lock only numworkers times).
Improve the naming of variables.
Remove usage of global vars (Use of global vars should always be as minimum as possible).
The following code adds a number from minWork to maxWork using numWorker spawned goroutines.
package main
import (
"fmt"
"sync"
)
const (
bufferSize = 1 // Buffer for numChan
numworkers = 3 // Number of workers doing addition
minWork = 1 // Sum from [minWork] (inclusive)
maxWork = 10000000 // Sum upto [maxWork] (inclusive)
)
// worker stats
type worker struct {
workCount int // Number of times, worker worked
workDone int // Amount of work done; numbers added
}
// workerMap holds a map for worker(s)
type workerMap struct {
mu sync.Mutex // Guards m for safe, concurrent r/w
m map[int]worker // Map to hold worker id to worker mapping
}
func main() {
var (
totalWorkDone int // Total Work Done
wm workerMap // WorkerMap
wg sync.WaitGroup // WaitGroup
numChan = make(chan int, bufferSize) // Channel for nums
)
wm.m = make(map[int]worker, numworkers)
for wid := 0; wid < numworkers; wid++ {
wg.Add(1)
go func(id int) {
var wk worker
// Wait for numbers
for n := range numChan {
wk.workCount++
wk.workDone += n
}
// Fill worker stats
wm.mu.Lock()
wm.m[id] = wk
wm.mu.Unlock()
wg.Done()
}(wid)
}
// Send numbers for addition by multiple workers
for i := minWork; i <= maxWork; i++ {
numChan <- i
}
// Close the channel
close(numChan)
// Wait for goroutines to finish
wg.Wait()
// Print stats
for k, v := range wm.m {
fmt.Printf("WorkerID: %d; Work: %+v\n", k, v)
totalWorkDone += v.workDone
}
// Print total work done by all workers
fmt.Printf("Work Done: %d\n", totalWorkDone)
}
I am practicing golang by doing coding problems on LeetCode. I am trying to solve a simple sudoku puzzle (it just validates the board). No rows with same digits, no columns with same digits, no 3x3 blocks with same digits. I am trying to use concurrency for learning Go Routines/Channels/Etc...
I can't get the waitgroup to finalize
import (
"sync"
"fmt"
)
func isValidSlice(slice []byte, results chan<- bool, wg *sync.WaitGroup) {
fmt.Println(slice)
seen := make(map[byte]bool)
for _,val := range(slice) {
if seen[val] {
if val != '.'{
results <- false
defer wg.Done()
return
}
} else {
seen[val] = true
}
}
results <- true
defer wg.Done()
}
func isValidSudoku(board [][]byte) bool {
// Channel to receive solution
c := make(chan bool)
// Number of routines that will run (9 for rows, 9 for cols, 9 for 3x3 blocks)
var wg sync.WaitGroup
// Check every row
for x:= 0; x < 9; x++{
wg.Add(1)
go isValidSlice(append([]byte{}, board[x]...), c, &wg)
}
for y:= 0; y < 9; y++{
wg.Add(1)
go isValidSlice(append([]byte{}, board[0:9][y]...), c, &wg)
}
// Check every 3x3 block
for x:= 0; x <= 6; x += 3{
for y := 0; y <= 6; y += 3{
block_digits := append([]byte{}, board[x][y:y+3]...)
block_digits = append(block_digits, board[x+1][y:y+3]...)
block_digits = append(block_digits, board[x+2][y:y+3]...)
wg.Add(1)
go isValidSlice(block_digits, c, &wg)
}
}
fmt.Println("got here")
wg.Wait()
fmt.Println("never got here")
for result := range c{
if !result{
return false
}
}
return true
}
I'm expecting the wg.Wait() lock to release and code to move forward. Then I'm expecting to one of the results in the channel to be false and return false if so. Otherwise, after all elements in the channel are traverse and no false was found, I would expect a True.
Your goroutines cannot call wg.Done() because they all wait to add their value in the channel. But since you only consume the values from the channel after wg.Wait(), all goroutines but one never get to call wg.Done().
You actually don't need a WaitGroup, just remove it.
Other comments:
You should move defer wg.Done() to the first line of isValidSlice. Calling defer on the last line of your function does not make much sense.
You only need a WaitGroup if you want to close the channel properly, you can do that in an additional goroutine, see below for an example on how to do it.
func isValidSudoku(board [][]byte) bool {
// ...
fmt.Println("got here")
go func(){
wg.Wait()
close(c)
}()
fmt.Println("never got here")
for result := range c{
if !result{
go func(){
for _ := range c {
}
}()
return false
}
}
return true
}
I am trying to create a word counter that returns an array of the number of times each word in a text file appears. Moreover, I have been assigned to parallelize this program.
My initial attempt at this task was as follows
Implementation 1
func WordCount(words []string, startWord int, endWord int, waitGroup *sync.WaitGroup, freqsChannel chan<- map[string]int) {
freqs := make(map[string]int)
for i := startWord; i < endWord; i++ {
word := words[i]
freqs[word]++
}
freqsChannel <- freqs
waitGroup.Done()
}
func ParallelWordCount(text string) map[string]int {
// Split text into string array of the words in text.
text = strings.ToLower(text)
text = strings.ReplaceAll(text, ",", "")
text = strings.ReplaceAll(text, ".", "")
words := strings.Fields(text)
length := len(words)
threads := 28
freqsChannel := make(chan map[string]int, threads)
var waitGroup sync.WaitGroup
waitGroup.Add(threads)
defer waitGroup.Wait()
wordsPerThread := length / threads // always rounds down
wordsInLastThread := length - (threads-1)*wordsPerThread
startWord := -wordsPerThread
endWord := 0
for i := 1; i <= threads; i++ {
if i < threads {
startWord += wordsPerThread
endWord += wordsPerThread
} else {
startWord += wordsInLastThread
endWord += wordsInLastThread
}
go WordCount(words, startWord, endWord, &waitGroup, freqsChannel)
}
freqs := <-freqsChannel
for i := 1; i < threads; i++ {
subFreqs := <-freqsChannel
for word, count := range subFreqs {
freqs[word] += count
}
}
return freqs
}
According to my teaching assistant, this was not a good solution as the pre-processing of the text file carried out by
text = strings.ToLower(text)
text = strings.ReplaceAll(text, ",", "")
text = strings.ReplaceAll(text, ".", "")
words := strings.Fields(text)
in ParallelWordCount goes against the idea of parallel processing.
Now, to fix this, I have moved the responsibility of processing the text file into an array of words into the the WordCount function that is called on separate goroutines for different parts of the text file. Below is the code for my second implementation.
Implementation 2
func WordCount(text string, waitGroup *sync.WaitGroup, freqsChannel chan<- map[string]int) {
freqs := make(map[string]int)
text = strings.ToLower(text)
text = strings.ReplaceAll(text, ",", "")
text = strings.ReplaceAll(text, ".", "")
words := strings.Fields(text)
for _, value := range words {
freqs[value]++
}
freqsChannel <- freqs
waitGroup.Done()
}
func splitCount(str string, subStrings int, waitGroup *sync.WaitGroup, freqsChannel chan<- map[string]int) {
if subStrings != 1 {
length := len(str)
charsPerSubstring := length / subStrings
i := 0
for str[charsPerSubstring+i] != ' ' {
i++
}
subString := str[0 : charsPerSubstring+i+1]
go WordCount(subString, waitGroup, freqsChannel)
splitCount(str[charsPerSubstring+i+1:length], subStrings-1, waitGroup, freqsChannel)
} else {
go WordCount(str, waitGroup, freqsChannel)
}
}
func ParallelWordCount(text string) map[string]int {
threads := 28
freqsChannel := make(chan map[string]int, threads)
var waitGroup sync.WaitGroup
waitGroup.Add(threads)
defer waitGroup.Wait()
splitCount(text, threads, &waitGroup, freqsChannel)
// Collect and return frequences
freqs := <-freqsChannel
for i := 1; i < threads; i++ {
subFreqs := <-freqsChannel
for word, count := range subFreqs {
freqs[word] += count
}
}
return freqs
}
The average runtime of this implementation is 3 ms compared to the old average of 5 ms, but have I thoroughly addressed the issue raised by my teaching assistant or does the second implementation also not take full advantage of parallel processing to efficiently count the words of text file?
Two things that I see:
Second example is better as you have split the text parsing and word counting into several goroutines. One thing you can try is to not count words in WordCount method, but just push them to the channel and increment them in the main counter. You can check if that is any faster, I'm not sure. Also, check the fan-in pattern for more details.
Parallel processing might still not be fully utilized, because I
don't believe you have 28 CPU cores available :). Number of cores is determining how many WordCount goroutines are working in parrallel, the rest of them will be distributed concurrently base on available resources (available CPU cores). Here is a great article explaining this.
Implementation 2 Issues
In method splitCount(), what if the total length of the string is less than 28. Still, it will call wordcount() equal to the number of words.
Also, it will fail as we are doing waitgroup.done 28 times in that scenario.
Recursively calling splitWord() is making it slow. We should split and call in a loop
The number of threads should not be 28 always as we don't know how many words are there in the string.
Will try to develop a more optimised approach and will update the answer.
I am trying to familiarize with go routines. I have written the following simple program to store the squares of numbers from 1-10 in a map.
func main() {
squares := make(map[int]int)
var wg sync.WaitGroup
for i := 1; i <= 10; i++ {
go func(n int, s map[int]int) {
s[n] = n * n
}(i, squares)
}
wg.Wait()
fmt.Println("Squares::: ", squares)
}
At the end, it prints an empty map. But in go, maps are passed by references. Why is it printing an empty map?
As pointed out in the comments, you need to synchronize access to the map and your usage of sync.WaitGroup is incorrect.
Try this instead:
func main() {
squares := make(map[int]int)
var lock sync.Mutex
var wg sync.WaitGroup
for i := 1; i <= 10; i++ {
wg.Add(1) // Increment the wait group count
go func(n int, s map[int]int) {
lock.Lock() // Lock the map
s[n] = n * n
lock.Unlock()
wg.Done() // Decrement the wait group count
}(i, squares)
}
wg.Wait()
fmt.Println("Squares::: ", squares)
}
sync.Map is what you are actually looking for, modified the code to suit your usecase here,
https://play.golang.org/p/DPLHiMsH5R8
P.S. Had to add some sleep so that the program does not finish before all the go routines are called.
Say, we have three methods to implement "fan in" behavior
func MakeChannel(tries int) chan int {
ch := make(chan int)
go func() {
for i := 0; i < tries; i++ {
ch <- i
}
close(ch)
}()
return ch
}
func MergeByReflection(channels ...chan int) chan int {
length := len(channels)
out := make(chan int)
cases := make([]reflect.SelectCase, length)
for i, ch := range channels {
cases[i] = reflect.SelectCase{Dir: reflect.SelectRecv, Chan: reflect.ValueOf(ch)}
}
go func() {
for length > 0 {
i, line, opened := reflect.Select(cases)
if !opened {
cases[i].Chan = reflect.ValueOf(nil)
length -= 1
} else {
out <- int(line.Int())
}
}
close(out)
}()
return out
}
func MergeByCode(channels ...chan int) chan int {
length := len(channels)
out := make(chan int)
go func() {
var i int
var ok bool
for length > 0 {
select {
case i, ok = <-channels[0]:
out <- i
if !ok {
channels[0] = nil
length -= 1
}
case i, ok = <-channels[1]:
out <- i
if !ok {
channels[1] = nil
length -= 1
}
case i, ok = <-channels[2]:
out <- i
if !ok {
channels[2] = nil
length -= 1
}
case i, ok = <-channels[3]:
out <- i
if !ok {
channels[3] = nil
length -= 1
}
case i, ok = <-channels[4]:
out <- i
if !ok {
channels[4] = nil
length -= 1
}
}
}
close(out)
}()
return out
}
func MergeByGoRoutines(channels ...chan int) chan int {
var group sync.WaitGroup
out := make(chan int)
for _, ch := range channels {
go func(ch chan int) {
for i := range ch {
out <- i
}
group.Done()
}(ch)
}
group.Add(len(channels))
go func() {
group.Wait()
close(out)
}()
return out
}
type MergeFn func(...chan int) chan int
func main() {
length := 5
tries := 1000000
channels := make([]chan int, length)
fns := []MergeFn{MergeByReflection, MergeByCode, MergeByGoRoutines}
for _, fn := range fns {
sum := 0
t := time.Now()
for i := 0; i < length; i++ {
channels[i] = MakeChannel(tries)
}
for i := range fn(channels...) {
sum += i
}
fmt.Println(time.Since(t))
fmt.Println(sum)
}
}
Results are (at 1 CPU, I have used runtime.GOMAXPROCS(1)):
19.869s (MergeByReflection)
2499997500000
8.483s (MergeByCode)
2499997500000
4.977s (MergeByGoRoutines)
2499997500000
Results are (at 2 CPU, I have used runtime.GOMAXPROCS(2)):
44.94s
2499997500000
10.853s
2499997500000
3.728s
2499997500000
I understand the reason why MergeByReflection is slowest, but what is about the difference between MergeByCode and MergeByGoRoutines?
And when we increase the CPU number why "select" clause (used MergeByReflection directly and in MergeByCode indirectly) becomes slower?
Here is a preliminary remark. The channels in your examples are all unbuffered, meaning they will likely block at put or get time.
In this example, there is almost no processing except channel management. The performance is therefore dominated by synchronization primitives. Actually, there is very little of this code that can be parallelized.
In the MergeByReflection and MergeByCode functions, select is used to listen to multiple input channels, but nothing is done to take in account the output channel (which may therefore block, while some event could be available on one of the input channels).
In the MergeByGoRoutines function, this situation cannot happen: when the output channel blocks, it does not prevent an other input channel to be read by another goroutine. There are therefore better opportunities for the runtime to parallelize the goroutines, and less contention on the input channels.
The MergeByReflection code is the slowest because it has the overhead of reflection, and almost nothing can be parallelized.
The MergeByGoRoutines function is the fastest because it reduces the contention (less synchronization is needed), and because output contention has a lesser impact on the input performance. It can therefore benefit of a small improvement when running with multiple cores (contrary to the two other methods).
There is so much synchronization activity with MergeByReflection and MergeByCode, that running on multiple cores negatively impacts the performance. You could have different performance by using buffered channels though.