Lack of data using golang channel - go

I encountered a strange problem. The script is below.
package main
import (
"fmt"
"sync"
)
type Data struct {
data []int
}
func main() {
ws := 5
ch := make(chan *Data, ws)
var wg sync.WaitGroup
for i := 0; i < ws; i++ {
wg.Add(1)
go func(wg *sync.WaitGroup, ch chan *Data) {
defer wg.Done()
for {
char, ok := <-ch
if !ok {
return
}
fmt.Printf("Get: %d\n", len(char.data))
}
}(&wg, ch)
}
var d Data
ar := []int{1}
for i := 0; i < ws; i++ {
d.data = []int{}
for j := 0; j < 1000; j++ {
d.data = append(d.data, ar[0])
}
ch <- &d
// time.Sleep(time.Second / 1000) // When this line is moved, a number of data by put and get becomes same.
fmt.Printf("Put: %d\n", len(d.data))
}
close(ch)
wg.Wait()
}
This is run, a following result is expected. The number of data for "Put" and "Get" is same.
Put: 1000
Get: 1000
Put: 1000
Get: 1000
Put: 1000
Get: 1000
Put: 1000
Get: 1000
Put: 1000
Get: 1000
But, this result cannot be got every time. The results are below. The number of data of "Put" and "Get" is different for every time.
Try 1
Put: 1000
Get: 1000
Put: 1000
Get: 1000
Put: 1000
Get: 1000
Put: 1000
Get: 1000
Put: 1000
Get: 1000
Try 2
Put: 1000
Get: 1000
Put: 1000
Get: 1000
Put: 1000
Get: 1000
Put: 1000
Get: 16
Put: 1000
Get: 0
Try 3
Get: 1000
Put: 1000
Put: 1000
Get: 1
Put: 1000
Get: 1000
Put: 1000
Get: 1
Put: 1000
Get: 1000
ALthough on my PC, the number of data of "Put" and "Get" is different for every time, at play.golang.org, the number of both data is always same. https://play.golang.org/p/QFSuZmZk7d Why?
If time.Sleep(time.Second / 1000) is used in the script, the number of both data becomes same. If you know about this problem, will you please teach me. Thank you so much for your time.

What you observe is an example of "data race".
It happens when you concurrently access the same piece of data (with at least one of those being a write).
You put a reference to the same structure every time. And what may happen next is one of few possibilities:
it was read on the other side of the channel before you mutated it (the "expected" scenario)
you started mutating it before it was read. In this case the receiver may read any number of Data.data items, from 0 to 1000, depending on when exactly the read happened.
There are multiple solutions for the problem:
You may create the new instance of Data every iteration. For that simply move the var d Data declaration inside the loop body. In this case every iteration a new structure is created, so you may not mutate the previous one by mistake.
You may declare channel of Data (the structures, not pointers to a structure): chan Data. In this case the Data instance is implicitly copied every time you send it to the channel (since everything in Go is passed by value, copied on assignment).

package main
import (
"fmt"
"sync"
)
/*
信号量的考察,put 之后,必须等待 get 拿到之后才能推出循环
*/
type Data struct {
data []int
}
func main() {
ws := 5
ch := make(chan *Data, ws)
sem := make(chan bool)
var wg sync.WaitGroup
for i := 0; i < ws; i++ {
wg.Add(1)
go func(wg *sync.WaitGroup, ch chan *Data) {
defer wg.Done()
for {
char, ok := <-ch
if !ok {
return
}
fmt.Printf("Get: %d\n", len(char.data))
sem <- true
}
}(&wg, ch)
}
var d Data
ar := []int{1}
// ws = 5
for i := 0; i < ws; i++ {
d.data = []int{}
for j := 0; j < 1000; j++ {
d.data = append(d.data, ar[0])
}
ch <- &d
fmt.Printf("Put: %d\n", len(d.data))
<-sem // 一个信号量,必须等待 get 完成之后才能继续put
}
close(ch)
wg.Wait()
}

Related

How to properly delay between executing a pool of workers

Good day,
I'm trying to implement the correct delay between the execution of workers, for example, it is necessary for the workers to complete 30 tasks and go to sleep for 5 seconds, how can I track in the code that exactly 30 tasks have been completed and only after that go to sleep for 5 seconds?
Below is the code that creates a pool of 30 workers, who, in turn, perform tasks of 30 pieces at a time in an unordered manner, here is the code:
import (
"fmt"
"math/rand"
"sync"
"time"
)
type Job struct {
id int
randomno int
}
type Result struct {
job Job
sumofdigits int
}
var jobs = make(chan Job, 10)
var results = make(chan Result, 10)
func digits(number int) int {
sum := 0
no := number
for no != 0 {
digit := no % 10
sum += digit
no /= 10
}
time.Sleep(2 * time.Second)
return sum
}
func worker(wg *sync.WaitGroup) {
for job := range jobs {
output := Result{job, digits(job.randomno)}
results <- output
}
wg.Done()
}
func createWorkerPool(noOfWorkers int) {
var wg sync.WaitGroup
for i := 0; i < noOfWorkers; i++ {
wg.Add(1)
go worker(&wg)
}
wg.Wait()
close(results)
}
func allocate(noOfJobs int) {
for i := 0; i < noOfJobs; i++ {
if i != 0 && i%30 == 0 {
fmt.Printf("SLEEPAGE 5 sec...")
time.Sleep(10 * time.Second)
}
randomno := rand.Intn(999)
job := Job{i, randomno}
jobs <- job
}
close(jobs)
}
func result(done chan bool) {
for result := range results {
fmt.Printf("Job id %d, input random no %d , sum of digits %d\n", result.job.id, result.job.randomno, result.sumofdigits)
}
done <- true
}
func main() {
startTime := time.Now()
noOfJobs := 100
go allocate(noOfJobs)
done := make(chan bool)
go result(done)
noOfWorkers := 30
createWorkerPool(noOfWorkers)
<-done
endTime := time.Now()
diff := endTime.Sub(startTime)
fmt.Println("total time taken ", diff.Seconds(), "seconds")
}
play: https://go.dev/play/p/lehl7hoo-kp
It is not clear exactly how to make sure that 30 tasks are completed and where to insert the delay, I will be grateful for any help
Okey, so let's start with this working example:
func Test_t(t *testing.T) {
// just a published, this publishes result on a chan
publish := func(s int, ch chan int, wg *sync.WaitGroup) {
ch <- s // this is blocking!!!
wg.Done()
}
wg := &sync.WaitGroup{}
wg.Add(100)
// we'll use done channel to notify the work is done
res := make(chan int)
done := make(chan struct{})
// create worker that will notify that all results were published
go func() {
wg.Wait()
done <- struct{}{}
}()
// let's create a jobs that publish on our res chan
// please note all goroutines are created immediately
for i := 0; i < 100; i++ {
go publish(i, res, wg)
}
// lets get 30 args and then wait
var resCounter int
forloop:
for {
select {
case ss := <-res:
println(ss)
resCounter += 1
// break the loop
if resCounter%30 == 0 {
// after receiving 30 results we are blocking this thread
// no more results will be taken from the channel for 5 seconds
println("received 30 results, waiting...")
time.Sleep(5 * time.Second)
}
case <-done:
// we are done here, let's break this infinite loop
break forloop
}
}
}
I hope this shows moreover how it can be done.
So, what's the problem with your code?
To be honest, it looks fine (I mean 30 results are published, then the code wait, then another 30 results, etc.), but the question is where would you like to wait?
There are a few possibilities I guess:
creating workers (this is how your code works now, as I see, it publishes jobs in 30-packs; please notice that the 2-second delay you have in the digit function is applicable only to the goroutine the code is executed)
triggering workers (so the "wait" code should be in worker function, not allowing to run more workers - so it must watch how many results were published)
handling results (this is how my code works and proper synchronization is in the forloop)

Concurrent Bubble sort in golang

Can someone explain to me how the goroutine works in the following code, I wrote it btw.
When I do BubbleSortVanilla, it takes roughly 15s for a list of size 100000
When I do BubbleSortOdd followed by BubbleSortEven using the odd even phase, it takes roughly 7s. But when I just do ConcurrentBubbleSort it only takes roughly 1.4s.
Can't really understand why the single ConcurrentBubbleSort is better?
Is it cause of the overhead in creating the two threads and its also processing the
same or well half the length of the list?
I tried profiling the code but am not really sure how to see how many threads are being created or the memory usage of each thread etc
package main
import (
"fmt"
"math/rand"
"sync"
"time"
)
func BubbleSortVanilla(intList []int) {
for i := 0; i < len(intList)-1; i += 1 {
if intList[i] > intList[i+1] {
intList[i], intList[i+1] = intList[i+1], intList[i]
}
}
}
func BubbleSortOdd(intList []int, wg *sync.WaitGroup, c chan []int) {
for i := 1; i < len(intList)-2; i += 2 {
if intList[i] > intList[i+1] {
intList[i], intList[i+1] = intList[i+1], intList[i]
}
}
wg.Done()
}
func BubbleSortEven(intList []int, wg *sync.WaitGroup, c chan []int) {
for i := 0; i < len(intList)-1; i += 2 {
if intList[i] > intList[i+1] {
intList[i], intList[i+1] = intList[i+1], intList[i]
}
}
wg.Done()
}
func ConcurrentBubbleSort(intList []int, wg *sync.WaitGroup, c chan []int) {
for i := 0; i < len(intList)-1; i += 1 {
if intList[i] > intList[i+1] {
intList[i], intList[i+1] = intList[i+1], intList[i]
}
}
wg.Done()
}
func main() {
// defer profile.Start(profile.MemProfile).Stop()
rand.Seed(time.Now().Unix())
intList := rand.Perm(100000)
fmt.Println("Read a sequence of", len(intList), "elements")
c := make(chan []int, len(intList))
var wg sync.WaitGroup
start := time.Now()
for j := 0; j < len(intList)-1; j++ {
// BubbleSortVanilla(intList) // takes roughly 15s
// wg.Add(2)
// go BubbleSortOdd(intList, &wg, c) // takes roughly 7s
// go BubbleSortEven(intList, &wg, c)
wg.Add(1)
go ConcurrentBubbleSort(intList, &wg, c) // takes roughly 1.4s
}
wg.Wait()
elapsed := time.Since(start)
// Print the sorted integers
fmt.Println("Sorted List: ", len(intList), "in", elapsed)
}
Your code is not working at all. ConcurrentBubbleSort and BubbleSortOdd + BubbleSortEven will cause the data race. Try to run your code with go run -race main.go. Because of data race, data of array will be incorrect after sort, and they are not sorted neither.
Why it is slow? I guess it is because of data race, and there are too many go routines which are causing the data race.
The Thread Analyzer detects data-races that occur during the execution
of a multi-threaded process. A data race occurs when:
two or more threads in a single process access the same memory
location concurrently, and
at least one of the accesses is for writing, and
the threads are not using any exclusive locks to control their
accesses to that memory.

Values sent over channel are not seen as received

The code below starts a few workers. Each worker receives a value via a channel which is added to a map where the key is the worker ID and value is the number received. Finally, when I add all the values received, I should get an expected result (in this case 55 because that is what you get when you add from 1..10). In most cases, I am not seeing the expected output. What am I doing wrong here? I do not want to solve it by adding a sleep. I would like to identify the issue programmatically and fix it.
type counter struct {
value int
count int
}
var data map[string]counter
var lock sync.Mutex
func adder(wid string, n int) {
defer lock.Unlock()
lock.Lock()
d := data[wid]
d.count++
d.value += n
data[wid] = d
return
}
func main() {
fmt.Println(os.Getpid())
data = make(map[string]counter)
c := make(chan int)
for w := 1; w <= 3; w++ { //starting 3 workers here
go func(wid string) {
data[wid] = counter{}
for {
v, k := <-c
if !k {
continue
}
adder(wid, v)
}
}(strconv.Itoa(w)) // worker is given an ID
}
time.Sleep(1 * time.Second) // If this is not added, only one goroutine is recorded.
for i := 1; i <= 10; i++ {
c <- i
}
close(c)
total := 0
for i, v := range data {
fmt.Println(i, v)
total += v.value
}
fmt.Println(total)
}
Your code has two significant races:
The initialization of data[wid] = counter{} is not synchronized with other goroutines that may be reading and rewriting data.
The worker goroutines do not signal when they are done modifying data, which means your main goroutine may read data before they finish writing.
You also have a strange construct:
for {
v, k := <-c
if !k {
continue
}
adder(wid, v)
}
k will only be false when the channel c is closed, after which the goroutine spins as much as it can. This would be better written as for v := range c.
To fix the reading code in the main goroutine, we'll use the more normal for ... range c idiom and add a sync.WaitGroup, and have each worker invoke Done() on the wait-group. The main goroutine will then wait for them to finish. To fix the initialization, we'll lock the map (there are other ways to do this, e.g., to set up the map before starting any of the goroutines, or to rely on the fact that empty map slots read as zero, but this one is straightforward). I also took out the extra debug. The result is this code, also available on the Go Playground.
package main
import (
"fmt"
// "os"
"strconv"
"sync"
// "time"
)
type counter struct {
value int
count int
}
var data map[string]counter
var lock sync.Mutex
var wg sync.WaitGroup
func adder(wid string, n int) {
defer lock.Unlock()
lock.Lock()
d := data[wid]
d.count++
d.value += n
data[wid] = d
}
func main() {
// fmt.Println(os.Getpid())
data = make(map[string]counter)
c := make(chan int)
for w := 1; w <= 3; w++ { //starting 3 workers here
wg.Add(1)
go func(wid string) {
lock.Lock()
data[wid] = counter{}
lock.Unlock()
for v := range c {
adder(wid, v)
}
wg.Done()
}(strconv.Itoa(w)) // worker is given an ID
}
for i := 1; i <= 10; i++ {
c <- i
}
close(c)
wg.Wait()
total := 0
for i, v := range data {
fmt.Println(i, v)
total += v.value
}
fmt.Println(total)
}
(This can be improved easily, e.g., there's no reason for wg to be global.)
Well, I like #torek's answer but I wanted to post this answer as it contains a bunch of improvements:
Reduce the usage of locks (For such simple tasks, avoid locks. If you benchmark it, you'll notice a good difference because my code uses the lock only numworkers times).
Improve the naming of variables.
Remove usage of global vars (Use of global vars should always be as minimum as possible).
The following code adds a number from minWork to maxWork using numWorker spawned goroutines.
package main
import (
"fmt"
"sync"
)
const (
bufferSize = 1 // Buffer for numChan
numworkers = 3 // Number of workers doing addition
minWork = 1 // Sum from [minWork] (inclusive)
maxWork = 10000000 // Sum upto [maxWork] (inclusive)
)
// worker stats
type worker struct {
workCount int // Number of times, worker worked
workDone int // Amount of work done; numbers added
}
// workerMap holds a map for worker(s)
type workerMap struct {
mu sync.Mutex // Guards m for safe, concurrent r/w
m map[int]worker // Map to hold worker id to worker mapping
}
func main() {
var (
totalWorkDone int // Total Work Done
wm workerMap // WorkerMap
wg sync.WaitGroup // WaitGroup
numChan = make(chan int, bufferSize) // Channel for nums
)
wm.m = make(map[int]worker, numworkers)
for wid := 0; wid < numworkers; wid++ {
wg.Add(1)
go func(id int) {
var wk worker
// Wait for numbers
for n := range numChan {
wk.workCount++
wk.workDone += n
}
// Fill worker stats
wm.mu.Lock()
wm.m[id] = wk
wm.mu.Unlock()
wg.Done()
}(wid)
}
// Send numbers for addition by multiple workers
for i := minWork; i <= maxWork; i++ {
numChan <- i
}
// Close the channel
close(numChan)
// Wait for goroutines to finish
wg.Wait()
// Print stats
for k, v := range wm.m {
fmt.Printf("WorkerID: %d; Work: %+v\n", k, v)
totalWorkDone += v.workDone
}
// Print total work done by all workers
fmt.Printf("Work Done: %d\n", totalWorkDone)
}

Getting deadlock as I try to emulate fan in - fan out with factorial calculations

I am trying the fan in - fan out pattern with a factorial problem. But I am getting:
fatal error: all goroutines are asleep - deadlock!
and unable to identify the reason for deadlock.
I am trying to concurrently calculate factorial for 100 numbers using the fan-in fan-out pattern.
package main
import (
"fmt"
)
func main() {
_inChannel := _inListener(generator())
for val := range _inChannel {
fmt.Print(val, " -- ")
}
}
func generator() chan int { // NEED TO CALCULATE FACTORIAL FOR 100 NUMBERS
ch := make(chan int) // CREATE CHANNEL TO INPUT NUMBERS
go func() {
for i := 1; i <= 100; i++ {
ch <- i
}
close(ch) // CLOSE CHANNEL WHEN ALL NUMBERS HAVE BEEN WRITTEM
}()
return ch
}
func _inListener(ch chan int) chan int {
rec := make(chan int) // CHANNEL RECEIVED FROM GENERATOR
go func() {
for num := range ch { // RECEIVE THE INPUT NUMBERS FROM GENERATOR
result := factorial(num) // RESULT IS A NEW CHANNEL CREATED
rec <- <-result // MERGE INTO A SINGLE CHANNEL; rec
close(result)
}
close(rec)
}()
return rec // RETURN THE DEDICATED CHANNEL TO RECEIVE ALL OUTPUTS
}
func factorial(n int) chan int {
ch := make(chan int) // MAKE A NEW CHANNEL TO OUTPUT THE RESULT
// OF FACTORIAL
total := 1
for i := n; i > 0; i-- {
total *= i
}
ch <- total
return ch // RETURN THE CHANNEL HAVING THE FACTORIAL CALCULATED
}
I have put in comments, so that it becomes easier to follow the code.
I'm no expert in channels. I've taking on this to try and get more familiar with go.
Another issue is the int isn't large enough to take all factorials over 20 or so.
As you can see, I added a defer close as well as a logical channel called done in the generator func. The rest of the changes probably aren't needed. With channels you need to make sure something is ready to take off a value on the channel when you put something on a channel. Otherwise deadlock. Also, using
go run -race main.go
helps at least see which line(s) are causing problems.
I hope this helps and isn't removed for being off topic.
I was able to remove the deadlock by doing this:
package main
import (
"fmt"
)
func main() {
_gen := generator()
_inChannel := _inListener(_gen)
for val := range _inChannel {
fmt.Print(val, " -- \n")
}
}
func generator() chan int { // NEED TO CALCULATE FACTORIAL FOR 100 NUMBERS
ch := make(chan int) // CREATE CHANNEL TO INPUT NUMBERS
done := make(chan bool)
go func() {
defer close(ch)
for i := 1; i <= 100; i++ {
ch <- i
}
//close(ch) // CLOSE CHANNEL WHEN ALL NUMBERS HAVE BEEN WRITTEM
done <- true
}()
// this function will pull off the done for each function call above.
go func() {
for i := 1; i < 100; i++ {
<-done
}
}()
return ch
}
func _inListener(ch chan int) chan int {
rec := make(chan int) // CHANNEL RECEIVED FROM GENERATOR
go func() {
for num := range ch { // RECEIVE THE INPUT NUMBERS FROM GENERATOR
result := factorial(num) // RESULT IS A NEW CHANNEL CREATED
rec <- result // MERGE INTO A SINGLE CHANNEL; rec
}
close(rec)
}()
return rec // RETURN THE DEDICATED CHANNEL TO RECEIVE ALL OUTPUTS
}
func factorial(n int) int {
// OF FACTORIAL
total := 1
for i := n; i > 0; i-- {
total *= i
}
return total // RETURN THE CHANNEL HAVING THE FACTORIAL CALCULATED
}

Go: channel many slow API queries into single SQL transaction

I wonder what would be idiomatic way to do as following.
I have N slow API queries, and one database connection, I want to have a buffered channel, where responses will come, and one database transaction which I will use to write data.
I could only come up with semaphore thing as following makeup example:
func myFunc(){
//10 concurrent API calls
sem := make(chan bool, 10)
//A concurrent safe map as buffer
var myMap MyConcurrentMap
for i:=0;i<N;i++{
sem<-true
go func(i int){
defer func(){<-sem}()
resp:=slowAPICall(fmt.Sprintf("http://slow-api.me?%d",i))
myMap.Put(resp)
}(i)
}
for j=0;j<cap(sem);j++{
sem<-true
}
tx,_ := db.Begin()
for data:=range myMap{
tx.Exec("Insert data into database")
}
tx.Commit()
}
I am nearly sure there is simpler, cleaner and more proper solution, but it is seems complicated to grasp for me.
EDIT:
Well, I come with following solution, this way I do not need the buffer map, so once data comes to resp channel the data is printed or can be used to insert into a database, it works, I am still not sure if everything OK, at last there are no race.
package main
import (
"fmt"
"math/rand"
"sync"
"time"
)
//Gloab waitGroup
var wg sync.WaitGroup
func init() {
//just for fun sake, make rand seeded
rand.Seed(time.Now().UnixNano())
}
//Emulate a slow API call
func verySlowAPI(id int) int {
n := rand.Intn(5)
time.Sleep(time.Duration(n) * time.Second)
return n
}
func main() {
//Amount of tasks
N := 100
//Concurrency level
concur := 10
//Channel for tasks
tasks := make(chan int, N)
//Channel for responses
resp := make(chan int, 10)
//10 concurrent groutinezs
wg.Add(concur)
for i := 1; i <= concur; i++ {
go worker(tasks, resp)
}
//Add tasks
for i := 0; i < N; i++ {
tasks <- i
}
//Collect data from goroutiens
for i := 0; i < N; i++ {
fmt.Printf("%d\n", <-resp)
}
//close the tasks channel
close(tasks)
//wait till finish
wg.Wait()
}
func worker(task chan int, resp chan<- int) {
defer wg.Done()
for {
task, ok := <-task
if !ok {
return
}
n := verySlowAPI(task)
resp <- n
}
}
There's no need to use channels for a semaphore, sync.WaitGroup was made for waiting for a set of routines to complete.
If you're using the channel to limit throughput, you're better off with a worker pool, and using the channel to pass jobs to the workers:
type job struct {
i int
}
func myFunc(N int) {
// Adjust as needed for total number of tasks
work := make(chan job, 10)
// res being whatever type slowAPICall returns
results := make(chan res, 10)
resBuff := make([]res, 0, N)
wg := new(sync.WaitGroup)
// 10 concurrent API calls
for i = 0; i < 10; i++ {
wg.Add(1)
go func() {
for j := range work {
resp := slowAPICall(fmt.Sprintf("http://slow-api.me?%d", j.i))
results <- resp
}
wg.Done()
}()
}
go func() {
for r := range results {
resBuff = append(resBuff, r)
}
}
for i = 0; i < N; i++ {
work <- job{i}
}
close(work)
wg.Wait()
close(results)
}
Maybe this will work for you. Now you can get rid of your concurrent map. Here is a code snippet:
func myFunc() {
//10 concurrent API calls
sem := make(chan bool, 10)
respCh := make(chan YOUR_RESP_TYPE, 10)
var responses []YOUR_RESP_TYPE
for i := 0; i < N; i++ {
sem <- true
go func(i int) {
defer func() {
<-sem
}()
resp := slowAPICall(fmt.Sprintf("http://slow-api.me?%d",i))
respCh <- resp
}(i)
}
respCollected := make(chan struct{})
go func() {
for i := 0; i < N; i++ {
responses = append(responses, <-respCh)
}
close(respCollected)
}()
<-respCollected
tx,_ := db.Begin()
for _, data := range responses {
tx.Exec("Insert data into database")
}
tx.Commit()
}
Than we need to use one more goroutine that will collect all responses in some slice or map from a response channel.

Resources