How to properly prevent a data race? - go

According to the docs doing the following prevents a data race:
var wg sync.WaitGroup
wg.Add(5)
for i := 0; i < 5; i++ {
go func(j int) {
fmt.Println(j) // Good. Read local copy of the loop counter.
wg.Done()
}(i)
}
wg.Wait()
however if I have code such as:
type Job struct {
Work []string
}
var Queue chan Job
func work(id string, type int32) {
job := Job{Work: []string{"A","B","C"}}
go func(j Job, ty int32) {
if ty == 1 {
// do some other task
}
Queue <- j
} (job, type) // RACE ON THIS LINE
if type == 1 {
// do something
}
return
}
work is being called via. gRPC.
the data race detector tells me that the go routine is producing a data race. I am thinking that there is something I am missing here but can't quite seem to find what that is.

Related

How to handle multiple goroutines that share the same channel

I've been searching a lot but could not find an answer for my problem yet.
I need to make multiple calls to an external API, but with different parameters concurrently.
And then for each call I need to init a struct for each dataset and process the data I receive from the API call. Bear in mind that I read each line of the incoming request and start immediately send it to the channel.
First problem I encounter was not obvious at the beginning due to the large quantity of data I'm receiving, is that each goroutine does not receive all the data that goes through the channel. (Which I learned by the research I've made). So what I need is a way of requeuing/redirect that data to the correct goroutine.
The function that sends the streamed response from a single dataset.
(I've cut useless parts of code that are out of context)
func (api *API) RequestData(ctx context.Context, c chan DWeatherResponse, dataset string, wg *sync.WaitGroup) error {
for {
line, err := reader.ReadBytes('\n')
s := string(line)
if err != nil {
log.Println("End of %s", dataset)
return err
}
data, err := extractDataFromStreamLine(s, dataset)
if err != nil {
continue
}
c <- *data
}
}
The function that will process the incoming data
func (s *StrikeStruct) Process(ch, requeue chan dweather.DWeatherResponse) {
for {
data, more := <-ch
if !more {
break
}
// data contains {dataset string, value float64, date time.Time}
// The s.Parameter needs to match the dataset
// IMPORTANT PART, checks if the received data is part of this struct dataset
// If not I want to send it to another go routine until it gets to the correct
one. There will be a max of 4 datasets but still this could not be the best approach to have
if !api.GetDataset(s.Parameter, data.Dataset) {
requeue <- data
continue
}
// Do stuff with the data from this point
}
}
Now on my own API endpoint I have the following:
ch := make(chan dweather.DWeatherResponse, 2)
requeue := make(chan dweather.DWeatherResponse)
final := make(chan strike.StrikePerYearResponse)
var wg sync.WaitGroup
for _, s := range args.Parameters.Strikes {
strike := strike.StrikePerYear{
Parameter: strike.Parameter(s.Dataset),
StrikeValue: s.Value,
}
// I receive and process the data in here
go strike.ProcessStrikePerYear(ch, requeue, final, string(s.Dataset))
}
go func() {
for {
data, _ := <-requeue
ch <- data
}
}()
// Creates a goroutine for each dataset
for _, dataset := range api.Params.Dataset {
wg.Add(1)
go api.RequestData(ctx, ch, dataset, &wg)
}
wg.Wait()
close(ch)
//Once the data is all processed it is all appended
var strikes []strike.StrikePerYearResponse
for range args.Fetch.Datasets {
strikes = append(strikes, <-final)
}
return strikes
The issue with this code is that as soon as I start receiving data from more than one endpoint the requeue will block and nothing more happens. If I remove that requeue logic data will be lost if it does not land on the correct goroutine.
My two questions are:
Why is the requeue blocking if it has a goroutine always ready to receive?
Should I take a different approach on how I'm processing the incoming data?
this is not a good way to solving your problem. you should change your solution. I suggest an implementation like the below:
import (
"fmt"
"sync"
)
// answer for https://stackoverflow.com/questions/68454226/how-to-handle-multiple-goroutines-that-share-the-same-channel
var (
finalResult = make(chan string)
)
// IData use for message dispatcher that all struct must implement its method
type IData interface {
IsThisForMe() bool
Process(*sync.WaitGroup)
}
//MainData can be your main struct like StrikePerYear
type MainData struct {
// add any props
Id int
Name string
}
type DataTyp1 struct {
MainData *MainData
}
func (d DataTyp1) IsThisForMe() bool {
// you can check your condition here to checking incoming data
if d.MainData.Id == 2 {
return true
}
return false
}
func (d DataTyp1) Process(wg *sync.WaitGroup) {
d.MainData.Name = "processed by DataTyp1"
// send result to final channel, you can change it as you want
finalResult <- d.MainData.Name
wg.Done()
}
type DataTyp2 struct {
MainData *MainData
}
func (d DataTyp2) IsThisForMe() bool {
// you can check your condition here to checking incoming data
if d.MainData.Id == 3 {
return true
}
return false
}
func (d DataTyp2) Process(wg *sync.WaitGroup) {
d.MainData.Name = "processed by DataTyp2"
// send result to final channel, you can change it as you want
finalResult <- d.MainData.Name
wg.Done()
}
//dispatcher will run new go routine for each request.
//you can implement a worker pool to preventing running too many go routines.
func dispatcher(incomingData *MainData, wg *sync.WaitGroup) {
// based on your requirements you can remove this go routing or not
go func() {
var p IData
p = DataTyp1{incomingData}
if p.IsThisForMe() {
go p.Process(wg)
return
}
p = DataTyp2{incomingData}
if p.IsThisForMe() {
go p.Process(wg)
return
}
}()
}
func main() {
dummyDataArray := []MainData{
MainData{Id: 2, Name: "this data #2"},
MainData{Id: 3, Name: "this data #3"},
}
wg := sync.WaitGroup{}
for i := range dummyDataArray {
wg.Add(1)
dispatcher(&dummyDataArray[i], &wg)
}
result := make([]string, 0)
done := make(chan struct{})
// data collector
go func() {
loop:for {
select {
case <-done:
break loop
case r := <-finalResult:
result = append(result, r)
}
}
}()
wg.Wait()
done<- struct{}{}
for _, s := range result {
fmt.Println(s)
}
}
Note: this is just for opening your mind for finding a better solution, and for sure this is not a production-ready code.

Watch for changes in a queue containing struct

I have two goroutines:
first one adds task to queue
second cleans up from the queue based on status
Add and cleanup might not be simultaneous.
If the status of task is success, I want to delete the task from the queue, if not, I will retry for status to be success (will have time limit). If that fails, I will log and delete from queue.
We can't communicate between add and delete because that is not how the real world scenario works.
I want something like a watcher which monitors addition in queue and does the following cleanup. To increase complexity, Add might be adding even during cleanup is happening (not shown here). I want to implement it without using external packages.
How can I achieve this?
type Task struct {
name string
status string //completed, failed
}
var list []*Task
func main() {
done := make(chan bool)
go Add()
time.Sleep(15)
go clean(done)
<-done
}
func Add() {
t1 := &Task{"test1", "completed"}
t2 := &Task{"test2", "failed"}
list = append(list, t1, t2)
}
func clean() {
for k, v := range list {
if v.status == "completed" {
RemoveIndex(list, k)
} else {
//for now consider this as retry
v.status == "completed"
}
if len(list) > 0 {
clean()
}
<-done
}
}
func RemoveIndex(s []int, index int) []int {
return append(s[:index], s[index+1:]...)
}
so i found a solution which works for me and posting it here for anyone it might be helpful for.
in my main i have added a ticker which runs every x seconds to watch if something is added in the queue.
type Task struct {
name string
status string //completed, failed
}
var list []*Task
func main() {
done := make(chan bool)
c := make(chan os.Signal, 2)
go Add()
go func() {
for {
select {
// case <-done:
// Cleaner(k)
case <-ticker.C:
Monitor(done)
}
}
}()
signal.Notify(c, os.Interrupt, syscall.SIGTERM)
<-c
//waiting for interrupt here
}
func Add() {
t1 := &Task{"test1", "completed"}
t2 := &Task{"test2", "failed"}
list = append(list, t1, t2)
}
func Monitor(done chan bool) {
if len(list) > 0 {
Cleaner()
}
}
func cleaner(){
//do cleaning here
// pop each element from queue and delete
}
func RemoveIndex(s []int, index int) []int {
return append(s[:index], s[index+1:]...)
}
so now this solution does not need to depend on communication between go routines,
in a real world scenario, the programme never dies and keeps adding and cleaning based on use case.you can optimize better by locking and unlocking before addition to queue and deletion from queue.

Check if someone has read from go channel

How we can set something like listener on go channels that when someone has read something from the channel, that notify us?
Imagine we have a sequence number for channel entries and we wanna decrement it when someone had read a value from our channel somewhere out of our package.
Unbuffered channels hand off data synchronously, so you already know when the data is read. Buffered channels work similarly when the buffer is full, but otherwise they don't block the same, so this approach wouldn't tell you quite the same thing. Depending on what your needs really are, consider also using tools like sync.WaitGroup.
ch = make(chan Data)
⋮
for {
⋮
// make data available
ch <- data
// now you know it was read
sequenceNumber--
⋮
}
You could create a channel relay mechanism, to capture read events in realtime.
So for example:
func relayer(in <-chan MyStruct) <-chan MyStruct {
out := make(chan MyStruct) // non-buffered chan (see below)
go func() {
defer close(out)
readCountLimit := 10
for item := range in {
out <- item
// ^^^^ so this will block until some worker has read from 'out'
readCountLimit--
}
}()
return out
}
Usage:
type MyStruct struct {
// put your data fields here
}
ch := make(chan MyStruct) // <- original channel - used by producer to write to
rch := relayer(ch) // <- relay channel - used to read from
// consumers
go worker("worker 1", rch)
go worker("worker 2", rch)
// producer
for { ch <- MyStruct{} }
You can do it in manual mode. implement some sort of ACK marker to the message.
Something like this:
type Msg struct {
Data int
ack bool
}
func (m *Msg) Ack() {
m.ack = true
}
func (m *Msg) Acked() bool {
return m.ack
}
func main() {
ch := make(chan *Msg)
msg := &Msg{Data: 1}
go func() {
for {
if msg.Acked() {
// do smth
}
time.Sleep(10 * time.Second)
}
}()
ch <- msg
for msg := range ch {
msg.Ack()
}
}
Code not tested.
You can also add some additional information to Ack() method, say meta information about package and func, from where Ack() was called, this answer may be related: https://stackoverflow.com/a/35213181/3782382

Mutex write locking of a channel value

I have a channel of thousands of IDs that need to be processed in parallel inside goroutines. How could I implement a lock so that goroutines cannot process the same id at the same time, should it be repeated in the channel?
package main
import (
"fmt"
"sync"
"strconv"
"time"
)
var wg sync.WaitGroup
func main() {
var data []string
for d := 0; d < 30; d++ {
data = append(data, "id1")
data = append(data, "id2")
data = append(data, "id3")
}
chanData := createChan(data)
for i := 0; i < 10; i++ {
wg.Add(1)
process(chanData, i)
}
wg.Wait()
}
func createChan(data []string) <-chan string {
var out = make(chan string)
go func() {
for _, val := range data {
out <- val
}
close(out)
}()
return out
}
func process(ids <-chan string, i int) {
go func() {
defer wg.Done()
for id := range ids {
fmt.Println(id + " (goroutine " + strconv.Itoa(i) + ")")
time.Sleep(1 * time.Second)
}
}()
}
--edit:
All values need to be processed in any order, but "id1, "id2" & "id3" need to block so they cannot be processed by more than one goroutine at the same time.
The simplest solution here is to not send the duplicate values at all, and then no synchronization is required.
func createChan(data []string) <-chan string {
seen := make(map[string]bool)
var out = make(chan string)
go func() {
for _, val := range data {
if seen[val] {
continue
}
seen[val] = true
out <- val
}
close(out)
}()
return out
}
I've found a solution. Someone has written a package (github.com/EagleChen/mapmutex) to do exactly what I needed:
package main
import (
"fmt"
"github.com/EagleChen/mapmutex"
"strconv"
"sync"
"time"
)
var wg sync.WaitGroup
var mutex *mapmutex.Mutex
func main() {
mutex = mapmutex.NewMapMutex()
var data []string
for d := 0; d < 30; d++ {
data = append(data, "id1")
data = append(data, "id2")
data = append(data, "id3")
}
chanData := createChan(data)
for i := 0; i < 10; i++ {
wg.Add(1)
process(chanData, i)
}
wg.Wait()
}
func createChan(data []string) <-chan string {
var out = make(chan string)
go func() {
for _, val := range data {
out <- val
}
close(out)
}()
return out
}
func process(ids <-chan string, i int) {
go func() {
defer wg.Done()
for id := range ids {
if mutex.TryLock(id) {
fmt.Println(id + " (goroutine " + strconv.Itoa(i) + ")")
time.Sleep(1 * time.Second)
mutex.Unlock(id)
}
}
}()
}
Your problem as stated is difficult by definition and my first choice would be to re-architect the application to avoid it, but if that's not an option:
First, I assume that if a given ID is repeated you still want it processed twice, but not in parallel (if that's not the case and the 2nd instance must be ignored, it becomes even more difficult, because you have to remember every ID you have processed forever, so you don't run the task over it twice).
To achieve your goal, you must keep track of every ID that is being acted upon in a goroutine - a go map is your best option here (note that its size will grow up to as many goroutines as you spin in parallel!). The map itself must be protected by a lock, as it is modified from multiple goroutines.
Another simplification that I'd take is that it is OK for an ID removed from the channel to be added back to it if found to be currently processed by another gorotuine. Then, we need map[string]bool as the tracking device, plus a sync.Mutex to guard it. For simplicity, I assume the map, mutex and the channel are global variables; but that may not be convenient for you - arrange access to those as you see fit (arguments to the goroutine, closure, etc.).
import "sync"
var idmap map[string]bool
var mtx sync.Mutex
var queue chan string
func process_one_id(id string) {
busy := false
mtx.Lock()
if idmap[id] {
busy = true
} else {
idmap[id] = true
}
mtx.Unlock()
if busy { // put the ID back on the queue and exit
queue <- id
return
}
// ensure the 'busy' mark is cleared at the end:
defer func() { mtx.Lock(); delete(idmap, id); mtx.Unlock() }()
// do your processing here
// ....
}

Dealing with slices concurrently is not working as expected without mutexes

Functions WithMutex and WithoutMutex are giving different results.
WithoutMutex implementation is losing values even though I have Waitgroup set up.
What could be wrong?
Do not run on Playground
P.S. I am on Windows 10 and Go 1.8.1
package main
import (
"fmt"
"sync"
)
var p = fmt.Println
type MuType struct {
list []int
*sync.RWMutex
}
var muData *MuType
var data *NonMuType
type NonMuType struct {
list []int
}
func (data *MuType) add(i int, wg *sync.WaitGroup) {
data.Lock()
defer data.Unlock()
data.list = append(data.list, i)
wg.Done()
}
func (data *MuType) read() []int {
data.RLock()
defer data.RUnlock()
return data.list
}
func (nonmu *NonMuType) add(i int, wg *sync.WaitGroup) {
nonmu.list = append(nonmu.list, i)
wg.Done()
}
func (nonmu *NonMuType) read() []int {
return nonmu.list
}
func WithoutMutex() {
nonmu := &NonMuType{}
nonmu.list = make([]int, 0)
var wg = sync.WaitGroup{}
for i := 0; i < 10; i++ {
wg.Add(1)
go nonmu.add(i, &wg)
}
wg.Wait()
data = nonmu
p(data.read())
}
func WithMutex() {
mtx := &sync.RWMutex{}
withMu := &MuType{list: make([]int, 0)}
withMu.RWMutex = mtx
var wg = sync.WaitGroup{}
for i := 0; i < 10; i++ {
wg.Add(1)
go withMu.add(i, &wg)
}
wg.Wait()
muData = withMu
p(muData.read())
}
func stressTestWOMU(max int) {
p("Without Mutex")
for ii := 0; ii < max; ii++ {
WithoutMutex()
}
}
func stressTest(max int) {
p("With Mutex")
for ii := 0; ii < max; ii++ {
WithMutex()
}
}
func main() {
stressTestWOMU(20)
stressTest(20)
}
Slices are not safe for concurrent writes, so I am in no way surprised that WithoutMutex does not appear to be consistent at all, and has dropped items.
The WithMutex version consistently has 10 items, but in jumbled orders. This is also to be expected, since the mutex protects it so that only one can append at a time. There is no guarantee as to which goroutine will run in which order though, so it is a race to see which of the rapidly spawned goroutines will get to append first.
The waitgroup does not do anything to control access or enforce ordering. It merely provides a signal at the end that everything is done.

Resources