Shared memory across go routines - go

The following test in Go fails:
type A struct {
b bool
}
func TestWG(t *testing.T) {
var wg sync.WaitGroup
a := update(&wg)
wg.Wait()
if !a.b {
t.Errorf("error")
}
}
// Does not work
func update(group *sync.WaitGroup) A {
a := A{
b : false,
}
group.Add(1)
go func() {
a.b = true
group.Done()
}()
return a
}
Initially I thought this is might be happening due to the absence of barrier in waitGroup.Done(), which might explain why changing update to
// works
func update(group *sync.WaitGroup) A {
a := A{
b : false,
}
group.Add(1)
go func() {
a.b = true
group.Done()
}()
time.Sleep(1*time.Second)
return a
}
works. But then changing return type to pointer also makes it work
// works
func update(group *sync.WaitGroup) *A {
a := A{
b : false,
}
group.Add(1)
go func() {
a.b = true
group.Done()
}()
return &a
}
Can someone tell me what is happening here?

Your first example has a data race!
You return a which needs to read it, and a concurrent goroutine (which you just launched) writes it without synchronization. So the output is undefined! go test -race also confirms this.
Same thing goes when you add a sleep: the data race remains (time.Sleep() is not a synchronization tool), so the result is still undefined! go test -race confirms this again.
When you change the return type to pointer, then you just return a pointer to a which does not involve reading the value of a, and the launched goroutine does not modify the pointer, just the pointed value. And the caller TestWG() properly waits until its done using the waitgroup, so no data race occurs here.

Related

How to handle multiple goroutines that share the same channel

I've been searching a lot but could not find an answer for my problem yet.
I need to make multiple calls to an external API, but with different parameters concurrently.
And then for each call I need to init a struct for each dataset and process the data I receive from the API call. Bear in mind that I read each line of the incoming request and start immediately send it to the channel.
First problem I encounter was not obvious at the beginning due to the large quantity of data I'm receiving, is that each goroutine does not receive all the data that goes through the channel. (Which I learned by the research I've made). So what I need is a way of requeuing/redirect that data to the correct goroutine.
The function that sends the streamed response from a single dataset.
(I've cut useless parts of code that are out of context)
func (api *API) RequestData(ctx context.Context, c chan DWeatherResponse, dataset string, wg *sync.WaitGroup) error {
for {
line, err := reader.ReadBytes('\n')
s := string(line)
if err != nil {
log.Println("End of %s", dataset)
return err
}
data, err := extractDataFromStreamLine(s, dataset)
if err != nil {
continue
}
c <- *data
}
}
The function that will process the incoming data
func (s *StrikeStruct) Process(ch, requeue chan dweather.DWeatherResponse) {
for {
data, more := <-ch
if !more {
break
}
// data contains {dataset string, value float64, date time.Time}
// The s.Parameter needs to match the dataset
// IMPORTANT PART, checks if the received data is part of this struct dataset
// If not I want to send it to another go routine until it gets to the correct
one. There will be a max of 4 datasets but still this could not be the best approach to have
if !api.GetDataset(s.Parameter, data.Dataset) {
requeue <- data
continue
}
// Do stuff with the data from this point
}
}
Now on my own API endpoint I have the following:
ch := make(chan dweather.DWeatherResponse, 2)
requeue := make(chan dweather.DWeatherResponse)
final := make(chan strike.StrikePerYearResponse)
var wg sync.WaitGroup
for _, s := range args.Parameters.Strikes {
strike := strike.StrikePerYear{
Parameter: strike.Parameter(s.Dataset),
StrikeValue: s.Value,
}
// I receive and process the data in here
go strike.ProcessStrikePerYear(ch, requeue, final, string(s.Dataset))
}
go func() {
for {
data, _ := <-requeue
ch <- data
}
}()
// Creates a goroutine for each dataset
for _, dataset := range api.Params.Dataset {
wg.Add(1)
go api.RequestData(ctx, ch, dataset, &wg)
}
wg.Wait()
close(ch)
//Once the data is all processed it is all appended
var strikes []strike.StrikePerYearResponse
for range args.Fetch.Datasets {
strikes = append(strikes, <-final)
}
return strikes
The issue with this code is that as soon as I start receiving data from more than one endpoint the requeue will block and nothing more happens. If I remove that requeue logic data will be lost if it does not land on the correct goroutine.
My two questions are:
Why is the requeue blocking if it has a goroutine always ready to receive?
Should I take a different approach on how I'm processing the incoming data?
this is not a good way to solving your problem. you should change your solution. I suggest an implementation like the below:
import (
"fmt"
"sync"
)
// answer for https://stackoverflow.com/questions/68454226/how-to-handle-multiple-goroutines-that-share-the-same-channel
var (
finalResult = make(chan string)
)
// IData use for message dispatcher that all struct must implement its method
type IData interface {
IsThisForMe() bool
Process(*sync.WaitGroup)
}
//MainData can be your main struct like StrikePerYear
type MainData struct {
// add any props
Id int
Name string
}
type DataTyp1 struct {
MainData *MainData
}
func (d DataTyp1) IsThisForMe() bool {
// you can check your condition here to checking incoming data
if d.MainData.Id == 2 {
return true
}
return false
}
func (d DataTyp1) Process(wg *sync.WaitGroup) {
d.MainData.Name = "processed by DataTyp1"
// send result to final channel, you can change it as you want
finalResult <- d.MainData.Name
wg.Done()
}
type DataTyp2 struct {
MainData *MainData
}
func (d DataTyp2) IsThisForMe() bool {
// you can check your condition here to checking incoming data
if d.MainData.Id == 3 {
return true
}
return false
}
func (d DataTyp2) Process(wg *sync.WaitGroup) {
d.MainData.Name = "processed by DataTyp2"
// send result to final channel, you can change it as you want
finalResult <- d.MainData.Name
wg.Done()
}
//dispatcher will run new go routine for each request.
//you can implement a worker pool to preventing running too many go routines.
func dispatcher(incomingData *MainData, wg *sync.WaitGroup) {
// based on your requirements you can remove this go routing or not
go func() {
var p IData
p = DataTyp1{incomingData}
if p.IsThisForMe() {
go p.Process(wg)
return
}
p = DataTyp2{incomingData}
if p.IsThisForMe() {
go p.Process(wg)
return
}
}()
}
func main() {
dummyDataArray := []MainData{
MainData{Id: 2, Name: "this data #2"},
MainData{Id: 3, Name: "this data #3"},
}
wg := sync.WaitGroup{}
for i := range dummyDataArray {
wg.Add(1)
dispatcher(&dummyDataArray[i], &wg)
}
result := make([]string, 0)
done := make(chan struct{})
// data collector
go func() {
loop:for {
select {
case <-done:
break loop
case r := <-finalResult:
result = append(result, r)
}
}
}()
wg.Wait()
done<- struct{}{}
for _, s := range result {
fmt.Println(s)
}
}
Note: this is just for opening your mind for finding a better solution, and for sure this is not a production-ready code.

Race condition. Cannot figure out why

A Race Condition occures when i'am running my code. It is a simple implementation of a concurrent safe storage. The Race Condition disappears when i change the reciever in get() method to (p *storageType). I'm confused. I need someone who could explain to me such a behaivior.
package main
type storageType struct {
fc chan func()
value int
}
func newStorage() *storageType {
p := storageType{
fc: make(chan func()),
}
go p.run()
return &p
}
func (p storageType) run() {
for {
(<-p.fc)()
}
}
func (p *storageType) set(s int) {
p.fc <- func() {
p.value = s
}
}
func (p storageType) get() int {
res := make(chan int)
p.fc <- func() {
res <- p.value
}
return <-res
}
func main() {
storage := newStorage()
for i := 0; i < 1000; i++ {
go storage.set(i)
go storage.get()
}
}
In main() the storage variable is of type *storageType. If storageType.Get() has value receiver, then storage.get() means (*storage).get().
The get() call has storageType as the reciver, so the storage pointer variable has to be dereferenced to make a copy (that will be used as the receiver value). This copying means the value of the pointed storageType struct must be read. But this read is not synchronized with the run() method which reads and writes the struct (its value field).
If you change the receiver of get() to be a pointer (of type *storageType), then the receiver again will be a copy, but this time it will be a copy of the pointer, not the pointed struct. So no unsynchronized read of the struct happens.
See possible duplicate: Why does the method of a struct that does not read/write its contents still cause a race case?
First one: your main function doesn't wait for all goroutines to finish. All goroutines are forced to return when main does.
Look into using a sync.WaitGroup

Why does this OrDone Channel Implementation receive twice from Done Channel?

Been reading through Concurrency in Go and it introduces a handy "or-done" channel.
TLDR; when you're working with a channel that you're not in control of (presumably from some other part of your system), the code can get a little ugly.
// Quite nice to read
for v := range myChan {
... Do Stuff
}
// Not so nice
loop:
for {
select {
case <- done:
break loop
case maybeVal, ok := <-myChan:
if !ok {
return
}
// Do something with maybeVal
}
}
The book offers a way to simplify this with the OrDone channel. Defined as follows. What I don't understand is why, in the nested select, we need to again receive from <-done.
orDone := func(done, c <-chan interface{}) <-chan interface{} {
valStream := make(chan interface{})
go func() {
defer close(valStream)
for {
select {
case <- done:
return
case v, ok := <-c:
if !ok {
return
}
select {
case valStream <- v:
case <-done: // Why do we also need to receive on done here?
}
}
}
}
return valStream
}
This allows you to go back to your original for loop, enhancing readability - like so:
for val := range orDone(done, myChan) {
// Once again, do something
}
Really just adding visibility to Peter's answer.
It's because valStream itself may block the send if whoever is receiving valStream loses interest.
In a situation where you have received a value from channel c you enter into your nested select{}. At this point if we took out the second <-done you would have this;
select {
case valStream: <- v:
}
This will block indefinitely until a value is received from channel v even if the done channel is closed. By adding a nested check, we allow ourselves to exit the select{} at either point.
An additional question, why is the body of that "case <-done:" empty?
orDone := func(done, c <-chan interface{}) <-chan interface{} {
valStream := make(chan interface{})
go func() {
defer close(valStream)
for {
select {
case <- done:
return
case v, ok := <-c:
if !ok {
return
}
select {
case valStream <- v:
case <-done:
return // I think there should be a "return" here
}
}
}
}
return valStream
}

Check if someone has read from go channel

How we can set something like listener on go channels that when someone has read something from the channel, that notify us?
Imagine we have a sequence number for channel entries and we wanna decrement it when someone had read a value from our channel somewhere out of our package.
Unbuffered channels hand off data synchronously, so you already know when the data is read. Buffered channels work similarly when the buffer is full, but otherwise they don't block the same, so this approach wouldn't tell you quite the same thing. Depending on what your needs really are, consider also using tools like sync.WaitGroup.
ch = make(chan Data)
⋮
for {
⋮
// make data available
ch <- data
// now you know it was read
sequenceNumber--
⋮
}
You could create a channel relay mechanism, to capture read events in realtime.
So for example:
func relayer(in <-chan MyStruct) <-chan MyStruct {
out := make(chan MyStruct) // non-buffered chan (see below)
go func() {
defer close(out)
readCountLimit := 10
for item := range in {
out <- item
// ^^^^ so this will block until some worker has read from 'out'
readCountLimit--
}
}()
return out
}
Usage:
type MyStruct struct {
// put your data fields here
}
ch := make(chan MyStruct) // <- original channel - used by producer to write to
rch := relayer(ch) // <- relay channel - used to read from
// consumers
go worker("worker 1", rch)
go worker("worker 2", rch)
// producer
for { ch <- MyStruct{} }
You can do it in manual mode. implement some sort of ACK marker to the message.
Something like this:
type Msg struct {
Data int
ack bool
}
func (m *Msg) Ack() {
m.ack = true
}
func (m *Msg) Acked() bool {
return m.ack
}
func main() {
ch := make(chan *Msg)
msg := &Msg{Data: 1}
go func() {
for {
if msg.Acked() {
// do smth
}
time.Sleep(10 * time.Second)
}
}()
ch <- msg
for msg := range ch {
msg.Ack()
}
}
Code not tested.
You can also add some additional information to Ack() method, say meta information about package and func, from where Ack() was called, this answer may be related: https://stackoverflow.com/a/35213181/3782382

What happens if you read a channel to nothing?

Can you read a goroutine channel into nothing? where does this channel read go to in this statement?
go func() {
<-ctx.Done()
logger.Errorf("canceled: %v", ctx.Err())
}()
Addition:
Would this code be any different than if I used the blank identifier
go func() {
_ = <-ctx.Done()
logger.Errorf("canceled: %v", ctx.Err())
}()
Yes, you can do that.
It does not need to go anywhere.
Purpose of the Done channel usually is just to signal the done event, so the value is not relevant and can be ignored.
It is the same as when you call a function and don't assign the return values to variables.
Consider this:
func getInt() int {
return 1
}
func main() {
getInt() // does not "go anywhere"
}
See this playground showing those examples:
https://play.golang.org/p/CA8P7gYpok

Resources