Can channels created in a for loop be interchangeably used by subroutines running concurrently from that for loop?
Pseudocode is below:
for i := range Map {
channel := make(chan my_type, buff_size)
go subroutine(Map[i], channel)
}
func subroutine(name valueType, channel channelType) {
// Stuff here
}
Is there a way were subroutine(Map[0]) for example, could access another channel created during another iteration of the for loop, i.e., channel of subroutine(Map[1])?
Context: I'm currently working on a project where I have to simulate different populations of cells. Each cell has the ability to divide, differentiate, etc. To replicate the real system, the different populations have to run concurrently with one another. The issue is that I have to insert/remove a cell of a specific population type while working on a different population, this is where the channels come into play. I was thinking of running the populations concurrently each of them having an associated channel. And this is why I'm asking if we can use channels created in different for loop iterations.
Here is some of my code to support my context description with comments explaining the different elements. I've included channels, but I have no idea if it works. I'm hoping it helps with understanding what I'm trying to do:
func main() {
// Where envMap is a map[int]Population and Population is a struct
envMap := initialiseEnvironment(envSetupInfo)
// General simulation loop
for i := 0; i < 10; i++ {
// Loop to go through all envMap elements in parallel
for eachKey := range envMap {
channel := make(chan type)
go simulateEnv(envMap[eachKey])
}
}
}
func simulateEnv(cellPopulation Population, channel chan) {
// Each Population has a map[int]Cell where Cell is a struct with cell properties
cellMap := cellPopulation.cellMap
for eachCell := range cellMap {
go divisionTransition(eachCell, channel)
}
}
Assuming each element of the map is a struct, you could make a field to hold references to other channels. For example, if you wanted each element of your map to have a reference to the previous channel, you could do something like this:
// assumes that Map[whatever] is a struct with a "lastChan" field
var lastRef = chan my_type
for i := range Map {
channel := make(chan my_type, buff_size)
if(lastRef != nil){
Map[i].lastChan = lastRef
}
go subroutine(Map[i], channel)
lastRef = channel
}
func subroutine(name valueType, channel channelType) {
//stuff here can access the previous channel with name.lastChan
}
Depending on what you want to do and which channels you need access to, you may wish to play around with the looping or even do mutliple loops.
Related
My experience working with Go is recent and in reviewing some code, I have seen that while it is write-protected, there is a problem with reading the data. Not with the reading itself, but with possible modifications that can occur between the reading and the modification of the slice.
type ConcurrentSlice struct {
sync.RWMutex
items []Item
}
type Item struct {
Index int
Value Info
}
type Info struct {
Name string
Labels map[string]string
Failure bool
}
As mentioned, the writing is protected in this way:
func (cs *ConcurrentSlice) UpdateOrAppend(item ScalingInfo) {
found := false
i := 0
for inList := range cs.Iter() {
if item.Name == inList.Value.Name{
cs.items[i] = item
found = true
}
i++
}
if !found {
cs.Lock()
defer cs.Unlock()
cs.items = append(cs.items, item)
}
}
func (cs *ConcurrentSlice) Iter() <-chan ConcurrentSliceItem {
c := make(chan ConcurrentSliceItem)
f := func() {
cs.Lock()
defer cs.Unlock()
for index, value := range cs.items {
c <- ConcurrentSliceItem{index, value}
}
close(c)
}
go f()
return c
}
But between collecting the content of the slice and modifying it, modifications can occur.It may be that another routine modifies the same slice and when it is time to assign a value, it no longer exists: slice[i] = item
What would be the right way to deal with this?
I have implemented this method:
func GetList() *ConcurrentSlice {
if list == nil {
denylist = NewConcurrentSlice()
return denylist
}
return denylist
}
And I use it like this:
concurrentSlice := GetList()
concurrentSlice.UpdateOrAppend(item)
But I understand that between the get and the modification, even if it is practically immediate, another routine may have modified the slice. What would be the correct way to perform the two operations atomically? That the slice I read is 100% the one I modify. Because if I try to assign an item to a index that no longer exists, it will break the execution.
Thank you in advance!
The way you are doing the blocking is incorrect, because it does not ensure that the items you return have not been removed. In case of an update, the array would still be at least the same length.
A simpler solution that works could be the following:
func (cs *ConcurrentSlice) UpdateOrAppend(item ScalingInfo) {
found := false
i := 0
cs.Lock()
defer cs.Unlock()
for _, it := range cs.items {
if item.Name == it.Name{
cs.items[i] = it
found = true
}
i++
}
if !found {
cs.items = append(cs.items, item)
}
}
Use a sync.Map if the order of the values is not important.
type Items struct {
m sync.Map
}
func (items *Items) Update(item Info) {
items.m.Store(item.Name, item)
}
func (items *Items) Range(f func(Info) bool) {
items.m.Range(func(key, value any) bool {
return f(value.(Info))
})
}
Data structures 101: always pick the best data structure for your use case. If you’re going to be looking up objects by name, that’s EXACTLY what map is for. If you still need to maintain the order of the items, you use a treemap
Concurrency 101: like transactions, your mutex should be atomic, consistent, and isolated. You’re failing isolation here because the data structure read does not fall inside your mutex lock.
Your code should look something like this:
func {
mutex.lock
defer mutex.unlock
check map or treemap for name
if exists update
else add
}
After some tests, I can say that the situation you fear can indeed happen with sync.RWMutex. I think it could happen with sync.Mutex too, but I can't reproduce that. Maybe I'm missing some informations, or maybe the calls are in order because they all are blocked and the order they redeem the right to lock is ordered in some way.
One way to keep your two calls safe without other routines getting in 'conflict' would be to use an other mutex, for every task on that object. You would lock that mutex before your read and write, and release it when you're done. You would also have to use that mutex on any other call that write (or read) to that object. You can find an implementation of what I'm talking about here in the main.go file. In order to reproduce the issue with RWMutex, you can simply comment the startTask and the endTask calls and the issue is visible in the terminal output.
EDIT : my first answer was wrong as I misinterpreted a test result, and fell in the situation described by OP.
tl;dr;
If ConcurrentSlice is to be used from a single goroutine, the locks are unnecessary, because the way algorithm written there is not going to be any concurrent read/writes to slice elements, or the slice.
If ConcurrentSlice is to be used from multiple goroutines, existings locks are not sufficient. This is because UpdateOrAppend may modify slice elements concurrently.
A safe version woule need two versions of Iter:
This can be called by users of ConcurrentSlice, but it cannot be called from `UpdateOrAppend:
func (cs *ConcurrentSlice) Iter() <-chan ConcurrentSliceItem {
c := make(chan ConcurrentSliceItem)
f := func() {
cs.RLock()
defer cs.RUnlock()
for index, value := range cs.items {
c <- ConcurrentSliceItem{index, value}
}
close(c)
}
go f()
return c
}
and this is only to be called from UpdateOrAppend:
func (cs *ConcurrentSlice) internalIter() <-chan ConcurrentSliceItem {
c := make(chan ConcurrentSliceItem)
f := func() {
// No locking
for index, value := range cs.items {
c <- ConcurrentSliceItem{index, value}
}
close(c)
}
go f()
return c
}
And UpdateOrAppend should be synchronized at the top level:
func (cs *ConcurrentSlice) UpdateOrAppend(item ScalingInfo) {
cs.Lock()
defer cs.Unlock()
....
}
Here's the long version:
This is an interesting piece of code. Based on my understanding of the go memory model, the mutex lock in Iter() is only necessary if there is another goroutine working on this code, and even with that, there is a possible race in the code. However, UpdateOrAppend only modifies elements of the slice with lower indexes than what Iter is working on, so that race never manifests itself.
The race can happen as follows:
The for-loop in iter reads element 0 of the slice
The element is sent through the channel. Thus, the slice receive happens after the first step.
The receiving end potentially updates element 0 of the slice. There is no problem up to here.
Then the sending goroutine reads element 1 of the slice. This is when a race can happen. If step 3 updated index 1 of the slice, the read at step 4 is a race. That is: if step 3 reads the update done by step 4, it is a race. You can see this if you start with i:=1 in UpdateOrAppend, and running it with the -race flag.
But UpdateOrAppend always modifies slice elements that are already seen by Iter when i=0, so this code is safe, even without the lock.
If there will be other goroutines accessing and modifying the structure, you need the Mutex, but you need it to protect the complete UpdateOrAppend method, because only one goroutine should be allowed to run that. You need the mutex to protect the potential updates in the first for-loop, and that mutex has to also include the slice append case, because that may actually modify the slice of the underlying object.
If Iter is only called from UpdateOrAppend, then this single mutex should be sufficient. If however Iter can be called from multiple goroutines, then there is another race possibility. If one UpdateOrAppend is running concurrently with multiple Iter instances, then some of those Iter instances will read from the modified slice elements concurrently, causing a race. So, it should be such that multiple Iters can only run if there are no UpdateOrAppend calls. That is a RWMutex.
But Iter can be called from UpdateOrAppend with a lock, so it cannot really call RLock, otherwise it is a deadlock.
Thus, you need two versions of Iter: one that can be called outside UpdateOrAppend, and that issues RLock in the goroutine, and another that can only be called from UpdateOrAppend and does not call RLock.
I wrote a program to compute the similarity scores between a query and a target document. The sketch is like below:
type Dictionary struct {
Documents map[int][]string
Queries map[int][]string
}
type Similarity struct {
QID int
DocID int
Sim float64
}
func (dict * Dictionary) CalScore(qID, docID int) float64 {
query := dict.Queries[qID]
document := dict.Documents[docID]
score := calculation(query, document) // some counting and calculation
// for example: count how many words in query are also in document and so on,
// like tf-idf things.
return score
}
// Calculate the similarity scores for each group.
func SimWorker(index int, dict *Dictionary, simsList *[][]Similarity, wg *sync.WaitGroup) {
defer wg.Done()
for i, sim := range (*simsList)[index] {
// Retrieving words from Dictionary and compute, pretty time consuming.
(*simsList)[index][i].Sim = dict.CalScore(dict.Queries[sim.QID], dict.Documents[sim.DocID])
}
}
func main() {
dict := Dictionary{
// All data filled in.
}
simsList := [][]Similarity{
// Slice of groups of structs containing
// pairs of query id and doc id.
// All sims scores are 0.0 initially.
}
var wg sync.WaitGroup
for i := range simsList {
wg.Add(1)
go SimWorker(i, &dict, &simsList, &wg)
}
wg.Wait() // wait until all goroutines finish
// Next procedures to the simsList
}
Basically I have a slice of groups of query-doc id pairs, query ids within each group are the same while doc ids are all different. The procedure is pretty straightforward, I just get the strings from the dictionary, then compute the score applying some algorithms.
Firstly I did all these in sequence (not using goroutines) and it took several minutes to compute the scores for each group and several hours in total. Then I expected some speed improvements by introducing goroutines like above. I created a goroutine for each group since they access different parts in the Dictionary and [][]Similarity. But it turned out that the speed didn't improve and somewhat decreased a little bit (the number of goroutines I'm using is around 10). Why did this happen and how do I improve the program to actually speed up the computation?
add this line to your entry point
runtime.GOMAXPROCS(runtime.NumCPU())
it allows to use all core in your PC.
With out this line, your programme will act concurrently, but not parallely
GoDoc
I am learning concurrency in go and how it works.
What I am trying to do ?
Loop through slice of data
Create struct for required/needed data
Create channel for that struct
Call worker func using go rutine and pass that channel to that rutine
Using data from channel do some processing
Set the processed output back into channel
Wait in main thread to get output from all the channels which we kicked off
Code Which I tried
package main
import (
"fmt"
"github.com/pkg/errors"
"time"
)
type subject struct {
Name string
Class string
StartDate time.Time
EndDate time.Time
}
type workerData struct {
Subject string
Class string
Result string
Error error
}
func main () {
// Creating test data
var subjects []subject
st,_ := time.Parse("01/02/2016","01/01/2015")
et,_ := time.Parse("01/02/2016","01/01/2016")
s1 := subject{Name:"Math", Class:"3", StartDate:st,EndDate:et }
s2 := subject{Name:"Geo", Class:"3", StartDate:st,EndDate:et }
s3 := subject{Name:"Bio", Class:"3", StartDate:st,EndDate:et }
s4 := subject{Name:"Phy", Class:"3", StartDate:st,EndDate:et }
s5 := subject{Name:"Art", Class:"3", StartDate:st,EndDate:et }
subjects = append(subjects, s1)
subjects = append(subjects, s2)
subjects = append(subjects, s3)
subjects = append(subjects, s4)
subjects = append(subjects, s5)
c := make(chan workerData) // I am sure this is not how I should be creating channel
for i := 0 ; i< len(subjects) ; i++ {
go worker(c)
}
for _, v := range subjects {
// Setting required data in channel
data := workerData{Subject:v.Name, Class:v.Class}
// set the data and start the routine
c <- data // I think this will update data for all the routines ? SO how should create separate channel for each routine
}
// I want to wait till all the routines set the data in channel and return the data from workers.
for {
select {
case data := <- c :
fmt.Println(data)
}
}
}
func worker (c chan workerData) {
data := <- c
// This can be any processing
time.Sleep(100 * time.Millisecond)
if data.Subject != "Math" {
data.Result = "Pass"
} else {
data.Error = errors.New("Subject not found")
}
fmt.Println(data.Subject)
// returning processed data and error to channel
c <- data
// Rightfully this closes channel and here after I get error send on Closed channel.
close(c)
}
Playgorund Link - https://play.golang.org/p/hs1-B1UR98r
Issue I am Facing
I am not sure how to create different channel for each data item. The way I am currently doing will update the channel data for all routines. I want to know is there way to create diffrent channel for each data item in loop and pass that to the go rutine. And then wait in main rutine to get the result back from rutines from all channels.
Any pointers/ help would be great ? If any confusion feel free to comment.
"// I think this will update data for all the routines ?"
A channel (to simplify) is not a data structure to store data.
It is a structure to send and receive data over different goroutines.
As such, notice that your worker function is doing send and receive on the same channel within each goroutine instances. If you were having only one instance of such worker, this would deadlock (https://golang.org/doc/articles/race_detector.html).
In the version of the code you posted, for a beginner this might seem to work because you have many workers exchanging works to each other. But it is wrong for a correct program.
As a consequence, if a worker can not read and write the same channel, then it must consume a specific writable channel to send its results to some other routines.
// I want to wait till all the routines set the data in channel and
return the data from workers.
This is part of the synchronization mechanisms required to ensure that a pusher waits until all its workers has finished their job before proceeding further. (this blog post talks about it https://medium.com/golangspec/synchronized-goroutines-part-i-4fbcdd64a4ec)
// Rightfully this closes channel and here after I get error send on
Closed channel.
Take care that you have n routines of workers executing in parallel. The first of this worker to reach the end of its function will close the channel, making it unwritable to other workers, and false signaling its end to main.
Normally one use the close statement on the writer side to indicate that there is no more data into the channel. To indicate it has ended. This signal is consumed by readers to quit their read-wait operation of the channel.
As an example, lets review this loop
for {
select {
case data := <- c :
fmt.Println(data)
}
}
it is bad, really bad.
It is an infinite loop with no exit statement
The select is superfluous and does not contain exit statement, remember that a read on a channel is a blocking operation.
It is a bad rewrite of a standard pattern provided by the language, the range loop over a channel
The range loop over a channel is very simply written
for data := range c {
fmt.Println(data)
}
This pattern has one great advantage, it automatically detect a closed channel to exit the loop! letting you loop over only the relevant data to process. It is also much more succint.
Also, your worker is a awkward in that it read and write only one element before quitting.
Spawning go routines is cheap, but not free. You should always evaluate the trade-off between the costs of async processing and its actual workload.
Overall, your code should be closer to what is demonstrated here
https://gobyexample.com/worker-pools
While trying few experiments in channels, I came up with below code:
var strChannel = make(chan string, 30)
var mutex = &sync.Mutex{}
func main() {
go sampleRoutine()
for i := 0; i < 10; i++ {
mutex.Lock()
strChannel <- strconv.FormatInt(int64(i), 10)
mutex.Unlock()
time.Sleep(1 * time.Second)
}
time.Sleep(10 * time.Second)
}
func sampleRoutine() {
/* A: for msg := range strChannel{*/
/* B: for {
msg := <-strChannel*/
log.Println("got message ", msg, strChannel)
if msg == "3" {
mutex.Lock()
strChannel = make(chan string, 20)
mutex.Unlock()
}
}
}
Basically here while listening to a given channel, I am assigning the channel variable to a new channel in a specific condition (here when msg == 3).
When I use the code in comment block B it works as expected, i.e. the loop moves on to the newly created channel and prints 4-10.
However comment block A which I believe is just a different way to write the loop doesn't work i.e. after printing "3" it stops.
Could someone please let me know the reason for this behavior?
And also is code like this where a routine listening on a channel, creating a new one safe?
In Go, for statement evaluate the value on the right side of range before the loop begins.
That means changing the value of the variable on the right side of range will take no effects. So in your code, in Variant A, msg is ever-iterating over the orignal channel and never changed. In Varaint B, it works as intended as the channel is being evaluated per iteration.
The concept is a little bit tricky. It does not mean you cannot modify items of the slice or map on the right side of range. If you look deeper into it, you will find that in Go, map and slice stores a pointer, and modify its item does not change that pointer, so it has effects.
It is even more trickier in case of array. Modifying item of an array on the right side of range has no effects. This is due to Go's mechanism about storing array as a value.
Playground examples: https://play.golang.org/p/wzPfGHFYrnv
Sending and reading from a channel do not need to be protected by a mutex: They act as synchronisation primitives by themself (that's one of the main ideas behind sending/receiving) on a channel.
There is no difference between variant A and B because you do not close the channel. But...
Resigning a new channel to strChannel while iterating the old channel is wrong. Don't do that. Not even with the B variant. Think about range strChannel as "please range over the values in the channel currently stored in the variable strChannel". This range will "continue on the original channel" and storing a new channel in the same variable doesn't change this. Avoid such code!
When you use a map in a program with concurrent access, is there any need to use a mutex in functions to read values?
Multiple readers, no writers is okay:
https://groups.google.com/d/msg/golang-nuts/HpLWnGTp-n8/hyUYmnWJqiQJ
One writer, no readers is okay. (Maps wouldn't be much good otherwise.)
Otherwise, if there is at least one writer and at least one more either writer or reader, then all readers and writers must use synchronization to access the map. A mutex works fine for this.
sync.Map has merged to Go master as of April 27, 2017.
This is the concurrent Map we have all been waiting for.
https://github.com/golang/go/blob/master/src/sync/map.go
https://godoc.org/sync#Map
I answered your question in this reddit thread few days ago:
In Go, maps are not thread-safe. Also, data requires locking even for
reading if, for example, there could be another goroutine that is
writing the same data (concurrently, that is).
Judging by your clarification in the comments, that there are going to be setter functions too, the answer to your question is yes, you will have to protect your reads with a mutex; you can use a RWMutex. For an example you can look at the source of the implementation of a table data structure (uses a map behind the scenes) which I wrote (actually the one linked in the reddit thread).
You could use concurrent-map to handle the concurrency pains for you.
// Create a new map.
map := cmap.NewConcurrentMap()
// Add item to map, adds "bar" under key "foo"
map.Add("foo", "bar")
// Retrieve item from map.
tmp, ok := map.Get("foo")
// Checks if item exists
if ok == true {
// Map stores items as interface{}, hence we'll have to cast.
bar := tmp.(string)
}
// Removes item under key "foo"
map.Remove("foo")
if you only have one writer, then you can probably get away with using an atomic Value. The following is adapted from https://golang.org/pkg/sync/atomic/#example_Value_readMostly (the original uses locks to protect writing, so supports multiple writers)
type Map map[string]string
var m Value
m.Store(make(Map))
read := func(key string) (val string) { // read from multiple go routines
m1 := m.Load().(Map)
return m1[key]
}
insert := func(key, val string) { // update from one go routine
m1 := m.Load().(Map) // load current value of the data structure
m2 := make(Map) // create a new map
for k, v := range m1 {
m2[k] = v // copy all data from the current object to the new one
}
m2[key] = val // do the update that we need (can delete/add/change)
m.Store(m2) // atomically replace the current object with the new one
// At this point all new readers start working with the new version.
// The old version will be garbage collected once the existing readers
// (if any) are done with it.
}
Why no made use of Go concurrency model instead, there is a simple example...
type DataManager struct {
/** This contain connection to know dataStore **/
m_dataStores map[string]DataStore
/** That channel is use to access the dataStores map **/
m_dataStoreChan chan map[string]interface{}
}
func newDataManager() *DataManager {
dataManager := new(DataManager)
dataManager.m_dataStores = make(map[string]DataStore)
dataManager.m_dataStoreChan = make(chan map[string]interface{}, 0)
// Concurrency...
go func() {
for {
select {
case op := <-dataManager.m_dataStoreChan:
if op["op"] == "getDataStore" {
storeId := op["storeId"].(string)
op["store"].(chan DataStore) <- dataManager.m_dataStores[storeId]
} else if op["op"] == "getDataStores" {
stores := make([]DataStore, 0)
for _, store := range dataManager.m_dataStores {
stores = append(stores, store)
}
op["stores"].(chan []DataStore) <- stores
} else if op["op"] == "setDataStore" {
store := op["store"].(DataStore)
dataManager.m_dataStores[store.GetId()] = store
} else if op["op"] == "removeDataStore" {
storeId := op["storeId"].(string)
delete(dataManager.m_dataStores, storeId)
}
}
}
}()
return dataManager
}
/**
* Access Map functions...
*/
func (this *DataManager) getDataStore(id string) DataStore {
arguments := make(map[string]interface{})
arguments["op"] = "getDataStore"
arguments["storeId"] = id
result := make(chan DataStore)
arguments["store"] = result
this.m_dataStoreChan <- arguments
return <-result
}
func (this *DataManager) getDataStores() []DataStore {
arguments := make(map[string]interface{})
arguments["op"] = "getDataStores"
result := make(chan []DataStore)
arguments["stores"] = result
this.m_dataStoreChan <- arguments
return <-result
}
func (this *DataManager) setDataStore(store DataStore) {
arguments := make(map[string]interface{})
arguments["op"] = "setDataStore"
arguments["store"] = store
this.m_dataStoreChan <- arguments
}
func (this *DataManager) removeDataStore(id string) {
arguments := make(map[string]interface{})
arguments["storeId"] = id
arguments["op"] = "removeDataStore"
this.m_dataStoreChan <- arguments
}