Data race with list.List concurrent access with mutexes - go

I'm getting a data race and I can't quite figure out why. Running my tests with the -race command I've narrowed it down to trying to access a list.List while reading from it, but my Mutexes don't seem to do anything.
I have a number of *list.Lists inside of an array like so:
type MyList struct {
mutex sync.Mutex
*list.List
}
type SomeObj struct {
data string
}
var myListOfLists [10]MyList
I'm reading and writing from the list like so:
list := myListOfLists[someIndex]
list.mutex.Lock()
for e := list.Front(); e != nil; e = e.Next() {
if (...) {
list.MoveToFront(e)
}
}
list.mutex.Unlock()
and in another goroutine also trying to read and build a full list to return
var fullCopy []*SomeObj
list := myListOfLists[someIndex]
list.mutex.Lock()
for e := list.Front(); e != nil; e = e.Next() {
fullCopy = append(fullCopy, e.Value.(SomeObj))
}
list.mutex.Unlock()

The statement list := myListOfLists[someIndex] copies the array element to variable list. This copies the mutex, thus preventing the mutex from working. The go vet command reports this problem.
You can avoid the copy by using a pointer to the array element:
list := &myListOfLists[someIndex]
Another approach is to use an array of pointers to MyList. While you are at it, you might as well use a list value instead a list pointer in MyList:
type MyList struct {
mutex sync.Mutex
list.List
}
var myListOfLists [10]*MyList
for i := range myListOfLists {
myListOfLists[i] = &MyList{}
}

Related

Lock slice before reading and modifying it

My experience working with Go is recent and in reviewing some code, I have seen that while it is write-protected, there is a problem with reading the data. Not with the reading itself, but with possible modifications that can occur between the reading and the modification of the slice.
type ConcurrentSlice struct {
sync.RWMutex
items []Item
}
type Item struct {
Index int
Value Info
}
type Info struct {
Name string
Labels map[string]string
Failure bool
}
As mentioned, the writing is protected in this way:
func (cs *ConcurrentSlice) UpdateOrAppend(item ScalingInfo) {
found := false
i := 0
for inList := range cs.Iter() {
if item.Name == inList.Value.Name{
cs.items[i] = item
found = true
}
i++
}
if !found {
cs.Lock()
defer cs.Unlock()
cs.items = append(cs.items, item)
}
}
func (cs *ConcurrentSlice) Iter() <-chan ConcurrentSliceItem {
c := make(chan ConcurrentSliceItem)
f := func() {
cs.Lock()
defer cs.Unlock()
for index, value := range cs.items {
c <- ConcurrentSliceItem{index, value}
}
close(c)
}
go f()
return c
}
But between collecting the content of the slice and modifying it, modifications can occur.It may be that another routine modifies the same slice and when it is time to assign a value, it no longer exists: slice[i] = item
What would be the right way to deal with this?
I have implemented this method:
func GetList() *ConcurrentSlice {
if list == nil {
denylist = NewConcurrentSlice()
return denylist
}
return denylist
}
And I use it like this:
concurrentSlice := GetList()
concurrentSlice.UpdateOrAppend(item)
But I understand that between the get and the modification, even if it is practically immediate, another routine may have modified the slice. What would be the correct way to perform the two operations atomically? That the slice I read is 100% the one I modify. Because if I try to assign an item to a index that no longer exists, it will break the execution.
Thank you in advance!
The way you are doing the blocking is incorrect, because it does not ensure that the items you return have not been removed. In case of an update, the array would still be at least the same length.
A simpler solution that works could be the following:
func (cs *ConcurrentSlice) UpdateOrAppend(item ScalingInfo) {
found := false
i := 0
cs.Lock()
defer cs.Unlock()
for _, it := range cs.items {
if item.Name == it.Name{
cs.items[i] = it
found = true
}
i++
}
if !found {
cs.items = append(cs.items, item)
}
}
Use a sync.Map if the order of the values is not important.
type Items struct {
m sync.Map
}
func (items *Items) Update(item Info) {
items.m.Store(item.Name, item)
}
func (items *Items) Range(f func(Info) bool) {
items.m.Range(func(key, value any) bool {
return f(value.(Info))
})
}
Data structures 101: always pick the best data structure for your use case. If you’re going to be looking up objects by name, that’s EXACTLY what map is for. If you still need to maintain the order of the items, you use a treemap
Concurrency 101: like transactions, your mutex should be atomic, consistent, and isolated. You’re failing isolation here because the data structure read does not fall inside your mutex lock.
Your code should look something like this:
func {
mutex.lock
defer mutex.unlock
check map or treemap for name
if exists update
else add
}
After some tests, I can say that the situation you fear can indeed happen with sync.RWMutex. I think it could happen with sync.Mutex too, but I can't reproduce that. Maybe I'm missing some informations, or maybe the calls are in order because they all are blocked and the order they redeem the right to lock is ordered in some way.
One way to keep your two calls safe without other routines getting in 'conflict' would be to use an other mutex, for every task on that object. You would lock that mutex before your read and write, and release it when you're done. You would also have to use that mutex on any other call that write (or read) to that object. You can find an implementation of what I'm talking about here in the main.go file. In order to reproduce the issue with RWMutex, you can simply comment the startTask and the endTask calls and the issue is visible in the terminal output.
EDIT : my first answer was wrong as I misinterpreted a test result, and fell in the situation described by OP.
tl;dr;
If ConcurrentSlice is to be used from a single goroutine, the locks are unnecessary, because the way algorithm written there is not going to be any concurrent read/writes to slice elements, or the slice.
If ConcurrentSlice is to be used from multiple goroutines, existings locks are not sufficient. This is because UpdateOrAppend may modify slice elements concurrently.
A safe version woule need two versions of Iter:
This can be called by users of ConcurrentSlice, but it cannot be called from `UpdateOrAppend:
func (cs *ConcurrentSlice) Iter() <-chan ConcurrentSliceItem {
c := make(chan ConcurrentSliceItem)
f := func() {
cs.RLock()
defer cs.RUnlock()
for index, value := range cs.items {
c <- ConcurrentSliceItem{index, value}
}
close(c)
}
go f()
return c
}
and this is only to be called from UpdateOrAppend:
func (cs *ConcurrentSlice) internalIter() <-chan ConcurrentSliceItem {
c := make(chan ConcurrentSliceItem)
f := func() {
// No locking
for index, value := range cs.items {
c <- ConcurrentSliceItem{index, value}
}
close(c)
}
go f()
return c
}
And UpdateOrAppend should be synchronized at the top level:
func (cs *ConcurrentSlice) UpdateOrAppend(item ScalingInfo) {
cs.Lock()
defer cs.Unlock()
....
}
Here's the long version:
This is an interesting piece of code. Based on my understanding of the go memory model, the mutex lock in Iter() is only necessary if there is another goroutine working on this code, and even with that, there is a possible race in the code. However, UpdateOrAppend only modifies elements of the slice with lower indexes than what Iter is working on, so that race never manifests itself.
The race can happen as follows:
The for-loop in iter reads element 0 of the slice
The element is sent through the channel. Thus, the slice receive happens after the first step.
The receiving end potentially updates element 0 of the slice. There is no problem up to here.
Then the sending goroutine reads element 1 of the slice. This is when a race can happen. If step 3 updated index 1 of the slice, the read at step 4 is a race. That is: if step 3 reads the update done by step 4, it is a race. You can see this if you start with i:=1 in UpdateOrAppend, and running it with the -race flag.
But UpdateOrAppend always modifies slice elements that are already seen by Iter when i=0, so this code is safe, even without the lock.
If there will be other goroutines accessing and modifying the structure, you need the Mutex, but you need it to protect the complete UpdateOrAppend method, because only one goroutine should be allowed to run that. You need the mutex to protect the potential updates in the first for-loop, and that mutex has to also include the slice append case, because that may actually modify the slice of the underlying object.
If Iter is only called from UpdateOrAppend, then this single mutex should be sufficient. If however Iter can be called from multiple goroutines, then there is another race possibility. If one UpdateOrAppend is running concurrently with multiple Iter instances, then some of those Iter instances will read from the modified slice elements concurrently, causing a race. So, it should be such that multiple Iters can only run if there are no UpdateOrAppend calls. That is a RWMutex.
But Iter can be called from UpdateOrAppend with a lock, so it cannot really call RLock, otherwise it is a deadlock.
Thus, you need two versions of Iter: one that can be called outside UpdateOrAppend, and that issues RLock in the goroutine, and another that can only be called from UpdateOrAppend and does not call RLock.

Concurrency-safe map of slices

I have a type that contains a sync.Map where the key in the map is a string and the value is a slice. My code for inserting items into the map is as follows:
newList := []*Item{item}
if result, ok := map.LoadOrStore(key, newList); ok {
resultList := result.([]*Item)
resultList = append(resultList, item)
map.Store(key, resultList)
}
This is not concurrency-safe because the the slice can be loaded and modified by multiple calls concurrently. This code is very fragile so I've attempted to modify it to be:
newList := []*Item{item}
if result, ok := map.LoadOrStore(key, &newList); ok {
resultList := result.(*[]*Item)
*resultList = append(*resultList, item)
}
All this does is make the issues occur deterministically. So, I'm trying to find a way to have a map-of-slices that can be added to concurrently. My instinct is to use sync.Mutex to lock the list while I'm adding to it but in order to maintain the concurrent access to the sync.Map I would need to create a map of sync.Mutex objects as well, like this:
newLock := sync.Mutex{}
raw, _ := lockMap.LoadOrStore(key, &newLock)
lock := raw.(*sync.Mutex)
newList := []*Item{item}
if result, ok := map.LoadOrStore(key, &newList); ok {
lock.Lock()
resultList := result.(*[]*Item)
*resultList = append(*resultList, item)
lock.Unlock()
}
Is there an easier way to go about this?
It isn't very different from your current plan, but you could save yourself the trouble of handling two maps by using a struct with an embedded mutex for the values of the map.
The struct would look something like this:
type SafeItems struct {
sync.Mutex
Items []*Item
}
And it could be used like this:
newMapEntry := SafeItems{Items: itemPtrList}
if result, ok := map.LoadOrStore(key, &newMapEntry); ok {
mapEntry := result.(*SafeItems)
mapEntry.Lock()
mapEntry.Items = append(mapEntry.Items, item)
mapEntry.Unlock()
}
It's not a huge change but it does provide some syntactic sugar.

How to access struct fields from list in a loop

I am inserting a struct variable in the list. I am able to retrieve that inserted item in the loop but not the individual value. I am getting the error:
e.Value.name undefined (type interface {} is interface with no methods)
Code given below:
type Item struct {
name string
value string
}
queue := list.New()
per := Item{name: "name", value: "Adnan"}
queue.PushFront(per)
for e := queue.Front(); e != nil; e = e.Next() {
fmt.Println(e.Value.name)
}
container/list.List is not generic, it works with interface{}. Try to use a slice of type []*Item or []Item, so you won't have this problem.
If you must use list.List, you may use a type assertion:
fmt.Println(e.Value.(Item).name)
Using a slice it could look like this:
var queue []Item
per := Item{name: "name", value: "Adnan"}
queue = append(queue, per)
for _, v := range queue {
fmt.Println(v.name)
}
Note however that append() appends to the end of the slice, so it's not equivalent with List.PushFront().

What happens if I concurrently access a single go map? [duplicate]

When you use a map in a program with concurrent access, is there any need to use a mutex in functions to read values?
Multiple readers, no writers is okay:
https://groups.google.com/d/msg/golang-nuts/HpLWnGTp-n8/hyUYmnWJqiQJ
One writer, no readers is okay. (Maps wouldn't be much good otherwise.)
Otherwise, if there is at least one writer and at least one more either writer or reader, then all readers and writers must use synchronization to access the map. A mutex works fine for this.
sync.Map has merged to Go master as of April 27, 2017.
This is the concurrent Map we have all been waiting for.
https://github.com/golang/go/blob/master/src/sync/map.go
https://godoc.org/sync#Map
I answered your question in this reddit thread few days ago:
In Go, maps are not thread-safe. Also, data requires locking even for
reading if, for example, there could be another goroutine that is
writing the same data (concurrently, that is).
Judging by your clarification in the comments, that there are going to be setter functions too, the answer to your question is yes, you will have to protect your reads with a mutex; you can use a RWMutex. For an example you can look at the source of the implementation of a table data structure (uses a map behind the scenes) which I wrote (actually the one linked in the reddit thread).
You could use concurrent-map to handle the concurrency pains for you.
// Create a new map.
map := cmap.NewConcurrentMap()
// Add item to map, adds "bar" under key "foo"
map.Add("foo", "bar")
// Retrieve item from map.
tmp, ok := map.Get("foo")
// Checks if item exists
if ok == true {
// Map stores items as interface{}, hence we'll have to cast.
bar := tmp.(string)
}
// Removes item under key "foo"
map.Remove("foo")
if you only have one writer, then you can probably get away with using an atomic Value. The following is adapted from https://golang.org/pkg/sync/atomic/#example_Value_readMostly (the original uses locks to protect writing, so supports multiple writers)
type Map map[string]string
var m Value
m.Store(make(Map))
read := func(key string) (val string) { // read from multiple go routines
m1 := m.Load().(Map)
return m1[key]
}
insert := func(key, val string) { // update from one go routine
m1 := m.Load().(Map) // load current value of the data structure
m2 := make(Map) // create a new map
for k, v := range m1 {
m2[k] = v // copy all data from the current object to the new one
}
m2[key] = val // do the update that we need (can delete/add/change)
m.Store(m2) // atomically replace the current object with the new one
// At this point all new readers start working with the new version.
// The old version will be garbage collected once the existing readers
// (if any) are done with it.
}
Why no made use of Go concurrency model instead, there is a simple example...
type DataManager struct {
/** This contain connection to know dataStore **/
m_dataStores map[string]DataStore
/** That channel is use to access the dataStores map **/
m_dataStoreChan chan map[string]interface{}
}
func newDataManager() *DataManager {
dataManager := new(DataManager)
dataManager.m_dataStores = make(map[string]DataStore)
dataManager.m_dataStoreChan = make(chan map[string]interface{}, 0)
// Concurrency...
go func() {
for {
select {
case op := <-dataManager.m_dataStoreChan:
if op["op"] == "getDataStore" {
storeId := op["storeId"].(string)
op["store"].(chan DataStore) <- dataManager.m_dataStores[storeId]
} else if op["op"] == "getDataStores" {
stores := make([]DataStore, 0)
for _, store := range dataManager.m_dataStores {
stores = append(stores, store)
}
op["stores"].(chan []DataStore) <- stores
} else if op["op"] == "setDataStore" {
store := op["store"].(DataStore)
dataManager.m_dataStores[store.GetId()] = store
} else if op["op"] == "removeDataStore" {
storeId := op["storeId"].(string)
delete(dataManager.m_dataStores, storeId)
}
}
}
}()
return dataManager
}
/**
* Access Map functions...
*/
func (this *DataManager) getDataStore(id string) DataStore {
arguments := make(map[string]interface{})
arguments["op"] = "getDataStore"
arguments["storeId"] = id
result := make(chan DataStore)
arguments["store"] = result
this.m_dataStoreChan <- arguments
return <-result
}
func (this *DataManager) getDataStores() []DataStore {
arguments := make(map[string]interface{})
arguments["op"] = "getDataStores"
result := make(chan []DataStore)
arguments["stores"] = result
this.m_dataStoreChan <- arguments
return <-result
}
func (this *DataManager) setDataStore(store DataStore) {
arguments := make(map[string]interface{})
arguments["op"] = "setDataStore"
arguments["store"] = store
this.m_dataStoreChan <- arguments
}
func (this *DataManager) removeDataStore(id string) {
arguments := make(map[string]interface{})
arguments["storeId"] = id
arguments["op"] = "removeDataStore"
this.m_dataStoreChan <- arguments
}

Writing generic data access functions in Go

I'm writing code that allows data access from a database. However, I find myself repeating the same code for similar types and fields. How can I write generic functions for the same?
e.g. what I want to achieve ...
type Person{FirstName string}
type Company{Industry string}
getItems(typ string, field string, val string) ([]interface{}) {
...
}
var persons []Person
persons = getItems("Person", "FirstName", "John")
var companies []Company
cs = getItems("Company", "Industry", "Software")
So you're definitely on the right track with the idea of returning a slice of nil interface types. However, you're going to run into problems when you try accessing specific members or calling specific methods, because you're not going to know what type you're looking for. This is where type assertions are going to come in very handy. To extend your code a bit:
getPerson(typ string, field string, val string) []Person {
slice := getItems(typ, field, val)
output := make([]Person, 0)
i := 0
for _, item := range slice {
// Type assertion!
thing, ok := item.(Person)
if ok {
output = append(output, thing)
i++
}
}
return output
}
So what that does is it performs a generic search, and then weeds out only those items which are of the correct type. Specifically, the type assertion:
thing, ok := item.(Person)
checks to see if the variable item is of type Person, and if it is, it returns the value and true, otherwise it returns nil and false (thus checking ok tells us if the assertion succeeded).
You can actually, if you want, take this a step further, and define the getItems() function in terms of another boolean function. Basically the idea would be to have getItems() run the function pass it on each element in the database and only add that element to the results if running the function on the element returns true:
getItem(critera func(interface{})bool) []interface{} {
output := make([]interface{}, 0)
foreach _, item := range database {
if criteria(item) {
output = append(output, item)
}
}
}
(honestly, if it were me, I'd do a hybrid of the two which accepts a criteria function but also accepts the field and value strings)
joshlf13 has a great answer. I'd expand a little on it though to maintain some additional type safety. instead of a critera function I would use a collector function.
// typed output array no interfaces
output := []string{}
// collector that populates our output array as needed
func collect(i interface{}) {
// The only non typesafe part of the program is limited to this function
if val, ok := i.(string); ok {
output = append(output, val)
}
}
// getItem uses the collector
func getItem(collect func(interface{})) {
foreach _, item := range database {
collect(item)
}
}
getItem(collect) // perform our get and populate the output array from above.
This has the benefit of not requiring you to loop through your interface{} slice after a call to getItems and do yet another cast.

Resources