What happens if I concurrently access a single go map? [duplicate] - go

When you use a map in a program with concurrent access, is there any need to use a mutex in functions to read values?

Multiple readers, no writers is okay:
https://groups.google.com/d/msg/golang-nuts/HpLWnGTp-n8/hyUYmnWJqiQJ
One writer, no readers is okay. (Maps wouldn't be much good otherwise.)
Otherwise, if there is at least one writer and at least one more either writer or reader, then all readers and writers must use synchronization to access the map. A mutex works fine for this.

sync.Map has merged to Go master as of April 27, 2017.
This is the concurrent Map we have all been waiting for.
https://github.com/golang/go/blob/master/src/sync/map.go
https://godoc.org/sync#Map

I answered your question in this reddit thread few days ago:
In Go, maps are not thread-safe. Also, data requires locking even for
reading if, for example, there could be another goroutine that is
writing the same data (concurrently, that is).
Judging by your clarification in the comments, that there are going to be setter functions too, the answer to your question is yes, you will have to protect your reads with a mutex; you can use a RWMutex. For an example you can look at the source of the implementation of a table data structure (uses a map behind the scenes) which I wrote (actually the one linked in the reddit thread).

You could use concurrent-map to handle the concurrency pains for you.
// Create a new map.
map := cmap.NewConcurrentMap()
// Add item to map, adds "bar" under key "foo"
map.Add("foo", "bar")
// Retrieve item from map.
tmp, ok := map.Get("foo")
// Checks if item exists
if ok == true {
// Map stores items as interface{}, hence we'll have to cast.
bar := tmp.(string)
}
// Removes item under key "foo"
map.Remove("foo")

if you only have one writer, then you can probably get away with using an atomic Value. The following is adapted from https://golang.org/pkg/sync/atomic/#example_Value_readMostly (the original uses locks to protect writing, so supports multiple writers)
type Map map[string]string
var m Value
m.Store(make(Map))
read := func(key string) (val string) { // read from multiple go routines
m1 := m.Load().(Map)
return m1[key]
}
insert := func(key, val string) { // update from one go routine
m1 := m.Load().(Map) // load current value of the data structure
m2 := make(Map) // create a new map
for k, v := range m1 {
m2[k] = v // copy all data from the current object to the new one
}
m2[key] = val // do the update that we need (can delete/add/change)
m.Store(m2) // atomically replace the current object with the new one
// At this point all new readers start working with the new version.
// The old version will be garbage collected once the existing readers
// (if any) are done with it.
}

Why no made use of Go concurrency model instead, there is a simple example...
type DataManager struct {
/** This contain connection to know dataStore **/
m_dataStores map[string]DataStore
/** That channel is use to access the dataStores map **/
m_dataStoreChan chan map[string]interface{}
}
func newDataManager() *DataManager {
dataManager := new(DataManager)
dataManager.m_dataStores = make(map[string]DataStore)
dataManager.m_dataStoreChan = make(chan map[string]interface{}, 0)
// Concurrency...
go func() {
for {
select {
case op := <-dataManager.m_dataStoreChan:
if op["op"] == "getDataStore" {
storeId := op["storeId"].(string)
op["store"].(chan DataStore) <- dataManager.m_dataStores[storeId]
} else if op["op"] == "getDataStores" {
stores := make([]DataStore, 0)
for _, store := range dataManager.m_dataStores {
stores = append(stores, store)
}
op["stores"].(chan []DataStore) <- stores
} else if op["op"] == "setDataStore" {
store := op["store"].(DataStore)
dataManager.m_dataStores[store.GetId()] = store
} else if op["op"] == "removeDataStore" {
storeId := op["storeId"].(string)
delete(dataManager.m_dataStores, storeId)
}
}
}
}()
return dataManager
}
/**
* Access Map functions...
*/
func (this *DataManager) getDataStore(id string) DataStore {
arguments := make(map[string]interface{})
arguments["op"] = "getDataStore"
arguments["storeId"] = id
result := make(chan DataStore)
arguments["store"] = result
this.m_dataStoreChan <- arguments
return <-result
}
func (this *DataManager) getDataStores() []DataStore {
arguments := make(map[string]interface{})
arguments["op"] = "getDataStores"
result := make(chan []DataStore)
arguments["stores"] = result
this.m_dataStoreChan <- arguments
return <-result
}
func (this *DataManager) setDataStore(store DataStore) {
arguments := make(map[string]interface{})
arguments["op"] = "setDataStore"
arguments["store"] = store
this.m_dataStoreChan <- arguments
}
func (this *DataManager) removeDataStore(id string) {
arguments := make(map[string]interface{})
arguments["storeId"] = id
arguments["op"] = "removeDataStore"
this.m_dataStoreChan <- arguments
}

Related

Concurrency-safe map of slices

I have a type that contains a sync.Map where the key in the map is a string and the value is a slice. My code for inserting items into the map is as follows:
newList := []*Item{item}
if result, ok := map.LoadOrStore(key, newList); ok {
resultList := result.([]*Item)
resultList = append(resultList, item)
map.Store(key, resultList)
}
This is not concurrency-safe because the the slice can be loaded and modified by multiple calls concurrently. This code is very fragile so I've attempted to modify it to be:
newList := []*Item{item}
if result, ok := map.LoadOrStore(key, &newList); ok {
resultList := result.(*[]*Item)
*resultList = append(*resultList, item)
}
All this does is make the issues occur deterministically. So, I'm trying to find a way to have a map-of-slices that can be added to concurrently. My instinct is to use sync.Mutex to lock the list while I'm adding to it but in order to maintain the concurrent access to the sync.Map I would need to create a map of sync.Mutex objects as well, like this:
newLock := sync.Mutex{}
raw, _ := lockMap.LoadOrStore(key, &newLock)
lock := raw.(*sync.Mutex)
newList := []*Item{item}
if result, ok := map.LoadOrStore(key, &newList); ok {
lock.Lock()
resultList := result.(*[]*Item)
*resultList = append(*resultList, item)
lock.Unlock()
}
Is there an easier way to go about this?
It isn't very different from your current plan, but you could save yourself the trouble of handling two maps by using a struct with an embedded mutex for the values of the map.
The struct would look something like this:
type SafeItems struct {
sync.Mutex
Items []*Item
}
And it could be used like this:
newMapEntry := SafeItems{Items: itemPtrList}
if result, ok := map.LoadOrStore(key, &newMapEntry); ok {
mapEntry := result.(*SafeItems)
mapEntry.Lock()
mapEntry.Items = append(mapEntry.Items, item)
mapEntry.Unlock()
}
It's not a huge change but it does provide some syntactic sugar.

Data race with list.List concurrent access with mutexes

I'm getting a data race and I can't quite figure out why. Running my tests with the -race command I've narrowed it down to trying to access a list.List while reading from it, but my Mutexes don't seem to do anything.
I have a number of *list.Lists inside of an array like so:
type MyList struct {
mutex sync.Mutex
*list.List
}
type SomeObj struct {
data string
}
var myListOfLists [10]MyList
I'm reading and writing from the list like so:
list := myListOfLists[someIndex]
list.mutex.Lock()
for e := list.Front(); e != nil; e = e.Next() {
if (...) {
list.MoveToFront(e)
}
}
list.mutex.Unlock()
and in another goroutine also trying to read and build a full list to return
var fullCopy []*SomeObj
list := myListOfLists[someIndex]
list.mutex.Lock()
for e := list.Front(); e != nil; e = e.Next() {
fullCopy = append(fullCopy, e.Value.(SomeObj))
}
list.mutex.Unlock()
The statement list := myListOfLists[someIndex] copies the array element to variable list. This copies the mutex, thus preventing the mutex from working. The go vet command reports this problem.
You can avoid the copy by using a pointer to the array element:
list := &myListOfLists[someIndex]
Another approach is to use an array of pointers to MyList. While you are at it, you might as well use a list value instead a list pointer in MyList:
type MyList struct {
mutex sync.Mutex
list.List
}
var myListOfLists [10]*MyList
for i := range myListOfLists {
myListOfLists[i] = &MyList{}
}

Is is safe to append() to a slice from which another thread is reading?

Let's say I have many goroutines doing something like this:
func (o *Obj) Reader() {
data := o.data;
for i, value := range data {
log.Printf("got data[%v] = %v", i, value)
}
}
And one doing this:
func (o *Obj) Writer() {
o.data = append(o.data, 1234)
}
If data := o.data means the internal structure of the slice is copied, this looks like it could be safe, because I'm never modifying anything in the accessible range of the copy. I'm either setting one element outside of the range and increasing the length, or allocating a completely new pointer, but the reader would be operating on the original one.
Are my assumptions correct and this is safe to do?
I'm aware that slices are not meant to be "thread-safe" in general, the question is more about how much does slice1 := slice2 actually copy.
The code in the question is unsafe because it reads a variable in one goroutine and modifies the variable in another goroutine without synchronization.
Here's one way to make the code safe:
type Obj struct {
mu sync.Mutex // add mutex
... // other fields as before
}
func (o *Obj) Reader() {
o.mu.Lock()
data := o.data
o.mu.Unlock()
for i, value := range data {
log.Printf("got data[%v] = %v", i, value)
}
}
func (o *Obj) Writer() {
o.mu.Lock()
o.data = append(o.data, 1234)
o.mu.Unlock()
}
It's safe for Reader to range over the local slice variable data because the Writer does not modify the local variable data or the backing array visible through the local variable data.
A bit late to the party, but if your use-case is frequent reads and infrequent writes, atomic.Value is designed to solve this:
type Obj struct {
data atomic.Value // []int
mu sync.Mutex
}
func (o *Obj) Reader() {
data := o.data.Load().([]int);
for i, value := range data {
log.Printf("got data[%v] = %v", i, value)
}
}
func (o *Obj) Writer() {
o.mu.Lock()
data := o.data.Load().([]int);
data = append(o.data, 1234)
o.data.Store(data)
o.mu.Unlock()
}
This will generally be much faster than either a Mutex or an RWMutex.
Note that this will only work with data this is effectively a copy, which it is in this case because you can safely maintain a reference to the previous slice when appending, as append() creates a new copy if it extends. If you're mutating the elements of the slice, or using another data structure, this approach is not safe.

Can we write a generic array/slice deduplication in go?

Is there a way to write a generic array/slice deduplication in go, for []int we can have something like (from http://rosettacode.org/wiki/Remove_duplicate_elements#Go ):
func uniq(list []int) []int {
unique_set := make(map[int] bool, len(list))
for _, x := range list {
unique_set[x] = true
}
result := make([]int, len(unique_set))
i := 0
for x := range unique_set {
result[i] = x
i++
}
return result
}
But is there a way to extend it to support any array? with a signature like:
func deduplicate(a []interface{}) []interface{}
I know that you can write that function with that signature, but then you can't actually use it on []int, you need to create a []interface{} put everything from the []int into it, pass it to the function then get it back and put it into a []interface{} and go through this new array and put everything in a new []int.
My question is, is there a better way to do this?
While VonC's answer probably does the closest to what you really want, the only real way to do it in native Go without gen is to define an interface
type IDList interface {
// Returns the id of the element at i
ID(i int) int
// Returns the element
// with the given id
GetByID(id int) interface{}
Len() int
// Adds the element to the list
Insert(interface{})
}
// Puts the deduplicated list in dst
func Deduplicate(dst, list IDList) {
intList := make([]int, list.Len())
for i := range intList {
intList[i] = list.ID(i)
}
uniques := uniq(intList)
for _,el := range uniques {
dst.Insert(list.GetByID(el))
}
}
Where uniq is the function from your OP.
This is just one possible example, and there are probably much better ones, but in general mapping each element to a unique "==able" ID and either constructing a new list or culling based on the deduplication of the IDs is probably the most intuitive way.
An alternate solution is to take in an []IDer where the IDer interface is just ID() int. However, that means that user code has to create the []IDer list and copy all the elements into that list, which is a bit ugly. It's cleaner for the user to wrap the list as an ID list rather than copy, but it's a similar amount of work either way.
The only way I have seen that implemented in Go is with the clipperhouse/gen project,
gen is an attempt to bring some generics-like functionality to Go, with some inspiration from C#’s Linq and JavaScript’s underscore libraries
See this test:
// Distinct returns a new Thing1s slice whose elements are unique. See: http://clipperhouse.github.io/gen/#Distinct
func (rcv Thing1s) Distinct() (result Thing1s) {
appended := make(map[Thing1]bool)
for _, v := range rcv {
if !appended[v] {
result = append(result, v)
appended[v] = true
}
}
return result
}
But, as explained in clipperhouse.github.io/gen/:
gen generates code for your types, at development time, using the command line.
gen is not an import; the generated source becomes part of your project and takes no external dependencies.
You could do something close to this via an interface. Define an interface, say "DeDupable" requiring a func, say, UniqId() []byte, which you could then use to do the removing of dups. and your uniq func would take a []DeDupable and work on it

Writing generic data access functions in Go

I'm writing code that allows data access from a database. However, I find myself repeating the same code for similar types and fields. How can I write generic functions for the same?
e.g. what I want to achieve ...
type Person{FirstName string}
type Company{Industry string}
getItems(typ string, field string, val string) ([]interface{}) {
...
}
var persons []Person
persons = getItems("Person", "FirstName", "John")
var companies []Company
cs = getItems("Company", "Industry", "Software")
So you're definitely on the right track with the idea of returning a slice of nil interface types. However, you're going to run into problems when you try accessing specific members or calling specific methods, because you're not going to know what type you're looking for. This is where type assertions are going to come in very handy. To extend your code a bit:
getPerson(typ string, field string, val string) []Person {
slice := getItems(typ, field, val)
output := make([]Person, 0)
i := 0
for _, item := range slice {
// Type assertion!
thing, ok := item.(Person)
if ok {
output = append(output, thing)
i++
}
}
return output
}
So what that does is it performs a generic search, and then weeds out only those items which are of the correct type. Specifically, the type assertion:
thing, ok := item.(Person)
checks to see if the variable item is of type Person, and if it is, it returns the value and true, otherwise it returns nil and false (thus checking ok tells us if the assertion succeeded).
You can actually, if you want, take this a step further, and define the getItems() function in terms of another boolean function. Basically the idea would be to have getItems() run the function pass it on each element in the database and only add that element to the results if running the function on the element returns true:
getItem(critera func(interface{})bool) []interface{} {
output := make([]interface{}, 0)
foreach _, item := range database {
if criteria(item) {
output = append(output, item)
}
}
}
(honestly, if it were me, I'd do a hybrid of the two which accepts a criteria function but also accepts the field and value strings)
joshlf13 has a great answer. I'd expand a little on it though to maintain some additional type safety. instead of a critera function I would use a collector function.
// typed output array no interfaces
output := []string{}
// collector that populates our output array as needed
func collect(i interface{}) {
// The only non typesafe part of the program is limited to this function
if val, ok := i.(string); ok {
output = append(output, val)
}
}
// getItem uses the collector
func getItem(collect func(interface{})) {
foreach _, item := range database {
collect(item)
}
}
getItem(collect) // perform our get and populate the output array from above.
This has the benefit of not requiring you to loop through your interface{} slice after a call to getItems and do yet another cast.

Resources