Here is a simple concurrent map that I wrote for learning purpose
package concurrent_hashmap
import (
"hash/fnv"
"sync"
)
type ConcurrentMap struct {
buckets []ThreadSafeMap
bucketCount uint32
}
type ThreadSafeMap struct {
mapLock sync.RWMutex
hashMap map[string]interface{}
}
func NewConcurrentMap(bucketSize uint32) *ConcurrentMap {
var threadSafeMapInstance ThreadSafeMap
var bucketOfThreadSafeMap []ThreadSafeMap
for i := 0; i <= int(bucketSize); i++ {
threadSafeMapInstance = ThreadSafeMap{sync.RWMutex{}, make(map[string]interface{})}
bucketOfThreadSafeMap = append(bucketOfThreadSafeMap, threadSafeMapInstance)
}
return &ConcurrentMap{bucketOfThreadSafeMap, bucketSize}
}
func (cMap *ConcurrentMap) Put(key string, val interface{}) {
bucketIndex := hash(key) % cMap.bucketCount
bucket := cMap.buckets[bucketIndex]
bucket.mapLock.Lock()
bucket.hashMap[key] = val
bucket.mapLock.Unlock()
}
// Helper
func hash(s string) uint32 {
h := fnv.New32a()
h.Write([]byte(s))
return h.Sum32()
}
I am trying to write a simple benchmark and I find that synchronize access will work correctly but concurrent access will get
fatal error: concurrent map writes
Here is my benchmark run with go test -bench=. -race
package concurrent_hashmap
import (
"testing"
"runtime"
"math/rand"
"strconv"
"sync"
)
// Concurrent does not work
func BenchmarkMyFunc(b *testing.B) {
var wg sync.WaitGroup
runtime.GOMAXPROCS(runtime.NumCPU())
my_map := NewConcurrentMap(uint32(4))
for n := 0; n < b.N; n++ {
go insert(my_map, wg)
}
wg.Wait()
}
func insert(my_map *ConcurrentMap, wg sync.WaitGroup) {
wg.Add(1)
var rand_int int
for element_num := 0; element_num < 1000; element_num++ {
rand_int = rand.Intn(100)
my_map.Put(strconv.Itoa(rand_int), rand_int)
}
defer wg.Done()
}
// This works
func BenchmarkMyFuncSynchronize(b *testing.B) {
my_map := NewConcurrentMap(uint32(4))
for n := 0; n < b.N; n++ {
my_map.Put(strconv.Itoa(123), 123)
}
}
The WARNING: DATA RACE is saying that bucket.hashMap[key] = val is causing the problem, but I am confused on why that is possible, since I lock that logic whenever write is happening.
I think I am missing something basic, can someone point out my mistake?
Thanks
Edit1:
Not sure if this helps but here is what my mutex looks like if I don't lock anything
{{0 0} 0 0 0 0}
Here is what it looks like if I lock the write
{{1 0} 0 0 -1073741824 0}
Not sure why my readerCount is a low negative number
Edit:2
I think I find where the issue is at, but not sure why I have to code that way
The issue is
type ThreadSafeMap struct {
mapLock sync.RWMutex // This is causing problem
hashMap map[string]interface{}
}
it should be
type ThreadSafeMap struct {
mapLock *sync.RWMutex
hashMap map[string]interface{}
}
Another weird thing is that in Put if I put print statement inside lock
bucket.mapLock.Lock()
fmt.Println("start")
fmt.Println(bucket)
fmt.Println(bucketIndex)
fmt.Println(bucket.mapLock)
fmt.Println(&bucket.mapLock)
bucket.hashMap[key] = val
defer bucket.mapLock.Unlock()
The following prints is possible
start
start
{0x4212861c0 map[123:123]}
{0x4212241c0 map[123:123]}
Its weird because each start printout should be follow with 4 lines of bucket info since you cannot have start back to back because that would indicate that multiple thread is access the line inside lock
Also for some reason each bucket.mapLock have different address even if I make the bucketIndex static, that indicate that I am not even accessing the same lock.
But despite the above weirdness changing mutex to pointer solves my problem
I would love to find out why I need pointers for mutex and why the prints seem to indicate multiple thread is accessing the lock and why each lock has different address.
The problem is with the statement
bucket := cMap.buckets[bucketIndex]
bucket now contains copy of the ThreadSafeMap at that index. As sync.RWMutex is stored as value, a copy of it is made while assigning. But map maps hold references to an underlying data structure, so the copy of the pointer or the same map is passed. The code locks a copy of the lock while writing to a single map, which cause the problem.
Thats why you don't face any problem when you change sync.RWMutex to *sync.RWMutex. It's better to store reference to structure in map as shown.
package concurrent_hashmap
import (
"hash/fnv"
"sync"
)
type ConcurrentMap struct {
buckets []*ThreadSafeMap
bucketCount uint32
}
type ThreadSafeMap struct {
mapLock sync.RWMutex
hashMap map[string]interface{}
}
func NewConcurrentMap(bucketSize uint32) *ConcurrentMap {
var threadSafeMapInstance *ThreadSafeMap
var bucketOfThreadSafeMap []*ThreadSafeMap
for i := 0; i <= int(bucketSize); i++ {
threadSafeMapInstance = &ThreadSafeMap{sync.RWMutex{}, make(map[string]interface{})}
bucketOfThreadSafeMap = append(bucketOfThreadSafeMap, threadSafeMapInstance)
}
return &ConcurrentMap{bucketOfThreadSafeMap, bucketSize}
}
func (cMap *ConcurrentMap) Put(key string, val interface{}) {
bucketIndex := hash(key) % cMap.bucketCount
bucket := cMap.buckets[bucketIndex]
bucket.mapLock.Lock()
bucket.hashMap[key] = val
bucket.mapLock.Unlock()
}
// Helper
func hash(s string) uint32 {
h := fnv.New32a()
h.Write([]byte(s))
return h.Sum32()
}
It's possible to validate the scenario by modifying the function Put as follows
func (cMap *ConcurrentMap) Put(key string, val interface{}) {
//fmt.Println("index", key)
bucketIndex := 1
bucket := cMap.buckets[bucketIndex]
fmt.Printf("%p %p\n", &(bucket.mapLock), bucket.hashMap)
}
Related
I am calling rest api which expects nonce header. The nonce must be unique timestamp and every consecutive call should have timestamp > previous one. My goal is to launch 10 go routines and from each one do a call to the web api. Since we do not have control over the routine execution order we might end up doing a webapi call with a nonce < previous one. I do not have control over the api implementation.
I have stripped down my code to something very simple which illustrate the problem:
package main
import (
"fmt"
"time"
)
func main() {
count := 10
results := make(chan string, count)
for i := 0; i < 10; i++ {
go someWork(results)
// Enabling the following line would give the
// expected outcome but does look like a hack to me.
// time.Sleep(time.Millisecond)
}
for i := 0; i < count; i++ {
fmt.Println(<-results)
}
}
func someWork(done chan string) {
// prepare http request, do http request, send to done chan the result
done <- time.Now().Format("15:04:05.00000")
}
From the output you can see how we have timestamps which are not chronologically ordered:
13:18:26.98549
13:18:26.98560
13:18:26.98561
13:18:26.98553
13:18:26.98556
13:18:26.98556
13:18:26.98557
13:18:26.98558
13:18:26.98559
13:18:26.98555
What would be the idiomatic way to achieve the expected outcome without adding the sleep line?
Thanks!
As I understand you only need to synchronize (serialize) the goroutines till request send part, that is where the timestamp and nonce need to be sequential. Response processing can be in parallel.
You can use a mutex for this case like in below code
package main
import (
"fmt"
"sync"
"time"
)
func main() {
count := 10
results := make(chan string, count)
var mutex sync.Mutex
for i := 0; i < count; i++ {
go someWork(&mutex, results)
}
for i := 0; i < count; i++ {
fmt.Println(<-results)
}
}
func someWork(mut *sync.Mutex, done chan string) {
// Lock the mutex, go routine getting lock here,
// is guaranteed to create the timestamp and
// perform the request before any other
mut.Lock()
// Get the timestamp
myTimeStamp := time.Now().Format("15:04:05.00000")
// prepare http request, do http request
// Unlock the mutex
mut.Unlock()
// Process response
// send to done chan the result
done <- myTimeStamp
}
But still some duplicate timestamps, may be need more fine-grained timestamp, but that is up to the use case.
I think: you can use a WaitGroup, for example:
package main
import (
"fmt"
"sync"
"time"
)
var wg sync.WaitGroup = sync.WaitGroup{}
var ct int = 0
func hello() {
fmt.Printf("Hello Go %v\n", time.Now().Format("15:04:05.00000"))
// when you are done, call done:
time.Sleep(time.Duration(10 * int(time.Second)))
wg.Done()
}
func main() {
for i := 0; i < 10; i++ {
wg.Add(1)
go hello()
wg.Wait()
}
}
Like if I have a struct with an array and I want to do something like this
type Paxos struct {
peers []string
}
for _, peer := range px.peers {
\\do stuff
}
My routines/threads will never modify the peers array, just read from it. Peers is an array of server addresses, and servers may fail but that wouldn't affect the peers array (later rpc calls would just fail)
If no writes are involved, concurrent reads are always safe, regardless of the data structure. However, as soon as even a single concurrency-unsafe write to a variable is involved, you need to serialise concurrent access (both writes and reads) to the variable.
Moreover, you can safely write to elements of a slice or an array under the condition that no more than one goroutine write to a given element.
For instance, if you run the following programme with the race detector on, it's likely to report a race condition, because multiple goroutines concurrently modify variable results without precautions:
package main
import (
"fmt"
"sync"
)
func main() {
const n = 8
var results []int
var wg sync.WaitGroup
wg.Add(n)
for i := 0; i < n; i++ {
i := i
go func() {
defer wg.Done()
results = append(results, square(i))
}()
}
wg.Wait()
fmt.Println(results)
}
func square(i int) int {
return i * i
}
However, the following programme contains no such no synchronization bug, because each element of the slice is modified by a single goroutine:
package main
import (
"fmt"
"sync"
)
func main() {
const n = 8
results := make([]int, n)
var wg sync.WaitGroup
wg.Add(n)
for i := 0; i < n; i++ {
i := i
go func() {
defer wg.Done()
results[i] = square(i)
}()
}
wg.Wait()
fmt.Println(results)
}
func square(i int) int {
return i * i
}
Yes, reads are thread-safe in Go and virtually all other languages. You're just looking up an address in memory and seeing what is there. If nothing is attempting to modify that memory, then you can have as many concurrent reads as you'd like.
I am making a cache wrapper around a database. To account for possibly slow database calls, I was thinking of a mutex per key (pseudo Go code):
mutexes = map[string]*sync.Mutex // instance variable
mutexes[key].Lock()
defer mutexes[key].Unlock()
if value, ok := cache.find(key); ok {
return value
}
value = databaseCall(key)
cache.save(key, value)
return value
However I don't want my map to grow too much. My cache is an LRU and I want to have a fixed size for some other reasons not mentioned here. I would like to do something like
delete(mutexes, key)
when all the locks on the key are over but... that doesn't look thread safe to me... How should I do it?
Note: I found this question
In Go, can we synchronize each key of a map using a lock per key? but no answer
A map of mutexes is an efficient way to accomplish this, however the map itself must also be synchronized. A reference count can be used to keep track of entries in concurrent use and remove them when no longer needed. Here is a working map of mutexes complete with a test and benchmark.
(UPDATE: This package provides similar functionality: https://pkg.go.dev/golang.org/x/sync/singleflight )
mapofmu.go
// Package mapofmu provides locking per-key.
// For example, you can acquire a lock for a specific user ID and all other requests for that user ID
// will block until that entry is unlocked (effectively your work load will be run serially per-user ID),
// and yet have work for separate user IDs happen concurrently.
package mapofmu
import (
"fmt"
"sync"
)
// M wraps a map of mutexes. Each key locks separately.
type M struct {
ml sync.Mutex // lock for entry map
ma map[interface{}]*mentry // entry map
}
type mentry struct {
m *M // point back to M, so we can synchronize removing this mentry when cnt==0
el sync.Mutex // entry-specific lock
cnt int // reference count
key interface{} // key in ma
}
// Unlocker provides an Unlock method to release the lock.
type Unlocker interface {
Unlock()
}
// New returns an initalized M.
func New() *M {
return &M{ma: make(map[interface{}]*mentry)}
}
// Lock acquires a lock corresponding to this key.
// This method will never return nil and Unlock() must be called
// to release the lock when done.
func (m *M) Lock(key interface{}) Unlocker {
// read or create entry for this key atomically
m.ml.Lock()
e, ok := m.ma[key]
if !ok {
e = &mentry{m: m, key: key}
m.ma[key] = e
}
e.cnt++ // ref count
m.ml.Unlock()
// acquire lock, will block here until e.cnt==1
e.el.Lock()
return e
}
// Unlock releases the lock for this entry.
func (me *mentry) Unlock() {
m := me.m
// decrement and if needed remove entry atomically
m.ml.Lock()
e, ok := m.ma[me.key]
if !ok { // entry must exist
m.ml.Unlock()
panic(fmt.Errorf("Unlock requested for key=%v but no entry found", me.key))
}
e.cnt-- // ref count
if e.cnt < 1 { // if it hits zero then we own it and remove from map
delete(m.ma, me.key)
}
m.ml.Unlock()
// now that map stuff is handled, we unlock and let
// anything else waiting on this key through
e.el.Unlock()
}
mapofmu_test.go:
package mapofmu
import (
"math/rand"
"strconv"
"strings"
"sync"
"testing"
"time"
)
func TestM(t *testing.T) {
r := rand.New(rand.NewSource(42))
m := New()
_ = m
keyCount := 20
iCount := 10000
out := make(chan string, iCount*2)
// run a bunch of concurrent requests for various keys,
// the idea is to have a lot of lock contention
var wg sync.WaitGroup
wg.Add(iCount)
for i := 0; i < iCount; i++ {
go func(rn int) {
defer wg.Done()
key := strconv.Itoa(rn)
// you can prove the test works by commenting the locking out and seeing it fail
l := m.Lock(key)
defer l.Unlock()
out <- key + " A"
time.Sleep(time.Microsecond) // make 'em wait a mo'
out <- key + " B"
}(r.Intn(keyCount))
}
wg.Wait()
close(out)
// verify the map is empty now
if l := len(m.ma); l != 0 {
t.Errorf("unexpected map length at test end: %v", l)
}
// confirm that the output always produced the correct sequence
outLists := make([][]string, keyCount)
for s := range out {
sParts := strings.Fields(s)
kn, err := strconv.Atoi(sParts[0])
if err != nil {
t.Fatal(err)
}
outLists[kn] = append(outLists[kn], sParts[1])
}
for kn := 0; kn < keyCount; kn++ {
l := outLists[kn] // list of output for this particular key
for i := 0; i < len(l); i += 2 {
if l[i] != "A" || l[i+1] != "B" {
t.Errorf("For key=%v and i=%v got unexpected values %v and %v", kn, i, l[i], l[i+1])
break
}
}
}
if t.Failed() {
t.Logf("Failed, outLists: %#v", outLists)
}
}
func BenchmarkM(b *testing.B) {
m := New()
b.ResetTimer()
for i := 0; i < b.N; i++ {
// run uncontended lock/unlock - should be quite fast
m.Lock(i).Unlock()
}
}
I wrote a simple similar implementation: mapmutex
But instead of a map of mutexes, in this implementation, a mutex is used to guard the map and each item in the map is used like a 'lock'. The map itself is just simple ordinary map.
I been reading about goroutines and the sync package and my question is... Do I always need to lock unlock when reading writting to data on different goroutines?
For example I have a variable on my server
config := make(map[string]string)
Then on different goroutines I want to read from config. Is it safe to read without using sync or it is not?
I guess writting needs to be done using the sync package. but I am not sure about reading
For example I have a simple in-memory cache system
type Cache interface {
Get(key string) interface{}
Put(key string, expires int64, value interface{})
}
// MemoryCache represents a memory type of cache
type MemoryCache struct {
c map[string]*MemoryCacheValue
rw sync.RWMutex
}
// MemoryCacheValue represents a memory cache value
type MemoryCacheValue struct {
value interface{}
expires int64
}
// NewMemoryCache creates a new memory cache
func NewMemoryCache() Cache {
return &MemoryCache{
c: make(map[string]*MemoryCacheValue),
}
}
// Get stores something into the cache
func (m *MemoryCache) Get(key string) interface{} {
if v, ok := m.c[key]; ok {
return v
}
return nil
}
// Put retrieves something from the cache
func (m *MemoryCache) Put(key string, expires int64, value interface{}) {
m.rw.Lock()
m.c[key] = &MemoryCacheValue{
value,
time.Now().Unix() + expires,
}
m.rw.Unlock()
}
I am acting safe here or I still need to lock unlock when I want to only read?
You're diving into the world of race conditions. The basic rule of thumb is that if ANY routine writes to or changes a piece of data that can be or is read by (or also written to) by any number of other coroutines/threads, you need to have some sort of synchronization system in place.
For example, lets say you have that map. It has ["Joe"] = "Smith" and ["Sean"] = "Howard" in it. One goroutine wants to read the value of ["Joe"]. Another routine is updating ["Joe"] to "Cooper". Which value does the first goroutine read? Depends on which goroutine gets to the data first. That's the race condition, the behavior is undefined and unpredictable.
The easiest method to control that access is with a sync.Mutex. In your case, since some routines only need to read and not write, you can instead use a sync.RWMutex (main difference is that a RWMutex allows any number of threads to read, as long as none are trying to write). You would bake this into the map using a structure like this:
type MutexMap struct {
m map[string]string
*sync.RWMutex
}
Then, in routines that need to read from the map, you would do:
func ReadSomething(o MutexMap, key string) string {
o.RLock() // lock for reading, blocks until the Mutex is ready
defer o.RUnlock() // make SURE you do this, else it will be locked permanently
return o.m[key]
}
And to write:
func WriteSomething(o MutexMap, key, value string) {
o.Lock() // lock for writing, blocks until the Mutex is ready
defer o.Unlock() // again, make SURE you do this, else it will be locked permanently
o.m[key] = value
}
Note that both of these could be written as methods of the struct, rather than functions, if desired.
You can also approach this using channels. You make a controller structure that runs in a goroutine, and you make requests to it over channels. Example:
package main
import "fmt"
type MapCtrl struct {
m map[string]string
ReadCh chan chan map[string]string
WriteCh chan map[string]string
QuitCh chan struct{}
}
func NewMapController() *MapCtrl {
return &MapCtrl{
m: make(map[string]string),
ReadCh: make(chan chan map[string]string),
WriteCh: make(chan map[string]string),
QuitCh: make(chan struct{}),
}
}
func (ctrl *MapCtrl) Control() {
for {
select {
case r := <-ctrl.ReadCh:
fmt.Println("Read request received")
retmap := make(map[string]string)
for k, v := range ctrl.m { // copy map, so it doesn't change in place after return
retmap[k] = v
}
r <- retmap
case w := <-ctrl.WriteCh:
fmt.Println("Write request received with", w)
for k, v := range w {
ctrl.m[k] = v
}
case <-ctrl.QuitCh:
fmt.Println("Quit request received")
return
}
}
}
func main() {
ctrl := NewMapController()
defer close(ctrl.QuitCh)
go ctrl.Control()
m := make(map[string]string)
m["Joe"] = "Smith"
m["Sean"] = "Howard"
ctrl.WriteCh <- m
r := make(chan map[string]string, 1)
ctrl.ReadCh <- r
fmt.Println(<-r)
}
Runnable version
How can I construct a slice out of all of elements consumed from a channel (like Python's list does)? I can use this helper function:
func ToSlice(c chan int) []int {
s := make([]int, 0)
for i := range c {
s = append(s, i)
}
return s
}
but due to the lack of generics, I'll have to write that for every type, won't I? Is there a builtin function that implements this? If not, how can I avoid copying and pasting the above code for every single type I'm using?
If there's only a few instances in your code where the conversion is needed, then there's absolutely nothing wrong with copying the 7 lines of code a few times (or even inlining it where it's used, which reduces it to 4 lines of code and is probably the most readable solution).
If you've really got conversions between lots and lots of types of channels and slices and want something generic, then you can do this with reflection at the cost of ugliness and lack of static typing at the callsite of ChanToSlice.
Here's complete example code for how you can use reflect to solve this problem with a demonstration of it working for an int channel.
package main
import (
"fmt"
"reflect"
)
// ChanToSlice reads all data from ch (which must be a chan), returning a
// slice of the data. If ch is a 'T chan' then the return value is of type
// []T inside the returned interface.
// A typical call would be sl := ChanToSlice(ch).([]int)
func ChanToSlice(ch interface{}) interface{} {
chv := reflect.ValueOf(ch)
slv := reflect.MakeSlice(reflect.SliceOf(reflect.TypeOf(ch).Elem()), 0, 0)
for {
v, ok := chv.Recv()
if !ok {
return slv.Interface()
}
slv = reflect.Append(slv, v)
}
}
func main() {
ch := make(chan int)
go func() {
for i := 0; i < 10; i++ {
ch <- i
}
close(ch)
}()
sl := ChanToSlice(ch).([]int)
fmt.Println(sl)
}
You could make ToSlice() just work on interface{}'s, but the amount of code you save here will likely cost you in complexity elsewhere.
func ToSlice(c chan interface{}) []interface{} {
s := make([]interface{}, 0)
for i := range c {
s = append(s, i)
}
return s
}
Full example at http://play.golang.org/p/wxx-Yf5ESN
That being said: As #Volker said in the comments from the slice (haha) of code you showed it seems like it'd be saner to either process the results in a streaming fashion or "buffer them up" at the generator and just send the slice down the channel.