I am making a cache wrapper around a database. To account for possibly slow database calls, I was thinking of a mutex per key (pseudo Go code):
mutexes = map[string]*sync.Mutex // instance variable
mutexes[key].Lock()
defer mutexes[key].Unlock()
if value, ok := cache.find(key); ok {
return value
}
value = databaseCall(key)
cache.save(key, value)
return value
However I don't want my map to grow too much. My cache is an LRU and I want to have a fixed size for some other reasons not mentioned here. I would like to do something like
delete(mutexes, key)
when all the locks on the key are over but... that doesn't look thread safe to me... How should I do it?
Note: I found this question
In Go, can we synchronize each key of a map using a lock per key? but no answer
A map of mutexes is an efficient way to accomplish this, however the map itself must also be synchronized. A reference count can be used to keep track of entries in concurrent use and remove them when no longer needed. Here is a working map of mutexes complete with a test and benchmark.
(UPDATE: This package provides similar functionality: https://pkg.go.dev/golang.org/x/sync/singleflight )
mapofmu.go
// Package mapofmu provides locking per-key.
// For example, you can acquire a lock for a specific user ID and all other requests for that user ID
// will block until that entry is unlocked (effectively your work load will be run serially per-user ID),
// and yet have work for separate user IDs happen concurrently.
package mapofmu
import (
"fmt"
"sync"
)
// M wraps a map of mutexes. Each key locks separately.
type M struct {
ml sync.Mutex // lock for entry map
ma map[interface{}]*mentry // entry map
}
type mentry struct {
m *M // point back to M, so we can synchronize removing this mentry when cnt==0
el sync.Mutex // entry-specific lock
cnt int // reference count
key interface{} // key in ma
}
// Unlocker provides an Unlock method to release the lock.
type Unlocker interface {
Unlock()
}
// New returns an initalized M.
func New() *M {
return &M{ma: make(map[interface{}]*mentry)}
}
// Lock acquires a lock corresponding to this key.
// This method will never return nil and Unlock() must be called
// to release the lock when done.
func (m *M) Lock(key interface{}) Unlocker {
// read or create entry for this key atomically
m.ml.Lock()
e, ok := m.ma[key]
if !ok {
e = &mentry{m: m, key: key}
m.ma[key] = e
}
e.cnt++ // ref count
m.ml.Unlock()
// acquire lock, will block here until e.cnt==1
e.el.Lock()
return e
}
// Unlock releases the lock for this entry.
func (me *mentry) Unlock() {
m := me.m
// decrement and if needed remove entry atomically
m.ml.Lock()
e, ok := m.ma[me.key]
if !ok { // entry must exist
m.ml.Unlock()
panic(fmt.Errorf("Unlock requested for key=%v but no entry found", me.key))
}
e.cnt-- // ref count
if e.cnt < 1 { // if it hits zero then we own it and remove from map
delete(m.ma, me.key)
}
m.ml.Unlock()
// now that map stuff is handled, we unlock and let
// anything else waiting on this key through
e.el.Unlock()
}
mapofmu_test.go:
package mapofmu
import (
"math/rand"
"strconv"
"strings"
"sync"
"testing"
"time"
)
func TestM(t *testing.T) {
r := rand.New(rand.NewSource(42))
m := New()
_ = m
keyCount := 20
iCount := 10000
out := make(chan string, iCount*2)
// run a bunch of concurrent requests for various keys,
// the idea is to have a lot of lock contention
var wg sync.WaitGroup
wg.Add(iCount)
for i := 0; i < iCount; i++ {
go func(rn int) {
defer wg.Done()
key := strconv.Itoa(rn)
// you can prove the test works by commenting the locking out and seeing it fail
l := m.Lock(key)
defer l.Unlock()
out <- key + " A"
time.Sleep(time.Microsecond) // make 'em wait a mo'
out <- key + " B"
}(r.Intn(keyCount))
}
wg.Wait()
close(out)
// verify the map is empty now
if l := len(m.ma); l != 0 {
t.Errorf("unexpected map length at test end: %v", l)
}
// confirm that the output always produced the correct sequence
outLists := make([][]string, keyCount)
for s := range out {
sParts := strings.Fields(s)
kn, err := strconv.Atoi(sParts[0])
if err != nil {
t.Fatal(err)
}
outLists[kn] = append(outLists[kn], sParts[1])
}
for kn := 0; kn < keyCount; kn++ {
l := outLists[kn] // list of output for this particular key
for i := 0; i < len(l); i += 2 {
if l[i] != "A" || l[i+1] != "B" {
t.Errorf("For key=%v and i=%v got unexpected values %v and %v", kn, i, l[i], l[i+1])
break
}
}
}
if t.Failed() {
t.Logf("Failed, outLists: %#v", outLists)
}
}
func BenchmarkM(b *testing.B) {
m := New()
b.ResetTimer()
for i := 0; i < b.N; i++ {
// run uncontended lock/unlock - should be quite fast
m.Lock(i).Unlock()
}
}
I wrote a simple similar implementation: mapmutex
But instead of a map of mutexes, in this implementation, a mutex is used to guard the map and each item in the map is used like a 'lock'. The map itself is just simple ordinary map.
Related
I have the following piece of code. I'm trying to run 3 GO routines at the same time never exceeding three. This works as expected, but the code is supposed to be running updates a table in the DB.
So the first routine processes the first 50, then the second 50, and then third 50, and it repeats. I don't want two routines processing the same rows at the same time and due to how long the update takes, this happens almost every time.
To solve this, I started flagging the rows with a new column processing which is a bool. I set it to true for all rows to be updated when the routine starts and sleep the script for 6 seconds to allow the flag to be updated.
This works for a random amount of time, but every now and then, I'll see 2-3 jobs processing the same rows again. I feel like the method I'm using to prevent duplicate updates is a bit janky and was wondering if there was a better way.
stopper := make(chan struct{}, 3)
var counter int
for {
counter++
stopper <- struct{}{}
go func(db *sqlx.DB, c int) {
fmt.Println("start")
updateTables(db)
fmt.Println("stop"b)
<-stopper
}(db, counter)
time.Sleep(6 * time.Second)
}
in updateTables
var ids[]string
err := sqlx.Select(db, &data, `select * from table_data where processing = false `)
if err != nil {
panic(err)
}
for _, row:= range data{
list = append(ids, row.Id)
}
if len(rows) == 0 {
return
}
for _, row:= range data{
_, err = db.Exec(`update table_data set processing = true where id = $1, row.Id)
if err != nil {
panic(err)
}
}
// Additional row processing
I think there's a misunderstanding on approach to go routines in this case.
Go routines to do these kind of work should be approached like worker Threads, using channels as the communication method in between the main routine (which will be doing the synchronization) and the worker go routines (which will be doing the actual job).
package main
import (
"log"
"sync"
"time"
)
type record struct {
id int
}
func main() {
const WORKER_COUNT = 10
recordschan := make(chan record)
var wg sync.WaitGroup
for k := 0; k < WORKER_COUNT; k++ {
wg.Add(1)
// Create the worker which will be doing the updates
go func(workerID int) {
defer wg.Done() // Marking the worker as done
for record := range recordschan {
updateRecord(record)
log.Printf("req %d processed by worker %d", record.id, workerID)
}
}(k)
}
// Feeding the records channel
for _, record := range fetchRecords() {
recordschan <- record
}
// Closing our channel as we're not using it anymore
close(recordschan)
// Waiting for all the go routines to finish
wg.Wait()
log.Println("we're done!")
}
func fetchRecords() []record {
result := []record{}
for k := 0; k < 100; k++ {
result = append(result, record{k})
}
return result
}
func updateRecord(req record) {
time.Sleep(200 * time.Millisecond)
}
You can even buffer things in the main go routine if you need to update all the 50 tables at once.
I'm writing a concurrency-safe memo:
package mu
import (
"sync"
)
// Func represents a memoizable function, operating on a string key, to use with a Mu
type Func func(key string) interface{}
// Mu is a cache that memoizes results of an expensive computation
//
// It has a traditional implementation using mutexes.
type Mu struct {
// guards done
mu sync.RWMutex
done map[string]chan bool
memo map[string]interface{}
f Func
}
// Get a string key if it exists, otherwise computes the value and caches it.
//
// Returns the value and whether or not the key existed.
func (c *Mu) Get(key string) (interface{}, bool) {
c.mu.RLock()
_, ok := c.done[key]
c.mu.RUnlock()
if ok {
return c.get(key), true
}
c.mu.Lock()
_, ok = c.done[key]
if ok {
c.mu.Unlock()
} else {
c.done[key] = make(chan bool)
c.mu.Unlock()
v := c.f(key)
c.memo[key] = v
close(c.done[key])
}
return c.get(key), ok
}
// get returns the value of key, blocking on an existing computation
func (c *Mu) get(key string) interface{} {
<-c.done[key]
v, _ := c.memo[key]
return v
}
As you can see, there's a mutex guarding the done field, which is used
to signal to other goroutines that a computation for a key is pending or done. This avoids duplicate computations (calls to c.f(key)) for the same key.
My question is around the guarantees of this code; by ensuring that the computing goroutine closes the channel after it writes to c.memo, does this guarantee that other goroutines that access c.memo[key] after a blocking call to <-c.done[key] are guaranteed to see the result of the computation?
The short answer is yes.
We can simplify some of the code to get to the essence of why. Consider your Mu struct:
type Mu struct {
memo int
done chan bool
}
We can now define 2 functions, compute and read
func compute(r *Mu) {
time.Sleep(2 * time.Second)
r.memo = 42
close(r.done)
}
func read(r *Mu) {
<-r.done
fmt.Println("Read value: ", r.memo)
}
Here, compute is a computationally heavy task (which we can simulate by sleeping for some time)
Now, in the main function, we start a new compute go routine, along with starting some read go routines at regular intervals:
func main() {
r := &Mu{}
r.done = make(chan bool)
go compute(r)
// this one starts immediately
go read(r)
time.Sleep(time.Second)
// this one starts in the middle of computation
go read(r)
time.Sleep(2*time.Second)
// this one starts after the computation is complete
go read(r)
// This is to prevent the program from terminating immediately
time.Sleep(3 * time.Second)
}
In all three cases, we print out the result of the compute task.
Working code here
When you "close" a channel in go, all statements which wait for the result of the channel (including statements that are executed after it's closed) will block. So provided that the only place that the channel is being closed from is the place where the memo value is computed, you will have that guarantee.
The only place where you should be careful, is to make sure that this channel isn't closed anywhere else in your code.
I'm creating a program which create random bson.M documents, and insert them in database.
The main goroutine generate the documents, and push them to a buffered channel. In the same time, two goroutines fetch the documents from the channel and insert them in database.
This process take a lot of memory and put too much pressure on garbage colelctor, so I'm trying to implement a memory pool to limit the number of allocations
Here is what I have so far:
package main
import (
"fmt"
"math/rand"
"sync"
"time"
"gopkg.in/mgo.v2/bson"
)
type List struct {
L []bson.M
}
func main() {
var rndSrc = rand.NewSource(time.Now().UnixNano())
pool := sync.Pool{
New: func() interface{} {
l := make([]bson.M, 1000)
for i, _ := range l {
m := bson.M{}
l[i] = m
}
return &List{L: l}
},
}
// buffered channel to store generated bson.M docs
var record = make(chan List, 3)
// start worker to insert docs in database
for i := 0; i < 2; i++ {
go func() {
for r := range record {
fmt.Printf("first: %v\n", r.L[0])
// do the insert ect
}
}()
}
// feed the channel
for i := 0; i < 100; i++ {
// get an object from the pool instead of creating a new one
list := pool.Get().(*List)
// re generate the documents
for j, _ := range list.L {
list.L[j]["key1"] = rndSrc.Int63()
}
// push the docs to the channel, and return them to the pool
record <- *list
pool.Put(list)
}
}
But it looks like one List is used 4 times before being regenerated:
> go run test.go
first: map[key1:943279487605002381 key2:4444061964749643436]
first: map[key1:943279487605002381 key2:4444061964749643436]
first: map[key1:943279487605002381 key2:4444061964749643436]
first: map[key1:943279487605002381 key2:4444061964749643436]
first: map[key1:8767993090152084935 key2:8807650676784718781]
...
Why isn't the list regenerated each time ? How can I fix this ?
The problem is that you have created a buffered channel with var record = make(chan List, 3). Hence this code:
record <- *list
pool.Put(list)
May return immediately and the entry will be placed back into the pool before it has been consumed. Hence the underlying slice will likely be modified in another loop iteration before your consumer has had a chance to consume it. Although you are sending List as a value object, remember that the []bson.M is a pointer to an allocated array and will still be pointing to the same memory when you send a new List value. Hence why you are seeing the duplicate output.
To fix, modify your channel to send the List pointer make(chan *List, 3) and change your consumer to put the entry back in the pool once finished, e.g:
for r := range record {
fmt.Printf("first: %v\n", r.L[0])
// do the insert etc
pool.Put(r) // Even if error occurs
}
Your producer should then sent the pointer with the pool.Put removed, i.e.
record <- list
Here is a simple concurrent map that I wrote for learning purpose
package concurrent_hashmap
import (
"hash/fnv"
"sync"
)
type ConcurrentMap struct {
buckets []ThreadSafeMap
bucketCount uint32
}
type ThreadSafeMap struct {
mapLock sync.RWMutex
hashMap map[string]interface{}
}
func NewConcurrentMap(bucketSize uint32) *ConcurrentMap {
var threadSafeMapInstance ThreadSafeMap
var bucketOfThreadSafeMap []ThreadSafeMap
for i := 0; i <= int(bucketSize); i++ {
threadSafeMapInstance = ThreadSafeMap{sync.RWMutex{}, make(map[string]interface{})}
bucketOfThreadSafeMap = append(bucketOfThreadSafeMap, threadSafeMapInstance)
}
return &ConcurrentMap{bucketOfThreadSafeMap, bucketSize}
}
func (cMap *ConcurrentMap) Put(key string, val interface{}) {
bucketIndex := hash(key) % cMap.bucketCount
bucket := cMap.buckets[bucketIndex]
bucket.mapLock.Lock()
bucket.hashMap[key] = val
bucket.mapLock.Unlock()
}
// Helper
func hash(s string) uint32 {
h := fnv.New32a()
h.Write([]byte(s))
return h.Sum32()
}
I am trying to write a simple benchmark and I find that synchronize access will work correctly but concurrent access will get
fatal error: concurrent map writes
Here is my benchmark run with go test -bench=. -race
package concurrent_hashmap
import (
"testing"
"runtime"
"math/rand"
"strconv"
"sync"
)
// Concurrent does not work
func BenchmarkMyFunc(b *testing.B) {
var wg sync.WaitGroup
runtime.GOMAXPROCS(runtime.NumCPU())
my_map := NewConcurrentMap(uint32(4))
for n := 0; n < b.N; n++ {
go insert(my_map, wg)
}
wg.Wait()
}
func insert(my_map *ConcurrentMap, wg sync.WaitGroup) {
wg.Add(1)
var rand_int int
for element_num := 0; element_num < 1000; element_num++ {
rand_int = rand.Intn(100)
my_map.Put(strconv.Itoa(rand_int), rand_int)
}
defer wg.Done()
}
// This works
func BenchmarkMyFuncSynchronize(b *testing.B) {
my_map := NewConcurrentMap(uint32(4))
for n := 0; n < b.N; n++ {
my_map.Put(strconv.Itoa(123), 123)
}
}
The WARNING: DATA RACE is saying that bucket.hashMap[key] = val is causing the problem, but I am confused on why that is possible, since I lock that logic whenever write is happening.
I think I am missing something basic, can someone point out my mistake?
Thanks
Edit1:
Not sure if this helps but here is what my mutex looks like if I don't lock anything
{{0 0} 0 0 0 0}
Here is what it looks like if I lock the write
{{1 0} 0 0 -1073741824 0}
Not sure why my readerCount is a low negative number
Edit:2
I think I find where the issue is at, but not sure why I have to code that way
The issue is
type ThreadSafeMap struct {
mapLock sync.RWMutex // This is causing problem
hashMap map[string]interface{}
}
it should be
type ThreadSafeMap struct {
mapLock *sync.RWMutex
hashMap map[string]interface{}
}
Another weird thing is that in Put if I put print statement inside lock
bucket.mapLock.Lock()
fmt.Println("start")
fmt.Println(bucket)
fmt.Println(bucketIndex)
fmt.Println(bucket.mapLock)
fmt.Println(&bucket.mapLock)
bucket.hashMap[key] = val
defer bucket.mapLock.Unlock()
The following prints is possible
start
start
{0x4212861c0 map[123:123]}
{0x4212241c0 map[123:123]}
Its weird because each start printout should be follow with 4 lines of bucket info since you cannot have start back to back because that would indicate that multiple thread is access the line inside lock
Also for some reason each bucket.mapLock have different address even if I make the bucketIndex static, that indicate that I am not even accessing the same lock.
But despite the above weirdness changing mutex to pointer solves my problem
I would love to find out why I need pointers for mutex and why the prints seem to indicate multiple thread is accessing the lock and why each lock has different address.
The problem is with the statement
bucket := cMap.buckets[bucketIndex]
bucket now contains copy of the ThreadSafeMap at that index. As sync.RWMutex is stored as value, a copy of it is made while assigning. But map maps hold references to an underlying data structure, so the copy of the pointer or the same map is passed. The code locks a copy of the lock while writing to a single map, which cause the problem.
Thats why you don't face any problem when you change sync.RWMutex to *sync.RWMutex. It's better to store reference to structure in map as shown.
package concurrent_hashmap
import (
"hash/fnv"
"sync"
)
type ConcurrentMap struct {
buckets []*ThreadSafeMap
bucketCount uint32
}
type ThreadSafeMap struct {
mapLock sync.RWMutex
hashMap map[string]interface{}
}
func NewConcurrentMap(bucketSize uint32) *ConcurrentMap {
var threadSafeMapInstance *ThreadSafeMap
var bucketOfThreadSafeMap []*ThreadSafeMap
for i := 0; i <= int(bucketSize); i++ {
threadSafeMapInstance = &ThreadSafeMap{sync.RWMutex{}, make(map[string]interface{})}
bucketOfThreadSafeMap = append(bucketOfThreadSafeMap, threadSafeMapInstance)
}
return &ConcurrentMap{bucketOfThreadSafeMap, bucketSize}
}
func (cMap *ConcurrentMap) Put(key string, val interface{}) {
bucketIndex := hash(key) % cMap.bucketCount
bucket := cMap.buckets[bucketIndex]
bucket.mapLock.Lock()
bucket.hashMap[key] = val
bucket.mapLock.Unlock()
}
// Helper
func hash(s string) uint32 {
h := fnv.New32a()
h.Write([]byte(s))
return h.Sum32()
}
It's possible to validate the scenario by modifying the function Put as follows
func (cMap *ConcurrentMap) Put(key string, val interface{}) {
//fmt.Println("index", key)
bucketIndex := 1
bucket := cMap.buckets[bucketIndex]
fmt.Printf("%p %p\n", &(bucket.mapLock), bucket.hashMap)
}
I been reading about goroutines and the sync package and my question is... Do I always need to lock unlock when reading writting to data on different goroutines?
For example I have a variable on my server
config := make(map[string]string)
Then on different goroutines I want to read from config. Is it safe to read without using sync or it is not?
I guess writting needs to be done using the sync package. but I am not sure about reading
For example I have a simple in-memory cache system
type Cache interface {
Get(key string) interface{}
Put(key string, expires int64, value interface{})
}
// MemoryCache represents a memory type of cache
type MemoryCache struct {
c map[string]*MemoryCacheValue
rw sync.RWMutex
}
// MemoryCacheValue represents a memory cache value
type MemoryCacheValue struct {
value interface{}
expires int64
}
// NewMemoryCache creates a new memory cache
func NewMemoryCache() Cache {
return &MemoryCache{
c: make(map[string]*MemoryCacheValue),
}
}
// Get stores something into the cache
func (m *MemoryCache) Get(key string) interface{} {
if v, ok := m.c[key]; ok {
return v
}
return nil
}
// Put retrieves something from the cache
func (m *MemoryCache) Put(key string, expires int64, value interface{}) {
m.rw.Lock()
m.c[key] = &MemoryCacheValue{
value,
time.Now().Unix() + expires,
}
m.rw.Unlock()
}
I am acting safe here or I still need to lock unlock when I want to only read?
You're diving into the world of race conditions. The basic rule of thumb is that if ANY routine writes to or changes a piece of data that can be or is read by (or also written to) by any number of other coroutines/threads, you need to have some sort of synchronization system in place.
For example, lets say you have that map. It has ["Joe"] = "Smith" and ["Sean"] = "Howard" in it. One goroutine wants to read the value of ["Joe"]. Another routine is updating ["Joe"] to "Cooper". Which value does the first goroutine read? Depends on which goroutine gets to the data first. That's the race condition, the behavior is undefined and unpredictable.
The easiest method to control that access is with a sync.Mutex. In your case, since some routines only need to read and not write, you can instead use a sync.RWMutex (main difference is that a RWMutex allows any number of threads to read, as long as none are trying to write). You would bake this into the map using a structure like this:
type MutexMap struct {
m map[string]string
*sync.RWMutex
}
Then, in routines that need to read from the map, you would do:
func ReadSomething(o MutexMap, key string) string {
o.RLock() // lock for reading, blocks until the Mutex is ready
defer o.RUnlock() // make SURE you do this, else it will be locked permanently
return o.m[key]
}
And to write:
func WriteSomething(o MutexMap, key, value string) {
o.Lock() // lock for writing, blocks until the Mutex is ready
defer o.Unlock() // again, make SURE you do this, else it will be locked permanently
o.m[key] = value
}
Note that both of these could be written as methods of the struct, rather than functions, if desired.
You can also approach this using channels. You make a controller structure that runs in a goroutine, and you make requests to it over channels. Example:
package main
import "fmt"
type MapCtrl struct {
m map[string]string
ReadCh chan chan map[string]string
WriteCh chan map[string]string
QuitCh chan struct{}
}
func NewMapController() *MapCtrl {
return &MapCtrl{
m: make(map[string]string),
ReadCh: make(chan chan map[string]string),
WriteCh: make(chan map[string]string),
QuitCh: make(chan struct{}),
}
}
func (ctrl *MapCtrl) Control() {
for {
select {
case r := <-ctrl.ReadCh:
fmt.Println("Read request received")
retmap := make(map[string]string)
for k, v := range ctrl.m { // copy map, so it doesn't change in place after return
retmap[k] = v
}
r <- retmap
case w := <-ctrl.WriteCh:
fmt.Println("Write request received with", w)
for k, v := range w {
ctrl.m[k] = v
}
case <-ctrl.QuitCh:
fmt.Println("Quit request received")
return
}
}
}
func main() {
ctrl := NewMapController()
defer close(ctrl.QuitCh)
go ctrl.Control()
m := make(map[string]string)
m["Joe"] = "Smith"
m["Sean"] = "Howard"
ctrl.WriteCh <- m
r := make(chan map[string]string, 1)
ctrl.ReadCh <- r
fmt.Println(<-r)
}
Runnable version