Sync Map possibly leading increase in ram and goroutines

Sync Map possibly leading increase in ram and goroutines - go

Hi here is the code where I make util called as collector
import (
"context"
"errors"
"sync"
"time"
)
type Collector struct {
keyValMap *sync.Map
}
func (c *Collector) LoadOrWait(key any) (retValue any, availability int, err error) {
value, status := c.getStatusAndValue(key)
switch status {
case 0:
return nil, 0, nil
case 1:
return value, 1, nil
case 2:
ctxWithTimeout, _ := context.WithTimeout(context.Background(), 5 * time.Second)
for {
select {
case <-ctxWithTimeout.Done():
return nil, 0, errRequestTimeout
default:
value, resourceStatus := c.getStatusAndValue(key)
if resourceStatus == 1 {
return value, 1, nil
}
time.Sleep(50 * time.Millisecond)
}
}
}
return nil, 0, errRequestTimeout
}
// Store ...
func (c *Collector) Store(key any, value any) {
c.keyValMap.Store(key, value)
}
func (c *Collector) getStatusAndValue(key any) (retValue any, availability int) {
var empty any
result, loaded := c.keyValMap.LoadOrStore(key, empty)
if loaded && result != empty {
return result, 1
}
if loaded && result == empty {
return empty, 2
}
return nil, 0
}
So the purpose of this util is to act as a cache where similar value is only loaded once but read many times. However when an object of Collector is passed to multiple goroutines I am facing increase in gorotines and ram usage whenever multiple goroutines are trying to use collector cache. Could someone explain if this usage of sync Map is correct. If yes then what might be the cause high number of goroutines / high ram usage

For sure, you're facing possible memory leaks due to not calling the cancel func of the newly created ctxWithTimeout context. In order to fix this change the line to these:
ctxWithTimeout, cancelFunc := context.WithTimeout(context.Background(), requestTimeout)
defer cancelFunc()
Thanks to this, you're always sure to clean up all the resources allocated once the context expires. This should address the issue of the leaks.
About the usage of sync.Map seems good to me.
Let me know if this solves your issue or if there is something else to address, thanks!

You show the code on the reader side of things, but not the code which does the request (and calls .Store(key, value)).
With the code you display :
the first goroutine which tries to access a given key will store your empty value in the map (when executing c.keyValMap.LoadOrStore(key, empty)),
so all goroutines that will come afterwards querying for the same key will enter the "query with timeout" loop -- even if the action that actually runs the request and stores its result in the cache isn't executed.
[after your update]
The code for your collector alone seems to be ok regarding resource consumption : I don't see deadlocks or multiplication of goroutines in that code alone.
You should probably look at other places in your code.
Also, if this structure only grows and never shrinks, it is bound to consume more memory. Do audit your program to evaluate how many different keys can live together in your cache and how much memory the cached values can occupy.

Related

Lock slice before reading and modifying it

My experience working with Go is recent and in reviewing some code, I have seen that while it is write-protected, there is a problem with reading the data. Not with the reading itself, but with possible modifications that can occur between the reading and the modification of the slice.
type ConcurrentSlice struct {
sync.RWMutex
items []Item
}
type Item struct {
Index int
Value Info
}
type Info struct {
Name string
Labels map[string]string
Failure bool
}
As mentioned, the writing is protected in this way:
func (cs *ConcurrentSlice) UpdateOrAppend(item ScalingInfo) {
found := false
i := 0
for inList := range cs.Iter() {
if item.Name == inList.Value.Name{
cs.items[i] = item
found = true
}
i++
}
if !found {
cs.Lock()
defer cs.Unlock()
cs.items = append(cs.items, item)
}
}
func (cs *ConcurrentSlice) Iter() <-chan ConcurrentSliceItem {
c := make(chan ConcurrentSliceItem)
f := func() {
cs.Lock()
defer cs.Unlock()
for index, value := range cs.items {
c <- ConcurrentSliceItem{index, value}
}
close(c)
}
go f()
return c
}
But between collecting the content of the slice and modifying it, modifications can occur.It may be that another routine modifies the same slice and when it is time to assign a value, it no longer exists: slice[i] = item
What would be the right way to deal with this?
I have implemented this method:
func GetList() *ConcurrentSlice {
if list == nil {
denylist = NewConcurrentSlice()
return denylist
}
return denylist
}
And I use it like this:
concurrentSlice := GetList()
concurrentSlice.UpdateOrAppend(item)
But I understand that between the get and the modification, even if it is practically immediate, another routine may have modified the slice. What would be the correct way to perform the two operations atomically? That the slice I read is 100% the one I modify. Because if I try to assign an item to a index that no longer exists, it will break the execution.
Thank you in advance!

The way you are doing the blocking is incorrect, because it does not ensure that the items you return have not been removed. In case of an update, the array would still be at least the same length.
A simpler solution that works could be the following:
func (cs *ConcurrentSlice) UpdateOrAppend(item ScalingInfo) {
found := false
i := 0
cs.Lock()
defer cs.Unlock()
for _, it := range cs.items {
if item.Name == it.Name{
cs.items[i] = it
found = true
}
i++
}
if !found {
cs.items = append(cs.items, item)
}
}

Use a sync.Map if the order of the values is not important.
type Items struct {
m sync.Map
}
func (items *Items) Update(item Info) {
items.m.Store(item.Name, item)
}
func (items *Items) Range(f func(Info) bool) {
items.m.Range(func(key, value any) bool {
return f(value.(Info))
})
}

Data structures 101: always pick the best data structure for your use case. If you’re going to be looking up objects by name, that’s EXACTLY what map is for. If you still need to maintain the order of the items, you use a treemap
Concurrency 101: like transactions, your mutex should be atomic, consistent, and isolated. You’re failing isolation here because the data structure read does not fall inside your mutex lock.
Your code should look something like this:
func {
mutex.lock
defer mutex.unlock
check map or treemap for name
if exists update
else add
}

After some tests, I can say that the situation you fear can indeed happen with sync.RWMutex. I think it could happen with sync.Mutex too, but I can't reproduce that. Maybe I'm missing some informations, or maybe the calls are in order because they all are blocked and the order they redeem the right to lock is ordered in some way.
One way to keep your two calls safe without other routines getting in 'conflict' would be to use an other mutex, for every task on that object. You would lock that mutex before your read and write, and release it when you're done. You would also have to use that mutex on any other call that write (or read) to that object. You can find an implementation of what I'm talking about here in the main.go file. In order to reproduce the issue with RWMutex, you can simply comment the startTask and the endTask calls and the issue is visible in the terminal output.
EDIT : my first answer was wrong as I misinterpreted a test result, and fell in the situation described by OP.

tl;dr;
If ConcurrentSlice is to be used from a single goroutine, the locks are unnecessary, because the way algorithm written there is not going to be any concurrent read/writes to slice elements, or the slice.
If ConcurrentSlice is to be used from multiple goroutines, existings locks are not sufficient. This is because UpdateOrAppend may modify slice elements concurrently.
A safe version woule need two versions of Iter:
This can be called by users of ConcurrentSlice, but it cannot be called from `UpdateOrAppend:
func (cs *ConcurrentSlice) Iter() <-chan ConcurrentSliceItem {
c := make(chan ConcurrentSliceItem)
f := func() {
cs.RLock()
defer cs.RUnlock()
for index, value := range cs.items {
c <- ConcurrentSliceItem{index, value}
}
close(c)
}
go f()
return c
}
and this is only to be called from UpdateOrAppend:
func (cs *ConcurrentSlice) internalIter() <-chan ConcurrentSliceItem {
c := make(chan ConcurrentSliceItem)
f := func() {
// No locking
for index, value := range cs.items {
c <- ConcurrentSliceItem{index, value}
}
close(c)
}
go f()
return c
}
And UpdateOrAppend should be synchronized at the top level:
func (cs *ConcurrentSlice) UpdateOrAppend(item ScalingInfo) {
cs.Lock()
defer cs.Unlock()
....
}
Here's the long version:
This is an interesting piece of code. Based on my understanding of the go memory model, the mutex lock in Iter() is only necessary if there is another goroutine working on this code, and even with that, there is a possible race in the code. However, UpdateOrAppend only modifies elements of the slice with lower indexes than what Iter is working on, so that race never manifests itself.
The race can happen as follows:
The for-loop in iter reads element 0 of the slice
The element is sent through the channel. Thus, the slice receive happens after the first step.
The receiving end potentially updates element 0 of the slice. There is no problem up to here.
Then the sending goroutine reads element 1 of the slice. This is when a race can happen. If step 3 updated index 1 of the slice, the read at step 4 is a race. That is: if step 3 reads the update done by step 4, it is a race. You can see this if you start with i:=1 in UpdateOrAppend, and running it with the -race flag.
But UpdateOrAppend always modifies slice elements that are already seen by Iter when i=0, so this code is safe, even without the lock.
If there will be other goroutines accessing and modifying the structure, you need the Mutex, but you need it to protect the complete UpdateOrAppend method, because only one goroutine should be allowed to run that. You need the mutex to protect the potential updates in the first for-loop, and that mutex has to also include the slice append case, because that may actually modify the slice of the underlying object.
If Iter is only called from UpdateOrAppend, then this single mutex should be sufficient. If however Iter can be called from multiple goroutines, then there is another race possibility. If one UpdateOrAppend is running concurrently with multiple Iter instances, then some of those Iter instances will read from the modified slice elements concurrently, causing a race. So, it should be such that multiple Iters can only run if there are no UpdateOrAppend calls. That is a RWMutex.
But Iter can be called from UpdateOrAppend with a lock, so it cannot really call RLock, otherwise it is a deadlock.
Thus, you need two versions of Iter: one that can be called outside UpdateOrAppend, and that issues RLock in the goroutine, and another that can only be called from UpdateOrAppend and does not call RLock.

context without channels in the same thread of execution

Can't figure out how I can cancel a task if it takes to much to time compute in the same thread of execution via context semantics?
I use this example as a reference point
https://golang.org/src/context/context_test.go
The goal here call a doWork, if doWork takes to much time to compute, GetValueWithDeadline should after a timeout return 0, or if caller called cancel that cancel a wait, (here it main is caller) or the value returned in in give a time window.
The same scenario can be done In a different way. ( separate goroutine sleep, wakeup check value etc, condition on a mutex, etc) but I really want to understand the correct way to use context.
The channel semantic I understand but here I can't achieve the desired effect, the default case
call to a doWork fault under default case and sleep.
package main
import (
"context"
"fmt"
"log"
"math/rand"
"sync"
"time"
)
type Server struct {
lock sync.Mutex
}
func NewServer() *Server {
s := new(Server)
return s
}
func (s *Server) doWork() int {
s.lock.Lock()
defer s.lock.Unlock()
r := rand.Intn(100)
log.Printf("Going to nap for %d", r)
time.Sleep(time.Duration(r) * time.Millisecond)
return r
}
// I take an example from here and it very unclear where is do work executed
// https://golang.org/src/context/context_test.go
func (s *Server) GetValueWithDeadline(ctx context.Context) int {
val := 0
select {
case <- time.After(150 * time.Millisecond):
fmt.Println("overslept")
return 0
case <- ctx.Done():
fmt.Println(ctx.Err())
return 0
default:
val = s.doWork()
}
return all
}
func main() {
rand.Seed(time.Now().UTC().UnixNano())
s := NewServer()
for i :=0; i < 10; i++ {
d := time.Now().Add(50 * time.Millisecond)
ctx, cancel := context.WithDeadline(context.Background(), d)
log.Print(s.GetValueWithDeadline(ctx))
cancel()
}
}
Thank you

There are multiple problems with your approach.
What problem contexts solve
First, the primary reason contexts were invented in Go is that they allow to unify an approach to cancellation of a set of tasks.
To explain this concept using a simple example, consider a client request to some sever; to simplify further let it be an HTTP request.
The client connects to the server, sends some data telling the server what to do to fulfill the request and then waits for the server to respond.
Let's now suppose the request requires elaborate and time-consuming processing on the server — for instance, suppose it needs to perform multiple complex queries to multiple remote database engines, do multiple HTTP requests to external services and then process the acquired results to actually produce the data the client wants.
So the client starts its request and the server goes on with all those requests.
To hide latency of individual tasks the server has to perform to fulfill the request, it runs them in separate goroutines.
Once each goroutine completes the assigned task, it communicates its result (and/or an error) back to the goroutine which handles the client's request, and so on.
Now suppose that the client fails to wait for the response to its request for whatever reason — a network outage, an explicit timeout in the client's software, the user kills the app which initiated the request etc, — there are lots of possibilities.
As you can see, there's little sense for the server to continue spending resources to finish the tasks which were logically bound to the now-dead request: there's no one to hear back the result anyway.
So it makes sense to reap those tasks once we know the request is not going to be completed, and that's where contexts come into play: you can associate each incoming request with a single context and then either pass it itself to any goroutine spawned to carry out a single task required to be done to fulfill the request, or derive another request from that and pass it instead.
Then, as soon as you cancel the "root" request, that signal is propagated through the whole tree of requests derived from the root one.
Now each goroutine which were given a context, might "listen" on it to be notified when that cancellation signal is sent, and once the goroutine notices that it might drop whatever it was busy doing and exit.
In terms of actual context.Context type that signal is called "done" — as in "we're done doing whatever that context is assotiated with", — and that's why the goroutine which wants to know it should stop doing its work listens on a special channel returned by the context's method called Done.
Back to your example
To make it work, you'd do something like:
func (s *Server) doWork(ctx context.Context) int {
s.lock.Lock()
defer s.lock.Unlock()
r := rand.Intn(100)
log.Printf("Going to nap for %d", r)
select {
case <- time.After(time.Duration(r) * time.Millisecond):
return r
case <- ctx.Done():
return -1
}
}
func (s *Server) GetValueWithTimeout(ctx context.Context, maxTime time.Duration) int {
d := time.Now().Add(maxTime)
ctx, cancel := context.WithDeadline(ctx, d)
defer cancel()
return s.doWork(ctx)
}
func main() {
const maxTime = 50 * time.Millisecond
rand.Seed(time.Now().UTC().UnixNano())
s := NewServer()
for i :=0; i < 10; i++ {
v := s.GetValueWithTimeout(context.Background(), maxTime)
log.Print(v)
}
}
(Playground).
So what happens here?
The GetValueWithTimeout method accepts the maximum time it should take the doWork method to produce a value, calculates the deadline, derives a context which cancels itself once the deadline passes from the context passed to the method and calls doWork with the new context object.
The doWork method arms its own timer to go off after a random time interval and then listens on both the context and the timer.
This one is the critical point: the code which performs some unit of work which is supposed to be cancellable must check the context to become "done" actively, by itself.
So, in our toy example, either the doWork's own timer fires first or the deadline of the generated context gets reached first; whatever happens first, makes the select statement unblock and proceed.
Note that if your "do the work" code wold be more involved — it would actually do something instead of sleeping, — you would most probably need to check on the context's status periodically, usually after performing invividual bits of that work.

Golang buffer with concurrent readers

I want to build a buffer in Go that supports multiple concurrent readers and one writer. Whatever is written to the buffer should be read by all readers. New readers are allowed to drop in at any time, which means already written data must be able to be played back for late readers.
The buffer should satisfy the following interface:
type MyBuffer interface {
Write(p []byte) (n int, err error)
NextReader() io.Reader
}
Do you have any suggestions for such an implementation preferably using built in types?

Depending on the nature of this writer and how you use it, keeping everything in memory (to be able to re-play everything for readers joining later) is very risky and might demand a lot of memory, or cause your app to crash due to out of memory.
Using it for a "low-traffic" logger keeping everything in memory is probably ok, but for example streaming some audio or video is most likely not.
If the reader implementations below read all the data that was written to the buffer, their Read() method will report io.EOF, properly. Care must be taken as some constructs (such as bufio.Scanner) may not read more data once io.EOF is encountered (but this is not the flaw of our implementation).
If you want the readers of our buffer to wait if no more data is available in the buffer, to wait until new data is written instead of returning io.EOF, you may wrap the returned readers in a "tail reader" presented here: Go: "tail -f"-like generator.
"Memory-safe" file implementation
Here is an extremely simple and elegant solution. It uses a file to write to, and also uses files to read from. The synchronization is basically provided by the operating system. This does not risk out of memory error, as the data is solely stored on the disk. Depending on the nature of your writer, this may or may not be sufficient.
I will rather use the following interface, because Close() is important in case of files.
type MyBuf interface {
io.WriteCloser
NewReader() (io.ReadCloser, error)
}
And the implementation is extremely simple:
type mybuf struct {
*os.File
}
func (mb *mybuf) NewReader() (io.ReadCloser, error) {
f, err := os.Open(mb.Name())
if err != nil {
return nil, err
}
return f, nil
}
func NewMyBuf(name string) (MyBuf, error) {
f, err := os.Create(name)
if err != nil {
return nil, err
}
return &mybuf{File: f}, nil
}
Our mybuf type embeds *os.File, so we get the Write() and Close() methods for "free".
The NewReader() simply opens the existing, backing file for reading (in read-only mode) and returns it, again taking advantage of that it implements io.ReadCloser.
Creating a new MyBuf value is implementing in the NewMyBuf() function which may also return an error if creating the file fails.
Notes:
Note that since mybuf embeds *os.File, it is possible with a type assertion to "reach" other exported methods of os.File even though they are not part of the MyBuf interface. I do not consider this a flaw, but if you want to disallow this, you have to change the implementation of mybuf to not embed os.File but rather have it as a named field (but then you have to add the Write() and Close() methods yourself, properly forwarding to the os.File field).
In-memory implementation
If the file implementation is not sufficient, here comes an in-memory implementation.
Since we're now in-memory only, we will use the following interface:
type MyBuf interface {
io.Writer
NewReader() io.Reader
}
The idea is to store all byte slices that are ever passed to our buffer. Readers will provide the stored slices when Read() is called, each reader will keep track of how many of the stored slices were served by its Read() method. Synchronization must be dealt with, we will use a simple sync.RWMutex.
Without further ado, here is the implementation:
type mybuf struct {
data [][]byte
sync.RWMutex
}
func (mb *mybuf) Write(p []byte) (n int, err error) {
if len(p) == 0 {
return 0, nil
}
// Cannot retain p, so we must copy it:
p2 := make([]byte, len(p))
copy(p2, p)
mb.Lock()
mb.data = append(mb.data, p2)
mb.Unlock()
return len(p), nil
}
type mybufReader struct {
mb *mybuf // buffer we read from
i int // next slice index
data []byte // current data slice to serve
}
func (mbr *mybufReader) Read(p []byte) (n int, err error) {
if len(p) == 0 {
return 0, nil
}
// Do we have data to send?
if len(mbr.data) == 0 {
mb := mbr.mb
mb.RLock()
if mbr.i < len(mb.data) {
mbr.data = mb.data[mbr.i]
mbr.i++
}
mb.RUnlock()
}
if len(mbr.data) == 0 {
return 0, io.EOF
}
n = copy(p, mbr.data)
mbr.data = mbr.data[n:]
return n, nil
}
func (mb *mybuf) NewReader() io.Reader {
return &mybufReader{mb: mb}
}
func NewMyBuf() MyBuf {
return &mybuf{}
}
Note that the general contract of Writer.Write() includes that an implementation must not retain the passed slice, so we have to make a copy of it before "storing" it.
Also note that the Read() of readers attempts to lock for minimal amount of time. That is, it only locks if we need new data slice from buffer, and only does read-locking, meaning if the reader has a partial data slice, will send that in Read() without locking and touching the buffer.

I linked to the append only commit log, because it seems very similar to your requirements. I am pretty new to distributed systems and the commit log so I may be butchering a couple of the concepts, but the kafka introduction clearly explains everything with nice charts.
Go is also pretty new to me, so i'm sure there's a better way to do it:
But perhaps you could model your buffer as a slice, I think a couple of cases:
buffer has no readers, new data is written to the buffer, buffer length grows
buffer has one/many reader(s):
reader subscribes to buffer
buffer creates and returns a channel to that client
buffer maintains a list of client channels
write occurs -> loops through all client channels and publishes to it (pub sub)
This addresses a pubsub real time consumer stream, where messages are fanned out, but does not address the backfill.
Kafka enables a backfill and their intro illustrates how it can be done :)
This offset is controlled by the consumer: normally a consumer will
advance its offset linearly as it reads records, but, in fact, since
the position is controlled by the consumer it can consume records in
any order it likes. For example a consumer can reset to an older
offset to reprocess data from the past or skip ahead to the most
recent record and start consuming from "now".
This combination of features means that Kafka consumers are very
cheap—they can come and go without much impact on the cluster or on
other consumers. For example, you can use our command line tools to
"tail" the contents of any topic without changing what is consumed by
any existing consumers.

I had to do something similar as part of an experiment, so sharing:
type MultiReaderBuffer struct {
mu sync.RWMutex
buf []byte
}
func (b *MultiReaderBuffer) Write(p []byte) (n int, err error) {
if len(p) == 0 {
return 0, nil
}
b.mu.Lock()
b.buf = append(b.buf, p...)
b.mu.Unlock()
return len(p), nil
}
func (b *MultiReaderBuffer) NewReader() io.Reader {
return &mrbReader{mrb: b}
}
type mrbReader struct {
mrb *MultiReaderBuffer
off int
}
func (r *mrbReader) Read(p []byte) (n int, err error) {
if len(p) == 0 {
return 0, nil
}
r.mrb.mu.RLock()
n = copy(p, r.mrb.buf[r.off:])
r.mrb.mu.RUnlock()
if n == 0 {
return 0, io.EOF
}
r.off += n
return n, nil
}

Concurrent access to maps with 'range' in Go

The "Go maps in action" entry in the Go blog states:
Maps are not safe for concurrent use: it's not defined what happens when you read and write to them simultaneously. If you need to read from and write to a map from concurrently executing goroutines, the accesses must be mediated by some kind of synchronization mechanism. One common way to protect maps is with sync.RWMutex.
However, one common way to access maps is to iterate over them with the range keyword. It is not clear if for the purposes of concurrent access, execution inside a range loop is a "read", or just the "turnover" phase of that loop. For example, the following code may or may not run afoul of the "no concurrent r/w on maps" rule, depending on the specific semantics / implementation of the range operation:
var testMap map[int]int
testMapLock := make(chan bool, 1)
testMapLock <- true
testMapSequence := 0
...
func WriteTestMap(k, v int) {
<-testMapLock
testMap[k] = v
testMapSequence++
testMapLock<-true
}
func IterateMapKeys(iteratorChannel chan int) error {
<-testMapLock
defer func() {
testMapLock <- true
}
mySeq := testMapSequence
for k, _ := range testMap {
testMapLock <- true
iteratorChannel <- k
<-testMapLock
if mySeq != testMapSequence {
close(iteratorChannel)
return errors.New("concurrent modification")
}
}
return nil
}
The idea here is that the range "iterator" is open when the second function is waiting for a consumer to take the next value, and the writer is not blocked at that time. However, it is never the case that two reads in a single iterator are on either side of a write - this is a "fail fast" iterator, the borrow a Java term.
Is there anything anywhere in the language specification or other documents that indicates if this is a legitimate thing to do, however? I could see it going either way, and the above quoted document is not clear on exactly what consititutes a "read". The documentation seems totally quiet on the concurrency aspects of the for/range statement.
(Please note this question is about the currency of for/range, but not a duplicate of: Golang concurrent map access with range - the use case is completely different and I am asking about the precise locking requirement wrt the 'range' keyword here!)

You are using a for statement with a range expression. Quoting from Spec: For statements:
The range expression is evaluated once before beginning the loop, with one exception: if the range expression is an array or a pointer to an array and at most one iteration variable is present, only the range expression's length is evaluated; if that length is constant, by definition the range expression itself will not be evaluated.
We're ranging over a map, so it's not an exception: the range expression is evaluated only once before beginning the loop. The range expression is simply a map variable testMap:
for k, _ := range testMap {}
The map value does not include the key-value pairs, it only points to a data structure that does. Why is this important? Because the map value is only evaluated once, and if later pairs are added to the map, the map value –evaluated once before the loop– will be a map that still points to a data structure that includes those new pairs. This is in contrast to ranging over a slice (which would be evaluated once too), which is also only a header pointing to a backing array holding the elements; but if elements are added to the slice during the iteration, even if that does not result in allocating and copying over to a new backing array, they will not be included in the iteration (because the slice header also contains the length - already evaluated). Appending elements to a slice may result in a new slice value, but adding pairs to a map will not result in a new map value.
Now on to iteration:
for k, v := range testMap {
t1 := time.Now()
someFunction()
t2 := time.Now()
}
Before we enter into the block, before the t1 := time.Now() line k and v variables are holding the values of the iteration, they are already read out from the map (else they couldn't hold the values). Question: do you think the map is read by the for ... range statement between t1 and t2? Under what circumstances could that happen? We have here a single goroutine that is executing someFunc(). To be able to access the map by the for statement, that would either require another goroutine, or it would require to suspend someFunc(). Obviously neither of those happen. (The for ... range construct is not a multi-goroutine monster.) No matter how many iterations there are, while someFunc() is executed, the map is not accessed by the for statement.
So to answer one of your questions: the map is not accessed inside the for block when executing an iteration, but it is accessed when the k and v values are set (assigned) for the next iteration. This implies that the following iteration over the map is safe for concurrent access:
var (
testMap = make(map[int]int)
testMapLock = &sync.RWMutex{}
)
func IterateMapKeys(iteratorChannel chan int) error {
testMapLock.RLock()
defer testMapLock.RUnlock()
for k, v := range testMap {
testMapLock.RUnlock()
someFunc()
testMapLock.RLock()
if someCond {
return someErr
}
}
return nil
}
Note that unlocking in IterateMapKeys() should (must) happen as a deferred statement, as in your original code you may return "early" with an error, in which case you didn't unlock, which means the map remained locked! (Here modeled by if someCond {...}).
Also note that this type of locking only ensures locking in case of concurrent access. It does not prevent a concurrent goroutine to modify (e.g. add a new pair) the map. The modification (if properly guarded with write lock) will be safe, and the loop may continue, but there is no guarantee that the for loop will iterate over the new pair:
If map entries that have not yet been reached are removed during iteration, the corresponding iteration values will not be produced. If map entries are created during iteration, that entry may be produced during the iteration or may be skipped. The choice may vary for each entry created and from one iteration to the next.
The write-lock-guarded modification may look like this:
func WriteTestMap(k, v int) {
testMapLock.Lock()
defer testMapLock.Unlock()
testMap[k] = v
}
Now if you release the read lock in the block of the for, a concurrent goroutine is free to grab the write lock and make modifications to the map. In your code:
testMapLock <- true
iteratorChannel <- k
<-testMapLock
When sending k on the iteratorChannel, a concurrent goroutine may modify the map. This is not just an "unlucky" scenario, sending a value on a channel is often a "blocking" operation, if the channel's buffer is full, another goroutine must be ready to receive in order for the send operation to proceed. Sending a value on a channel is a good scheduling point for the runtime to run other goroutines even on the same OS thread, not to mention if there are multiple OS threads, of which one may already be "waiting" for the write lock in order to carry out a map modification.
To sum the last part: you releasing the read lock inside the for block is like yelling to others: "Come, modify the map now if you dare!" Consequently in your code encountering that mySeq != testMapSequence is very likely. See this runnable example to demonstrate it (it's a variation of your example):
package main
import (
"fmt"
"math/rand"
"sync"
)
var (
testMap = make(map[int]int)
testMapLock = &sync.RWMutex{}
testMapSequence int
)
func main() {
go func() {
for {
k := rand.Intn(10000)
WriteTestMap(k, 1)
}
}()
ic := make(chan int)
go func() {
for _ = range ic {
}
}()
for {
if err := IterateMapKeys(ic); err != nil {
fmt.Println(err)
}
}
}
func WriteTestMap(k, v int) {
testMapLock.Lock()
defer testMapLock.Unlock()
testMap[k] = v
testMapSequence++
}
func IterateMapKeys(iteratorChannel chan int) error {
testMapLock.RLock()
defer testMapLock.RUnlock()
mySeq := testMapSequence
for k, _ := range testMap {
testMapLock.RUnlock()
iteratorChannel <- k
testMapLock.RLock()
if mySeq != testMapSequence {
//close(iteratorChannel)
return fmt.Errorf("concurrent modification %d", testMapSequence)
}
}
return nil
}
Example output:
concurrent modification 24
concurrent modification 41
concurrent modification 463
concurrent modification 477
concurrent modification 482
concurrent modification 496
concurrent modification 508
concurrent modification 521
concurrent modification 525
concurrent modification 535
concurrent modification 541
concurrent modification 555
concurrent modification 561
concurrent modification 565
concurrent modification 570
concurrent modification 577
concurrent modification 591
concurrent modification 593
We're encountering concurrent modification quite often!
Do you want to avoid this kind of concurrent modification? The solution is quite simple: don't release the read lock inside the for. Also run your app with the -race option to detect race conditions: go run -race testmap.go
Final thoughts
The language spec clearly allows you to modify the map in the same goroutine while ranging over it, this is what the previous quote relates to ("If map entries that have not yet been reached are removed during iteration.... If map entries are created during iteration..."). Modifying the map in the same goroutine is allowed and is safe, but how it is handled by the iterator logic is not defined.
If the map is modified in another goroutine, if you use proper synchronization, The Go Memory Model guarantees that the goroutine with the for ... range will observe all modifications, and the iterator logic will see it just as if "its own" goroutine would have modified it – which is allowed as stated before.

The unit of concurrent access for a for range loop over a map is the map. Go maps in action.
A map is a dynamic data structure that changes for inserts, updates and deletes. Inside the Map Implementation. For example,
The iteration order over maps is not specified and is not guaranteed
to be the same from one iteration to the next. If map entries that
have not yet been reached are removed during iteration, the
corresponding iteration values will not be produced. If map entries
are created during iteration, that entry may be produced during the
iteration or may be skipped. The choice may vary for each entry
created and from one iteration to the next. If the map is nil, the
number of iterations is 0. For statements, The Go Programming
Language Specification
Reading a map with a for range loop with interleaved inserts, updates and deletes is unlikely to be useful.
Lock the map:
package main
import (
"sync"
)
var racer map[int]int
var race sync.RWMutex
func Reader() {
race.RLock() // Lock map
for k, v := range racer {
_, _ = k, v
}
race.RUnlock()
}
func Write() {
for i := 0; i < 1e6; i++ {
race.Lock()
racer[i/2] = i
race.Unlock()
}
}
func main() {
racer = make(map[int]int)
Write()
go Write()
Reader()
}
Don't lock after the read -- fatal error: concurrent map iteration and map write:
package main
import (
"sync"
)
var racer map[int]int
var race sync.RWMutex
func Reader() {
for k, v := range racer {
race.RLock() // Lock after read
_, _ = k, v
race.RUnlock()
}
}
func Write() {
for i := 0; i < 1e6; i++ {
race.Lock()
racer[i/2] = i
race.Unlock()
}
}
func main() {
racer = make(map[int]int)
Write()
go Write()
Reader()
}
Use the Go Data Race Detector. Read Introducing the Go Race Detector.

How to use channels to safely synchronise data in Go

Below is an example of how to use mutex lock in order to safely access data. How would I go about doing the same with the use of CSP (communication sequential processes) instead of using mutex lock’s and unlock’s?
type Stack struct {
top *Element
size int
sync.Mutex
}
func (ss *Stack) Len() int {
ss.Lock()
size := ss.size
ss.Unlock()
return size
}
func (ss *Stack) Push(value interface{}) {
ss.Lock()
ss.top = &Element{value, ss.top}
ss.size++
ss.Unlock()
}
func (ss *SafeStack) Pop() (value interface{}) {
ss.Lock()
size := ss.size
ss.Unlock()
if size > 0 {
ss.Lock()
value, ss.top = ss.top.value, ss.top.next
ss.size--
ss.Unlock()
return
}
return nil
}

If you actually were to look at how Go implements channels, you'd essentially see a mutex around an array with some additional thread handling to block execution until the value is passed through. A channel's job is to move data from one spot in memory to another with ease. Therefore where you have locks and unlocks, you'd have things like this example:
func example() {
resChan := make(int chan)
go func(){
resChan <- 1
}()
go func(){
res := <-resChan
}
}
So in the example, the first goroutine is blocked after sending the value until the second goroutine reads from the channel.
To do this in Go with mutexes, one would use sync.WaitGroup which will add one to the group on setting the value, then release it from the group and the second goroutine will lock and then unlock the value.
The oddities in your example are 1 no goroutines, so it's all happening in a single main goroutine and the locks are being used more traditionally (as in c thread like) so channels won't really accomplish anything. The example you have would be considered an anti-pattern, like the golang proverb says "Don't communicate by sharing memory, share memory by communicating."

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Sync Map possibly leading increase in ram and goroutines - go

Related

Lock slice before reading and modifying it

context without channels in the same thread of execution

Golang buffer with concurrent readers

Concurrent access to maps with 'range' in Go

How to use channels to safely synchronise data in Go

Categories

Resources