GoLang sequential goroutines - go

I am new to golang and have a use case where operations on a value of a type have to run in a sequential manner where as operation on value of other type can be run concurrently.
Imagine data is coming from a streaming connection (In-order)
key_name_1, value_1
key_name_2, value_2
key_name_1, value_1
Now, key_name_1, and key_name_2 can be operated by goroutine concurrently.
But as next streamed value (3rd row) is key_name_1 again, so this operation should only be processed by goroutine if the earlier operation (1st row) has finished otherwise it should wait for the 1st operation to finish before it can apply the operation.
For the sake of discussion we can assume that operation is simply
adding the new value to previous value.
What would be the right way to achieve this in golang with highest possible performance ?
The exact use case is database changes are streamed on a queue, now if a value is getting changed its important that onto another database that operation is applied on the same sequence otherwise consistency will get impacted. Conflicts are rare, but can happen.

As a simple solution for mutual exclusivity for a given key you can just use a locked map of ref-counted locks. It's not the most optimal for high loads, but might just suffice in your case.
type processLock struct {
mtx sync.Mutex
refcount int
}
type locker struct {
mtx sync.Mutex
locks map[string]*processLock
}
func (l *locker) acquire(key string) {
l.mtx.Lock()
lk, found := l.locks[key]
if !found {
lk = &processLock{}
l.locks[key] = lk
}
lk.refcount++
l.mtx.Unlock()
lk.mtx.Lock()
}
func (l *locker) release(key string) {
l.mtx.Lock()
lk := l.locks[key]
lk.refcount--
if lk.refcount == 0 {
delete(l.locks, key)
}
l.mtx.Unlock()
lk.mtx.Unlock()
}
Just call acquire(key) before processing a key and release(key) when done with it.
Live demo.
Warning! The code above guarantees exclusivity, but not sequence. To sequentialize the unlocking you need a FIFO mutex.

Related

On which step can a goroutine be interrupted

I am writing some asynchromous code in go which basically implements in-memory caching. I have a not very fast source which I query every minute (using ticker), and save the result into a cache struct field. This field can be queried from different goroutines asynchronously.
In order to avoid using mutexes when updating values from source I do not write to the same struct field which is being queried by other goroutines but create another variable, fill it and then assign it to the queried field. This works fine since assigning operation is atomic and no race occurs.
The code looks like the following:
// this fires up when cache is created
func (f *FeaturesCache) goStartUpdaterDaemon(ctx context.Context) {
go func() {
defer kiterrors.RecoverFunc(ctx, f.logger(ctx))
ticker := time.NewTicker(updateFeaturesPeriod) // every minute
defer ticker.Stop()
for {
select {
case <-ticker.C:
f.refill(ctx)
case <-ctx.Done():
return
}
}
}()
}
func (f *FeaturesCache) refill(ctx context.Context) {
var newSources map[string]FeatureData
// some querying and processing logic
// save new data for future queries
f.features = newSources
}
Now I need to add another view of my data so I can also get it from cache. Basically that means adding one more struct field which will be queriad and filled in the same way the previous one (features) was.
I need these 2 views of my data to be in sync, so it is undesired to have, for example, new data in view 2 and old data in view 1 or the other way round.
So the only thing I need to change about refill is to add a new field, at first I did it this way:
func (f *FeaturesCache) refill(ctx context.Context) {
var newSources map[string]FeatureData
var anotherView map[string]DataView2
// some querying and processing logic
// save new data for future queries
f.features = newSources // line A
f.anotherView = anotherView // line B
}
However, for this code I'm wondering whether it satisfies my consistency requirements. I am worried that if the scheduler decides to interrupt the goroutine which runs refill between lines A nd B (check the code above) than I might get inconsistency between data views.
So I researched the problem. Many sources on the Internet say that the scheduler switches goroutines on syscalls and function calls. However, according to this answer https://stackoverflow.com/a/64113553/12702274 since go 1.14 there is an asynchronous preemtion mechanism in go scheduler which switches goroutines based on their running time in addition to previously checked signals. That makes me think that it is actually possible that refill goroutine can be interrupted between lines A and B.
Then I thought about surrounding those 2 assignments with mutex - lock before line A, unlock after line B. However, it seems to me that this doesn't change things much. The goroutine may still be interrupted between lines A and B and the data gets inconsistent. The only thing mutex achieves here is that 2 simultaneous refills do not conflict with each other which is actually impossible, because I run them in the same thread as timer. Thus it is useless here.
So, is there any way I can ensure atomicity for two consecutive assignments?
If I understand your concern correctly, you don't want to lock existing cached data while updating it(bec. it takes time to update, you want to be able to allow usage of existing cached data while updating it in another routine right ?).
Also you want to make f.features and f.anotherView updates atomic.
What about to take your data in a map[int]map[string]FeatureData and map[int]map[string]DataView2. Put the data to a new key each time and let the queries from this key(newSearchIndex).
Just tried to explain in code roughly(think below like pseudo code)
type FeaturesCache struct {
mu sync.RWMutex
features map[int8]map[string]FeatureData
anotherView map[int8]map[string]DataView2
oldSearchIndex int8
newSearchIndex int8
}
func (f *FeaturesCache) CreateNewIndex() int8 {
f.mu.Lock()
defer f.mu.Unlock()
return (f.newSearchIndex + 1) % 16 // mod 16 could be change per your refill rate
}
func (f *FeaturesCache) SetNewIndex(newIndex int8) {
f.mu.Lock()
defer f.mu.Unlock()
f.oldSearchIndex = f.newSearchIndex
f.newSearchIndex = newIndex
}
func (f *FeaturesCache) refill(ctx context.Context) {
var newSources map[string]FeatureData
var anotherView map[string]DataView2
// some querying and processing logic
// save new data for future queries
newSearchIndex := f.CreateNewIndex()
f.features[newSearchIndex] = newSources
f.anotherView[newSearchIndex] = anotherView
f.SetNewIndex(newSearchIndex) //Let the queries to new cached datas after updating search Index
f.features[f.oldSearchIndex] = nil
f.anotherView[f.oldSearchIndex] = nil
}

How to implement a singleton

I want to implement a singleton with Go. The difference between normal singleton is the instance is singleton with different key in map struct. Something like this code. I am not sure is there any data race with the demo code.
var instanceLock sync.Mutex
var instances map[string]string
func getDemoInstance(key string) string {
if value, ok := instances[key]; ok {
return value
}
instanceLock.Lock()
defer instanceLock.Unlock()
if value, ok := instances[key]; ok {
return value
} else {
instances[key] = key + key
return key + key
}
}
Yes, there is data race, you can confirm by running it with go run -race main.go. If one goroutine locks and modifies the map, another goroutine may be reading it before the lock.
You may use sync.RWMutex to read-lock when just reading it (multiple readers are allowed without blocking each other).
For example:
var (
instancesMU sync.RWMutex
instances = map[string]string{}
)
func getDemoInstance(key string) string {
instancesMU.RLock()
if value, ok := instances[key]; ok {
instancesMU.RUnlock()
return value
}
instancesMU.RUnlock()
instancesMU.Lock()
defer instancesMU.Unlock()
if value, ok := instances[key]; ok {
return value
}
value := key + key
instances[key] = value
return value
}
You can try this as well: sync.Map
Map is like a Go map[interface{}]interface{} but is safe for concurrent use by multiple goroutines without additional locking or coordination. Loads, stores, and deletes run in amortized constant time.
The Map type is optimized for two common use cases: (1) when the entry for a given key is only ever written once but read many times, as in caches that only grow, or (2) when multiple goroutines read, write, and overwrite entries for disjoint sets of keys.
In these two cases, use of a Map may significantly reduce lock contention compared to a Go map paired with a separate Mutex or RWMutex.
Note: In the third paragraph it mentions why using sync.Map is beneficial rather than using Go Map simply paired up with sync.RWMutex.
So this perfectly fits your case, I guess?
Little late to answer, anyways this should help: https://github.com/ashwinshirva/api/tree/master/dp/singleton
This shows two ways to implement singleton:
Using the sync.Mutex
Using sync.Once

What happens when reading or writing concurrently without a mutex

In Go, a sync.Mutex or chan is used to prevent concurrent access of shared objects. However, in some cases I am just interested in the "latest" value of a variable or field of an object.
Or I like to write a value and do not care if another go-routine overwrites it later or has just overwritten it before.
Update: TLDR; Just don't do this. It is not safe. Read the answers, comments, and linked documents!
Update 2021: The Go memory model is going to be specified more thoroughly and there are three great articles by Russ Cox that will teach you more about the surprising effects of unsynchronized memory access. These articles summarize a lot of the below discussions and learnings.
Here are two variants good and bad of an example program, where both seem to produce "correct" output using the current Go runtime:
package main
import (
"flag"
"fmt"
"math/rand"
"time"
)
var bogus = flag.Bool("bogus", false, "use bogus code")
func pause() {
time.Sleep(time.Duration(rand.Uint32()%100) * time.Millisecond)
}
func bad() {
stop := time.After(100 * time.Millisecond)
var name string
// start some producers doing concurrent writes (DANGER!)
for i := 0; i < 10; i++ {
go func(i int) {
pause()
name = fmt.Sprintf("name = %d", i)
}(i)
}
// start consumer that shows the current value every 10ms
go func() {
tick := time.Tick(10 * time.Millisecond)
for {
select {
case <-stop:
return
case <-tick:
fmt.Println("read:", name)
}
}
}()
<-stop
}
func good() {
stop := time.After(100 * time.Millisecond)
names := make(chan string, 10)
// start some producers concurrently writing to a channel (GOOD!)
for i := 0; i < 10; i++ {
go func(i int) {
pause()
names <- fmt.Sprintf("name = %d", i)
}(i)
}
// start consumer that shows the current value every 10ms
go func() {
tick := time.Tick(10 * time.Millisecond)
var name string
for {
select {
case name = <-names:
case <-stop:
return
case <-tick:
fmt.Println("read:", name)
}
}
}()
<-stop
}
func main() {
flag.Parse()
if *bogus {
bad()
} else {
good()
}
}
The expected output is as follows:
...
read: name = 3
read: name = 3
read: name = 5
read: name = 4
...
Any combination of read: and read: name=[0-9] is correct output for this program. Receiving any other string as output would be an error.
When running this program with go run --race bogus.go it is safe.
However, go run --race bogus.go -bogus warns of the concurrent reads and writes.
For map types and when appending to slices I always need a mutex or a similar method of protection to avoid segfaults or unexpected behavior. However, reading and writing literals (atomic values) to variables or field values seems to be safe.
Question: Which Go data types can I safely read and safely write concurrently without a mutext and without producing segfaults and without reading garbage from memory?
Please explain why something is safe or unsafe in Go in your answer.
Update: I rewrote the example to better reflect the original code, where I had the the concurrent writes issue. The important leanings are already in the comments. I will accept an answer that summarizes these learnings with enough detail (esp. on the Go-runtime).
However, in some cases I am just interested in the latest value of a variable or field of an object.
Here is the fundamental problem: What does the word "latest" mean?
Suppoose that, mathematically speaking, we have a sequence of values Xi, with 0 <= i < N. Then obviously Xj is "later than" Xi if j > i. That's a nice simple definition of "latest" and is probably the one you want.
But when two separate CPUs within a single machine—including two goroutines in a Go program—are working at the same time, time itself loses meaning. We cannot say whether i < j, i == j, or i > j. So there is no correct definition for the word latest.
To solve this kind of problem, modern CPU hardware, and Go as a programming language, gives us certain synchronization primitives. If CPUs A and B execute memory fence instructions, or synchronization instructions, or use whatever other hardware provisions exist, the CPUs (and/or some external hardware) will insert whatever is required for the notion of "time" to regain its meaning. That is, if the CPU uses barrier instructions, we can say that a memory load or store that was executed before the barrier is a "before" and a memory load or store that is executed after the barrier is an "after".
(The actual implementation, in some modern hardware, consists of load and store buffers that can rearrange the order in which loads and stores go to memory. The barrier instruction either synchronizes the buffers, or places an actual barrier in them, so that loads and stores cannot move across the barrier. This particular concrete implementation gives an easy way to think about the problem, but isn't complete: you should think of time as simply not existing outside the hardware-provided synchronization, i.e., all loads from, and stores to, some location are happening simultaneously, rather than in some sequential order, except for these barriers.)
In any case, Go's sync package gives you a simple high level access method to these kinds of barriers. Compiled code that executes before a mutex Lock call really does complete before the lock function returns, and the code that executes after the call really does not start until after the lock function returns.
Go's channels provide the same kinds of before/after time guarantees.
Go's sync/atomic package provides much lower level guarantees. In general you should avoid this in favor of the higher level channel or sync.Mutex style guarantees. (Edit to add note: You could use sync/atomic's Pointer operations here, but not with the string type directly, as Go strings are actually implemented as a header containing two separate values: a pointer, and a length. You could solve this with another layer of indirection, by updating a pointer that points to the string object. But before you even consider doing that, you should benchmark the use of the language's preferred methods and verify that these are a problem, because code that works at the sync/atomic level is hard to write and hard to debug.)
Which Go data types can I safely read and safely write concurrently without a mutext and without producing segfaults and without reading garbage from memory?
None.
It really is that simple: You cannot, under no circumstance whatsoever, read and write concurrently to anything in Go.
(Btw: Your "correct" program is not correct, it is racy and even if you get rid of the race condition it would not deterministically produce the output.)
Why can't you use channels
package main
import (
"fmt"
"sync"
)
func main() {
var wg sync.WaitGroup // wait group to close channel
var buffer int = 1 // buffer of the channel
// channel to get the share data
cName := make(chan string, buffer)
for i := 0; i < 10; i++ {
wg.Add(1) // add to wait group
go func(i int) {
cName <- fmt.Sprintf("name = %d", i)
wg.Done() // decrease wait group.
}(i)
}
go func() {
wg.Wait() // wait of wait group to be 0
close(cName) // close the channel
}()
// process all the data
for n := range cName {
println("read:", n)
}
}
The above code returns the following output
read: name = 0
read: name = 5
read: name = 1
read: name = 2
read: name = 3
read: name = 4
read: name = 7
read: name = 6
read: name = 8
read: name = 9
https://play.golang.org/p/R4n9ssPMOeS
Article about channels

Is there a race condition in the golang implementation of mutex the m.state is read without atomic function

In golang if two goroutines read and write a variable without mutex and atomic, that may bring data race condition.
Use command go run --race xxx.go will detect the race point.
While the implementation of Mutex in src/sync/mutex.go use the following code
func (m *Mutex) Lock() {
// Fast path: grab unlocked mutex.
if atomic.CompareAndSwapInt32(&m.state, 0, mutexLocked) {
if race.Enabled {
race.Acquire(unsafe.Pointer(m))
}
return
}
var waitStartTime int64
starving := false
awoke := false
iter := 0
old := m.state // This line confuse me !!!
......
The code old := m.state confuse me, because m.state is read and write by different goroutine.
The following function Test obvious has race condition problem. But if i put it in mutex.go, no race conditon will detect.
# mutex.go
func Test(){
a := int32(1)
go func(){
atomic.CompareAndSwapInt32(&a, 1, 4)
}()
_ = a
}
If put it in other package like src/os/exec.go, the conditon race problem will detect.
package main
import(
"sync"
"os"
)
func main(){
sync.Test() // race condition will not detect
os.Test() // race condition will detect
}
First of all the golang source always changes so let's make sure we are looking at the same thing. Take release 1.12 at
https://github.com/golang/go/blob/release-branch.go1.12/src/sync/mutex.go
as you said the Lock function begins
func (m *Mutex) Lock() {
// fast path where it will set the high order bit and return if not locked
if atomic.CompareAndSwapInt32(&m.state, 0, mutexLocked) {
return
}
//reads value to decide on the lower order bits
for {
//if statements involving CompareAndSwaps on the lower order bits
}
}
What is this CompareAndSwap doing? it looks atomically in that int32 and if it is 0 it swaps it to mutexLocked (which is 1 defined as a const above) and returns true that it swapped it.
Then it promptly returns. That is its fast path. The goroutine acquired the lock and now it is running can start running it's protected path.
If it is 1 (mutexLocked) already, it doesn't swap it and returns false (it didn't swap it).
Then it reads the state and enters a loop that it does atomic compare and swaps to determine how it should behave.
What are the possible states? combinations of locked, woken and starving as you see from the const block.
Now depending on how long the goroutine has been waiting on the waitlist it will get priority on when to check again if the mutex is now free.
But also observe that only Unlock() can set the mutexLocked bit back to 0.
in the Lock() CAS loop the only bits that are set are the starving and woken ones.Yes you can have multiple readers but only one writer at any time, and that writer is the one who is holding the mutex and is executing its protected path until calling Unlock(). Check out this article for more details.
By disassemble the binary output file, The Test function in different pack generate different code.
The reason is that the compiler forbid to generate race detect instrument in the sync package.
The code is :
var norace_inst_pkgs = []string{"sync", "sync/atomic"} // https://github.com/golang/go/blob/release-branch.go1.12/src/cmd/compile/internal/gc/racewalk.go
``

How do I find out if a goroutine is done, without blocking?

All the examples I've seen so far involve blocking to get the result (via the <-chan operator).
My current approach involves passing a pointer to a struct:
type goresult struct {
result resultType;
finished bool;
}
which the goroutine writes upon completion. Then it's a simple matter of checking finished whenever convenient. Do you have better alternatives?
What I'm really aiming for is a Qt-style signal-slot system. I have a hunch the solution will look almost trivial (chans have lots of unexplored potential), but I'm not yet familiar enough with the language to figure it out.
You can use the "comma, ok" pattern (see their page on "effective go"):
foo := <- ch; // This blocks.
foo, ok := <- ch; // This returns immediately.
Select statements allows you to check multiple channels at once, taking a random branch (of the ones where communication is waiting):
func main () {
for {
select {
case w := <- workchan:
go do_work(w)
case <- signalchan:
return
// default works here if no communication is available
default:
// do idle work
}
}
}
For all the send and receive
expressions in the "select" statement,
the channel expressions are evaluated,
along with any expressions that appear
on the right hand side of send
expressions, in top-to-bottom order.
If any of the resulting operations can
proceed, one is chosen and the
corresponding communication and
statements are evaluated. Otherwise,
if there is a default case, that
executes; if not, the statement blocks
until one of the communications can
complete.
You can also peek at the channel buffer to see if it contains anything by using len:
if len(channel) > 0 {
// has data to receive
}
This won't touch the channel buffer, unlike foo, gotValue := <- ch which removes a value when gotValue == true.

Resources