go atomic Load and Store - go

func resetElectionTimeoutMS(newMin, newMax int) (int, int) {
oldMin := atomic.LoadInt32(&MinimumElectionTimeoutMS)
oldMax := atomic.LoadInt32(&maximumElectionTimeoutMS)
atomic.StoreInt32(&MinimumElectionTimeoutMS, int32(newMin))
atomic.StoreInt32(&maximumElectionTimeoutMS, int32(newMax))
return int(oldMin), int(oldMax)
}
I got a go code function like this.
The thing I am confused is: why do we need atomic here? What is this preventing from?
Thanks.

Atomic functions complete a task in an isolated way where all parts of the task appear to happen instantaneously or don't happen at all.
In this case, LoadInt32 and StoreInt32 ensure that an integer is stored and retrieved in a way where someone loading won't get a partial store. However, you need both sides to use atomic functions for this to function correctly. The raft example appears incorrect for at least two reasons.
Two atomic functions do not act as one atomic function, so reading the old and setting the new in two lines is a race condition. You may read, then someone else sets, then you set and you are returning false information for the previous value before you set it.
Not everyone accessing MinimumElectionTimeoutMS is using atomic operations. This means that the use of atomics in this function is effectively useless.
How would this be fixed?
func resetElectionTimeoutMS(newMin, newMax int) (int, int) {
oldMin := atomic.SwapInt32(&MinimumElectionTimeoutMS, int32(newMin))
oldMax := atomic.SwapInt32(&maximumElectionTimeoutMS, int32(newMax))
return int(oldMin), int(oldMax)
}
This would ensure that oldMin is the minimum that existed before the swap. However, the entire function is still not atomic as the final outcome could be an oldMin and oldMax pair that was never called with resetElectionTimeoutMS. For that... just use locks.
Each function would also need to be changed to do an atomic load:
func minimumElectionTimeout() time.Duration {
min := atomic.LoadInt32(&MinimumElectionTimeoutMS)
return time.Duration(min) * time.Millisecond
}
I recommend you carefully consider the quote VonC mentioned from the golang atomic documentation:
These functions require great care to be used correctly. Except for special, low-level applications, synchronization is better done with channels or the facilities of the sync package.
If you want to understand atomic operations, I recommend you start with http://preshing.com/20130618/atomic-vs-non-atomic-operations/. That goes over the load and store operations used in your example. However, there are other uses for atomics. The go atomic package overview goes over some cool stuff like atomic swapping (the example I gave), compare and swap (known as CAS), and Adding.
A funny quote from the link I gave you:
it’s well-known that on x86, a 32-bit mov instruction is atomic if the memory operand is naturally aligned, but non-atomic otherwise. In other words, atomicity is only guaranteed when the 32-bit integer is located at an address which is an exact multiple of 4.
In other words, on common systems today, the atomic functions used in your example are effectively no-ops. They are already atomic! (They are not guaranteed though, if you need it to be atomic, it is better to specify it explicitly)

Considering that the package atomic provides low-level atomic memory primitives useful for implementing synchronization algorithms, I suppose it was intended to be used as:
MinimumElectionTimeoutMS isn't modified while being stored in oldMin
MinimumElectionTimeoutMS isn't modified while being set to a new value newMin.
But, the package does come with the warning:
These functions require great care to be used correctly.
Except for special, low-level applications, synchronization is better done with channels or the facilities of the sync package.
Share memory by communicating; don't communicate by sharing memory.
In this case (server.go from the Raft distributed consensus protocol), synchronizing directly on the variable might be deemed faster than putting a Mutex on the all function.
Except, as Stephen Weinberg's answer illustrate (upvoted), this isn't how you use atomic. It only makes sure that oldMin is accurate while doing the swap.
See another example at "Is the two atomic style code in sync/atomic.once.go necessary?", in relation with the "memory model".
OneOfOne mentions in the comments using atomic CAS as a spinlock (very fast locking):
BenchmarkSpinL-8 2000 708494 ns/op 32315 B/op 2001 allocs/op
BenchmarkMutex-8 1000 1225260 ns/op 78027 B/op 2259 allocs/op
See:
sync/spinlock.go
sync/spinlock_test.go

Related

Use big.Rat with Go to get Abs() value

I am a beginner with Go and a java developer.
I am currently working with big.Rat.
I need to get the Abs of a Rat n for which I have to write something like
n.Abs(n) or something like big.Rat{}.Abs(n)
Why didn't go provide something like just n.Abs()?
Or am I going wrong somewhere?
Go's big package is concerned with memory allocation when it comes to its function signatures. A big.Rat consists of two big.Ints which each contain an array of uints. Unlike an int (native 32 or 64 bit integer), a big.Int must thus be allocated dynamically, depending on its value. For large values this means more elements in the array.
Your proposed function signature n.Abs() would mean that a new array of the same size as n's would have to be allocated for this operation. In reality we often have the case that the original n is no longer needed, thus we can reuse its existing memory. To allow this, the Abs function takes a pointer to an existing big.Rat which might be n itself. The implementation can now reuse the memory. The caller is now in full control of what memory to use for these operations.
This might not make the nicest API for all use cases, in fact if you just want to do a quick calculation for a few large numbers, on a computer with Gigabytes of RAM, you might have preferred the n.Abs() version, but if you do numerically expensive computations with a lot of large numbers, you must be able to control your memory. Imagine doing some image manipulation on a Raspberry for example, where you are more constraint by the available memory. In this case the existing API allows you to be more efficient.

Is there a difference in Go between a counter using atomic operations and one using a mutex?

I have seen some discussion lately about whether there is a difference between a counter implemented using atomic increment/load, and one using a mutex to synchronise increment/load.
Are the following counter implementations functionally equivalent?
type Counter interface {
Inc()
Load() int64
}
// Atomic Implementation
type AtomicCounter struct {
counter int64
}
func (c *AtomicCounter) Inc() {
atomic.AddInt64(&c.counter, 1)
}
func (c *AtomicCounter) Load() int64 {
return atomic.LoadInt64(&c.counter)
}
// Mutex Implementation
type MutexCounter struct {
counter int64
lock sync.Mutex
}
func (c *MutexCounter) Inc() {
c.lock.Lock()
defer c.lock.Unlock()
c.counter++
}
func (c *MutexCounter) Load() int64 {
c.lock.Lock()
defer c.lock.Unlock()
return c.counter
}
I have run a bunch of test cases (Playground Link) and haven't been able to see any different behaviour. Running the tests on my machine the numbers get printed out of order for all the PrintAll test functions.
Can someone confirm whether they are equivalent or if there are any edge cases where these are different? Is there a preference to use one technique over the other? The atomic documentation does say it should only be used in special cases.
Update:
The original question that caused me to ask this was this one, however it is now on hold, and i feel this aspect deserves its own discussion. In the answers it seemed that using a mutex would guarantee correct results, whereas atomics might not, specifically if the program is running in multiple threads. My questions are:
Is it correct that they can produce different results? (See update below. The answer is yes.).
What causes this behaviour?
What are the tradeoffs between the two approaches?
Another Update:
I've found some code where the two counters behave differently. When run on my machine this function will finish with MutexCounter, but not with AtomicCounter. Don't ask me why you would ever run this code:
func TestCounter(counter Counter) {
end := make(chan interface{})
for i := 0; i < 1000; i++ {
go func() {
r := rand.New(rand.NewSource(time.Now().UnixNano()))
for j := 0; j < 10000; j++ {
k := int64(r.Uint32())
if k >= 0 {
counter.Inc()
}
}
}()
}
go func() {
prevValue := int64(0)
for counter.Load() != 10000000 { // Sometimes this condition is never met with AtomicCounter.
val := counter.Load()
if val%1000000 == 0 && val != prevValue {
prevValue = val
}
}
end <- true
fmt.Println("Count:", counter.Load())
}()
<-end
}
There is no difference in behavior. There is a difference in performance.
Mutexes are slow, due to the setup and teardown, and due to the fact that they block other goroutines for the duration of the lock.
Atomic operations are fast because they use an atomic CPU instruction (when possible), rather than relying on external locks to.
Therefore, whenever it is feasible, atomic operations should be preferred.
Alright, I'm going to attempt to self-answer for some closure. Edits are welcome.
There is some discussion about the atomic package here. But to quote the most telling comments:
The very short summary is that if you have to ask, you should probably
avoid the package. Or, read the atomic operations chapter of the
C++11 standard; if you understand how to use those operations safely
in C++, then you are more than capable of using Go's sync/atomic
package.
That said, sticking to atomic.AddInt32 and atomic.LoadInt32 is safe as
long as you are just reporting statistical information, and not
actually relying on the values carrying any meaning about the state of
the different goroutines.
And:
What atomicity does not guarantee, is any ordering of
observability of values. I mean, atomic.AddInt32() does only guarantee
that what this operation stores at &cnt will be exactly *cnt + 1 (with
the value of *cnt being what the CPU executing the active goroutine
fetched from memory when the operation started); it does not provide any
guarantee that if another goroutine will attempt to read this value at
the same time it will fetch that same value *cnt + 1.
On the other hand, mutexes and channels guarantee strict ordering
of accesses to values being shared/passed around (subject to the rules
of Go memory model).
In regards to why the code sample in the question never finishes, this is due to fact that the func that is reading the counter is in a very tight loop. When using the atomic counter, there are no syncronisation events (e.g. mutex lock/unlock, syscalls) which means that the goroutine never yields control. The result of this is that this goroutine starves the thread it is running on, and prevents the scheduler from allocating time to any other goroutines allocated to that thread, this includes ones that increment the counter meaning the counter never reaches 10000000.
Atomics are faster in the common case: the compiler translates each call to a function from the sync/atomic package to a special set of machine instructions which basically operate on the CPU level — for instance, on x86
architectures, an atomic.AddInt64 would be translated to some plain
ADD-class instruction prefixed with the LOCK instruction (see this for an example) —
with the latter ensuring coherent view of the updated memory location
across all the CPUs in the system.
A mutex is a much complicated thing as it, in the end, wraps some bit
of the native OS-specific thread synchronization API
(for instance, on Linux, that's futex).
On the other hand, the Go runtime is pretty much optimized when
it comes to synchronization stuff (which is kinda expected —
given one of the main selling points of Go), and the mutex implementation
tries to avoid hitting the kernel to perform synchronization
between goroutines, if possible, and carry it out completely in
the Go runtime itself.
This might explain no noticeable difference in the timings
in your benchmarks, provided the contention over the mutexes
was reasonably low.
Still, I feel oblidged to note — just in case — that atomics and
higher-level synchronization facilities are designed to solve different
tasks. Say, you can't use atomics to protect some memory state during
the execution of a whole function — and even a single statement,
in the general case.

Go: Seek+Write vs WriteAt performance

I've just started to study Go's filesystem operations. It seems like there are at least two ways to perform random file writes:
// 1. First set the offset, then write data
f.Seek(offset, whence)
f.Write(data)
// 2. Write by offset in one step
f.WriteAt(data, offset)
All of three functions (Seek, Write, WriteAt) are implemented with the use of different syscalls: on Unix systems Write is implemented via syscall.Write and WriteAt has syscall.Pwrite inside.
Since Seek+Write perform two syscalls, whereas WriteAt requires only one syscall, should the second method be preferred for the sake of better performance?
seek()+read() and seek() + write() are both a pair of sys-calls while pread() and pwrite() are single sys-calls. Less sys-calls - more efficiency.
You can definitly go for the WriteAt

Is it safe to read a function pointer concurrently without a lock?

Suppose I have this:
go func() {
for range time.Tick(1 * time.Millisecond) {
a, b = b, a
}
}()
And elsewhere:
i := a // <-- Is this safe?
For this question, it's unimportant what the value of i is with respect to the original a or b. The only question is whether reading a is safe. That is, is it possible for a to be nil, partially assigned, invalid, undefined, ... anything other than a valid value?
I've tried to make it fail but so far it always succeeds (on my Mac).
I haven't been able to find anything specific beyond this quote in the The Go Memory Model doc:
Reads and writes of values larger than a single machine word behave as
multiple machine-word-sized operations in an unspecified order.
Is this implying that a single machine word write is effectively atomic? And, if so, are function pointer writes in Go a single machine word operation?
Update: Here's a properly synchronized solution
Unsynchronized, concurrent access to any variable from multiple goroutines where at least one of them is a write is undefined behavior by The Go Memory Model.
Undefined means what it says: undefined. It may be that your program will work correctly, it may be it will work incorrectly. It may result in losing memory and type safety provided by the Go runtime (see example below). It may even crash your program. Or it may even cause the Earth to explode (probability of that is extremely small, maybe even less than 1e-40, but still...).
This undefined in your case means that yes, i may be nil, partially assigned, invalid, undefined, ... anything other than either a or b. This list is just a tiny subset of all the possible outcomes.
Stop thinking that some data races are (or may be) benign or unharmful. They can be the source of the worst things if left unattended.
Since your code writes to the variable a in one goroutine and reads it in another goroutine (which tries to assign its value to another variable i), it's a data race and as such it's not safe. It doesn't matter if in your tests it works "correctly". One could take your code as a starting point, extend / build on it and result in a catastrophe due to your initially "unharmful" data race.
As related questions, read How safe are Golang maps for concurrent Read/Write operations? and Incorrect synchronization in go lang.
Strongly recommended to read the blog post by Dmitry Vyukov: Benign data races: what could possibly go wrong?
Also a very interesting blog post which shows an example which breaks Go's memory safety with intentional data race: Golang data races to break memory safety
In terms of Race condition, it's not safe. In short my understanding of race condition is when there're more than one asynchronous routine (coroutines, threads, process, goroutines etc.) trying to access the same resource and at least one is a writing operation, so in your example we have 2 goroutines reading and writing variables of type function, I think what's matter from a concurrent point of view is those variables have a memory space somewhere and we're trying to read or write in that portion of memory.
Short answer: just run your example using the -race flag with go run -race
or go build -race and you'll see a detected data race.
The answer to your question, as of today, is that if a and b are not larger than a machine word, i must be equal to a or b. Otherwise, it may contains an unspecified value, that is most likely to be an interleave of different parts from a and b.
The Go memory model, as of the version on June 6, 2022, guarantees that if a program executes a race condition, a memory access of a location not larger than a machine word must be atomic.
Otherwise, a read r of a memory location x that is not larger than a machine word must observe some write w such that r does not happen before w and there is no write w' such that w happens before w' and w' happens before r. That is, each read must observe a value written by a preceding or concurrent write.
The happen-before relationship here is defined in the memory model in the previous section.
The result of a racy read from a larger memory location is unspecified, but it is definitely not undefined as in the realm of C++.
Reads of memory locations larger than a single machine word are encouraged but not required to meet the same semantics as word-sized memory locations, observing a single allowed write w. For performance reasons, implementations may instead treat larger operations as a set of individual machine-word-sized operations in an unspecified order. This means that races on multiword data structures can lead to inconsistent values not corresponding to a single write. When the values depend on the consistency of internal (pointer, length) or (pointer, type) pairs, as can be the case for interface values, maps, slices, and strings in most Go implementations, such races can in turn lead to arbitrary memory corruption.

Thread safety of simultaneous updates of a variable to the same value

Is the following construct thread-safe, assuming that the elements of foo are aligned and sized properly so that there is no word tearing? If not, why not?
Note: The code below is a toy example of what I want to do, not my actual real world scenario. Obviously, there are better ways of coding the observable behavior in my example.
uint[] foo;
// Fill foo with data.
// In thread one:
for(uint i = 0; i < foo.length; i++) {
if(foo[i] < SOME_NUMBER) {
foo[i] = MAGIC_VAL;
}
}
// In thread two:
for(uint i = 0; i < foo.length; i++) {
if(foo[i] < SOME_OTHER_NUMBER) {
foo[i] = MAGIC_VAL;
}
}
This obviously looks unsafe at first glance, so I'll highlight why I think it could be safe:
The only two options are for an element of foo to be unchanged or to be set to MAGIC_VAL.
If thread two sees foo[i] in an intermediate state while it's being updated, only two things can happen: The intermediate state is < SOME_OTHER_NUMBER or it's not. If it is < SOME_OTHER_NUMBER, thread two will also try to set it to MAGIC_VAL. If not, thread two will do nothing.
Edit: Also, what if foo is a long or a double or something, so that updating it can't be done atomically? You may still assume that alignment, etc. is such that updating one element of foo will not affect any other element. Also, the whole point of multithreading in this case is performance, so any type of locking would defeat this.
On a modern multicore processor your code is NOT threadsafe (at least in most languages) without a memory barrier. Simply put, without explicit barriers each thread can see a different entirely copy of foo from caches.
Say that your two threads ran at some point in time, then at some later point in time a third thread read foo, it could see a foo that was completely uninitialized, or the foo of either of the other two threads, or some mix of both, depending on what's happened with CPU memory caching.
My advice - don't try to be "smart" about concurrency, always try to be "safe". Smart will bite you every time. The broken double-checked locking article has some eye-opening insights into what can happen with memory access and instruction reordering in the absence of memory barriers (though specifically about Java and it's (changing) memory model, it's insightful for any language).
You have to be really on top of your language's specified memory model to shortcut barriers. For example, Java allows a variable to be tagged volatile, which combined with a type which is documented as having atomic assignment, can allow unsynchronized assignment and fetch by forcing them through to main memory (so the thread is not observing/updating cached copies).
You can do this safely and locklessly with a compare-and-swap operation. What you've got looks thread safe but the compiler might create a writeback of the unchanged value under some circumstances, which will cause one thread to step on the other.
Also you're probably not getting as much performance as you think out of doing this, because having both threads writing to the same contiguous memory like this will cause a storm of MESI transitions inside the CPU's cache, each of which is quite slow. For more details on multithread memory coherence you can look at section 3.3.4 of Ulrich Drepper's "What Every Programmer Should Know About Memory".
If reads and writes to each array element are atomic (i.e. they're aligned properly with no word tearing as you mentioned), then there shouldn't be any problems in this code. If foo[i] is less than either of SOME_NUMBER or SOME_OTHER_NUMBER, then at least one thread (possibly both) will set it to MAGIC_VAL at some point; otherwise, it will be untouched. With atomic reads and writes, there are no other possibilities.
However, since your situation is more complicated, be very very careful -- make sure that foo[i] is truly only read once per loop and stored in a local variable. If you read it more than once during the same iteration, you could get inconsistent results. Even the slightest change you make to your code could immediately make it unsafe with race conditions, so comment heavily about the code with big red warning signs.
It's bad practice, you should never be in the state where two threads are accessesing the same variable at the same time, regardless of the consequences. The example you give is over simplified, any majority complex samples will almost always have problems associated with it.. ...
Remember: Semaphores are your friend!
That particular example is thread-safe.
There are no intermediate states really involved here.
That particular program would not get confused.
I would suggest a Mutex on the array, though.

Resources