How can Go's race detector be aware of lock? - go

Adding lock to a program with race condition can solve the race condition and make the race detector quiet. How can Go's race detector be aware of lock?
Someone points out that "the race detector can only detect race conditions if and when they actually occur".
Consider the following program:
package main
import (
"sync"
"time"
)
func main() {
var a int
var wg sync.WaitGroup
workers := 2
wg.Add(workers)
for i := 1; i <= workers; i++ {
go func(sleep int) {
time.Sleep(time.Duration(sleep) * time.Second)
a = 1
wg.Done()
}(i * 5)
}
wg.Wait()
}
One goroutine sleeps for 5 seconds, another sleeps for 10 seconds, they don't write a at the same time in most cases, but the race detector prints the race condition warning every time. Why?

The race detector does not analyze the source code, it doesn't know that you added locks in the source code.
The race detector works at runtime:
When the -race command-line flag is set, the compiler instruments all memory accesses with code that records when and how the memory was accessed, while the runtime library watches for unsynchronized accesses to shared variables.
Because of this design, the race detector can only detect race conditions if and when they actually occur. So when you add proper locking / synchronization, the race condition will not happen (the if condition will not be met), and so no warning will be printed.
See this blog post for more details: Introducing the Go Race Detector
And this article: Data Race Detector
Edit for your example:
It may be that your 2 goroutines will never reach the point where they write the shared variable a at the same physical time (because the code runs so fast and the sleep time is relatively huge), but they run concurrently, in different goroutines, without explicit synchronization (a synchronization point may be channel communication, mutex lock/unlock etc.).
The race condition does not imply that access to a shared variable does happen at the same time (one of which must be a write). The race condition is also met if access to a shared variable happens concurrently (from multiple goroutines), without synchronization. This can be detected at runtime by the race detector (due to the instrumented memory access code).
Code generated by the compiler is allowed to use cached instances of the a variable in multiple goroutines, the runtime only has to guarantee that the cached instances are "refreshed" or disposed of if a synchronization point is reached. For details, see The Go Memory Model.
Also note that time.Sleep() does not guarantee that execution will continue right after the specified duration, only that execution will be suspended for at least the specified duration (so the execution may continue at a later time):
Sleep pauses the current goroutine for at least the duration d.

The data race detector does not do static analysis. It is not aware of your lock. It’s empirical, it just notices that when you run your code with the lock, two threads are never writing to the same value at the same time (or one writing, one reading).

Related

Is there any performance penalty building go program using -race flag?

Hi I was wondering if there any performance penalty running a go program in production built using
go build -race
You can read about it in the article post that describes the go race detector at https://go.dev/doc/articles/race_detector
Quoting from that article (the last paragraph) :
Runtime Overhead
The cost of race detection varies by program, but for a typical program, memory usage may increase by 5-10x and execution time by
2-20x.
The race detector currently allocates an extra 8 bytes per defer and
recover statement. Those extra allocations are not recovered until the
goroutine exits. This means that if you have a long-running goroutine
that is periodically issuing defer and recover calls, the program
memory usage may grow without bound. These memory allocations will not
show up in the output of runtime.ReadMemStats or runtime/pprof.

will mutex lock guarantee mem sync?

in goroutine A, I do this
mu.Lock()
n++
mu.UnLock()
can I see the changed n in goroutine B immediately?
in other word, will mutex guarantee memory sync
and in goroutine B
fmt.Println(n)
If immediately means after the mu.Unlock() then yes, but keep in mind, that before reading value you should also acquire the lock. For instance, in a multiprocessor (multicore) system, each core has its own cache, and there may be situations when variable values in different core's caches are different.
For this specific situation of a single int variable perhaps using "sync/atomic" package is more beneficial. To update use atomic.AddUint64(&n, 1) and to read atomic.LoadUint64(&n)
If the struct of interest is map then you could use sync.Map with embedded RWMutex. RWMutex exposes different locks for update (m.Lock()) and read (m.RLock()). And you could have safe parallel reads and after m.RUnlock() sequential updates.

Golang assignment safety with single reader and single writer

Say I have two go routines:
var sequence int64
// writer
for i := sequence; i < max; i++ {
doSomethingWithSequence(i)
sequence = i
}
// reader
for {
doSomeOtherThingWithSequence(sequence)
}
So can I get by without atomic?
Some potential risks I can think of:
reorder (for the writer, updating sequence happens before doSomething) could happen, but I can live with that.
sequence is not properly aligned in memory so the reader might observe a partially updated i. Running on (recent kernel) linux with x86_64,
can we rule that out?
go compiler 'cleverly optimizes' the reader, so the access to i never goes to memory but cached in a register. Is that possible in go?
Anything else?
Go's motto: Do not communicate by sharing memory; instead, share memory by communicating. Which is an effective best-practice most of the time.
If you care about ordering, you care about synchronizing the two goroutines.
I don't think they are possible. Anyway, those are not things you should worry about if you properly design the synchronization.
The same as above.
Luckily, Go has a data race detector integrated. Try to run your example with go run -race. You will probably see the race condition happening on sequence variable.

Are goroutines appropriate for large, parallel, compute-bound problems?

Are go-routines pre-emptively multitasked for numerical problems?
I am very intrigued by the lean design of Go, the speed, but most by the fact that channels are first-class objects. I hope the last point may enable a whole new class of deep-analysis algorithms for big data, via the complex interconnection patterns which they should allow.
My problem domain requires real-time compute-bound analysis of streaming incoming data. The data can be partitioned into between 100-1000 "problems" each of which will take between 10 and 1000 seconds to compute (ie their granularity is highly variable). Results must however all be available before the output makes sense, ie, say 500 problems come in, and all 500 must be solved before I can use any of them. The application must be able to scale, potentially to thousands (but unlikely 100s of thousands) problems.
Given that I am less worried about numerical library support (most of this stuff is custom), Go seems ideal as I can map each problem to a goroutine. Before I invest in learning Go rather than say, Julia, Rust, or a functional language (none of which, as far as I can see, have first-class channels so for me are at an immediate disadvantage) I need to know if goroutines are properly pre-emptively multi-tasked. That is, if I run 500 compute-bound goroutines on a powerful multicore computer, can I expect reasonably load balancing across all the "problems" or will I have to cooperatively "yield" all the time, 1995-style. This issue is particularly important given the variable granularity of the problem and the fact that, during compute, I usually will not know how much longer it will take.
If another language would serve me better, I am happy to hear about it, but I have a requirement that threads (or go/coroutines) of execution be lightweight. Python multiprocessing module for example, is far too resource intensive for my scaling ambitions. Just to pre-empt: I do understand the difference between parallelism and concurrency.
The Go runtime has a model where multiple Go routines are mapped onto multiple threads in an automatic fashion. No Go routine is bound to a certain thread, the scheduler may (and will) schedule Go routines to the next available thread. The number of threads a Go program uses is taken from the GOMAXPROCS environment variable and can be overriden with runtime.GOMAXPROCS(). This is a simplified description which is sufficient for understanding.
Go routines may yield in the following cases:
On any operation that might block, i.e. any operation that cannot return a result on the spot because it is either a (possible) blocking system-call like io.Read() or an operation that might require waiting for other Go routines, like acquiring a mutex or sending to or receiving from a channel
On various runtime operations
On function call if the scheduler detects that the preempted Go routine took a lot of CPU time (this is new in Go 1.2)
On call to runtime.Gosched()
On panic()
As of Go 1.14, tight loops can be preempted by the runtime. As a result, loops without function calls no longer potentially deadlock the scheduler or significantly delay garbage collection. This is not supported on all platforms - be sure to review the release notes. Also see issue #36365 for future plans in this area.
On various other occasions
The following things prevent a Go routine from yielding:
Executing C code. A Go routine can't yield while it's executing C code via cgo.
Calling runtime.LockOSThread(), until runtime.UnlockOSThread() has been called.
Not sure I fully understand you, however you can set runtime.GOMAXPROCS to scale to all processes, then use channels (or locks) to synchronize the data, example:
const N = 100
func main() {
runtime.GOMAXPROCS(runtime.NumCPU()) //scale to all processors
var stuff [N]bool
var wg sync.WaitGroup
ch := make(chan int, runtime.NumCPU())
done := make(chan struct{}, runtime.NumCPU())
go func() {
for i := range ch {
stuff[i] = true
}
}()
wg.Add(N)
for i := range stuff {
go func(i int) {
for { //cpu bound loop
select {
case <-done:
fmt.Println(i, "is done")
ch <- i
wg.Done()
return
default:
}
}
}(i)
}
go func() {
for _ = range stuff {
time.Sleep(time.Microsecond)
done <- struct{}{}
}
close(done)
}()
wg.Wait()
close(ch)
for i, v := range stuff { //false-postive datarace
if !v {
panic(fmt.Sprintf("%d != true", i))
}
}
fmt.Println("All done")
}
EDIT: Information about the scheduler # http://tip.golang.org/src/pkg/runtime/proc.c
Goroutine scheduler
The scheduler's job is to distribute ready-to-run goroutines over worker threads.
The main concepts are:
G - goroutine.
M - worker thread, or machine.
P - processor, a resource that is required to execute Go code. M must have an associated P to execute Go code, however it can be blocked or in a syscall w/o an associated P.
Design doc at http://golang.org/s/go11sched.

Is message passing via channels in go guaranteed to be non-blocking?

In order to assess whether go is a possible option for an audio/video application, I would like to know whether message passing in go satisfies any non-blocking progress guarantees (being obstruction-free, lock-free or wait-free). In particular, the following scenarios are relevant:
Single producer single consumer:
Two threads communicate using a shared channel. Thread A only does asynchronous sends, thread B only does asynchronous receives. Suppose the OS scheduler decides to interrupt thread A at the "worst possible moment" for an indefinite amount of time. Is thread B guaranteed to finish a receive operation in a bounded number of cpu cycles or is there a (theoretical) possibility that thread A can put the channel into a state where thread B needs to wait for the OS to resume thread A?
Multiple producers:
Several threads A1, A2, A3, ... communicate with one or more others threads using a shared channel. The threads Ai only do asynchronous sends. Suppose A2, A3, ... are suspended by the OS scheduler at the "worst possible moment" for an indefinite amount of time. Is thread A1 still guaranteed to finish a send operation in a bounded number of cpu cycles? Suppose further that each thread only wants to do one send. If the program is run sufficiently long (with a "malicious" scheduler which potentially starves some threads or interrupts and resumes threads at the "worst possible moment"), is at least one send guaranteed to succeed?
I am not so much interested in typical scenarios here, but rather worst-case guarantees.
See Non-blocking algorithm (Wikipedia) for more details on obstruction-, lock- and wait-free algorithms.
Normal sends and receives are blocking operations by definition. You can do a non-blocking send or receive by using a select statement:
select {
case ch <- msg:
default:
}
(Receiving is very similar; just replace the case statement.)
The send only takes place when there is room in the channel's buffer. Otherwise the default case runs. Note that internally a mutex is still used (if I'm reading the code correctly).
The Go Memory Model doesn't require sends and receives to be non-blocking, and the current runtime implementation locks the channel for send and recv. This means, for instance, that it is possible to starve a sending or receiving go-routine if the OS-scheduler interrupts another thread running another go-routine which tries to send or receive on the same channel while it has already acquired the channel's lock.
So the answer is unfortunately no :(
(unless someone reimplement parts of the runtime using non-blocking algorithms).
You're asking whether an operation is guarantee to complete within a bounded number of cycles, which of course is not a design consideration for this language (or most underlying OSes).
If run in a single thread, then Go uses cooperative multitasking between goroutines. So if one routine never yields, then the other will never run. If your program runs on multiple threads (as set by GOMAXPROCS), then you can run several goroutines simultaneously, in which case the OS controls scheduling between the threads. However, in neither case is there a guaranteed upper bound on the time to completion for a function call.
Note that the cooperative nature of goroutines gives you some amount of control over scheduling execution -- that is, routines are never preempted. Until you yield, you retain control of the thread.
As for blocking behavior, see The language specification:
The capacity, in number of elements, sets the size of the buffer in the channel. If the capacity is greater than zero, the channel is asynchronous: communication operations succeed without blocking if the buffer is not full (sends) or not empty (receives), and elements are received in the order they are sent. If the capacity is zero or absent, the communication succeeds only when both a sender and receiver are ready.
Note that non-blocking sends and receives on channels can be accomplished using the select syntax already mentioned.
Goroutines do not own channels or the values sent on them. So, the execution status of a goroutine that has sent / is sending values on a channel has no impact on the ability of other goroutines to send or receive values on that channel, unless the channel's buffer is full, in which case all sends will block until a corresponding receive occurs, or the buffer is empty, in which case all receives will block until there is a corresponding send.
Because goroutines use cooperative scheduling (they have to yield to the scheduler, either through a channel operation, a syscall, or an explicit call to runtime.Gosched()), it is impossible for a goroutine to be interrupted at the "worst possible time". It is possible for a goroutine to never yield, in which case, it could tie up a thread indefinitely. If you have only one thread of execution, then your other goroutines will never be scheduled. It is possible, but statistically improbable, for a goroutine to just never be scheduled. However, if all goroutines but one are blocked on sends or receives, then the remaining goroutine must be scheduled.
It is possible to get a deadlock. If I have two goroutines executing:
func Goroutine(ch1, ch2 chan int) {
i := <-ch1
ch2 <- i
}
...
ch1, ch2 := make(chan int), make(chan int)
go Goroutine(ch1, ch2)
go Goroutine(ch2, ch1)
Then, as should be apparent, both goroutines are waiting for the other to send a value, which will never happen.

Resources