How to make sure code has no data races in Go? - go

I'm writing a microservice that calls other microservices, for data that rarely updates (once in a day or once in a month). So I decided to create cache, and implemented this interface:
type StringCache interface {
Get(string) (string, bool)
Put(string, string)
}
internally it's just map[string]cacheItem, where
type cacheItem struct {
data string
expire_at time.Time
}
My coworker says that it's unsafe, and I need add mutex locks in my methods, because it will be used in parallel by different instances of http handler functions. I have a test for it, but it detects no data races, because it uses cache in one goroutine:
func TestStringCache(t *testing.T) {
testDuration := time.Millisecond * 10
cache := NewStringCache(testDuration / 2)
cache.Put("here", "this")
// Value put in cache should be in cache
res, ok := cache.Get("here")
assert.Equal(t, res, "this")
assert.True(t, ok)
// Values put in cache will eventually expire
time.Sleep(testDuration)
res, ok = cache.Get("here")
assert.Equal(t, res, "")
assert.False(t, ok)
}
So, my question is: how to rewrite this test that it detects data race (if it is present) when running with go test -race?

First thing first, the data race detector in Go is not some sort of formal prover which uses static code analysis but is rather a dynamic tool which instruments the compiled code in a special way to try to detect data races at runtime.
What this means is that if the race detector is lucky and it spots a data race, you ought to be sure there is a data race at the reported spot. But this also means that if the actual program flow did not make certain existing data race condition happen, the race detector won't spot and report it.
In oher words, the race detector does not have false positives but it is merely a best-effort tool.
So, in order to write race-free code you really have to rethink your approach.
It's best to start with this classic essay on the topic written by the author of the Go race detector, and once you have absorbed that there is no benign data races, you basically just train yourself to think about concurrently running theads of execution accessing your data each time you're architecting the data and the algorithms to manipulate it.
For instance, you know (at least you should know if you have read the docs) that each incoming request to an HTTP server implemented using net/http is handled by a separate goroutine.
This means, that if you have a central (shared) data structure such as a cache which is to be accessed by the code which processes client requests, you do have multiple goroutines potentially accessing that shared data concurrently.
Now if you have another goroutine which updates that data, you do have a potential for a classic data race: while one goroutine is updating the data, another may read it.
As to the question at hand, two things:
First, Never ever use timers to test stuff. This does not work.
Second, for such a simple case as yours, using merely two goroutines completely suffices:
package main
import (
"testing"
"time"
)
type cacheItem struct {
data string
expire_at time.Time
}
type stringCache struct {
m map[string]cacheItem
exp time.Duration
}
func (sc *stringCache) Get(key string) (string, bool) {
if item, ok := sc.m[key]; !ok {
return "", false
} else {
return item.data, true
}
}
func (sc *stringCache) Put(key, data string) {
sc.m[key] = cacheItem{
data: data,
expire_at: time.Now().Add(sc.exp),
}
}
func NewStringCache(d time.Duration) *stringCache {
return &stringCache{
m: make(map[string]cacheItem),
exp: d,
}
}
func TestStringCache(t *testing.T) {
cache := NewStringCache(time.Minute)
ch := make(chan struct{})
go func() {
cache.Put("here", "this")
close(ch)
}()
_, _ = cache.Get("here")
<-ch
}
Save this as sc_test.go and then
tmp$ go test -race -c -o sc_test ./sc_test.go
tmp$ ./sc_test
==================
WARNING: DATA RACE
Write at 0x00c00009e270 by goroutine 8:
runtime.mapassign_faststr()
/home/kostix/devel/golang-1.13.6/src/runtime/map_faststr.go:202 +0x0
command-line-arguments.(*stringCache).Put()
/home/kostix/tmp/sc_test.go:27 +0x144
command-line-arguments.TestStringCache.func1()
/home/kostix/tmp/sc_test.go:46 +0x62
Previous read at 0x00c00009e270 by goroutine 7:
runtime.mapaccess2_faststr()
/home/kostix/devel/golang-1.13.6/src/runtime/map_faststr.go:107 +0x0
command-line-arguments.TestStringCache()
/home/kostix/tmp/sc_test.go:19 +0x125
testing.tRunner()
/home/kostix/devel/golang-1.13.6/src/testing/testing.go:909 +0x199
Goroutine 8 (running) created at:
command-line-arguments.TestStringCache()
/home/kostix/tmp/sc_test.go:45 +0xe4
testing.tRunner()
/home/kostix/devel/golang-1.13.6/src/testing/testing.go:909 +0x199
Goroutine 7 (running) created at:
testing.(*T).Run()
/home/kostix/devel/golang-1.13.6/src/testing/testing.go:960 +0x651
testing.runTests.func1()
/home/kostix/devel/golang-1.13.6/src/testing/testing.go:1202 +0xa6
testing.tRunner()
/home/kostix/devel/golang-1.13.6/src/testing/testing.go:909 +0x199
testing.runTests()
/home/kostix/devel/golang-1.13.6/src/testing/testing.go:1200 +0x521
testing.(*M).Run()
/home/kostix/devel/golang-1.13.6/src/testing/testing.go:1117 +0x2ff
main.main()
_testmain.go:44 +0x223
==================
--- FAIL: TestStringCache (0.00s)
testing.go:853: race detected during execution of test
FAIL

Related

Lock slice before reading and modifying it

My experience working with Go is recent and in reviewing some code, I have seen that while it is write-protected, there is a problem with reading the data. Not with the reading itself, but with possible modifications that can occur between the reading and the modification of the slice.
type ConcurrentSlice struct {
sync.RWMutex
items []Item
}
type Item struct {
Index int
Value Info
}
type Info struct {
Name string
Labels map[string]string
Failure bool
}
As mentioned, the writing is protected in this way:
func (cs *ConcurrentSlice) UpdateOrAppend(item ScalingInfo) {
found := false
i := 0
for inList := range cs.Iter() {
if item.Name == inList.Value.Name{
cs.items[i] = item
found = true
}
i++
}
if !found {
cs.Lock()
defer cs.Unlock()
cs.items = append(cs.items, item)
}
}
func (cs *ConcurrentSlice) Iter() <-chan ConcurrentSliceItem {
c := make(chan ConcurrentSliceItem)
f := func() {
cs.Lock()
defer cs.Unlock()
for index, value := range cs.items {
c <- ConcurrentSliceItem{index, value}
}
close(c)
}
go f()
return c
}
But between collecting the content of the slice and modifying it, modifications can occur.It may be that another routine modifies the same slice and when it is time to assign a value, it no longer exists: slice[i] = item
What would be the right way to deal with this?
I have implemented this method:
func GetList() *ConcurrentSlice {
if list == nil {
denylist = NewConcurrentSlice()
return denylist
}
return denylist
}
And I use it like this:
concurrentSlice := GetList()
concurrentSlice.UpdateOrAppend(item)
But I understand that between the get and the modification, even if it is practically immediate, another routine may have modified the slice. What would be the correct way to perform the two operations atomically? That the slice I read is 100% the one I modify. Because if I try to assign an item to a index that no longer exists, it will break the execution.
Thank you in advance!
The way you are doing the blocking is incorrect, because it does not ensure that the items you return have not been removed. In case of an update, the array would still be at least the same length.
A simpler solution that works could be the following:
func (cs *ConcurrentSlice) UpdateOrAppend(item ScalingInfo) {
found := false
i := 0
cs.Lock()
defer cs.Unlock()
for _, it := range cs.items {
if item.Name == it.Name{
cs.items[i] = it
found = true
}
i++
}
if !found {
cs.items = append(cs.items, item)
}
}
Use a sync.Map if the order of the values is not important.
type Items struct {
m sync.Map
}
func (items *Items) Update(item Info) {
items.m.Store(item.Name, item)
}
func (items *Items) Range(f func(Info) bool) {
items.m.Range(func(key, value any) bool {
return f(value.(Info))
})
}
Data structures 101: always pick the best data structure for your use case. If you’re going to be looking up objects by name, that’s EXACTLY what map is for. If you still need to maintain the order of the items, you use a treemap
Concurrency 101: like transactions, your mutex should be atomic, consistent, and isolated. You’re failing isolation here because the data structure read does not fall inside your mutex lock.
Your code should look something like this:
func {
mutex.lock
defer mutex.unlock
check map or treemap for name
if exists update
else add
}
After some tests, I can say that the situation you fear can indeed happen with sync.RWMutex. I think it could happen with sync.Mutex too, but I can't reproduce that. Maybe I'm missing some informations, or maybe the calls are in order because they all are blocked and the order they redeem the right to lock is ordered in some way.
One way to keep your two calls safe without other routines getting in 'conflict' would be to use an other mutex, for every task on that object. You would lock that mutex before your read and write, and release it when you're done. You would also have to use that mutex on any other call that write (or read) to that object. You can find an implementation of what I'm talking about here in the main.go file. In order to reproduce the issue with RWMutex, you can simply comment the startTask and the endTask calls and the issue is visible in the terminal output.
EDIT : my first answer was wrong as I misinterpreted a test result, and fell in the situation described by OP.
tl;dr;
If ConcurrentSlice is to be used from a single goroutine, the locks are unnecessary, because the way algorithm written there is not going to be any concurrent read/writes to slice elements, or the slice.
If ConcurrentSlice is to be used from multiple goroutines, existings locks are not sufficient. This is because UpdateOrAppend may modify slice elements concurrently.
A safe version woule need two versions of Iter:
This can be called by users of ConcurrentSlice, but it cannot be called from `UpdateOrAppend:
func (cs *ConcurrentSlice) Iter() <-chan ConcurrentSliceItem {
c := make(chan ConcurrentSliceItem)
f := func() {
cs.RLock()
defer cs.RUnlock()
for index, value := range cs.items {
c <- ConcurrentSliceItem{index, value}
}
close(c)
}
go f()
return c
}
and this is only to be called from UpdateOrAppend:
func (cs *ConcurrentSlice) internalIter() <-chan ConcurrentSliceItem {
c := make(chan ConcurrentSliceItem)
f := func() {
// No locking
for index, value := range cs.items {
c <- ConcurrentSliceItem{index, value}
}
close(c)
}
go f()
return c
}
And UpdateOrAppend should be synchronized at the top level:
func (cs *ConcurrentSlice) UpdateOrAppend(item ScalingInfo) {
cs.Lock()
defer cs.Unlock()
....
}
Here's the long version:
This is an interesting piece of code. Based on my understanding of the go memory model, the mutex lock in Iter() is only necessary if there is another goroutine working on this code, and even with that, there is a possible race in the code. However, UpdateOrAppend only modifies elements of the slice with lower indexes than what Iter is working on, so that race never manifests itself.
The race can happen as follows:
The for-loop in iter reads element 0 of the slice
The element is sent through the channel. Thus, the slice receive happens after the first step.
The receiving end potentially updates element 0 of the slice. There is no problem up to here.
Then the sending goroutine reads element 1 of the slice. This is when a race can happen. If step 3 updated index 1 of the slice, the read at step 4 is a race. That is: if step 3 reads the update done by step 4, it is a race. You can see this if you start with i:=1 in UpdateOrAppend, and running it with the -race flag.
But UpdateOrAppend always modifies slice elements that are already seen by Iter when i=0, so this code is safe, even without the lock.
If there will be other goroutines accessing and modifying the structure, you need the Mutex, but you need it to protect the complete UpdateOrAppend method, because only one goroutine should be allowed to run that. You need the mutex to protect the potential updates in the first for-loop, and that mutex has to also include the slice append case, because that may actually modify the slice of the underlying object.
If Iter is only called from UpdateOrAppend, then this single mutex should be sufficient. If however Iter can be called from multiple goroutines, then there is another race possibility. If one UpdateOrAppend is running concurrently with multiple Iter instances, then some of those Iter instances will read from the modified slice elements concurrently, causing a race. So, it should be such that multiple Iters can only run if there are no UpdateOrAppend calls. That is a RWMutex.
But Iter can be called from UpdateOrAppend with a lock, so it cannot really call RLock, otherwise it is a deadlock.
Thus, you need two versions of Iter: one that can be called outside UpdateOrAppend, and that issues RLock in the goroutine, and another that can only be called from UpdateOrAppend and does not call RLock.

Thread within struct, function arguments too large for new goroutine

I created this simple app to demonstrate the issue I was having.
package main
import (
"fmt"
"unsafe"
"sync"
)
type loc_t struct {
count [9999]int64
Counter int64
}
func (l loc_t) rampUp (wg *sync.WaitGroup) {
defer wg.Done()
l.Counter += 1
}
func main() {
wg := new(sync.WaitGroup)
loc := loc_t{}
fmt.Println(unsafe.Sizeof(loc))
wg.Add(1)
go loc.rampUp(wg)
wg.Wait()
fmt.Println(loc.Counter)
}
If I run the above I will get a fatal error: newproc: function arguments too large for new goroutine
runtime stack:
runtime: unexpected return pc for runtime.systemstack called from 0x0
Now the reason for that is the 2k stack size when a go is used to spawn a background task. What's interesting is I'm only passing a pointer the called function. This issue happened to me in production, different struct obviously, everything was working for a year, and then all of sudden it started throwing this error.
Method receivers are passed to method calls, just like any other parameter. So if the method has a non-pointer receiver, the whole struct in your case will be copied. The easiest solution would be to use a pointer receiver, if you can.
If you must use a non-pointer receiver, then you can circumvent this by not launching the method call as the goroutine but another function, possibly a function literal:
go func() {
loc.rampUp(wg)
}()
If the loc variable may be modified concurrently (before the launched goroutine would get scheduled and copy it for the rampUp() method), you can create a copy of it manually and use that in the goroutine, like this:
loc2 := loc
wg.Add(1)
go func() {
loc2.rampUp(wg)
}()
These solutions work because launching the new goroutine does not require big initial stack, so the initial stack limit will not get in the way. And the stack size is dynamic, so after the launch it will grow as needed. Details can be read here: Does Go have an "infinite call stack" equivalent?
The issue with the stack size is, obviously, the size of the struct itself. So as your struct grows organically, you may, as I did, cross that 2k stack call size.
The above problem can be fixed by using a pointer to the struct in the function declaration.
func (l *loc_t) rampUp (wg *sync.WaitGroup) {
defer wg.Done()
l.Counter += 1
}
This creates a pointer to the struct, so all that goes to the stack is the pointer, instead of an entire copy of the struct.
Obviously this can have other implications including race conditions if you're making the call in several threads at once. But as a solution to an ever growing struct that will suddenly start causing stack overflows, it's a solution.
Anyway, hope this is helpful to someone else out there.

Is there a resource leak here?

func First(query string, replicas ...Search) Result {
c := make(chan Result)
searchReplica := func(i int) {
c <- replicas[i](query)
}
for i := range replicas {
go searchReplica(i)
}
return <-c
}
This function is from the slides of Rob Pike on go concurrency patterns in 2012. I think there is a resource leak in this function. As the function return after the first send & receive pair happens on channel c, the other go routines try to send on channel c. So there is a resource leak here. Anyone knows golang well can confirm this? And how can I detect this leak using what kind of golang tooling?
Yes, you are right (for reference, here's the link to the slide). In the above code only one launched goroutine will terminate, the rest will hang on attempting to send on channel c.
Detailing:
c is an unbuffered channel
there is only a single receive operation, in the return statement
A new goroutine is launched for each element of replicas
each launched goroutine sends a value on channel c
since there is only 1 receive from it, one goroutine will be able to send a value on it, the rest will block forever
Note that depending on the number of elements of replicas (which is len(replicas)):
if it's 0: First() would block forever (no one sends anything on c)
if it's 1: would work as expected
if it's > 1: then it leaks resources
The following modified version will not leak goroutines, by using a non-blocking send (with the help of select with default branch):
searchReplica := func(i int) {
select {
case c <- replicas[i](query):
default:
}
}
The first goroutine ready with the result will send it on channel c which will be received by the goroutine running First(), in the return statement. All other goroutines when they have the result will attempt to send on the channel, and "seeing" that it's not ready (send would block because nobody is ready to receive from it), the default branch will be chosen, and thus the goroutine will end normally.
Another way to fix it would be to use a buffered channel:
c := make(chan Result, len(replicas))
And this way the send operations would not block. And of course only one (the first sent) value will be received from the channel and returned.
Note that the solution with any of the above fixes would still block if len(replicas) is 0. To avoid that, First() should check this explicitly, e.g.:
func First(query string, replicas ...Search) Result {
if len(replicas) == 0 {
return Result{}
}
// ...rest of the code...
}
Some tools / resources to detect leaks:
https://github.com/fortytw2/leaktest
https://github.com/zimmski/go-leak
https://medium.com/golangspec/goroutine-leak-400063aef468
https://blog.minio.io/debugging-go-routine-leaks-a1220142d32c

Is `make(chan _, _)` atomic?

Is it thread-safe to modify the channel that a consumer is reading from?
Consider the following code:
func main(){
channel := make(chan int, 3)
channel_ptr := &channel
go supplier (channel_ptr)
go consumer (channel_ptr)
temp = *channel_ptr
// Important bit
*channel_ptr = make(chan int, 5)
more := true
for more{
select {
case msg := <-temp:
*channel_ptr <- msg
default:
more = false
}
}
// Block main indefinitely to keep the children alive
<-make(chan bool)
}
func consumer(c *chan int){
for true{
fmt.Println(<-(*c))
}
}
func supplier(c *chan int){
for i :=0; i < 5; i ++{
(*c)<-i
}
}
If channels and make work the way that I want them to, I should get the following properties:
The program always outputs 0 1 2 3 4
The program will never panic from trying to read from a non-initialized channel (IE, the part I labelled Important bit is atomic)
From several test runs, this seems to be true, but I can't find it anywhere in the documentation and I'm worried about subtle race conditions.
Update
Yeah, what I was doing doesn't work. This thread is probably buried at this point, but does anybody know how to dynamically resize a buffered channel?
It's not thread safe.
If you run with -race flag to use race detector, you'll see the bug:
$ run -race t.go
==================
WARNING: DATA RACE
Write at 0x00c420086018 by main goroutine:
main.main()
/Users/kjk/src/go/src/github.com/kjk/go-cookbook/start-mysql-in-docker-go/t.go:14 +0x128
Previous read at 0x00c420086018 by goroutine 6:
main.supplier()
/Users/kjk/src/go/src/github.com/kjk/go-cookbook/start-mysql-in-docker-go/t.go:37 +0x51
Goroutine 6 (running) created at:
main.main()
/Users/kjk/src/go/src/github.com/kjk/go-cookbook/start-mysql-in-docker-go/t.go:9 +0xb4
0
==================
1
2
3
==================
WARNING: DATA RACE
Read at 0x00c420086018 by goroutine 6:
main.supplier()
/Users/kjk/src/go/src/github.com/kjk/go-cookbook/start-mysql-in-docker-go/t.go:37 +0x51
Previous write at 0x00c420086018 by main goroutine:
main.main()
/Users/kjk/src/go/src/github.com/kjk/go-cookbook/start-mysql-in-docker-go/t.go:14 +0x128
Goroutine 6 (running) created at:
main.main()
/Users/kjk/src/go/src/github.com/kjk/go-cookbook/start-mysql-in-docker-go/t.go:9 +0xb4
==================
4
As a rule of thumb, you should never pass channel as a pointer. Channel already is a pointer internally.
Stepping back a bit: I don't understand what you're trying to achieve.
I guess there's a reason you're trying to pass a channel as a pointer. The pattern of using channels in Go is: you create it once and you pass it around as value. You don't pass a pointer to it and you never modify it after creation.
In your example the problem is that you have a shared piece of memory (memory address pointed to by channel_ptr) and you write to that memory in one thread while some other thread reads it. That's data race.
It's not specific to a channel, you would have the same issue if it was pointer to an int and two threads were modifying the value of an int.

How to use channels to safely synchronise data in Go

Below is an example of how to use mutex lock in order to safely access data. How would I go about doing the same with the use of CSP (communication sequential processes) instead of using mutex lock’s and unlock’s?
type Stack struct {
top *Element
size int
sync.Mutex
}
func (ss *Stack) Len() int {
ss.Lock()
size := ss.size
ss.Unlock()
return size
}
func (ss *Stack) Push(value interface{}) {
ss.Lock()
ss.top = &Element{value, ss.top}
ss.size++
ss.Unlock()
}
func (ss *SafeStack) Pop() (value interface{}) {
ss.Lock()
size := ss.size
ss.Unlock()
if size > 0 {
ss.Lock()
value, ss.top = ss.top.value, ss.top.next
ss.size--
ss.Unlock()
return
}
return nil
}
If you actually were to look at how Go implements channels, you'd essentially see a mutex around an array with some additional thread handling to block execution until the value is passed through. A channel's job is to move data from one spot in memory to another with ease. Therefore where you have locks and unlocks, you'd have things like this example:
func example() {
resChan := make(int chan)
go func(){
resChan <- 1
}()
go func(){
res := <-resChan
}
}
So in the example, the first goroutine is blocked after sending the value until the second goroutine reads from the channel.
To do this in Go with mutexes, one would use sync.WaitGroup which will add one to the group on setting the value, then release it from the group and the second goroutine will lock and then unlock the value.
The oddities in your example are 1 no goroutines, so it's all happening in a single main goroutine and the locks are being used more traditionally (as in c thread like) so channels won't really accomplish anything. The example you have would be considered an anti-pattern, like the golang proverb says "Don't communicate by sharing memory, share memory by communicating."

Resources