Why do we need to copy the value of u here? - go

I'm a beginner in Go and following an online course where this bit of code was used as an example:
func ConcurrentMutex(url string, fetcher Fetcher, f *fetchState) {
var done sync.WaitGroup
for _, u := range urls {
done.Add(1)
u2 := u
go func() {
defer done.Done()
ConcurrentMutex(u2, fetcher, f)
}()
//go func(u string) {
// defer done.Done()
// ConcurrentMutex(u, fetcher, f)
//}(u)
}
done.Wait()
return
}
The type of u is a string, and the way I see it, we should be able to pass u to the ConcurrentMutex call in the inner function without having to copy its value to u2. However, the professor maintained that the Go memory model meant that as we are continuously changing the value of u, it can affect the different calls to ConcurrentMutex.
I'm still not entirely sure why. Shouldn't the value of u at the point the function is called be passed to the function? I would have understood if we were passing a pointer, but as we're not, it's confusing me.
Can someone please explain how Go's memory model interprets this block?
Note: The commented out inner function was the original one used in the example in the lecture video but was changed in the lecture note. It looks to me like both are equivalent, so I guess the question applies to both.

What you are using is closure. u2 is shared between the inner function and the function surrounding it. Which means with every iteration as u2 is modified, modified value is what is visible to inner function. Better way of writing is using the code that has been commented out. By explicitly passing a value to go routine, you ensure that go routine carries it's own copy that will not be modified by surrounding function.
What go specification says about this: Go specification
Function literals A function literal represents an anonymous function.
FunctionLit = "func" Signature FunctionBody . func(a, b int, z
float64) bool { return a*b < int(z) } A function literal can be
assigned to a variable or invoked directly.
f := func(x, y int) int { return x + y } func(ch chan int) { ch <- ACK
}(replyChan) Function literals are closures: they may refer to
variables defined in a surrounding function. Those variables are then
shared between the surrounding function and the function literal, and
they survive as long as they are accessible.
Hope this answers your question.

Related

Lock slice before reading and modifying it

My experience working with Go is recent and in reviewing some code, I have seen that while it is write-protected, there is a problem with reading the data. Not with the reading itself, but with possible modifications that can occur between the reading and the modification of the slice.
type ConcurrentSlice struct {
sync.RWMutex
items []Item
}
type Item struct {
Index int
Value Info
}
type Info struct {
Name string
Labels map[string]string
Failure bool
}
As mentioned, the writing is protected in this way:
func (cs *ConcurrentSlice) UpdateOrAppend(item ScalingInfo) {
found := false
i := 0
for inList := range cs.Iter() {
if item.Name == inList.Value.Name{
cs.items[i] = item
found = true
}
i++
}
if !found {
cs.Lock()
defer cs.Unlock()
cs.items = append(cs.items, item)
}
}
func (cs *ConcurrentSlice) Iter() <-chan ConcurrentSliceItem {
c := make(chan ConcurrentSliceItem)
f := func() {
cs.Lock()
defer cs.Unlock()
for index, value := range cs.items {
c <- ConcurrentSliceItem{index, value}
}
close(c)
}
go f()
return c
}
But between collecting the content of the slice and modifying it, modifications can occur.It may be that another routine modifies the same slice and when it is time to assign a value, it no longer exists: slice[i] = item
What would be the right way to deal with this?
I have implemented this method:
func GetList() *ConcurrentSlice {
if list == nil {
denylist = NewConcurrentSlice()
return denylist
}
return denylist
}
And I use it like this:
concurrentSlice := GetList()
concurrentSlice.UpdateOrAppend(item)
But I understand that between the get and the modification, even if it is practically immediate, another routine may have modified the slice. What would be the correct way to perform the two operations atomically? That the slice I read is 100% the one I modify. Because if I try to assign an item to a index that no longer exists, it will break the execution.
Thank you in advance!
The way you are doing the blocking is incorrect, because it does not ensure that the items you return have not been removed. In case of an update, the array would still be at least the same length.
A simpler solution that works could be the following:
func (cs *ConcurrentSlice) UpdateOrAppend(item ScalingInfo) {
found := false
i := 0
cs.Lock()
defer cs.Unlock()
for _, it := range cs.items {
if item.Name == it.Name{
cs.items[i] = it
found = true
}
i++
}
if !found {
cs.items = append(cs.items, item)
}
}
Use a sync.Map if the order of the values is not important.
type Items struct {
m sync.Map
}
func (items *Items) Update(item Info) {
items.m.Store(item.Name, item)
}
func (items *Items) Range(f func(Info) bool) {
items.m.Range(func(key, value any) bool {
return f(value.(Info))
})
}
Data structures 101: always pick the best data structure for your use case. If you’re going to be looking up objects by name, that’s EXACTLY what map is for. If you still need to maintain the order of the items, you use a treemap
Concurrency 101: like transactions, your mutex should be atomic, consistent, and isolated. You’re failing isolation here because the data structure read does not fall inside your mutex lock.
Your code should look something like this:
func {
mutex.lock
defer mutex.unlock
check map or treemap for name
if exists update
else add
}
After some tests, I can say that the situation you fear can indeed happen with sync.RWMutex. I think it could happen with sync.Mutex too, but I can't reproduce that. Maybe I'm missing some informations, or maybe the calls are in order because they all are blocked and the order they redeem the right to lock is ordered in some way.
One way to keep your two calls safe without other routines getting in 'conflict' would be to use an other mutex, for every task on that object. You would lock that mutex before your read and write, and release it when you're done. You would also have to use that mutex on any other call that write (or read) to that object. You can find an implementation of what I'm talking about here in the main.go file. In order to reproduce the issue with RWMutex, you can simply comment the startTask and the endTask calls and the issue is visible in the terminal output.
EDIT : my first answer was wrong as I misinterpreted a test result, and fell in the situation described by OP.
tl;dr;
If ConcurrentSlice is to be used from a single goroutine, the locks are unnecessary, because the way algorithm written there is not going to be any concurrent read/writes to slice elements, or the slice.
If ConcurrentSlice is to be used from multiple goroutines, existings locks are not sufficient. This is because UpdateOrAppend may modify slice elements concurrently.
A safe version woule need two versions of Iter:
This can be called by users of ConcurrentSlice, but it cannot be called from `UpdateOrAppend:
func (cs *ConcurrentSlice) Iter() <-chan ConcurrentSliceItem {
c := make(chan ConcurrentSliceItem)
f := func() {
cs.RLock()
defer cs.RUnlock()
for index, value := range cs.items {
c <- ConcurrentSliceItem{index, value}
}
close(c)
}
go f()
return c
}
and this is only to be called from UpdateOrAppend:
func (cs *ConcurrentSlice) internalIter() <-chan ConcurrentSliceItem {
c := make(chan ConcurrentSliceItem)
f := func() {
// No locking
for index, value := range cs.items {
c <- ConcurrentSliceItem{index, value}
}
close(c)
}
go f()
return c
}
And UpdateOrAppend should be synchronized at the top level:
func (cs *ConcurrentSlice) UpdateOrAppend(item ScalingInfo) {
cs.Lock()
defer cs.Unlock()
....
}
Here's the long version:
This is an interesting piece of code. Based on my understanding of the go memory model, the mutex lock in Iter() is only necessary if there is another goroutine working on this code, and even with that, there is a possible race in the code. However, UpdateOrAppend only modifies elements of the slice with lower indexes than what Iter is working on, so that race never manifests itself.
The race can happen as follows:
The for-loop in iter reads element 0 of the slice
The element is sent through the channel. Thus, the slice receive happens after the first step.
The receiving end potentially updates element 0 of the slice. There is no problem up to here.
Then the sending goroutine reads element 1 of the slice. This is when a race can happen. If step 3 updated index 1 of the slice, the read at step 4 is a race. That is: if step 3 reads the update done by step 4, it is a race. You can see this if you start with i:=1 in UpdateOrAppend, and running it with the -race flag.
But UpdateOrAppend always modifies slice elements that are already seen by Iter when i=0, so this code is safe, even without the lock.
If there will be other goroutines accessing and modifying the structure, you need the Mutex, but you need it to protect the complete UpdateOrAppend method, because only one goroutine should be allowed to run that. You need the mutex to protect the potential updates in the first for-loop, and that mutex has to also include the slice append case, because that may actually modify the slice of the underlying object.
If Iter is only called from UpdateOrAppend, then this single mutex should be sufficient. If however Iter can be called from multiple goroutines, then there is another race possibility. If one UpdateOrAppend is running concurrently with multiple Iter instances, then some of those Iter instances will read from the modified slice elements concurrently, causing a race. So, it should be such that multiple Iters can only run if there are no UpdateOrAppend calls. That is a RWMutex.
But Iter can be called from UpdateOrAppend with a lock, so it cannot really call RLock, otherwise it is a deadlock.
Thus, you need two versions of Iter: one that can be called outside UpdateOrAppend, and that issues RLock in the goroutine, and another that can only be called from UpdateOrAppend and does not call RLock.

GO - Pointer or Variable in WaitGroups reference

According the following function declarations from sync package:
Add -------> func (wg *WaitGroup) Add(delta int)
Done ------> func (wg *WaitGroup) Done()
Wait ------> func (wg *WaitGroup) Wait()
I understand that all 3 of them are called by a pointer to a WaitGroup, right?
If this is correct, I don't understand in the next pice of code, why Done function is called using a pointer variable, but Add and Wait functions are called using a variable (not a pointer):
package main
import (
"fmt"
"sync"
"time"
)
func worker(id int, wg *sync.WaitGroup) {
defer wg.Done()
fmt.Printf("Worker %d starting\n", id)
time.Sleep(time.Second)
fmt.Printf("Worker %d done\n", id)
}
func main() {
var wg sync.WaitGroup
for i := 1; i <= 5; i++ {
wg.Add(1)
go worker(i, &wg)
}
wg.Wait()
}
Thanks a lot for your help.
Done, Add and Wait are called on pointer. All functions refer to a pointer receiver *WaitGroup. The fact that you declare variable as value of WaitGroup doesn't mean much as all those methods will all access and modify the variable. The only problem happens when you want to pass your variable to worker - if you try to pass it as value you will make a copy and then Done will be referring to different pointer than Add and Wait - that's why you pass it's address with &.
I think here is best explanation I so far seen on the topic: https://github.com/golang/go/wiki/MethodSets#variables
In general, when you have a variable of a type, you can pretty much call whatever you want on it. When you combine the two rules above together, the following is valid:
type List []int
func (l List) Len() int { return len(l) }
func (l *List) Append(val int) { *l = append(*l, val) }
func main() {
// A bare value
var lst List
lst.Append(1)
fmt.Printf("%v (len: %d)\n", lst, lst.Len())
// A pointer value
plst := new(List)
plst.Append(2)
fmt.Printf("%v (len: %d)\n", plst, plst.Len())
}
Note that both pointer and value methods can both be called on both pointer and non-pointer values. To understand why, let's examine the method sets of both types, directly from the spec:
List
- Len() int
*List
- Len() int
- Append(int)
Notice that the method set for List does not actually contain Append(int) even though you can see from the above program that you can call the method without a problem. This is a result of the second spec section above. It implicitly translates the first line below into the second:
lst.Append(1)
(&lst).Append(1)
Now that the value before the dot is a *List, its method set includes Append, and the call is legal.
To make it easier to remember these rules, it may be helpful to simply consider the pointer- and value-receiver methods separately from the method set. It is legal to call a pointer-valued method on anything that is already a pointer or whose address can be taken (as is the case in the above example). It is legal to call a value method on anything which is a value or whose value can be dereferenced (as is the case with any pointer; this case is specified explicitly in the spec).

How to understand this behavior of goroutine?

package main
import (
"fmt"
"time"
)
type field struct {
name string
}
func (p *field) print() {
fmt.Println(p.name)
}
func main() {
data := []field{ {"one"},{"two"},{"three"} }
for _,v := range data {
go v.print()
}
<-time.After(1 * time.Second)
}
why does this code print 3 "three" instead of "one" "two" "three" in any order?
There is a data race.
The code implicitly takes address of variable v when evaluating arguments to the goroutine function. Note that the call v.print() is shorthand for the call (&v).print().
The loop changes the value of variable v.
When goroutines execute, it so happens that v has the last value of the loop. That's not guaranteed. It could execute as you expected.
It's helpful and easy to run programs with the race detector. This data race is detected and reported by the detector.
One fix is to create another variable scoped to the inside of the loop:
for _, v := range data {
v := v // short variable declaration of new variable `v`.
go v.print()
}
With this change, the address of the inner variable v is taken when evaluating the arguments to the goroutine. There is a unique inner variable v for each iteration of the loop.
Yet another way to fix the problem is use a slice of pointers:
data := []*field{ {"one"},{"two"},{"three"} } // note '*'
for _, v := range data {
go v.print()
}
With this change, the individual pointers in the slice are passed to the goroutine, not the address of the range variable v.
Another fix is to use the address of the slice element:
data := []field{ {"one"},{"two"},{"three"} } // note '*'
for i:= range data {
v := &data[i]
go v.print()
}
Because pointer values are typically used with types having a pointer receiver, this subtle issue does not come up often in practice. Because field has a pointer receiver, it would be typical to use []*field instead of []field for the type of data in the question.
If the goroutine function is in an anonymous function, then a common approach for avoiding the issue is to pass the range variables as an argument to the anonymous function:
for _, v := range data {
go func(v field) {
v.print() // take address of argument v, not range variable v.
}(v)
}
Because the code in the question does not already use an anonymous function for the goroutine, the first approach used in this answer is simpler.
As stated above there’s a race condition it’s result depends on delays on different processes and not well defined and predictable.
For example if you add time.Sleep(1*time.Seconds) you likely to get a correct result. Because usually goroutine prints faster than 1second and will have correct variable v but it’s a very bad way.
Golang has a special race detector tool which helps to find such situations. I recommend read about it while reading testing. Definitely it’s worth it.
There’s another way - explicitly pass variable value at goroutine start:
for _, v := range data {
go func(iv field) {
iv.print()
}(v)
}
Here v will be copied to iv (“internal v”) on every iteration and each goroutine will use correct value.

Golang : interface to swap two numbers

I want to swap two numbers using interface but the interface concept is so confusing to me.
http://play.golang.org/p/qhwyxMRj-c
This is the code and playground. How do I use interface and swap two input numbers? Do I need to define two structures?
type num struct {
value interface{}
}
type numbers struct {
b *num
c *num
}
func (a *num) SwapNum(var1, var2 interface{}) {
var a num
temp := var1
var1 = var2
var2 = temp
}
func main() {
a := 1
b := 2
c := 3.5
d := 5.5
SwapNum(a, b)
fmt.Println(a, b) // 2 1
SwapNum(c, d)
fmt.Println(c, d) // 5.5 3.5
}
First of all, the interface{} type is simply a type which accepts all values as it is an interface with an empty method set and every type can satisfy that. int for example does not have any methods, neither does interface{}.
For a method which swaps the values of two variables you first need to make sure these variables are actually modifiable. Values passed to a function are always copied (except reference types like slices and maps but that is not our concern at the moment). You can achieve modifiable parameter by using a pointer to the variable.
So with that knowledge you can go on and define SwapNum like this:
func SwapNum(a interface{}, b interface{})
Now SwapNum is a function that accepts two parameters of any type.
You can't write
func SwapNum(a *interface{}, b *interface{})
as this would only accept parameters of type *interface{} and not just any type.
(Try it for yourself here).
So we have a signature, the only thing left is swapping the values.
func SwapNum(a interface{}, b interface{}) {
*a, *b = *b, *a
}
No, this will not work that way. By using interface{} we must do runtime type assertions to check whether we're doing the right thing or not. So the code must be expanded using the reflect package. This article might get you started if you don't know about reflection.
Basically we will need this function:
func SwapNum(a interface{}, b interface{}) {
ra := reflect.ValueOf(a).Elem()
rb := reflect.ValueOf(b).Elem()
tmp := ra.Interface()
ra.Set(rb)
rb.Set(reflect.ValueOf(tmp))
}
This code makes a reflection of a and b using reflect.ValueOf() so that we can
inspect it. In the same line we're assuming that we've got pointer values and dereference
them by calling .Elem() on them.
This basically translates to ra := *a and rb := *b.
After that, we're making a copy of *a by requesting the value using .Interface()
and assigning it (effectively making a copy).
Finally, we set the value of a to b with [ra.Set(rb)]5, which translates to *a = *b
and then assigning b to a, which we stored in the temp. variable before. For this,
we need to convert tmp back to a reflection of itself so that rb.Set() can be used
(it takes a reflect.Value as parameter).
Can we do better?
Yes! We can make the code more type safe, or better, make the definition of Swap type safe
by using reflect.MakeFunc. In the doc (follow the link) is an example which is very
like what you're trying. Essentially you can fill a function prototype with content
by using reflection. As you supplied the prototype (the signature) of the function the
compiler can check the types, which it can't when the value is reduced to interface{}.
Example usage:
var intSwap func(*int, *int)
a,b := 1, 0
makeSwap(&intSwap)
intSwap(&a, &b)
// a is now 0, b is now 1
The code behind this:
swap := func(in []reflect.Value) []reflect.Value {
ra := in[0].Elem()
rb := in[1].Elem()
tmp := ra.Interface()
ra.Set(rb)
rb.Set(reflect.ValueOf(tmp))
return nil
}
makeSwap := func(fptr interface{}) {
fn := reflect.ValueOf(fptr).Elem()
v := reflect.MakeFunc(fn.Type(), swap)
fn.Set(v)
}
The code of swap is basically the same as that of SwapNum. makeSwap is the same
as the one used in the docs where it is explained pretty well.
Disclaimer: The code above makes a lot of assumptions about what is given and
what the values look like. Normally you need to check, for example, that the given
values to SwapNum actually are pointer values and so forth. I left that out for
reasons of clarity.

Compare function values in Go

Normal use of function variables in Go allows them to be compared only to nil, not to one another. The reason for this (as it's been explained to me) is that, since Go has closures, the definition of equality is fuzzy. If I have two different closures with different values bound to local variables, but which use the same underlying function, should they be considered equal or unequal?
However, I do want to be able to make such a comparison. In particular, I have code like this (except, in my real code, the check is actually necessary - this is just a dummy example), where I compare a function pointer to a function literal:
func getFunc(which bool) (func ()) {
if which {
return func1
} else {
return func2
}
}
func func1() { }
func func2() { }
f := getFunc(true)
if f == func1 {
fmt.Println("func1")
} else {
fmt.Println("func2")
}
Is there any way, for example using the reflect or unsafe packages, to get this to work?
You could compare the functions by name:
f := getFunc(true)
f1 := runtime.FuncForPC(reflect.ValueOf(f).Pointer()).Name()
f2 := runtime.FuncForPC(reflect.ValueOf(func1).Pointer()).Name()
if f1 == f2 {
fmt.Println("func1")
} else {
fmt.Println("func2")
}
But this relies on both the reflect and runtime packages. Probably not a good idea to do this.
Do you really need to compare functions? I would consider an alternative if possible.
Extending upon #Luke's answer, it appears that I can directly test pointer equality. Note that this is really iffy. To quote the reflect.Value.Pointer() documentation:
If v's Kind is Func, the returned pointer is an underlying code
pointer, but not necessarily enough to identify a single function
uniquely. The only guarantee is that the result is zero if and only if
v is a nil func Value.
That said, here's what you can do:
f := getFunc(true)
f1 := reflect.ValueOf(f).Pointer()
f2 := reflect.ValueOf(func1).Pointer()
eq := f1 == f2
Note that I did run a battery of tests (which I had used to regression-test the code that resulted from #Luke's answer) against this new version, and they all passed, which leads me to believe that the warning issued in the reflect documentation may be OK to ignore, but then, ignoring documentation is really never a good idea...
If all the functions you want to compare have the same signature, you could do something like this:
type cmpFunc struct {
f func()
id uint64
}
func (c *cmpFunc) call() { c.f() }
func (c *cmpFunc) equals(other *cmpFunc) { return c.id == other.id }
makeComparable(f func()) *cmpFunc {
return &cmpFunc{f, get_uniq_id()}
}
Where get_uniq_id does what it says on the box. This gets a bit uglier because Go doesn't have () overloading, and it's more or less impossible without generics if you want to do this for functions in general. But this should work pretty well for your purposes.

Resources