Parallel table - driven testing in go fails miserably - go

I have the following test function
func TestIntegrationAppsWithProductionSelf(t *testing.T) {
// here is where the apps array that will act as my test suite is being populated
myapps, err := RetrieveApps(fs)
for _, v := range apps {
v := v
t.Run("", func(t *testing.T) {
t.Parallel()
expectedOutput = `=` + v + `
`
cmpOpts.SingleApp = v
t.Logf("\t\tTesting %s\n", v)
buf, err := VarsCmp(output, cmpOpts)
if err != nil {
t.Fatalf("ERROR executing var comparison for %s: %s\n", v, err)
}
assert.Equal(t, expectedOutput, buf.String())
})
}
}
The test fails, despite the fact that when I remove t.Parallel() (even keeping the sub-testing structure) it succeeds.
The failure (happens as said before only when t.Parallel() is incorporated) has to do with the fact that the values to be compared passed to assert are out of sync, i.e. the assert method compares values that it shouldn't)
Why is that?
I also perform this cryptic re-assignment of the test suite variable (v := v) which I do not understand)
edit: Wandering if it was the usage of the assert method from this package, I made the following substitution, nonetheless the end result is the same,
//assert.Equal(t, expectedOutput, buf.String())
if expectedOutput != buf.String() {
t.Errorf("Failed! Expected %s - Actual: %s\n", expectedOutput, buf.String())
}

Let's dissect the case.
First, let's refer to the docs on testing.T.Run:
Run runs f as a subtest of t called name.
It runs f in a separate goroutine <…>
(Emphasis mine.)
So, when you call t.Run("some_name", someFn), that SomeFn is being run by the test suite as if you would manually do something like
go someFn(t)
Next, let's notice that you do not pass a named function into your call to t.Run, but rather you pass it a so-called function literal; let's cite the language spec on them:
Function literals are closures: they may refer to variables defined in a surrounding function. Those variables are then shared between the surrounding function and the function literal, and they survive as long as they are accessible.
In your case, it means when the compiler compiles the body of your function literal, it makes the function "close over" any variable its body mentions, and which is not one of the formal function parameters; in your case, the only function parameter is t *testing.T, hence every other accessed variable is captured by the created closure.
In Go, when a function literal closes over a variable, it does so by retaining a reference to that variable — which is explicitly mentioned in the spec as («Those variables are then shared between the surrounding function and the function literal <…>», again, emphasis mine.)
Now notice that loops in Go reuse iteration variables on each iteration; that is, when you write
for _, v := range apps {
that variable v is created once in the "outer" scope of the loop and then gets reassigned on each iteration of the loop. To recap: the same variable, whose storage is located at some fixed point in memory, gets assigned a new value on each iteration.
Now, since a function literal closes over external variables by keeping references to them — as opposed to copying their values at the "time" of its definition into itself, — without that funky-looking v := v "trick" each function literal created at each call to t.Run in your loop would reference exactly the same iteration variable v of the loop.
The v := v construct declares another variable named v which is local to the loop's body and at the same time assigns it the value of the loop iteration variable v. Since the local v "shadows" loop iterator's v, the function literal declared afterwards would close over that local variable, and hence each function literal created on each iteration will close over a distinct, separate variable v.
Why is this needed, you may ask?
This is needed because of a subtle problem with the interaction of loop iteration variable and goroutines, which is detailed on the Go wiki:
when one does something like
for _, v := range apps {
go func() {
// use v
}()
}
A function literal closing over v is created, and then it's run with the go statement—in parallel both with the goroutine which runs the loop and with all the other goroutines started on the len(apps)-1 other iterations.
These goroutines running our function literals all refer to the same v and so they all have a data race over that variable: the goroutine running the looop writes to it, and the goroutines running function literals read from it—concurrently and without any synchronization.
I hope, by now you should see the puzzle's pieces coming together: in the code
for _, v := range apps {
v := v
t.Run("", func(t *testing.T) {
expectedOutput = `=` + v + `
// ...
the function literal passed to t.Run closes over v, expectedOutput,
cmpOpts.SingleApp (and may be something else),
and then t.Run() makes that function literal run in a separate goroutine, as documented,—producing the classic data race on expectedOutput and cmpOpts.SingleApp, and whatever else which is not v (a fresh variable on each iteration) or t (passed to the call of the function literal).
You might run go test -race -run=TestIntegrationAppsWithProductionSelf ./... to see the engaged race detector crashing your test case's code.

I am going to post what actually worked, but (unless the question is closed) I will accept the answer that actually elaborates on it.
The problem was that the variable used to store the expectedOutput was declared with a declaration inside the TestIntegrationAppsWithProductionSelf function but outside the for loop (this is now reflected in the code snippet of the initial question).
What worked was to remove the var expectedOutput string statement and do within the for loop
for _, v := range apps {
v := v
expectedOutput := `=` + v + `
`
t.Run("", func(t *testing.T) {
t.Parallel()
cmpOpts.SingleApp = v
t.Logf("\t\tTesting %s\n", v)
buf, err := VarsCmp(output, cmpOpts)
if err != nil {
t.Fatalf("ERROR executing var comparison for %s: %s\n", v, err)
}
//assert.Equal(t, expectedOutput, buf.String())
if expectedOutput != buf.String() {
t.Errorf("Failed! Expected %s - Actual: %s\n", expectedOutput, buf.String())
}
})
}

Related

Can I get a mixture of := and = in Go if statements? [duplicate]

This code:
package main
import (
"fmt"
)
func main() {
fmt.Println("Hello, playground")
var a bool
var b interface{}
b = true
if a, ok := b.(bool); !ok {
fmt.Println("Problem!")
}
}
Yields this error in the golang playground:
tmp/sandbox791966413/main.go:11:10: a declared and not used
tmp/sandbox791966413/main.go:14:21: a declared and not used
This is confusing because of what we read in the short variable declaration golang docs:
Unlike regular variable declarations, a short variable declaration may
redeclare variables provided they were originally declared earlier in
the same block (or the parameter lists if the block is the function
body) with the same type, and at least one of the non-blank variables
is new. As a consequence, redeclaration can only appear in a
multi-variable short declaration. Redeclaration does not introduce a
new variable; it just assigns a new value to the original.
So, my questions:
Why can't I redeclare the variable in the code snippet above?
Supposing I really can't, what I'd really like to do is find
a way to populate variables with the output of functions while
checking the error in a more concise way. So is there any way
to improve on the following form for getting a value out of an
error-able function?
var A RealType
if newA, err := SomeFunc(); err != nil {
return err
} else {
A = newA
}
This is happening because you are using the short variable declaration in an if initialization statement, which introduces a new scope.
Rather than this, where a is a new variable that shadows the existing one:
if a, ok := b.(bool); !ok {
fmt.Println("Problem!")
}
you could do this, where a is redeclared:
a, ok := b.(bool)
if !ok {
fmt.Println("Problem!")
}
This works because ok is a new variable, so you can redeclare a and the passage you quoted is in effect.
Similarly, your second code snippet could be written as:
A, err := SomeFunc()
if err != nil {
return err
}
You did redeclare the variable, that is the problem. When you redeclare a variable, it is a different variable. This is known as "shadowing".
package main
import (
"fmt"
)
func main() {
fmt.Println("Hello, playground")
// original a.
var a bool
var b interface{}
b = true
// redeclared a here. This is a a'
if a, ok := b.(bool); !ok {
// a' will last until the end of this block
fmt.Println("Problem!")
}
// a' is gone. So using a here would get the original a.
}
As for your second question. That code looks great. I would probably switch it to if err == nil (and swap the if and else blocks). But that is a style thing.
The error raised by the compiler is saying that both declarations of variable a are not being used.
You're actually declaring a twice, when using short declaration in the if condition you're creating a new scope that shadows the previous declaration of the variable.
The actual problem you're facing is that you're never using the variable value which, in Go, is considered a compile time error.
Regarding your second question, I think that the shortest way to get the value and check the error is to do something similar to this:
func main() {
a, ok := someFunction()
if !ok {
fmt.Println("Problem!")
}
}

Why does go panic recover to return value with local variable not work?

This panic recover code works with named return values.
func main() {
result, err := foo()
fmt.Println("result:", result)
if err != nil {
fmt.Println("err:", err)
}
}
func foo() (result int, err error) {
defer func() {
if e := recover(); e != nil {
result = -1
err = errors.New(e.(string))
}
}()
bar()
result = 100
err = nil
return
}
func bar() {
panic("panic happened")
}
Output
result: -1
err: panic happened
But why this code with local variables does not work?
func main() {
result, err := foo()
fmt.Println("result:", result)
if err != nil {
fmt.Println("err:", err)
}
}
func foo() (int, error) {
var result int
var err error
defer func() {
if e := recover(); e != nil {
result = -1
err = errors.New(e.(string))
}
}()
bar()
result = 100
err = nil
return result, err
}
func bar() {
panic("panic happened")
}
Output
result: 0
Any explanation to help me understanding the reason / basic concept of it? In the go tour basics the explanation is as followed.
Named return values
Go's return values may be named. If so, they are treated as variables defined at the top of the function.
So it should be the same, right?
Note that this has nothing to do with panic/recover, it is a feature of the defer statement.
... if the deferred function is a function literal and the surrounding
function has named result parameters that are in scope within the
literal, the deferred function may access and modify the result
parameters before they are returned. If the deferred function has
any return values, they are discarded when the function completes.
Spec: Return statements details this:
There are three ways to return values from a function with a result type:
The return value or values may be explicitly listed in the "return" statement. Each expression must be single-valued and assignable to the corresponding element of the function's result type.
The expression list in the "return" statement may be a single call to a multi-valued function. The effect is as if each value returned from that function were assigned to a temporary variable with the type of the respective value, followed by a "return" statement listing these variables, at which point the rules of the previous case apply.
The expression list may be empty if the function's result type specifies names for its result parameters. The result parameters act as ordinary local variables and the function may assign values to them as necessary. The "return" statement returns the values of these variables.
So basically if you use a return statement that explicitly lists the return values, those will be used, regardless if the result parameters are named or not.
If the result parameters are named, they act as ordinary local variables: you can read and write them. If the result parameters are named, you may use a "naked" return statement, without listing the values to return. If you do so, then the actual return values will be the values of the (named) result parameters. The same thing applies if your function does not reach a return statement due to panicing and recovering: once the deferred functions run, the actual return values will be the values of the named result parameters (which the deferred functions can change and "have a say" in what to return).
If you don't use named result parameters but you declare local variables, they are not special in this way: when the function returns, those are not used "automatically" as the result values (like they would be if they would be named result parameters and not local variables). So if you change them in a deferred function, that will not have any effect on the actual values returned. In fact, if you don't use named result parameters and your function panics and recovers, you can't specify the return values, they will be the zero values of the result types. That's why you see result: 0 (0 is the zero value for int) and no error (because error is an interface type and zero value for interface types is nil and you don't print the error if it's nil).
See related: How to return a value in a Go function that panics?
Might be a brief summary for #icza's anwser:
Named return variables use their final values for returning when the function teminate with no panic(return normally or recover from panic), so you can change them in defer recover func(), and the final values changed, so be the return values.
If use local variables, compiler can not know these local variables will be used as return variables until a normal return. Local variables might be changed in panic recover, but
the return statement has not been executed yet because the panic, so the local variables you defined was not treated as return variables, the return values will be the zero values of the return types.

Implicit memory aliasing in for loop

I'm using golangci-lint and I'm getting an error on the following code:
versions []ObjectDescription
... (populate versions) ...
for i, v := range versions {
res := createWorkerFor(&v)
...
}
the error is:
G601: Implicit memory aliasing in for loop. (gosec)
res := createWorkerFor(&v)
^
What does "implicit memory aliasing in for loop" mean, exactly? I could not find any error description in the golangci-lint documentation. I don't understand this error.
The warning means, in short, that you are taking the address of a loop variable.
This happens because in for statements the iteration variable(s) is reused. At each iteration, the value of the next element in the range expression is assigned to the iteration variable; v doesn't change, only its value changes. Hence, the expression &v is referring to the same location in memory.
The following code prints the same memory address four times:
for _, n := range []int{1, 2, 3, 4} {
fmt.Printf("%p\n", &n)
}
When you store the address of the iteration variable, or when you use it in a closure inside the loop, by the time you dereference the pointer, its value might have changed. Static analysis tools will detect this and emit the warning you see.
Common ways to prevent the issue are:
index the ranged slice/array/map. This takes the address of the actual element at i-th position, instead of the iteration variable
for i := range versions {
res := createWorkerFor(&versions[i])
}
reassign the iteration variable inside the loop
for _, v := range versions {
v := v
res := createWorkerFor(&v) // this is now the address of the inner v
}
with closures, pass the iteration variable as argument to the closure
for _, v := range versions {
go func(arg ObjectDescription) {
x := &arg // safe
}(v)
}
In case you dereference sequentially within the loop and you know for sure that nothing is leaking the pointer, you might get away with ignoring this check. However the job of the linter is precisely to report code patterns that could cause issues, so it's a good idea to fix it anyway.
Indexing will solve the problem:
for i := range versions {
res := createWorkerFor(&versions[i])
...
}

How to understand this behavior of goroutine?

package main
import (
"fmt"
"time"
)
type field struct {
name string
}
func (p *field) print() {
fmt.Println(p.name)
}
func main() {
data := []field{ {"one"},{"two"},{"three"} }
for _,v := range data {
go v.print()
}
<-time.After(1 * time.Second)
}
why does this code print 3 "three" instead of "one" "two" "three" in any order?
There is a data race.
The code implicitly takes address of variable v when evaluating arguments to the goroutine function. Note that the call v.print() is shorthand for the call (&v).print().
The loop changes the value of variable v.
When goroutines execute, it so happens that v has the last value of the loop. That's not guaranteed. It could execute as you expected.
It's helpful and easy to run programs with the race detector. This data race is detected and reported by the detector.
One fix is to create another variable scoped to the inside of the loop:
for _, v := range data {
v := v // short variable declaration of new variable `v`.
go v.print()
}
With this change, the address of the inner variable v is taken when evaluating the arguments to the goroutine. There is a unique inner variable v for each iteration of the loop.
Yet another way to fix the problem is use a slice of pointers:
data := []*field{ {"one"},{"two"},{"three"} } // note '*'
for _, v := range data {
go v.print()
}
With this change, the individual pointers in the slice are passed to the goroutine, not the address of the range variable v.
Another fix is to use the address of the slice element:
data := []field{ {"one"},{"two"},{"three"} } // note '*'
for i:= range data {
v := &data[i]
go v.print()
}
Because pointer values are typically used with types having a pointer receiver, this subtle issue does not come up often in practice. Because field has a pointer receiver, it would be typical to use []*field instead of []field for the type of data in the question.
If the goroutine function is in an anonymous function, then a common approach for avoiding the issue is to pass the range variables as an argument to the anonymous function:
for _, v := range data {
go func(v field) {
v.print() // take address of argument v, not range variable v.
}(v)
}
Because the code in the question does not already use an anonymous function for the goroutine, the first approach used in this answer is simpler.
As stated above there’s a race condition it’s result depends on delays on different processes and not well defined and predictable.
For example if you add time.Sleep(1*time.Seconds) you likely to get a correct result. Because usually goroutine prints faster than 1second and will have correct variable v but it’s a very bad way.
Golang has a special race detector tool which helps to find such situations. I recommend read about it while reading testing. Definitely it’s worth it.
There’s another way - explicitly pass variable value at goroutine start:
for _, v := range data {
go func(iv field) {
iv.print()
}(v)
}
Here v will be copied to iv (“internal v”) on every iteration and each goroutine will use correct value.

How to avoid "unused variable in a for loop" error

How to avoid "unused variable in a for loop" error with code like
ticker := time.NewTicker(time.Millisecond * 500)
go func() {
for t := range ticker.C {
fmt.Println("Tick at", t)
}
}()
if I actually don't use the t variable?
You don't need to assign anything, just use for range, like this (on play)
package main
import (
"fmt"
"time"
)
func main() {
ticker := time.NewTicker(time.Millisecond * 500)
go func() {
for range ticker.C {
fmt.Println("Tick")
}
}()
time.Sleep(time.Second * 2)
}
Use a predefined _ variable. It is named "blank identifier" and used as a write-only value when you don't need the actual value of a variable. It's similar to writing a value to /dev/null in Unix.
for _ = range []int{1,2} {
fmt.Println("One more iteration")
}
The blank identifier can be assigned or declared with any value of any type, with the value discarded harmlessly. It's a bit like writing to the Unix /dev/null file: it represents a write-only value to be used as a place-holder where a variable is needed but the actual value is irrelevant.
Update
From Golang docs:
Up until Go 1.3, for-range loop had two forms
for i, v := range x {
...
}
and
for i := range x {
...
}
If one was not interested in the loop values, only the iteration itself, it was still necessary to mention a variable (probably the blank identifier, as in for _ = range x), because the form
for range x {
...
}
was not syntactically permitted.
This situation seemed awkward, so as of Go 1.4 the variable-free form is now legal. The pattern arises rarely but the code can be cleaner when it does.

Resources