I am running a parallel test using go test. It's a scenario-based test and test runs based on provided JSON data.
Sample JSON data looks like below
[
{
ID: 1,
dependency: []
...
},
{
ID: 2
dependency: [1]
...
},
{
ID: 3
dependency: [2]
...
},
{
ID: 4
dependency: [1,2]
...
}
]
Here ID is identifier of a testcase and dependency is referring to the testcases which should run before a testcase.
Circular dependency checker is implemented and it's making sure that no circular dependency is on JSON.
Test code look like below.
t.Run(file, func(t *testing.T) {
t.Parallel()
var tcs []TestCase
bytes := ReadFile(file, t)
json.Unmarshal(bytes, &tcs)
CheckCircularDependencies(tcs, t) // check circular dependency graph
for idx, tc := range tcs {
queues = append(queues, queitem{
idx: idx
stepID: step.ID,
status: NotStarted,
....
})
}
for idx, tc := range tcs {
UpdateWorkQueStatus(tc, InProgress, t)
ProcessQueItem(tc, t)
}
})
func ProcessQueItem(tc TestCase, t *testing.T) {
WaitForDependencies(tc, t) // this waits for all the dependency test cases to be finished
...
// DO TEST code which takes about 2-3s
// Provide a command to microservices
// Wait for 1-2s for the command to be processed
// If not processed within 1-2s then wait again for 1-2s
// Once command is processed, just check the result and finish test case
UpdateWorkQueStatus(tc, Done, t)
})
}
func WaitForDependencies(tc TestCase, t *testing.T) {
if !GoodToGoForTestCase(tc, t) {
SleepForAWhile()
WaitForDependencies(tc, t)
}
}
func SleepForAWhile() {
time.Sleep(100 * time.Millisecond)
}
Parallel test Behavior:
This is working as expected on my personal computer which has multiple processors.
But when it is working on Google Cloud (maybe 1 cpu), it's stuck and does not go ahead at all for 10mins.
Current solution:
Running tests without parallelism and doing by the order of IDs. (just removed t.Parallel() function call to avoid parallelism)
Problem:
This serial solution is taking more than 3 times than parallel solution, since most of testing time is based on sleeping time.
The test code itself also contains sleep function call.
My question
1. Is this architecture correct and can work correctly on 1 processor CPU?
2. If this architecture can work, what will be the problem of stuck?
Related
I have several tests using testify/suite package and I execute them in parallel as follows
type IntegrationSuite struct {
suite.Suite
}
func TestIntegrationSuite(t *testing.T) {
suite.Run(t, &IntegrationSuite{})
}
func (is *IntegrationSuite) TestSomething() {
is.T().Log("\tIntegration Testing something")
for i := range myTestTable {
i := i
is.T().Run("Testing "+myTestTable[i].scenarioName, func(_ *testing.T) {
is.T().Parallel()
...
func (is *IntegrationSuite) TestSomethingElse() {
is.T().Log("\tIntegration Testing something else")
for i := range myOtherTestTable {
i := i
is.T().Run("Testing "+myOtherTestTable[i].scenarioName, func(_ *testing.T) {
is.T().Parallel()
...
})
However this panics with
panic: testing: t.Parallel called multiple times [recovered]
panic: testing: t.Parallel called multiple times
How can one leverage parallelism with the specific package?
You're calling t.Parallel() on the wrong instance of testing.T.
You're spinning up multiple subtests using:
is.T().Run("Testing "+myOtherTestTable[i].scenarioName, func(_ *testing.T) {...}
But instead of marking the subtest as parallel, you're marking the outer parent test as parallel for every subtest you start:
is.T().Parallel() // marks outer test as parallel
Instead, bind the subtest func's testing.T param to a variable and call Parallel on the subtest:
is.T().Run("Testing "+myOtherTestTable[i].scenarioName, func(t *testing.T) {
t.Parallel() // this is the subtest's T instance
// test code
})
This marks all subtests as parallel within the parent test. If you also want to mark the parent test as parallel with other tests in the package, call is.T().Parallel() once at the beginning of TestSomethingElse (not within the subtest)
For more details on parallelism of subtests, see: Control of Parallelism
Please note that the testify suite is currently not "thread safe" and therefore does not support concurrency. There are cases in which tests pass successfully even though they should fail.
Please find the current status as well the reported issues here: https://github.com/stretchr/testify/issues/187
Is it possible to call a test Function from a non test go file to start the execution of tests?
e.g., I have a test function:
package API
import "testing"
func TestAPI(t *testing.T) {
...
}
I need to call this from a non test go file.
package main
import "../API"
API.TestAPI()
Can I do that?
There are some great testing patterns in Go. Unfortunately, the way that you're trying to test is not going to work.
Your unit tests should be run using the command go test <pkg. Tests should never be called from within your code itself.
Generally there are two main forms of unit testing in Go - black and white box unit testing.
White-box testing: Testing unexported functions from within the
package itself.
Black-box testing: Testing exported functions from outside the
package, emulating how other packages will interact with it.
If I have a package, example, that I'd like to test. There is a simple exported function that sums a list of numbers that are provided. There is also an unexported utility function that's used by the Sum function.
example.go
package example
func Sum(nums ...int) int {
sum := 0
for _, num := range nums {
sum = add(sum, num)
}
return sum
}
func add(a, b int) int {
return a + b
}
example_test.go : black-box testing
Notice that I'm testing the example package, but the test sits in the example_test package. Adding the _test keeps the Go compiler happy and lets it know that this is a testing package for example. Here, we may only access exported variables and functions which lets us test the behaviour that external packages will experience when importing and using our example package.
package example_test
import (
"testing"
"example"
)
func TestSum(t *testing.T) {
tests := []struct {
nums []int
sum int
}{
{nums: []int{1, 2, 3}, sum: 6},
{nums: []int{2, 3, 4}, sum: 9},
}
for _, test := range tests {
s := example.Sum(test.nums...)
if s != test.sum {
t.FailNow()
}
}
}
example_internal_test.go : white-box testing
Notice that I'm testing the example package, and the test also sits in the example package. This allows the unit tests to access unexported variables and functions.
package example
import "testing"
func TestAdd(t *testing.T) {
tests := []struct {
a int
b int
sum int
}{
{a: 1, b: 2, sum: 3},
{a: 3, b: 4, sum: 7},
}
for _, test := range tests {
s := add(test.a, test.b)
if s != test.sum {
t.FailNow()
}
}
}
I hope that this you a better understanding on how Go's testing framework is set up and should be used. I haven't used any external libraries in the examples for simplicity sake, although a very popular and powerful package to help with unit testing is github.com/stretchr/testify.
What you are attempting is possible, but not advised, as it depends on internal details of Go's testing library, which are not covered by the Go 1 Compatibly Guarantee, meaning you'll likely have to tweak this approach for every version of Go.
You should rarely, if ever, attempt this. I did this once, for a suite of API integration tests that I wanted to be able to run as normal tests, but also from a CLI tool, to test third-party implementations of the API. Now that I've done it, I'm not sure I'd do it again. I'd probably look for a different approach.
But if you can't be dissuaded, here's what you need to do:
Your tests must not be defined in a file named *_test.go. Such files are excluded from normal compilation.
You must trigger the tests to run from a file appropriately named *_test.go, for your normal test case.
You must manually orchestrate the tests in your non-test case.
In detail:
Put tests in non-test files. The testing package is just a standard package, so it's easy to include wherever you need it:
foo.go
package foo
import "testing"
func SomethingTest(t *testing.T) {
// your tests
}
Run the tests from a *_test.go file:
foo_test.go
func TestSomething(t *testing.T) {
SomethingTest(t)
}
And run the tests from your non-test files. Here you are in risky territory, as it requires using methods that are not protected by the Go 1 compatibility guarantee. You need to call the testing.MainStart method, with a manually-crafted list of tests.
This also requires implementing your own version of testing.testDeps, which is private, but an interface, so easy to implement. You can see my Go 1.9 implementation here.
tests := []testing.InternalTest{
{
Name: "TestSomething",
f: SomethingTest,
},
}
m := testing.MainStart(MyTestDeps, tests, nil, nil)
os.Exit(m.Run())
This is a very simplified example, not meant to be complete. But I hope it puts you on the right path.
When testing a database procedure invoked from an API, when it runs sequentially, it seems to run consistently within ~3s. However we've noticed that when several requests come in at the same time, this can take much longer, causing time outs. I am trying to reproduce the "several requests at one time" case as a go test.
I tried the -parallel 10 go test flag, but the timings were the same at ~28s.
Is there something wrong with my benchmark function?
func Benchmark_RealCreate(b *testing.B) {
b.ResetTimer()
for n := 0; n < b.N; n++ {
name := randomdata.SillyName()
r := gofight.New()
u := []unit{unit{MefeUnitID: name, MefeCreatorUserID: "user", BzfeCreatorUserID: 55, ClassificationID: 2, UnitName: name, UnitDescriptionDetails: "Up on the hills and testing"}}
uJSON, _ := json.Marshal(u)
r.POST("/create").
SetBody(string(uJSON)).
Run(h.BasicEngine(), func(r gofight.HTTPResponse, rq gofight.HTTPRequest) {
assert.Contains(b, r.Body.String(), name)
assert.Equal(b, http.StatusOK, r.Code)
})
}
}
Else how I can achieve what I am after?
The -parallel flag is not for running the same test or benchmark parallel, in multiple instances.
Quoting from Command go: Testing flags:
-parallel n
Allow parallel execution of test functions that call t.Parallel.
The value of this flag is the maximum number of tests to run
simultaneously; by default, it is set to the value of GOMAXPROCS.
Note that -parallel only applies within a single test binary.
The 'go test' command may run tests for different packages
in parallel as well, according to the setting of the -p flag
(see 'go help build').
So basically if your tests allow, you can use -parallel to run multiple distinct testing or benchmark functions parallel, but not the same one in multiple instances.
In general, running multiple benchmark functions parallel defeats the purpose of benchmarking a function, because running it parallel in multiple instances usually distorts the benchmarking.
However, in your case code efficiency is not what you want to measure, you want to measure an external service. So go's built-in testing and benchmarking facilities are not really suitable.
Of course we could still use the convenience of having this "benchmark" run automatically when our other tests and benchmarks run, but you should not force this into the conventional benchmarking framework.
First thing that comes to mind is to use a for loop to launch n goroutines which all attempt to call the testable service. One problem with this is that this only ensures n concurrent goroutines at the start, because as the calls start to complete, there will be less and less concurrency for the remaining ones.
To overcome this and truly test n concurrent calls, you should have a worker pool with n workers, and continously feed jobs to this worker pool, making sure there will be n concurrent service calls at all times. For a worker pool implementation, see Is this an idiomatic worker thread pool in Go?
So all in all, fire up a worker pool with n workers, have a goroutine send jobs to it for an arbitrary time (e.g. for 30 seconds or 1 minute), and measure (count) the completed jobs. The benchmark result will be a simple division.
Also note that for solely testing purposes, a worker pool might not even be needed. You can just use a loop to launch n goroutines, but make sure each started goroutine keeps calling the service and not return after a single call.
I'm new to go, but why don't you try to make a function and run it using the standard parallel test?
func Benchmark_YourFunc(b *testing.B) {
b.RunParralel(func(pb *testing.PB) {
for pb.Next() {
YourFunc(staff ...T)
}
})
}
Your example code mixes several things. Why are you using assert there? This is not a test it is a benchmark. If the assert methods are slow, your benchmark will be.
You also moved the parallel execution out of your code into the test command. You should try to make a parallel request by using concurrency. Here just a possibility how to start:
func executeRoutines(routines int) {
wg := &sync.WaitGroup{}
wg.Add(routines)
starter := make(chan struct{})
for i := 0; i < routines; i++ {
go func() {
<-starter
// your request here
wg.Done()
}()
}
close(starter)
wg.Wait()
}
https://play.golang.org/p/ZFjUodniDHr
We start some goroutines here, which are waiting until starter is closed. So you can set your request direct after that line. That the function waits until all the requests are done we are using a WaitGroup.
BUT IMPORTANT: Go just supports concurrency. So if your system has not 10 cores the 10 goroutines will not run parallel. So ensure that you have enough cores availiable.
With this start you can play a little bit. You could start to call this function inside your benchmark. You could also play around with the numbers of goroutines.
As the documentation indicates, the parallel flag is to allow multiple different tests to be run in parallel. You generally do not want to run benchmarks in parallel because that would run different benchmarks at the same time, throwing off the results for all of them. If you want to benchmark parallel traffic, you need to write parallel traffic generation into your test. You need to decide how this should work with b.N which is your work factor; I would probably use it as the total request count, and write a benchmark or multiple benchmarks testing different concurrent load levels, e.g.:
func Benchmark_RealCreate(b *testing.B) {
concurrencyLevels := []int{5, 10, 20, 50}
for _, clients := range concurrencyLevels {
b.Run(fmt.Sprintf("%d_clients", clients), func(b *testing.B) {
sem := make(chan struct{}, clients)
wg := sync.WaitGroup{}
for n := 0; n < b.N; n++ {
wg.Add(1)
go func() {
name := randomdata.SillyName()
r := gofight.New()
u := []unit{unit{MefeUnitID: name, MefeCreatorUserID: "user", BzfeCreatorUserID: 55, ClassificationID: 2, UnitName: name, UnitDescriptionDetails: "Up on the hills and testing"}}
uJSON, _ := json.Marshal(u)
sem <- struct{}{}
r.POST("/create").
SetBody(string(uJSON)).
Run(h.BasicEngine(), func(r gofight.HTTPResponse, rq gofight.HTTPRequest) {})
<-sem
wg.Done()
}()
}
wg.Wait()
})
}
}
Note here I removed the initial ResetTimer; the timer doesn't start until you benchmark function is called, so calling it as the first op in your function is pointless. It's intended for cases where you have time-consuming setup prior to the benchmark loop that you don't want included in the benchmark results. I've also removed the assertions, because this is a benchmark, not a test; assertions are for validity checking in tests and only serve to throw off timing results in benchmarks.
One thing is benchmarking (measuring time code takes to run) another one is load/stress testing.
The -parallel flag as stated above, is to allow a set of tests to execute in parallel, allowing the test set to execute faster, not to execute some test N times in parallel.
But is simple to achieve what you want (execution of same test N times). Bellow a very simple (really quick and dirty) example just to clarify/demonstrate the important points, that gets this very specific situation done:
You define a test and mark it to be executed in parallel => TestAverage with a call to t.Parallel
You then define another test and use RunParallel to execute the number of instances of the test (TestAverage) you want.
The class to test:
package math
import (
"fmt"
"time"
)
func Average(xs []float64) float64 {
total := float64(0)
for _, x := range xs {
total += x
}
fmt.Printf("Current Unix Time: %v\n", time.Now().Unix())
time.Sleep(10 * time.Second)
fmt.Printf("Current Unix Time: %v\n", time.Now().Unix())
return total / float64(len(xs))
}
The testing funcs:
package math
import "testing"
func TestAverage(t *testing.T) {
t.Parallel()
var v float64
v = Average([]float64{1,2})
if v != 1.5 {
t.Error("Expected 1.5, got ", v)
}
}
func TestTeardownParallel(t *testing.T) {
// This Run will not return until the parallel tests finish.
t.Run("group", func(t *testing.T) {
t.Run("Test1", TestAverage)
t.Run("Test2", TestAverage)
t.Run("Test3", TestAverage)
})
// <tear-down code>
}
Then just do a go test and you should see:
X:\>go test
Current Unix Time: 1556717363
Current Unix Time: 1556717363
Current Unix Time: 1556717363
And 10 secs after that
...
Current Unix Time: 1556717373
Current Unix Time: 1556717373
Current Unix Time: 1556717373
Current Unix Time: 1556717373
Current Unix Time: 1556717383
PASS
ok _/X_/y 20.259s
The two extra lines, in the end are because TestAverage is executed also.
The interesting point here: if you remove t.Parallel() from TestAverage, it will all be execute sequencially:
X:> go test
Current Unix Time: 1556717564
Current Unix Time: 1556717574
Current Unix Time: 1556717574
Current Unix Time: 1556717584
Current Unix Time: 1556717584
Current Unix Time: 1556717594
Current Unix Time: 1556717594
Current Unix Time: 1556717604
PASS
ok _/X_/y 40.270s
This can of course be made more complex and extensible...
I have Apache Beam code implementation on Go SDK as described below. The pipeline has 3 steps. One is textio.Read, other one is CountLines and the last step is ProcessLines. ProcessLines step takes around 10 seconds time. I just added a Sleep function for the sake of brevity.
I am calling the pipeline with 20 workers. When I run the pipeline, my expectation was 20 workers would run in parallel and textio.Read read 20 lines from the file and ProcessLines would do 20 parallel executions in 10 seconds. However, the pipeline did not work like that. It's currently working in a way that textio.Read reads one line from the file, pushes the data to the next step and waits until ProcessLines step completes its 10 seconds work. There is no parallelism and there is only one line string from the file throughout the pipeline. Could you please clarify me what I'm doing wrong for parallelism? How should I update the code to achieve parallelism as described above?
package main
import (
"context"
"flag"
"time"
"github.com/apache/beam/sdks/go/pkg/beam"
"github.com/apache/beam/sdks/go/pkg/beam/io/textio"
"github.com/apache/beam/sdks/go/pkg/beam/log"
"github.com/apache/beam/sdks/go/pkg/beam/x/beamx"
)
// metrics to be monitored
var (
input = flag.String("input", "", "Input file (required).")
numberOfLines = beam.NewCounter("extract", "numberOfLines")
lineLen = beam.NewDistribution("extract", "lineLenDistro")
)
func countLines(ctx context.Context, line string) string {
lineLen.Update(ctx, int64(len(line)))
numberOfLines.Inc(ctx, 1)
return line
}
func processLines(ctx context.Context, line string) {
time.Sleep(10 * time.Second)
}
func CountLines(s beam.Scope, lines beam.PCollection) beam.PCollection
{
s = s.Scope("Count Lines")
return beam.ParDo(s, countLines, lines)
}
func ProcessLines(s beam.Scope, lines beam.PCollection) {
s = s.Scope("Process Lines")
beam.ParDo0(s, processLines, lines)
}
func main() {
// If beamx or Go flags are used, flags must be parsed first.
flag.Parse()
// beam.Init() is an initialization hook that must be called on startup. On
// distributed runners, it is used to intercept control.
beam.Init()
// Input validation is done as usual. Note that it must be after Init().
if *input == "" {
log.Fatal(context.Background(), "No input file provided")
}
p := beam.NewPipeline()
s := p.Root()
l := textio.Read(s, *input)
lines := CountLines(s, l)
ProcessLines(s, lines)
// Concept #1: The beamx.Run convenience wrapper allows a number of
// pre-defined runners to be used via the --runner flag.
if err := beamx.Run(context.Background(), p); err != nil {
log.Fatalf(context.Background(), "Failed to execute job: %v", err.Error())
}
}
EDIT:
After I got the answer about the problem might be caused by fusion, I changed the related part of the code but it did not work again.
Now the first and second step is working in parallel, however the third step ProcessLines is not working in parallel. I only made the following changes. Can anyone tell me what the problem is?
func AddRandomKey(s beam.Scope, col beam.PCollection) beam.PCollection {
return beam.ParDo(s, addRandomKeyFn, col)
}
func addRandomKeyFn(elm beam.T) (int, beam.T) {
return rand.Int(), elm
}
func countLines(ctx context.Context, _ int, lines func(*string) bool, emit func(string)) {
var line string
for lines(&line) {
lineLen.Update(ctx, int64(len(line)))
numberOfLines.Inc(ctx, 1)
emit(line)
}
}
func processLines(ctx context.Context, _ int, lines func(*string) bool) {
var line string
for lines(&line) {
time.Sleep(10 * time.Second)
numberOfLinesProcess.Inc(ctx, 1)
}
}
func CountLines(s beam.Scope, lines beam.PCollection) beam.PCollection {
s = s.Scope("Count Lines")
keyed := AddRandomKey(s, lines)
grouped := beam.GroupByKey(s, keyed)
return beam.ParDo(s, countLines, grouped)
}
func ProcessLines(s beam.Scope, lines beam.PCollection) {
s = s.Scope("Process Lines")
keyed := AddRandomKey(s, lines)
grouped := beam.GroupByKey(s, keyed)
beam.ParDo0(s, processLines, grouped)
}
Many advanced runners of MapReduce-type pipelines fuse stages that can be run in memory together. Apache Beam and Dataflow are no exception.
What's happening here is that the three steps of your pipeline are fused, and happening in the same machine. Furthermore, the Go SDK does not currently support splitting the Read transform, unfortunately.
To achieve parallelism in the third transform, you can break the fusion between Read and ProcessLines. You can do that adding a random key to your lines, and a GroupByKey transform.
In Python, it would be:
(p | beam.ReadFromText(...)
| CountLines()
| beam.Map(lambda x: (random.randint(0, 1000), x))
| beam.GroupByKey()
| beam.FlatMap(lambda k, v: v) # Discard the key, and return the values
| ProcessLines())
This would allow you to parallelize ProcessLines.
in the interests of learning more about Go, I have been playing with goroutines, and have noticed something - but am not sure what exactly I'm seeing, and hope someone out there might be able to explain the following behaviour.
the following code does exactly what you'd expect:
package main
import (
"fmt"
)
type Test struct {
me int
}
type Tests []Test
func (test *Test) show() {
fmt.Println(test.me)
}
func main() {
var tests Tests
for i := 0; i < 10; i++ {
test := Test{
me: i,
}
tests = append(tests, test)
}
for _, test := range tests {
test.show()
}
}
and prints 0 - 9, in order.
now, when the code is changed as shown below, it always returns with the last one first - doesn't matter which numbers I use:
package main
import (
"fmt"
"sync"
)
type Test struct {
me int
}
type Tests []Test
func (test *Test) show(wg *sync.WaitGroup) {
fmt.Println(test.me)
wg.Done()
}
func main() {
var tests Tests
for i := 0; i < 10; i++ {
test := Test{
me: i,
}
tests = append(tests, test)
}
var wg sync.WaitGroup
wg.Add(10)
for _, test := range tests {
go func(t Test) {
t.show(&wg)
}(test)
}
wg.Wait()
}
this will return:
9
0
1
2
3
4
5
6
7
8
the order of iteration of the loop isn't changing, so I guess that it is something to do with the goroutines...
basically, I am trying to understand why it behaves like this...I understand that goroutines can run in a different order than the order in which they're spawned, but, my question is why this always runs like this. as if there's something really obvious I'm missing...
As expected, the ouput is pseudo-random,
package main
import (
"fmt"
"runtime"
"sync"
)
type Test struct {
me int
}
type Tests []Test
func (test *Test) show(wg *sync.WaitGroup) {
fmt.Println(test.me)
wg.Done()
}
func main() {
fmt.Println("GOMAXPROCS", runtime.GOMAXPROCS(0))
var tests Tests
for i := 0; i < 10; i++ {
test := Test{
me: i,
}
tests = append(tests, test)
}
var wg sync.WaitGroup
wg.Add(10)
for _, test := range tests {
go func(t Test) {
t.show(&wg)
}(test)
}
wg.Wait()
}
Output:
$ go version
go version devel +af15bee Fri Jan 29 18:29:10 2016 +0000 linux/amd64
$ go run goroutine.go
GOMAXPROCS 4
9
4
5
6
7
8
1
2
3
0
$ go run goroutine.go
GOMAXPROCS 4
9
3
0
1
2
7
4
8
5
6
$ go run goroutine.go
GOMAXPROCS 4
1
9
6
8
4
3
0
5
7
2
$
Are you running in the Go playground? The Go playground, by design, is deterministic, which makes it easier to cache programs.
Or, are you running with runtime.GOMAXPROCS = 1? This runs one thing at a time, sequentially. This is what the Go playground does.
Go routines are scheduled randomly since Go 1.5. So, even if the order looks consistent, don't rely on it.
See Go 1.5 release note :
In Go 1.5, the order in which goroutines are scheduled has been changed. The properties of the scheduler were never defined by the language, but programs that depend on the scheduling order may be broken by this change. We have seen a few (erroneous) programs affected by this change. If you have programs that implicitly depend on the scheduling order, you will need to update them.
Another potentially breaking change is that the runtime now sets the default number of threads to run simultaneously, defined by GOMAXPROCS, to the number of cores available on the CPU. In prior releases the default was 1. Programs that do not expect to run with multiple cores may break inadvertently. They can be updated by removing the restriction or by setting GOMAXPROCS explicitly. For a more detailed discussion of this change, see the design document.