Goroutine performance - performance

I have started to learn Go, it's fun and easy. But working with goroutines I have seen little benefit in performance.
If I try to sequentially add 1 million numbers two times in 2 functions:
package main
import (
"fmt"
"time"
)
var sumA int
var sumB int
func fSumA() {
for i := 0; i < 1000000; i++ {
sumA += i
}
}
func fSumB() {
for i := 0; i < 1000000; i++ {
sumB += i
}
}
func main() {
start := time.Now()
fSumA()
fSumB()
sum := sumA + sumB
fmt.Println("Elapsed time", time.Since(start))
fmt.Println("Sum", sum)
}
It takes 5 ms.
MacBook-Pro-de-Pedro:hello pedro$ ./bin/hello
Elapsed time 5.724406ms
Suma total 999999000000
MacBook-Pro-de-Pedro:hello pedro$ ./bin/hello
Elapsed time 5.358165ms
Suma total 999999000000
MacBook-Pro-de-Pedro:hello pedro$ ./bin/hello
Elapsed time 5.042528ms
Suma total 999999000000
MacBook-Pro-de-Pedro:hello pedro$ ./bin/hello
Elapsed time 5.469628ms
Suma total 999999000000
When I try to do the same thing with 2 goroutines:
package main
import (
"fmt"
"sync"
"time"
)
var wg sync.WaitGroup
var sumA int
var sumB int
func fSumA() {
for i := 0; i < 1000000; i++ {
sumA += i
}
wg.Done()
}
func fSumB() {
for i := 0; i < 1000000; i++ {
sumB += i
}
wg.Done()
}
func main() {
start := time.Now()
wg.Add(2)
go fSumA()
go fSumB()
wg.Wait()
sum := sumA + sumB
fmt.Println("Elapsed time", time.Since(start))
fmt.Println("Sum", sum)
}
I get more or less the same result, 5 ms. My computer is a MacBook pro (Core 2 Duo). I don't see any performance improvement. Maybe is the processor?
MacBook-Pro-de-Pedro:hello pedro$ ./bin/hello
Elapsed time 5.258415ms
Suma total 999999000000
MacBook-Pro-de-Pedro:hello pedro$ ./bin/hello
Elapsed time 5.528498ms
Suma total 999999000000
MacBook-Pro-de-Pedro:hello pedro$ ./bin/hello
Elapsed time 5.273565ms
Suma total 999999000000
MacBook-Pro-de-Pedro:hello pedro$ ./bin/hello
Elapsed time 5.539224ms
Suma total 999999000000

Here how you can test this with the golangs own benchmark tool:
Create a test go file (e.g. main_test.go).
Note: _test.go has to be the file ending!
Copy the following code there or create your own Benchmarks:
package main
import (
"sync"
"testing"
)
var GlobalInt int
func BenchmarkCount(b *testing.B) {
var a, c int
count(&a, b.N)
count(&c, b.N)
GlobalInt = a + c // make sure the result is actually used
}
func count(a *int, max int) {
for i := 0; i < max; i++ {
*a += i
}
}
var wg sync.WaitGroup
func BenchmarkCountConcurrent(b *testing.B) {
var a, c int
wg.Add(2)
go countCon(&a, b.N)
go countCon(&c, b.N)
wg.Wait()
GlobalInt = a + c // make sure the result is actually used
}
func countCon(a *int, max int) {
for i := 0; i < max; i++ {
*a += i
}
wg.Done()
}
Run with:
go test -bench .
Result on my Mac:
$ go test -bench .
BenchmarkCount-8 500000000 3.50 ns/op
BenchmarkCountConcurrent-8 2000000000 1.98 ns/op
PASS
ok MyPath/MyPackage 6.309s
Most important value is the time/op. The smaller the better. Here 3.5 ns/op for normal counting, 1.98 ns/op for concurrent counting.
EDIT:
Here you can read up on golang Testing and Benchmark.

Related

concurrency in go -Taking same time with different no of CPU

I am running a go concurrent program with the below two case and observed that It is taking same time irrespective of no of CPU it using while execution.
Case1: When cpuUsed = 1
program took 3m20.973185s.
when I am increasing the no of CPU used.
Case2: when cpuUsed = 8
program took 3m20.9330516s.
Please find the below Go code for more details.
package main
import (
"fmt"
"math/rand"
"runtime"
"sync"
"time"
)
var waitG sync.WaitGroup
var cpuUsed = 1
var maxRandomNums = 1000
func init() {
maxCPU := runtime.NumCPU() //It'll give us the max CPU :)
cpuUsed = 8 //getting same time taken for 1 and 8
runtime.GOMAXPROCS(cpuUsed)
fmt.Printf("Number of CPUs (Total=%d - Used=%d) \n", maxCPU, cpuUsed)
}
func main() {
start := time.Now()
ids := []string{"rotine1", "routine2", "routine3", "routine4"}
waitG.Add(4)
for i := range ids {
go numbers(ids[i])
}
waitG.Wait()
elapsed := time.Since(start)
fmt.Printf("\nprogram took %s. \n", elapsed)
}
func numbers(id string) {
rand.Seed(time.Now().UnixNano())
for i := 1; i <= maxRandomNums; i++ {
time.Sleep(200 * time.Millisecond)
fmt.Printf("%s-%d ", id, rand.Intn(20)+20)
}
waitG.Done()
}
you will find out:
total time (3 min 20s) = 200s = sleep(200ms) * loops(1000)
Let's simplify your code and focus on CPU usage:
Remove the Sleep, which does not use the CPU at all
fmt.Println as a stdio, does not use the CPU
Random number did nothing but introduce uncertainty into the program, remove it
The only code that takes CPU in the goroutine is the "rand.Intn(20)+20", making it a constant addition
Increase the "maxRandomNums"
then your code will be like this, run it again
package main
import (
"fmt"
"runtime"
"sync"
"time"
)
var waitG sync.WaitGroup
var cpuUsed = 1
var maxRandomNums = 1000000000
func init() {
maxCPU := runtime.NumCPU() //It'll give us the max CPU :)
cpuUsed = 8 //getting same time taken for 1 and 8
runtime.GOMAXPROCS(cpuUsed)
fmt.Printf("Number of CPUs (Total=%d - Used=%d) \n", maxCPU, cpuUsed)
}
func main() {
start := time.Now()
ids := []string{"rotine1", "routine2", "routine3", "routine4"}
waitG.Add(4)
for i := range ids {
go numbers(ids[i])
}
waitG.Wait()
elapsed := time.Since(start)
fmt.Printf("\nprogram took %s. \n", elapsed)
}
func numbers(id string) {
// rand.Seed(time.Now().UnixNano())
for i := 1; i <= maxRandomNums; i++ {
// time.Sleep(200 * time.Millisecond)
// fmt.Printf("%s-%d ", id, rand.Intn(20)+20)
_ = i + 20
}
waitG.Done()
}

golang global variable access is slow in benchmark test

Here is a simple golang benchmark test, it runs x++ in three different ways:
in a simple for loop with x declared inside function
in a nested loop with x declared inside function
in a nested loop with x declared as global variable
package main
import (
"testing"
)
var x = 0
func BenchmarkLoop(b *testing.B) {
x := 0
for n := 0; n < b.N; n++ {
x++
}
}
func BenchmarkDoubleLoop(b *testing.B) {
x := 0
for n := 0; n < b.N/1000; n++ {
for m := 0; m < 1000; m++ {
x++
}
}
}
func BenchmarkDoubleLoopGlobalVariable(b *testing.B) {
for n := 0; n < b.N/1000; n++ {
for m := 0; m < 1000; m++ {
x++
}
}
}
And the result is as following:
$ go test -bench=.
BenchmarkLoop-8 2000000000 0.32 ns/op
BenchmarkDoubleLoop-8 2000000000 0.34 ns/op
BenchmarkDoubleLoopGlobalVariable-8 2000000000 2.00 ns/op
PASS
ok github.com/cizixs/playground/loop-perf 5.597s
Obviously, the first and second methods have similar performance, while the third function is much slower(about 6x times slow).
And I wonder why this is happening, is there a way to improve performance of global variable access?
I wonder why this is happening.
The compiler optimizes away your whole code. 300ps per op means a only a noop was "executed".

confusing concurrency and performance issue in Go

now I start learning Go language by watching this great course. To be clear for years I write only on PHP and concurrency/parallelism is new for me, so I little confused by this.
In this course, there is a task to create a program which calculates factorial with 100 computations. I went a bit further and to comparing performance I changed it to 10000 and for some reason, the sequential program works same or even faster than concurrency.
Here I'm going to provide 3 solutions: mine, teachers and sequential
My solution:
package main
import (
"fmt"
)
func gen(steps int) <-chan int{
out := make(chan int)
go func() {
for j:= 0; j <steps; j++ {
out <- j
}
close(out)
}()
return out
}
func factorial(in <-chan int) <-chan int {
out := make(chan int)
go func() {
for n := range in {
out <- fact(n)
}
close(out)
}()
return out
}
func fact(n int) int {
total := 1
for i := n;i>0;i-- {
total *=i
}
return total
}
func main() {
steps := 10000
for i := 0; i < steps; i++ {
for n:= range factorial(gen(10)) {
fmt.Println(n)
}
}
}
execution time:
real 0m6,356s
user 0m3,885s
sys 0m0,870s
Teacher solution:
package main
import (
"fmt"
)
func gen(steps int) <-chan int{
out := make(chan int)
go func() {
for i := 0; i < steps; i++ {
for j:= 0; j <10; j++ {
out <- j
}
}
close(out)
}()
return out
}
func factorial(in <-chan int) <-chan int {
out := make(chan int)
go func() {
for n := range in {
out <- fact(n)
}
close(out)
}()
return out
}
func fact(n int) int {
total := 1
for i := n;i>0;i-- {
total *=i
}
return total
}
func main() {
steps := 10000
for n:= range factorial(gen(steps)) {
fmt.Println(n)
}
}
execution time:
real 0m2,836s
user 0m1,388s
sys 0m0,492s
Sequential:
package main
import (
"fmt"
)
func fact(n int) int {
total := 1
for i := n;i>0;i-- {
total *=i
}
return total
}
func main() {
steps := 10000
for i := 0; i < steps; i++ {
for j:= 0; j <10; j++ {
fmt.Println(fact(j))
}
}
}
execution time:
real 0m2,513s
user 0m1,113s
sys 0m0,387s
So, as you can see the sequential solution is fastest, teachers solution is in the second place and my solution is third.
First question: why the sequential solution is fastest?
And second why my solution is so slow? if I understanding correctly in my solution I'm creating 10000 goroutines inside gen and 10000 inside factorial and in teacher solution, he is creating only 1 goroutine in gen and 1 in factorial. My so slow because I'm creating too many unneeded goroutines?
It's the difference between concurrency and parellelism - your's, you teachers and the sequential are progressively less concurrent in design but how parallel they are depends on number of CPU cores and there is a set up and communication cost associated with concurrency. There are no asynchronous calls in the code so only parallelism will improve speed.
This is worth a look: https://blog.golang.org/concurrency-is-not-parallelism
Also, even with parallel cores speedup will be dependent on nature of the workload - google Amdahl's law for explanation.
Let's start with some fundamental benchmarks for factorial computation.
$ go test -run=! -bench=. factorial_test.go
goos: linux
goarch: amd64
BenchmarkFact0-4 1000000000 2.07 ns/op
BenchmarkFact9-4 300000000 4.37 ns/op
BenchmarkFact0To9-4 50000000 36.0 ns/op
BenchmarkFact10K0To9-4 3000 384069 ns/op
$
The CPU time is very small, even for 10,000 iterations of factorials zero through nine.
factorial_test.go:
package main
import "testing"
func fact(n int) int {
total := 1
for i := n; i > 0; i-- {
total *= i
}
return total
}
var sinkFact int
func BenchmarkFact0(b *testing.B) {
for N := 0; N < b.N; N++ {
j := 0
sinkFact = fact(j)
}
}
func BenchmarkFact9(b *testing.B) {
for N := 0; N < b.N; N++ {
j := 9
sinkFact = fact(j)
}
}
func BenchmarkFact0To9(b *testing.B) {
for N := 0; N < b.N; N++ {
for j := 0; j < 10; j++ {
sinkFact = fact(j)
}
}
}
func BenchmarkFact10K0To9(b *testing.B) {
for N := 0; N < b.N; N++ {
steps := 10000
for i := 0; i < steps; i++ {
for j := 0; j < 10; j++ {
sinkFact = fact(j)
}
}
}
}
Let's look at the time for the sequential program.
$ go build -a sequential.go && time ./sequential
real 0m0.247s
user 0m0.054s
sys 0m0.149s
Writing to the terminal is obviously a major bottleneck. Let's write to a sink.
$ go build -a sequential.go && time ./sequential > /dev/null
real 0m0.070s
user 0m0.049s
sys 0m0.020s
It's still a lot more than the 0m0.000000384069s for the factorial computation.
sequential.go:
package main
import (
"fmt"
)
func fact(n int) int {
total := 1
for i := n; i > 0; i-- {
total *= i
}
return total
}
func main() {
steps := 10000
for i := 0; i < steps; i++ {
for j := 0; j < 10; j++ {
fmt.Println(fact(j))
}
}
}
Attempts to use concurrency for such a trivial amount of parallel work are likely to fail. Go goroutines and channels are cheap, but they are not free. Also, a single channel and a single terminal are the bottleneck, the limiting factor, even when writing to a sink. See Amdahl's Law for parallel computing. See Concurrency is not parallelism.
$ go build -a teacher.go && time ./teacher > /dev/null
real 0m0.123s
user 0m0.123s
sys 0m0.022s
$ go build -a student.go && time ./student > /dev/null
real 0m0.135s
user 0m0.113s
sys 0m0.038s
teacher.go:
package main
import (
"fmt"
)
func gen(steps int) <-chan int {
out := make(chan int)
go func() {
for i := 0; i < steps; i++ {
for j := 0; j < 10; j++ {
out <- j
}
}
close(out)
}()
return out
}
func factorial(in <-chan int) <-chan int {
out := make(chan int)
go func() {
for n := range in {
out <- fact(n)
}
close(out)
}()
return out
}
func fact(n int) int {
total := 1
for i := n; i > 0; i-- {
total *= i
}
return total
}
func main() {
steps := 10000
for n := range factorial(gen(steps)) {
fmt.Println(n)
}
}
student.go:
package main
import (
"fmt"
)
func gen(steps int) <-chan int {
out := make(chan int)
go func() {
for j := 0; j < steps; j++ {
out <- j
}
close(out)
}()
return out
}
func factorial(in <-chan int) <-chan int {
out := make(chan int)
go func() {
for n := range in {
out <- fact(n)
}
close(out)
}()
return out
}
func fact(n int) int {
total := 1
for i := n; i > 0; i-- {
total *= i
}
return total
}
func main() {
steps := 10000
for i := 0; i < steps; i++ {
for n := range factorial(gen(10)) {
fmt.Println(n)
}
}
}

Benchmarking bad results

so I have the Quicksort algorithm implemented with concurrency (the one without as well). Now I wanted to compare the times. I wrote this:
func benchmarkConcurrentQuickSort(size int, b *testing.B) {
A := RandomArray(size)
var wg sync.WaitGroup
b.ResetTimer()
ConcurrentQuicksort(A, 0, len(A)-1, &wg)
wg.Wait()
}
func BenchmarkConcurrentQuickSort500(b *testing.B) {
benchmarkConcurrentQuickSort(500, b)
}
func BenchmarkConcurrentQuickSort1000(b *testing.B) {
benchmarkConcurrentQuickSort(1000, b)
}
func BenchmarkConcurrentQuickSort5000(b *testing.B) {
benchmarkConcurrentQuickSort(5000, b)
}
func BenchmarkConcurrentQuickSort10000(b *testing.B) {
benchmarkConcurrentQuickSort(10000, b)
}
func BenchmarkConcurrentQuickSort20000(b *testing.B) {
benchmarkConcurrentQuickSort(20000, b)
}
func BenchmarkConcurrentQuickSort1000000(b *testing.B) {
benchmarkConcurrentQuickSort(1000000, b)
}
The results are like this:
C:\projects\go\src\github.com\frynio\mysort>go test -bench=.
BenchmarkConcurrentQuickSort500-4 2000000000 0.00 ns/op
BenchmarkConcurrentQuickSort1000-4 2000000000 0.00 ns/op
BenchmarkConcurrentQuickSort5000-4 2000000000 0.00 ns/op
BenchmarkConcurrentQuickSort10000-4 2000000000 0.00 ns/op
BenchmarkConcurrentQuickSort20000-4 2000000000 0.00 ns/op
BenchmarkConcurrentQuickSort1000000-4 30 49635266 ns/op
PASS
ok github.com/frynio/mysort 8.342s
I can believe the last one, but I definitely think that sorting 500-element array takes longer than 1ns. What am i doing wrong? I am pretty sure that RandomArray returns array of wanted size, as we can see in the last benchmark. Why does it print out the 0.00 ns?
func RandomArray(n int) []int {
a := []int{}
for i := 0; i < n; i++ {
a = append(a, rand.Intn(500))
}
return a
}
// ConcurrentPartition - ConcurrentQuicksort function for partitioning the array (randomized choice of a pivot)
func ConcurrentPartition(A []int, p int, r int) int {
index := rand.Intn(r-p) + p
pivot := A[index]
A[index] = A[r]
A[r] = pivot
x := A[r]
j := p - 1
i := p
for i < r {
if A[i] <= x {
j++
tmp := A[j]
A[j] = A[i]
A[i] = tmp
}
i++
}
temp := A[j+1]
A[j+1] = A[r]
A[r] = temp
return j + 1
}
// ConcurrentQuicksort - a concurrent version of a quicksort algorithm
func ConcurrentQuicksort(A []int, p int, r int, wg *sync.WaitGroup) {
if p < r {
q := ConcurrentPartition(A, p, r)
wg.Add(2)
go func() {
ConcurrentQuicksort(A, p, q-1, wg)
wg.Done()
}()
go func() {
ConcurrentQuicksort(A, q+1, r, wg)
wg.Done()
}()
}
}
Package testing
A sample benchmark function looks like this:
func BenchmarkHello(b *testing.B) {
for i := 0; i < b.N; i++ {
fmt.Sprintf("hello")
}
}
The benchmark function must run the target code b.N times. During
benchmark execution, b.N is adjusted until the benchmark function
lasts long enough to be timed reliably.
I don't see a benchmark loop in your code. Try
func benchmarkConcurrentQuickSort(size int, b *testing.B) {
A := RandomArray(size)
var wg sync.WaitGroup
b.ResetTimer()
for i := 0; i < b.N; i++ {
ConcurrentQuicksort(A, 0, len(A)-1, &wg)
wg.Wait()
}
}
Output:
BenchmarkConcurrentQuickSort500-4 10000 122291 ns/op
BenchmarkConcurrentQuickSort1000-4 5000 221154 ns/op
BenchmarkConcurrentQuickSort5000-4 1000 1225230 ns/op
BenchmarkConcurrentQuickSort10000-4 500 2568024 ns/op
BenchmarkConcurrentQuickSort20000-4 300 5808130 ns/op
BenchmarkConcurrentQuickSort1000000-4 1 1371751710 ns/op

Concurrent integral calculation in Golang

I tried to compute the integral concurrently, but my program ended up being slower than computing the integral with a normal for loop. What am I doing wrong?
package main
import (
"fmt"
"math"
"sync"
"time"
)
type Result struct {
result float64
lock sync.RWMutex
}
var wg sync.WaitGroup
var result Result
func main() {
now := time.Now()
a := 0.0
b := 1.0
n := 100000.0
deltax := (b - a) / n
wg.Add(int(n))
for i := 0.0; i < n; i++ {
go f(a, deltax, i)
}
wg.Wait()
fmt.Println(deltax * result.result)
fmt.Println(time.Now().Sub(now))
}
func f(a float64, deltax float64, i float64) {
fx := math.Sqrt(a + deltax * (i + 0.5))
result.lock.Lock()
result.result += fx
result.lock.Unlock()
wg.Done()
}
3- For performance gain, you may divide tasks per CPU cores without using lock sync.RWMutex:
+30x Optimizations using channels and runtime.NumCPU(), this takes 2ms on 2 Cores and 993µs on 8 Cores, while Your sample code takes 61ms on 2 Cores and 40ms on 8 Cores:
See this working sample code and outputs:
package main
import (
"fmt"
"math"
"runtime"
"time"
)
func main() {
nCPU := runtime.NumCPU()
fmt.Println("nCPU =", nCPU)
ch := make(chan float64, nCPU)
startTime := time.Now()
a := 0.0
b := 1.0
n := 100000.0
deltax := (b - a) / n
stepPerCPU := n / float64(nCPU)
for start := 0.0; start < n; {
stop := start + stepPerCPU
go f(start, stop, a, deltax, ch)
start = stop
}
integral := 0.0
for i := 0; i < nCPU; i++ {
integral += <-ch
}
fmt.Println(time.Now().Sub(startTime))
fmt.Println(deltax * integral)
}
func f(start, stop, a, deltax float64, ch chan float64) {
result := 0.0
for i := start; i < stop; i++ {
result += math.Sqrt(a + deltax*(i+0.5))
}
ch <- result
}
Output on 2 Cores:
nCPU = 2
2.0001ms
0.6666666685900485
Output on 8 Cores:
nCPU = 8
993µs
0.6666666685900456
Your sample code, Output on 2 Cores:
0.6666666685900424
61.0035ms
Your sample code, Output on 8 Cores:
0.6666666685900415
40.9964ms
2- For good benchmark statistics, use large number of samples (big n):
As you See here using 2 Cores this takes 110ms on 2 Cores, but on this same CPU
using 1 Core this takes 215ms with n := 10000000.0:
With n := 10000000.0 and single goroutine, see this working sample code:
package main
import (
"fmt"
"math"
"time"
)
func main() {
now := time.Now()
a := 0.0
b := 1.0
n := 10000000.0
deltax := (b - a) / n
result := 0.0
for i := 0.0; i < n; i++ {
result += math.Sqrt(a + deltax*(i+0.5))
}
fmt.Println(time.Now().Sub(now))
fmt.Println(deltax * result)
}
Output:
215.0123ms
0.6666666666685884
With n := 10000000.0 and 2 goroutines, see this working sample code:
package main
import (
"fmt"
"math"
"runtime"
"time"
)
func main() {
nCPU := runtime.NumCPU()
fmt.Println("nCPU =", nCPU)
ch := make(chan float64, nCPU)
startTime := time.Now()
a := 0.0
b := 1.0
n := 10000000.0
deltax := (b - a) / n
stepPerCPU := n / float64(nCPU)
for start := 0.0; start < n; {
stop := start + stepPerCPU
go f(start, stop, a, deltax, ch)
start = stop
}
integral := 0.0
for i := 0; i < nCPU; i++ {
integral += <-ch
}
fmt.Println(time.Now().Sub(startTime))
fmt.Println(deltax * integral)
}
func f(start, stop, a, deltax float64, ch chan float64) {
result := 0.0
for i := start; i < stop; i++ {
result += math.Sqrt(a + deltax*(i+0.5))
}
ch <- result
}
Output:
nCPU = 2
110.0063ms
0.6666666666686073
1- There is an optimum point for number of Goroutines, And from this point forward increasing number of Goroutines doesn't reduce program execution time:
On 2 Cores CPU, with the following code, The result is:
nCPU: 1, 2, 4, 8, 16
Time: 2.1601236s, 1.1220642s, 1.1060633s, 1.1140637s, 1.1380651s
As you see from nCPU=1 to nCPU=2 time decrease is big enough but after this point it is not much, so nCPU=2 on 2 Cores CPU is Optimum point for this Sample code, so using nCPU := runtime.NumCPU() is enough here.
package main
import (
"fmt"
"math"
"time"
)
func main() {
nCPU := 2 //2.1601236s#1 1.1220642s#2 1.1060633s#4 1.1140637s#8 1.1380651s#16
fmt.Println("nCPU =", nCPU)
ch := make(chan float64, nCPU)
startTime := time.Now()
a := 0.0
b := 1.0
n := 100000000.0
deltax := (b - a) / n
stepPerCPU := n / float64(nCPU)
for start := 0.0; start < n; {
stop := start + stepPerCPU
go f(start, stop, a, deltax, ch)
start = stop
}
integral := 0.0
for i := 0; i < nCPU; i++ {
integral += <-ch
}
fmt.Println(time.Now().Sub(startTime))
fmt.Println(deltax * integral)
}
func f(start, stop, a, deltax float64, ch chan float64) {
result := 0.0
for i := start; i < stop; i++ {
result += math.Sqrt(a + deltax*(i+0.5))
}
ch <- result
}
Unless the time taken by the activity in the goroutine takes a lot more time than needed to switch contexts, carry out the task and use a mutex to update a value, it would be faster to do it serially.
Take a look at a slightly modified version. All I've done is add a delay of 1 microsecond in the f() function.
package main
import (
"fmt"
"math"
"sync"
"time"
)
type Result struct {
result float64
lock sync.RWMutex
}
var wg sync.WaitGroup
var result Result
func main() {
fmt.Println("concurrent")
concurrent()
result.result = 0
fmt.Println("serial")
serial()
}
func concurrent() {
now := time.Now()
a := 0.0
b := 1.0
n := 100000.0
deltax := (b - a) / n
wg.Add(int(n))
for i := 0.0; i < n; i++ {
go f(a, deltax, i, true)
}
wg.Wait()
fmt.Println(deltax * result.result)
fmt.Println(time.Now().Sub(now))
}
func serial() {
now := time.Now()
a := 0.0
b := 1.0
n := 100000.0
deltax := (b - a) / n
for i := 0.0; i < n; i++ {
f(a, deltax, i, false)
}
fmt.Println(deltax * result.result)
fmt.Println(time.Now().Sub(now))
}
func f(a, deltax, i float64, concurrent bool) {
time.Sleep(1 * time.Microsecond)
fx := math.Sqrt(a + deltax*(i+0.5))
if concurrent {
result.lock.Lock()
result.result += fx
result.lock.Unlock()
wg.Done()
} else {
result.result += fx
}
}
With the delay, the result was as follows (the concurrent version is much faster):
concurrent
0.6666666685900424
624.914165ms
serial
0.6666666685900422
5.609195767s
Without the delay:
concurrent
0.6666666685900428
50.771275ms
serial
0.6666666685900422
749.166µs
As you can see, the longer it takes to complete a task, the more sense it makes to do it concurrently, if possible.

Resources