Goroutine performance

Goroutine performance - performance

I have started to learn Go, it's fun and easy. But working with goroutines I have seen little benefit in performance.
If I try to sequentially add 1 million numbers two times in 2 functions:
package main
import (
"fmt"
"time"
)
var sumA int
var sumB int
func fSumA() {
for i := 0; i < 1000000; i++ {
sumA += i
}
}
func fSumB() {
for i := 0; i < 1000000; i++ {
sumB += i
}
}
func main() {
start := time.Now()
fSumA()
fSumB()
sum := sumA + sumB
fmt.Println("Elapsed time", time.Since(start))
fmt.Println("Sum", sum)
}
It takes 5 ms.
MacBook-Pro-de-Pedro:hello pedro$ ./bin/hello
Elapsed time 5.724406ms
Suma total 999999000000
MacBook-Pro-de-Pedro:hello pedro$ ./bin/hello
Elapsed time 5.358165ms
Suma total 999999000000
MacBook-Pro-de-Pedro:hello pedro$ ./bin/hello
Elapsed time 5.042528ms
Suma total 999999000000
MacBook-Pro-de-Pedro:hello pedro$ ./bin/hello
Elapsed time 5.469628ms
Suma total 999999000000
When I try to do the same thing with 2 goroutines:
package main
import (
"fmt"
"sync"
"time"
)
var wg sync.WaitGroup
var sumA int
var sumB int
func fSumA() {
for i := 0; i < 1000000; i++ {
sumA += i
}
wg.Done()
}
func fSumB() {
for i := 0; i < 1000000; i++ {
sumB += i
}
wg.Done()
}
func main() {
start := time.Now()
wg.Add(2)
go fSumA()
go fSumB()
wg.Wait()
sum := sumA + sumB
fmt.Println("Elapsed time", time.Since(start))
fmt.Println("Sum", sum)
}
I get more or less the same result, 5 ms. My computer is a MacBook pro (Core 2 Duo). I don't see any performance improvement. Maybe is the processor?
MacBook-Pro-de-Pedro:hello pedro$ ./bin/hello
Elapsed time 5.258415ms
Suma total 999999000000
MacBook-Pro-de-Pedro:hello pedro$ ./bin/hello
Elapsed time 5.528498ms
Suma total 999999000000
MacBook-Pro-de-Pedro:hello pedro$ ./bin/hello
Elapsed time 5.273565ms
Suma total 999999000000
MacBook-Pro-de-Pedro:hello pedro$ ./bin/hello
Elapsed time 5.539224ms
Suma total 999999000000

Here how you can test this with the golangs own benchmark tool:
Create a test go file (e.g. main_test.go).
Note: _test.go has to be the file ending!
Copy the following code there or create your own Benchmarks:
package main
import (
"sync"
"testing"
)
var GlobalInt int
func BenchmarkCount(b *testing.B) {
var a, c int
count(&a, b.N)
count(&c, b.N)
GlobalInt = a + c // make sure the result is actually used
}
func count(a *int, max int) {
for i := 0; i < max; i++ {
*a += i
}
}
var wg sync.WaitGroup
func BenchmarkCountConcurrent(b *testing.B) {
var a, c int
wg.Add(2)
go countCon(&a, b.N)
go countCon(&c, b.N)
wg.Wait()
GlobalInt = a + c // make sure the result is actually used
}
func countCon(a *int, max int) {
for i := 0; i < max; i++ {
*a += i
}
wg.Done()
}
Run with:
go test -bench .
Result on my Mac:
$ go test -bench .
BenchmarkCount-8 500000000 3.50 ns/op
BenchmarkCountConcurrent-8 2000000000 1.98 ns/op
PASS
ok MyPath/MyPackage 6.309s
Most important value is the time/op. The smaller the better. Here 3.5 ns/op for normal counting, 1.98 ns/op for concurrent counting.
EDIT:
Here you can read up on golang Testing and Benchmark.

Related

concurrency in go -Taking same time with different no of CPU

I am running a go concurrent program with the below two case and observed that It is taking same time irrespective of no of CPU it using while execution.
Case1: When cpuUsed = 1
program took 3m20.973185s.
when I am increasing the no of CPU used.
Case2: when cpuUsed = 8
program took 3m20.9330516s.
Please find the below Go code for more details.
package main
import (
"fmt"
"math/rand"
"runtime"
"sync"
"time"
)
var waitG sync.WaitGroup
var cpuUsed = 1
var maxRandomNums = 1000
func init() {
maxCPU := runtime.NumCPU() //It'll give us the max CPU :)
cpuUsed = 8 //getting same time taken for 1 and 8
runtime.GOMAXPROCS(cpuUsed)
fmt.Printf("Number of CPUs (Total=%d - Used=%d) \n", maxCPU, cpuUsed)
}
func main() {
start := time.Now()
ids := []string{"rotine1", "routine2", "routine3", "routine4"}
waitG.Add(4)
for i := range ids {
go numbers(ids[i])
}
waitG.Wait()
elapsed := time.Since(start)
fmt.Printf("\nprogram took %s. \n", elapsed)
}
func numbers(id string) {
rand.Seed(time.Now().UnixNano())
for i := 1; i <= maxRandomNums; i++ {
time.Sleep(200 * time.Millisecond)
fmt.Printf("%s-%d ", id, rand.Intn(20)+20)
}
waitG.Done()
}

you will find out:
total time (3 min 20s) = 200s = sleep(200ms) * loops(1000)
Let's simplify your code and focus on CPU usage:
Remove the Sleep, which does not use the CPU at all
fmt.Println as a stdio, does not use the CPU
Random number did nothing but introduce uncertainty into the program, remove it
The only code that takes CPU in the goroutine is the "rand.Intn(20)+20", making it a constant addition
Increase the "maxRandomNums"
then your code will be like this, run it again
package main
import (
"fmt"
"runtime"
"sync"
"time"
)
var waitG sync.WaitGroup
var cpuUsed = 1
var maxRandomNums = 1000000000
func init() {
maxCPU := runtime.NumCPU() //It'll give us the max CPU :)
cpuUsed = 8 //getting same time taken for 1 and 8
runtime.GOMAXPROCS(cpuUsed)
fmt.Printf("Number of CPUs (Total=%d - Used=%d) \n", maxCPU, cpuUsed)
}
func main() {
start := time.Now()
ids := []string{"rotine1", "routine2", "routine3", "routine4"}
waitG.Add(4)
for i := range ids {
go numbers(ids[i])
}
waitG.Wait()
elapsed := time.Since(start)
fmt.Printf("\nprogram took %s. \n", elapsed)
}
func numbers(id string) {
// rand.Seed(time.Now().UnixNano())
for i := 1; i <= maxRandomNums; i++ {
// time.Sleep(200 * time.Millisecond)
// fmt.Printf("%s-%d ", id, rand.Intn(20)+20)
_ = i + 20
}
waitG.Done()
}

golang global variable access is slow in benchmark test

Here is a simple golang benchmark test, it runs x++ in three different ways:
in a simple for loop with x declared inside function
in a nested loop with x declared inside function
in a nested loop with x declared as global variable
package main
import (
"testing"
)
var x = 0
func BenchmarkLoop(b *testing.B) {
x := 0
for n := 0; n < b.N; n++ {
x++
}
}
func BenchmarkDoubleLoop(b *testing.B) {
x := 0
for n := 0; n < b.N/1000; n++ {
for m := 0; m < 1000; m++ {
x++
}
}
}
func BenchmarkDoubleLoopGlobalVariable(b *testing.B) {
for n := 0; n < b.N/1000; n++ {
for m := 0; m < 1000; m++ {
x++
}
}
}
And the result is as following:
$ go test -bench=.
BenchmarkLoop-8 2000000000 0.32 ns/op
BenchmarkDoubleLoop-8 2000000000 0.34 ns/op
BenchmarkDoubleLoopGlobalVariable-8 2000000000 2.00 ns/op
PASS
ok github.com/cizixs/playground/loop-perf 5.597s
Obviously, the first and second methods have similar performance, while the third function is much slower(about 6x times slow).
And I wonder why this is happening, is there a way to improve performance of global variable access?

I wonder why this is happening.
The compiler optimizes away your whole code. 300ps per op means a only a noop was "executed".

confusing concurrency and performance issue in Go

now I start learning Go language by watching this great course. To be clear for years I write only on PHP and concurrency/parallelism is new for me, so I little confused by this.
In this course, there is a task to create a program which calculates factorial with 100 computations. I went a bit further and to comparing performance I changed it to 10000 and for some reason, the sequential program works same or even faster than concurrency.
Here I'm going to provide 3 solutions: mine, teachers and sequential
My solution:
package main
import (
"fmt"
)
func gen(steps int) <-chan int{
out := make(chan int)
go func() {
for j:= 0; j <steps; j++ {
out <- j
}
close(out)
}()
return out
}
func factorial(in <-chan int) <-chan int {
out := make(chan int)
go func() {
for n := range in {
out <- fact(n)
}
close(out)
}()
return out
}
func fact(n int) int {
total := 1
for i := n;i>0;i-- {
total *=i
}
return total
}
func main() {
steps := 10000
for i := 0; i < steps; i++ {
for n:= range factorial(gen(10)) {
fmt.Println(n)
}
}
}
execution time:
real 0m6,356s
user 0m3,885s
sys 0m0,870s
Teacher solution:
package main
import (
"fmt"
)
func gen(steps int) <-chan int{
out := make(chan int)
go func() {
for i := 0; i < steps; i++ {
for j:= 0; j <10; j++ {
out <- j
}
}
close(out)
}()
return out
}
func factorial(in <-chan int) <-chan int {
out := make(chan int)
go func() {
for n := range in {
out <- fact(n)
}
close(out)
}()
return out
}
func fact(n int) int {
total := 1
for i := n;i>0;i-- {
total *=i
}
return total
}
func main() {
steps := 10000
for n:= range factorial(gen(steps)) {
fmt.Println(n)
}
}
execution time:
real 0m2,836s
user 0m1,388s
sys 0m0,492s
Sequential:
package main
import (
"fmt"
)
func fact(n int) int {
total := 1
for i := n;i>0;i-- {
total *=i
}
return total
}
func main() {
steps := 10000
for i := 0; i < steps; i++ {
for j:= 0; j <10; j++ {
fmt.Println(fact(j))
}
}
}
execution time:
real 0m2,513s
user 0m1,113s
sys 0m0,387s
So, as you can see the sequential solution is fastest, teachers solution is in the second place and my solution is third.
First question: why the sequential solution is fastest?
And second why my solution is so slow? if I understanding correctly in my solution I'm creating 10000 goroutines inside gen and 10000 inside factorial and in teacher solution, he is creating only 1 goroutine in gen and 1 in factorial. My so slow because I'm creating too many unneeded goroutines?

It's the difference between concurrency and parellelism - your's, you teachers and the sequential are progressively less concurrent in design but how parallel they are depends on number of CPU cores and there is a set up and communication cost associated with concurrency. There are no asynchronous calls in the code so only parallelism will improve speed.
This is worth a look: https://blog.golang.org/concurrency-is-not-parallelism
Also, even with parallel cores speedup will be dependent on nature of the workload - google Amdahl's law for explanation.

Let's start with some fundamental benchmarks for factorial computation.
$ go test -run=! -bench=. factorial_test.go
goos: linux
goarch: amd64
BenchmarkFact0-4 1000000000 2.07 ns/op
BenchmarkFact9-4 300000000 4.37 ns/op
BenchmarkFact0To9-4 50000000 36.0 ns/op
BenchmarkFact10K0To9-4 3000 384069 ns/op
$
The CPU time is very small, even for 10,000 iterations of factorials zero through nine.
factorial_test.go:
package main
import "testing"
func fact(n int) int {
total := 1
for i := n; i > 0; i-- {
total *= i
}
return total
}
var sinkFact int
func BenchmarkFact0(b *testing.B) {
for N := 0; N < b.N; N++ {
j := 0
sinkFact = fact(j)
}
}
func BenchmarkFact9(b *testing.B) {
for N := 0; N < b.N; N++ {
j := 9
sinkFact = fact(j)
}
}
func BenchmarkFact0To9(b *testing.B) {
for N := 0; N < b.N; N++ {
for j := 0; j < 10; j++ {
sinkFact = fact(j)
}
}
}
func BenchmarkFact10K0To9(b *testing.B) {
for N := 0; N < b.N; N++ {
steps := 10000
for i := 0; i < steps; i++ {
for j := 0; j < 10; j++ {
sinkFact = fact(j)
}
}
}
}
Let's look at the time for the sequential program.
$ go build -a sequential.go && time ./sequential
real 0m0.247s
user 0m0.054s
sys 0m0.149s
Writing to the terminal is obviously a major bottleneck. Let's write to a sink.
$ go build -a sequential.go && time ./sequential > /dev/null
real 0m0.070s
user 0m0.049s
sys 0m0.020s
It's still a lot more than the 0m0.000000384069s for the factorial computation.
sequential.go:
package main
import (
"fmt"
)
func fact(n int) int {
total := 1
for i := n; i > 0; i-- {
total *= i
}
return total
}
func main() {
steps := 10000
for i := 0; i < steps; i++ {
for j := 0; j < 10; j++ {
fmt.Println(fact(j))
}
}
}
Attempts to use concurrency for such a trivial amount of parallel work are likely to fail. Go goroutines and channels are cheap, but they are not free. Also, a single channel and a single terminal are the bottleneck, the limiting factor, even when writing to a sink. See Amdahl's Law for parallel computing. See Concurrency is not parallelism.
$ go build -a teacher.go && time ./teacher > /dev/null
real 0m0.123s
user 0m0.123s
sys 0m0.022s
$ go build -a student.go && time ./student > /dev/null
real 0m0.135s
user 0m0.113s
sys 0m0.038s
teacher.go:
package main
import (
"fmt"
)
func gen(steps int) <-chan int {
out := make(chan int)
go func() {
for i := 0; i < steps; i++ {
for j := 0; j < 10; j++ {
out <- j
}
}
close(out)
}()
return out
}
func factorial(in <-chan int) <-chan int {
out := make(chan int)
go func() {
for n := range in {
out <- fact(n)
}
close(out)
}()
return out
}
func fact(n int) int {
total := 1
for i := n; i > 0; i-- {
total *= i
}
return total
}
func main() {
steps := 10000
for n := range factorial(gen(steps)) {
fmt.Println(n)
}
}
student.go:
package main
import (
"fmt"
)
func gen(steps int) <-chan int {
out := make(chan int)
go func() {
for j := 0; j < steps; j++ {
out <- j
}
close(out)
}()
return out
}
func factorial(in <-chan int) <-chan int {
out := make(chan int)
go func() {
for n := range in {
out <- fact(n)
}
close(out)
}()
return out
}
func fact(n int) int {
total := 1
for i := n; i > 0; i-- {
total *= i
}
return total
}
func main() {
steps := 10000
for i := 0; i < steps; i++ {
for n := range factorial(gen(10)) {
fmt.Println(n)
}
}
}

Benchmarking bad results

so I have the Quicksort algorithm implemented with concurrency (the one without as well). Now I wanted to compare the times. I wrote this:
func benchmarkConcurrentQuickSort(size int, b *testing.B) {
A := RandomArray(size)
var wg sync.WaitGroup
b.ResetTimer()
ConcurrentQuicksort(A, 0, len(A)-1, &wg)
wg.Wait()
}
func BenchmarkConcurrentQuickSort500(b *testing.B) {
benchmarkConcurrentQuickSort(500, b)
}
func BenchmarkConcurrentQuickSort1000(b *testing.B) {
benchmarkConcurrentQuickSort(1000, b)
}
func BenchmarkConcurrentQuickSort5000(b *testing.B) {
benchmarkConcurrentQuickSort(5000, b)
}
func BenchmarkConcurrentQuickSort10000(b *testing.B) {
benchmarkConcurrentQuickSort(10000, b)
}
func BenchmarkConcurrentQuickSort20000(b *testing.B) {
benchmarkConcurrentQuickSort(20000, b)
}
func BenchmarkConcurrentQuickSort1000000(b *testing.B) {
benchmarkConcurrentQuickSort(1000000, b)
}
The results are like this:
C:\projects\go\src\github.com\frynio\mysort>go test -bench=.
BenchmarkConcurrentQuickSort500-4 2000000000 0.00 ns/op
BenchmarkConcurrentQuickSort1000-4 2000000000 0.00 ns/op
BenchmarkConcurrentQuickSort5000-4 2000000000 0.00 ns/op
BenchmarkConcurrentQuickSort10000-4 2000000000 0.00 ns/op
BenchmarkConcurrentQuickSort20000-4 2000000000 0.00 ns/op
BenchmarkConcurrentQuickSort1000000-4 30 49635266 ns/op
PASS
ok github.com/frynio/mysort 8.342s
I can believe the last one, but I definitely think that sorting 500-element array takes longer than 1ns. What am i doing wrong? I am pretty sure that RandomArray returns array of wanted size, as we can see in the last benchmark. Why does it print out the 0.00 ns?
func RandomArray(n int) []int {
a := []int{}
for i := 0; i < n; i++ {
a = append(a, rand.Intn(500))
}
return a
}
// ConcurrentPartition - ConcurrentQuicksort function for partitioning the array (randomized choice of a pivot)
func ConcurrentPartition(A []int, p int, r int) int {
index := rand.Intn(r-p) + p
pivot := A[index]
A[index] = A[r]
A[r] = pivot
x := A[r]
j := p - 1
i := p
for i < r {
if A[i] <= x {
j++
tmp := A[j]
A[j] = A[i]
A[i] = tmp
}
i++
}
temp := A[j+1]
A[j+1] = A[r]
A[r] = temp
return j + 1
}
// ConcurrentQuicksort - a concurrent version of a quicksort algorithm
func ConcurrentQuicksort(A []int, p int, r int, wg *sync.WaitGroup) {
if p < r {
q := ConcurrentPartition(A, p, r)
wg.Add(2)
go func() {
ConcurrentQuicksort(A, p, q-1, wg)
wg.Done()
}()
go func() {
ConcurrentQuicksort(A, q+1, r, wg)
wg.Done()
}()
}
}

Package testing
A sample benchmark function looks like this:
func BenchmarkHello(b *testing.B) {
for i := 0; i < b.N; i++ {
fmt.Sprintf("hello")
}
}
The benchmark function must run the target code b.N times. During
benchmark execution, b.N is adjusted until the benchmark function
lasts long enough to be timed reliably.
I don't see a benchmark loop in your code. Try
func benchmarkConcurrentQuickSort(size int, b *testing.B) {
A := RandomArray(size)
var wg sync.WaitGroup
b.ResetTimer()
for i := 0; i < b.N; i++ {
ConcurrentQuicksort(A, 0, len(A)-1, &wg)
wg.Wait()
}
}
Output:
BenchmarkConcurrentQuickSort500-4 10000 122291 ns/op
BenchmarkConcurrentQuickSort1000-4 5000 221154 ns/op
BenchmarkConcurrentQuickSort5000-4 1000 1225230 ns/op
BenchmarkConcurrentQuickSort10000-4 500 2568024 ns/op
BenchmarkConcurrentQuickSort20000-4 300 5808130 ns/op
BenchmarkConcurrentQuickSort1000000-4 1 1371751710 ns/op

Concurrent integral calculation in Golang

I tried to compute the integral concurrently, but my program ended up being slower than computing the integral with a normal for loop. What am I doing wrong?
package main
import (
"fmt"
"math"
"sync"
"time"
)
type Result struct {
result float64
lock sync.RWMutex
}
var wg sync.WaitGroup
var result Result
func main() {
now := time.Now()
a := 0.0
b := 1.0
n := 100000.0
deltax := (b - a) / n
wg.Add(int(n))
for i := 0.0; i < n; i++ {
go f(a, deltax, i)
}
wg.Wait()
fmt.Println(deltax * result.result)
fmt.Println(time.Now().Sub(now))
}
func f(a float64, deltax float64, i float64) {
fx := math.Sqrt(a + deltax * (i + 0.5))
result.lock.Lock()
result.result += fx
result.lock.Unlock()
wg.Done()
}

3- For performance gain, you may divide tasks per CPU cores without using lock sync.RWMutex:
+30x Optimizations using channels and runtime.NumCPU(), this takes 2ms on 2 Cores and 993µs on 8 Cores, while Your sample code takes 61ms on 2 Cores and 40ms on 8 Cores:
See this working sample code and outputs:
package main
import (
"fmt"
"math"
"runtime"
"time"
)
func main() {
nCPU := runtime.NumCPU()
fmt.Println("nCPU =", nCPU)
ch := make(chan float64, nCPU)
startTime := time.Now()
a := 0.0
b := 1.0
n := 100000.0
deltax := (b - a) / n
stepPerCPU := n / float64(nCPU)
for start := 0.0; start < n; {
stop := start + stepPerCPU
go f(start, stop, a, deltax, ch)
start = stop
}
integral := 0.0
for i := 0; i < nCPU; i++ {
integral += <-ch
}
fmt.Println(time.Now().Sub(startTime))
fmt.Println(deltax * integral)
}
func f(start, stop, a, deltax float64, ch chan float64) {
result := 0.0
for i := start; i < stop; i++ {
result += math.Sqrt(a + deltax*(i+0.5))
}
ch <- result
}
Output on 2 Cores:
nCPU = 2
2.0001ms
0.6666666685900485
Output on 8 Cores:
nCPU = 8
993µs
0.6666666685900456
Your sample code, Output on 2 Cores:
0.6666666685900424
61.0035ms
Your sample code, Output on 8 Cores:
0.6666666685900415
40.9964ms
2- For good benchmark statistics, use large number of samples (big n):
As you See here using 2 Cores this takes 110ms on 2 Cores, but on this same CPU
using 1 Core this takes 215ms with n := 10000000.0:
With n := 10000000.0 and single goroutine, see this working sample code:
package main
import (
"fmt"
"math"
"time"
)
func main() {
now := time.Now()
a := 0.0
b := 1.0
n := 10000000.0
deltax := (b - a) / n
result := 0.0
for i := 0.0; i < n; i++ {
result += math.Sqrt(a + deltax*(i+0.5))
}
fmt.Println(time.Now().Sub(now))
fmt.Println(deltax * result)
}
Output:
215.0123ms
0.6666666666685884
With n := 10000000.0 and 2 goroutines, see this working sample code:
package main
import (
"fmt"
"math"
"runtime"
"time"
)
func main() {
nCPU := runtime.NumCPU()
fmt.Println("nCPU =", nCPU)
ch := make(chan float64, nCPU)
startTime := time.Now()
a := 0.0
b := 1.0
n := 10000000.0
deltax := (b - a) / n
stepPerCPU := n / float64(nCPU)
for start := 0.0; start < n; {
stop := start + stepPerCPU
go f(start, stop, a, deltax, ch)
start = stop
}
integral := 0.0
for i := 0; i < nCPU; i++ {
integral += <-ch
}
fmt.Println(time.Now().Sub(startTime))
fmt.Println(deltax * integral)
}
func f(start, stop, a, deltax float64, ch chan float64) {
result := 0.0
for i := start; i < stop; i++ {
result += math.Sqrt(a + deltax*(i+0.5))
}
ch <- result
}
Output:
nCPU = 2
110.0063ms
0.6666666666686073
1- There is an optimum point for number of Goroutines, And from this point forward increasing number of Goroutines doesn't reduce program execution time:
On 2 Cores CPU, with the following code, The result is:
nCPU: 1, 2, 4, 8, 16
Time: 2.1601236s, 1.1220642s, 1.1060633s, 1.1140637s, 1.1380651s
As you see from nCPU=1 to nCPU=2 time decrease is big enough but after this point it is not much, so nCPU=2 on 2 Cores CPU is Optimum point for this Sample code, so using nCPU := runtime.NumCPU() is enough here.
package main
import (
"fmt"
"math"
"time"
)
func main() {
nCPU := 2 //2.1601236s#1 1.1220642s#2 1.1060633s#4 1.1140637s#8 1.1380651s#16
fmt.Println("nCPU =", nCPU)
ch := make(chan float64, nCPU)
startTime := time.Now()
a := 0.0
b := 1.0
n := 100000000.0
deltax := (b - a) / n
stepPerCPU := n / float64(nCPU)
for start := 0.0; start < n; {
stop := start + stepPerCPU
go f(start, stop, a, deltax, ch)
start = stop
}
integral := 0.0
for i := 0; i < nCPU; i++ {
integral += <-ch
}
fmt.Println(time.Now().Sub(startTime))
fmt.Println(deltax * integral)
}
func f(start, stop, a, deltax float64, ch chan float64) {
result := 0.0
for i := start; i < stop; i++ {
result += math.Sqrt(a + deltax*(i+0.5))
}
ch <- result
}

Unless the time taken by the activity in the goroutine takes a lot more time than needed to switch contexts, carry out the task and use a mutex to update a value, it would be faster to do it serially.
Take a look at a slightly modified version. All I've done is add a delay of 1 microsecond in the f() function.
package main
import (
"fmt"
"math"
"sync"
"time"
)
type Result struct {
result float64
lock sync.RWMutex
}
var wg sync.WaitGroup
var result Result
func main() {
fmt.Println("concurrent")
concurrent()
result.result = 0
fmt.Println("serial")
serial()
}
func concurrent() {
now := time.Now()
a := 0.0
b := 1.0
n := 100000.0
deltax := (b - a) / n
wg.Add(int(n))
for i := 0.0; i < n; i++ {
go f(a, deltax, i, true)
}
wg.Wait()
fmt.Println(deltax * result.result)
fmt.Println(time.Now().Sub(now))
}
func serial() {
now := time.Now()
a := 0.0
b := 1.0
n := 100000.0
deltax := (b - a) / n
for i := 0.0; i < n; i++ {
f(a, deltax, i, false)
}
fmt.Println(deltax * result.result)
fmt.Println(time.Now().Sub(now))
}
func f(a, deltax, i float64, concurrent bool) {
time.Sleep(1 * time.Microsecond)
fx := math.Sqrt(a + deltax*(i+0.5))
if concurrent {
result.lock.Lock()
result.result += fx
result.lock.Unlock()
wg.Done()
} else {
result.result += fx
}
}
With the delay, the result was as follows (the concurrent version is much faster):
concurrent
0.6666666685900424
624.914165ms
serial
0.6666666685900422
5.609195767s
Without the delay:
concurrent
0.6666666685900428
50.771275ms
serial
0.6666666685900422
749.166µs
As you can see, the longer it takes to complete a task, the more sense it makes to do it concurrently, if possible.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Goroutine performance - performance

Related

concurrency in go -Taking same time with different no of CPU

golang global variable access is slow in benchmark test

confusing concurrency and performance issue in Go

Benchmarking bad results

Concurrent integral calculation in Golang

Categories

Resources