concurrency in go -Taking same time with different no of CPU - go

I am running a go concurrent program with the below two case and observed that It is taking same time irrespective of no of CPU it using while execution.
Case1: When cpuUsed = 1
program took 3m20.973185s.
when I am increasing the no of CPU used.
Case2: when cpuUsed = 8
program took 3m20.9330516s.
Please find the below Go code for more details.
package main
import (
"fmt"
"math/rand"
"runtime"
"sync"
"time"
)
var waitG sync.WaitGroup
var cpuUsed = 1
var maxRandomNums = 1000
func init() {
maxCPU := runtime.NumCPU() //It'll give us the max CPU :)
cpuUsed = 8 //getting same time taken for 1 and 8
runtime.GOMAXPROCS(cpuUsed)
fmt.Printf("Number of CPUs (Total=%d - Used=%d) \n", maxCPU, cpuUsed)
}
func main() {
start := time.Now()
ids := []string{"rotine1", "routine2", "routine3", "routine4"}
waitG.Add(4)
for i := range ids {
go numbers(ids[i])
}
waitG.Wait()
elapsed := time.Since(start)
fmt.Printf("\nprogram took %s. \n", elapsed)
}
func numbers(id string) {
rand.Seed(time.Now().UnixNano())
for i := 1; i <= maxRandomNums; i++ {
time.Sleep(200 * time.Millisecond)
fmt.Printf("%s-%d ", id, rand.Intn(20)+20)
}
waitG.Done()
}

you will find out:
total time (3 min 20s) = 200s = sleep(200ms) * loops(1000)
Let's simplify your code and focus on CPU usage:
Remove the Sleep, which does not use the CPU at all
fmt.Println as a stdio, does not use the CPU
Random number did nothing but introduce uncertainty into the program, remove it
The only code that takes CPU in the goroutine is the "rand.Intn(20)+20", making it a constant addition
Increase the "maxRandomNums"
then your code will be like this, run it again
package main
import (
"fmt"
"runtime"
"sync"
"time"
)
var waitG sync.WaitGroup
var cpuUsed = 1
var maxRandomNums = 1000000000
func init() {
maxCPU := runtime.NumCPU() //It'll give us the max CPU :)
cpuUsed = 8 //getting same time taken for 1 and 8
runtime.GOMAXPROCS(cpuUsed)
fmt.Printf("Number of CPUs (Total=%d - Used=%d) \n", maxCPU, cpuUsed)
}
func main() {
start := time.Now()
ids := []string{"rotine1", "routine2", "routine3", "routine4"}
waitG.Add(4)
for i := range ids {
go numbers(ids[i])
}
waitG.Wait()
elapsed := time.Since(start)
fmt.Printf("\nprogram took %s. \n", elapsed)
}
func numbers(id string) {
// rand.Seed(time.Now().UnixNano())
for i := 1; i <= maxRandomNums; i++ {
// time.Sleep(200 * time.Millisecond)
// fmt.Printf("%s-%d ", id, rand.Intn(20)+20)
_ = i + 20
}
waitG.Done()
}

Related

Parallel execution of prime finding algorithm slows runtime

So I implemented the following prime finding algorithm in go.
primes = []
Assume all numbers are primes (vacuously true)
check = 2
if check is still assumed to be prime append it to primes
multiply check by each prime less than or equal to its minimum factor and
eliminate results from assumed primes.
increment check by 1 and repeat 4 thru 6 until check > limit.
Here is my serial implementation:
package main
import(
"fmt"
"time"
)
type numWithMinFactor struct {
number int
minfactor int
}
func pow(base int, power int) int{
result := 1
for i:=0;i<power;i++{
result*=base
}
return result
}
func process(check numWithMinFactor,primes []int,top int,minFactors []numWithMinFactor){
var n int
for i:=0;primes[i]<=check.minfactor;i++{
n = check.number*primes[i]
if n>top{
break;
}
minFactors[n] = numWithMinFactor{n,primes[i]}
if i+1 == len(primes){
break;
}
}
}
func findPrimes(top int) []int{
primes := []int{}
minFactors := make([]numWithMinFactor,top+2)
check := 2
for power:=1;check <= top;power++{
if minFactors[check].number == 0{
primes = append(primes,check)
minFactors[check] = numWithMinFactor{check,check}
}
process(minFactors[check],primes,top,minFactors)
check++
}
return primes
}
func main(){
fmt.Println("Welcome to prime finder!")
start := time.Now()
fmt.Println(findPrimes(1000000))
elapsed := time.Since(start)
fmt.Println("Finding primes took %s", elapsed)
}
This runs great producing all the primes <1,000,000 in about 63ms (mostly printing) and primes <10,000,000 in 600ms on my pc. Now I figure none of the numbers check such that 2^n < check <= 2^(n+1) have factors > 2^n so I can do all the multiplications and elimination for each check in that range in parallel once I have primes up to 2^n. And my parallel implementation is as follows:
package main
import(
"fmt"
"time"
"sync"
)
type numWithMinFactor struct {
number int
minfactor int
}
func pow(base int, power int) int{
result := 1
for i:=0;i<power;i++{
result*=base
}
return result
}
func process(check numWithMinFactor,primes []int,top int,minFactors []numWithMinFactor, wg *sync.WaitGroup){
defer wg.Done()
var n int
for i:=0;primes[i]<=check.minfactor;i++{
n = check.number*primes[i]
if n>top{
break;
}
minFactors[n] = numWithMinFactor{n,primes[i]}
if i+1 == len(primes){
break;
}
}
}
func findPrimes(top int) []int{
primes := []int{}
minFactors := make([]numWithMinFactor,top+2)
check := 2
var wg sync.WaitGroup
for power:=1;check <= top;power++{
for check <= pow(2,power){
if minFactors[check].number == 0{
primes = append(primes,check)
minFactors[check] = numWithMinFactor{check,check}
}
wg.Add(1)
go process(minFactors[check],primes,top,minFactors,&wg)
check++
if check>top{
break;
}
}
wg.Wait()
}
return primes
}
func main(){
fmt.Println("Welcome to prime finder!")
start := time.Now()
fmt.Println(findPrimes(1000000))
elapsed := time.Since(start)
fmt.Println("Finding primes took %s", elapsed)
}
Unfortunately not only is this implementation slower running up to 1,000,000 in 600ms and up to 10 million in 6 seconds. My intuition tells me that there is potential for parallelism to improve performance however I clearly haven't been able to achieve that and would greatly appreciate any input on how to improve runtime here, or more specifically any insight as to why the parallel solution is slower.
Additionally the parallel solution consumes more memory relative to the serial solution but that is to be expected; the serial solution can grid up to 1,000,000,000 in about 22 seconds where the parallel solution runs out of memory on my system (32GB ram) going for the same target. But I'm asking about runtime here not memory use, I could for example use the zero value state of the minFactors array rather than a separate isPrime []bool true state but I think it is more readable as is.
I've tried passing a pointer for primes []int but that didn't seem to make a difference, using a channel instead of passing the minFactors array to the process function resulted in big time memory use and a much(10x ish) slower performance. I've re-written this algo a couple times to see if I could iron anything out but no luck. Any insights or suggestions would be much appreciated because I think parallelism could make this faster not 10x slower!
Par #Volker's suggestion I limited the number of processes to somthing less than my pc's available logical processes with the following revision however I am still getting runtimes that are 10x slower than the serial implementation.
package main
import(
"fmt"
"time"
"sync"
)
type numWithMinFactor struct {
number int
minfactor int
}
func pow(base int, power int) int{
result := 1
for i:=0;i<power;i++{
result*=base
}
return result
}
func process(check numWithMinFactor,primes []int,top int,minFactors []numWithMinFactor, wg *sync.WaitGroup){
defer wg.Done()
var n int
for i:=0;primes[i]<=check.minfactor;i++{
n = check.number*primes[i]
if n>top{
break;
}
minFactors[n] = numWithMinFactor{n,primes[i]}
if i+1 == len(primes){
break;
}
}
}
func findPrimes(top int) []int{
primes := []int{}
minFactors := make([]numWithMinFactor,top+2)
check := 2
nlogicalProcessors := 20
var wg sync.WaitGroup
var twoPow int
for power:=1;check <= top;power++{
twoPow = pow(2,power)
for check <= twoPow{
for nLogicalProcessorsInUse := 0 ; nLogicalProcessorsInUse < nlogicalProcessors; nLogicalProcessorsInUse++{
if minFactors[check].number == 0{
primes = append(primes,check)
minFactors[check] = numWithMinFactor{check,check}
}
wg.Add(1)
go process(minFactors[check],primes,top,minFactors,&wg)
check++
if check>top{
break;
}
if check>twoPow{
break;
}
}
wg.Wait()
if check>top{
break;
}
}
}
return primes
}
func main(){
fmt.Println("Welcome to prime finder!")
start := time.Now()
fmt.Println(findPrimes(10000000))
elapsed := time.Since(start)
fmt.Println("Finding primes took %s", elapsed)
}
tldr; Why is my parallel implementation slower than serial implementation how do I make it faster?
Par #mh-cbon's I made larger jobs for parallel processing resulting in the following code.
package main
import(
"fmt"
"time"
"sync"
)
func pow(base int, power int) int{
result := 1
for i:=0;i<power;i++{
result*=base
}
return result
}
func process(check int,primes []int,top int,minFactors []int){
var n int
for i:=0;primes[i]<=minFactors[check];i++{
n = check*primes[i]
if n>top{
break;
}
minFactors[n] = primes[i]
if i+1 == len(primes){
break;
}
}
}
func processRange(start int,end int,primes []int,top int,minFactors []int, wg *sync.WaitGroup){
defer wg.Done()
for start <= end{
process(start,primes,top,minFactors)
start++
}
}
func findPrimes(top int) []int{
primes := []int{}
minFactors := make([]int,top+2)
check := 2
nlogicalProcessors := 10
var wg sync.WaitGroup
var twoPow int
var start int
var end int
var stepSize int
var stepsTaken int
for power:=1;check <= top;power++{
twoPow = pow(2,power)
stepSize = (twoPow-start)/nlogicalProcessors
stepsTaken = 0
stepSize = (twoPow/2)/nlogicalProcessors
for check <= twoPow{
start = check
end = check+stepSize
if stepSize == 0{
end = twoPow
}
if stepsTaken == nlogicalProcessors-1{
end = twoPow
}
if end>top {
end = top
}
for check<=end {
if minFactors[check] == 0{
primes = append(primes,check)
minFactors[check] = check
}
check++
}
wg.Add(1)
go processRange(start,end,primes,top,minFactors,&wg)
if check>top{
break;
}
if check>twoPow{
break;
}
stepsTaken++
}
wg.Wait()
if check>top{
break;
}
}
return primes
}
func main(){
fmt.Println("Welcome to prime finder!")
start := time.Now()
fmt.Println(findPrimes(1000000))
elapsed := time.Since(start)
fmt.Println("Finding primes took %s", elapsed)
}
This runs at a similar speed to the serial implementation.
So I did eventually get a parallel version of the code to run slightly faster than the serial version. following suggestions from #mh-cbon (See above). However this implementation did not result in vast improvements relative to the serial implementation (50ms to 10 million compared to 75ms serially) Considering that allocating and writing an []int 0:10000000 takes 25ms I'm not disappointed by these results. As #Volker stated "such stuff often is not limited by CPU but by memory bandwidth." which I believe is the case here.
I would still love to see any additional improvements however I am somewhat satisfied with what I've gained here.
Serial code running up to 2 billion 19.4 seconds
Parallel code running up to 2 billion 11.1 seconds
Initializing []int{0:2Billion} 4.5 seconds

Concurrent Bubble sort in golang

Can someone explain to me how the goroutine works in the following code, I wrote it btw.
When I do BubbleSortVanilla, it takes roughly 15s for a list of size 100000
When I do BubbleSortOdd followed by BubbleSortEven using the odd even phase, it takes roughly 7s. But when I just do ConcurrentBubbleSort it only takes roughly 1.4s.
Can't really understand why the single ConcurrentBubbleSort is better?
Is it cause of the overhead in creating the two threads and its also processing the
same or well half the length of the list?
I tried profiling the code but am not really sure how to see how many threads are being created or the memory usage of each thread etc
package main
import (
"fmt"
"math/rand"
"sync"
"time"
)
func BubbleSortVanilla(intList []int) {
for i := 0; i < len(intList)-1; i += 1 {
if intList[i] > intList[i+1] {
intList[i], intList[i+1] = intList[i+1], intList[i]
}
}
}
func BubbleSortOdd(intList []int, wg *sync.WaitGroup, c chan []int) {
for i := 1; i < len(intList)-2; i += 2 {
if intList[i] > intList[i+1] {
intList[i], intList[i+1] = intList[i+1], intList[i]
}
}
wg.Done()
}
func BubbleSortEven(intList []int, wg *sync.WaitGroup, c chan []int) {
for i := 0; i < len(intList)-1; i += 2 {
if intList[i] > intList[i+1] {
intList[i], intList[i+1] = intList[i+1], intList[i]
}
}
wg.Done()
}
func ConcurrentBubbleSort(intList []int, wg *sync.WaitGroup, c chan []int) {
for i := 0; i < len(intList)-1; i += 1 {
if intList[i] > intList[i+1] {
intList[i], intList[i+1] = intList[i+1], intList[i]
}
}
wg.Done()
}
func main() {
// defer profile.Start(profile.MemProfile).Stop()
rand.Seed(time.Now().Unix())
intList := rand.Perm(100000)
fmt.Println("Read a sequence of", len(intList), "elements")
c := make(chan []int, len(intList))
var wg sync.WaitGroup
start := time.Now()
for j := 0; j < len(intList)-1; j++ {
// BubbleSortVanilla(intList) // takes roughly 15s
// wg.Add(2)
// go BubbleSortOdd(intList, &wg, c) // takes roughly 7s
// go BubbleSortEven(intList, &wg, c)
wg.Add(1)
go ConcurrentBubbleSort(intList, &wg, c) // takes roughly 1.4s
}
wg.Wait()
elapsed := time.Since(start)
// Print the sorted integers
fmt.Println("Sorted List: ", len(intList), "in", elapsed)
}
Your code is not working at all. ConcurrentBubbleSort and BubbleSortOdd + BubbleSortEven will cause the data race. Try to run your code with go run -race main.go. Because of data race, data of array will be incorrect after sort, and they are not sorted neither.
Why it is slow? I guess it is because of data race, and there are too many go routines which are causing the data race.
The Thread Analyzer detects data-races that occur during the execution
of a multi-threaded process. A data race occurs when:
two or more threads in a single process access the same memory
location concurrently, and
at least one of the accesses is for writing, and
the threads are not using any exclusive locks to control their
accesses to that memory.

Unable to use goroutines concurrently to find max until context is cancelled

I have successfully made a synchronous solution without goroutines to findMax of compute calls.
package main
import (
"context"
"fmt"
"math/rand"
"time"
)
func findMax(ctx context.Context, concurrency int) uint64 {
var (
max uint64 = 0
num uint64 = 0
)
for i := 0; i < concurrency; i++ {
num = compute()
if num > max {
max = num
}
}
return max
}
func compute() uint64 {
// NOTE: This is a MOCK implementation of the blocking operation.
time.Sleep(time.Duration(rand.Int63n(100)) * time.Millisecond)
return rand.Uint64()
}
func main() {
maxDuration := 2 * time.Second
concurrency := 10
ctx, cancel := context.WithTimeout(context.Background(), maxDuration)
defer cancel()
max := findMax(ctx, concurrency)
fmt.Println(max)
}
https://play.golang.org/p/lYXRNTDtNCI
When I attempt to use goroutines to use findMax to repeatedly call compute function using as many goroutines until context ctx is canceled by the caller main function. I am getting 0 every time and not the expected max of the grouting compute function calls. I have tried different ways to do it and get deadlock most of the time.
package main
import (
"context"
"fmt"
"math/rand"
"time"
)
func findMax(ctx context.Context, concurrency int) uint64 {
var (
max uint64 = 0
num uint64 = 0
)
for i := 0; i < concurrency; i++ {
select {
case <- ctx.Done():
return max
default:
go func() {
num = compute()
if num > max {
max = num
}
}()
}
}
return max
}
func compute() uint64 {
// NOTE: This is a MOCK implementation of the blocking operation.
time.Sleep(time.Duration(rand.Int63n(100)) * time.Millisecond)
return rand.Uint64()
}
func main() {
maxDuration := 2 * time.Second
concurrency := 10
ctx, cancel := context.WithTimeout(context.Background(), maxDuration)
defer cancel()
max := findMax(ctx, concurrency)
fmt.Println(max)
}
https://play.golang.org/p/3fFFq2xlXAE
Your program has multiple problems:
You are spawning multiple goroutines that are operating on shared variables i.e., max and num leading to data race as they are not protected (eg. by Mutex).
Here num is modified by every worker goroutine but it should have been local to the worker otherwise the computed data could be lost (eg. one worker goroutine computed a result and stored it in num, but right after that a second worker computes and replaces the value of num).
num = compute // Should be "num := compute"
You are not waiting for every goroutine to finish it's computation and it may result in incorrect results as every workers computation wasn't taken into account even though context wasn't cancelled. Use sync.WaitGroup or channels to fix this.
Here's a sample program that addresses most of the issues in your code:
package main
import (
"context"
"fmt"
"math/rand"
"sync"
"time"
)
type result struct {
sync.RWMutex
max uint64
}
func findMax(ctx context.Context, workers int) uint64 {
var (
res = result{}
wg = sync.WaitGroup{}
)
for i := 0; i < workers; i++ {
select {
case <-ctx.Done():
// RLock to read res.max
res.RLock()
ret := res.max
res.RUnlock()
return ret
default:
wg.Add(1)
go func() {
defer wg.Done()
num := compute()
// Lock so that read from res.max and write
// to res.max is safe. Else, data race could
// occur.
res.Lock()
if num > res.max {
res.max = num
}
res.Unlock()
}()
}
}
// Wait for all the goroutine to finish work i.e., all
// workers are done computing and updating the max.
wg.Wait()
return res.max
}
func compute() uint64 {
rnd := rand.Int63n(100)
time.Sleep(time.Duration(rnd) * time.Millisecond)
return rand.Uint64()
}
func main() {
maxDuration := 2 * time.Second
concurrency := 10
ctx, cancel := context.WithTimeout(context.Background(), maxDuration)
defer cancel()
fmt.Println(findMax(ctx, concurrency))
}
As #Brits pointed out in the comments that when context is cancelled make sure that you stop those worker goroutines to stop processing (if possible) because it is not needed anymore.

Sleep by fraction of integer

I am trying to write a program that pauses for random intervals of time that are decimal numbers.
Here is the program that is not working:
package main
import (
"fmt"
"math/rand"
"time"
)
var test int
func main() {
intervalGenerate()
}
func intervalGenerate() {
var randint float64
rand.Seed(time.Now().UnixNano())
randInterval := randFloats(3, 7, 1)
randint = randInterval[0]
duration := time.Duration(randint) * time.Second
fmt.Println("Sleeping for", duration)
time.Sleep(duration)
fmt.Println("Resuming")
}
func randFloats(min, max float64, n int) []float64 {
res := make([]float64, n)
for i := range res {
res[i] = min + rand.Float64()*(max-min)
}
return res
}
Currently a random number is generated between 3 and 7 (including decimals) but Sleep rounds to the nearest round number.
From what I understand the reason this is failing is because Sleep takes Duration, which is an Int64:
func Sleep(d Duration)
Is there a way to sleep a program for fractions of a second?
Go Playground:
https://play.golang.org/p/z-dnDBnUfxr
Use time.Millisecond, time.Microsecond, or time.Nanosecond depending on the level of granularity you need.
// sleep for 2.5 seconds
milliseconds := 2500
time.Sleep(time.Duration(milliseconds) * time.Millisecond)

Access with mutex has more speed on more cpu

I tried some synchronization techniques to share state between goroutines and find out that incorrect variant (without sync) works slowly than same program with mutex.
Given the code:
package main
import (
"sync"
"time"
"fmt"
)
func main() {
var wg sync.WaitGroup
var mu sync.Mutex
hash := make(map[string]string)
hash["test"] = "string"
num := 40000000
wg.Add(num)
start := time.Now()
for i := 0; i < num; i++ {
go func() {
mu.Lock()
_, _ = hash["test"]
mu.Unlock()
wg.Done()
}()
}
wg.Wait()
fmt.Println(time.Since(start))
}
Perform on my laptop with 8 HT cores for 9-10 seconds.
But if just remove sync it works for 11-12 seconds:
package main
import (
"sync"
"time"
"fmt"
)
func main() {
var wg sync.WaitGroup
hash := make(map[string]string)
hash["test"] = "string"
num := 40000000
wg.Add(num)
start := time.Now()
for i := 0; i < num; i++ {
go func() {
_, _ = hash["test"]
wg.Done()
}()
}
wg.Wait()
fmt.Println(time.Since(start))
}
A synced version is faster and unitizes CPU much higher. Question is why?
My thought is about how goroutines scheduled and overhead for context switching because of the more GOMAXPROCS the greater the gap between these two versions. But I can't explain the real reason for that that happens under the hood of the scheduler.

Resources