It's concurrent, but what makes it run in parallel?

It's concurrent, but what makes it run in parallel? - go

So I'm trying to understand how parallel computing works while also learning Go. I understand the difference between concurrency and parallelism, however, what I'm a little stuck on is how Go (or the OS) determines that something should be executed in parallel...
Is there something I have to do when writing my code, or is it all handled by the schedulers?
In the example below, I have two functions that are run in separate Go routines using the go keyword. Because the default GOMAXPROCS is the number of processors available on your machine (and I'm also explicitly setting it) I would expect that these two functions run at the same time and thus the output would be a mix of number in particular order - And furthermore that each time it is run the output would be different. However, this is not the case. Instead, they are running one after the other and to make matters more confusing function two is running before function one.
Code:
func main() {
runtime.GOMAXPROCS(6)
var wg sync.WaitGroup
wg.Add(2)
fmt.Println("Starting")
go func() {
defer wg.Done()
for smallNum := 0; smallNum < 20; smallNum++ {
fmt.Printf("%v ", smallNum)
}
}()
go func() {
defer wg.Done()
for bigNum := 100; bigNum > 80; bigNum-- {
fmt.Printf("%v ", bigNum)
}
}()
fmt.Println("Waiting to finish")
wg.Wait()
fmt.Println("\nFinished, Now terminating")
}
Output:
go run main.go
Starting
Waiting to finish
100 99 98 97 96 95 94 93 92 91 90 89 88 87 86 85 84 83 82 81 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
Finished, Now terminating
I am following this article, although just about every example I've come across does something similar.
Concurrency, Goroutines and GOMAXPROCS
Is this working the way is should and I'm not understanding something correctly, or is my code not right?

Is there something I have to do when writing my code,
No.
or is it all handled by the schedulers?
Yes.
In the example below, I have two functions that are run in separate Go routines using the go keyword. Because the default GOMAXPROCS is the number of processors available on your machine (and I'm also explicitly setting it) I would expect that these two functions run at the same time
They might or might not, you have no control here.
and thus the output would be a mix of number in particular order - And furthermore that each time it is run the output would be different. However, this is not the case. Instead, they are running one after the other and to make matters more confusing function two is running before function one.
Yes. Again you cannot force parallel computation.
Your test is flawed: You just don't do much in each goroutine. In your example goroutine 2 might be scheduled to run, starts running and completes before goroutine 1 started running. "Starting" a goroutine with go doesn't force it to start executing right away, all there is done is creating a new goroutine which can run. From all goroutines which can run some are scheduled onto your processors. All this scheduling cannot be controlled, it is fully automatic. As you seem to know this is the difference between concurrent and parallel. You have control over concurrency in Go but not (much) on what is done actually in parallel on two or more cores.
More realistic examples with actual, long-running goroutines which do actual work will show interleaved output.

It's all handled by the scheduler.
With only two loops of 20 short instructions, you will be hard pressed to see the effects of concurrency or parallelism.
Here is another toy example : https://play.golang.org/p/xPKITzKACZp
package main
import (
"fmt"
"runtime"
"sync"
"sync/atomic"
"time"
)
const (
ConstMaxProcs = 2
ConstRunners = 4
ConstLoopcount = 1_000_000
)
func runner(id int, wg *sync.WaitGroup, cptr *int64) {
var times int
for i := 0; i < ConstLoopcount; i++ {
val := atomic.AddInt64(cptr, 1)
if val > 1 {
times++
}
atomic.AddInt64(cptr, -1)
}
fmt.Printf("[runner %d] cptr was > 1 on %d occasions\n", id, times)
wg.Done()
}
func main() {
runtime.GOMAXPROCS(ConstMaxProcs)
var cptr int64
wg := &sync.WaitGroup{}
wg.Add(ConstRunners)
start := time.Now()
for id := 1; id <= ConstRunners; id++ {
go runner(id, wg, &cptr)
}
wg.Wait()
fmt.Printf("completed in %s\n", time.Now().Sub(start))
}
As with your example : you don't have control on the scheduler, this example just has more "surface" to witness some effects of concurrency.
It's hard to witness the actual difference between concurrency and parallelism from within the program, you can view your processor's activity while it runs, or check the global execution time.
The playground does not give sub-second precision on its clock, if you want to see the actual timing, copy/paste the code in a local file and tune the constants to see various effects.
Note that some other effects (probably : branch prediction on the if val > 1 {...} check and/or memory invalidation around the shared cptr variable) make the execution very volatile on my machine, so don't expect a straight "running with ConstMaxProcs = 4 is 4 times quicker than ConstMaxProcs = 1".

Related

What happens when reading or writing concurrently without a mutex

In Go, a sync.Mutex or chan is used to prevent concurrent access of shared objects. However, in some cases I am just interested in the "latest" value of a variable or field of an object.
Or I like to write a value and do not care if another go-routine overwrites it later or has just overwritten it before.
Update: TLDR; Just don't do this. It is not safe. Read the answers, comments, and linked documents!
Update 2021: The Go memory model is going to be specified more thoroughly and there are three great articles by Russ Cox that will teach you more about the surprising effects of unsynchronized memory access. These articles summarize a lot of the below discussions and learnings.
Here are two variants good and bad of an example program, where both seem to produce "correct" output using the current Go runtime:
package main
import (
"flag"
"fmt"
"math/rand"
"time"
)
var bogus = flag.Bool("bogus", false, "use bogus code")
func pause() {
time.Sleep(time.Duration(rand.Uint32()%100) * time.Millisecond)
}
func bad() {
stop := time.After(100 * time.Millisecond)
var name string
// start some producers doing concurrent writes (DANGER!)
for i := 0; i < 10; i++ {
go func(i int) {
pause()
name = fmt.Sprintf("name = %d", i)
}(i)
}
// start consumer that shows the current value every 10ms
go func() {
tick := time.Tick(10 * time.Millisecond)
for {
select {
case <-stop:
return
case <-tick:
fmt.Println("read:", name)
}
}
}()
<-stop
}
func good() {
stop := time.After(100 * time.Millisecond)
names := make(chan string, 10)
// start some producers concurrently writing to a channel (GOOD!)
for i := 0; i < 10; i++ {
go func(i int) {
pause()
names <- fmt.Sprintf("name = %d", i)
}(i)
}
// start consumer that shows the current value every 10ms
go func() {
tick := time.Tick(10 * time.Millisecond)
var name string
for {
select {
case name = <-names:
case <-stop:
return
case <-tick:
fmt.Println("read:", name)
}
}
}()
<-stop
}
func main() {
flag.Parse()
if *bogus {
bad()
} else {
good()
}
}
The expected output is as follows:
...
read: name = 3
read: name = 3
read: name = 5
read: name = 4
...
Any combination of read: and read: name=[0-9] is correct output for this program. Receiving any other string as output would be an error.
When running this program with go run --race bogus.go it is safe.
However, go run --race bogus.go -bogus warns of the concurrent reads and writes.
For map types and when appending to slices I always need a mutex or a similar method of protection to avoid segfaults or unexpected behavior. However, reading and writing literals (atomic values) to variables or field values seems to be safe.
Question: Which Go data types can I safely read and safely write concurrently without a mutext and without producing segfaults and without reading garbage from memory?
Please explain why something is safe or unsafe in Go in your answer.
Update: I rewrote the example to better reflect the original code, where I had the the concurrent writes issue. The important leanings are already in the comments. I will accept an answer that summarizes these learnings with enough detail (esp. on the Go-runtime).

However, in some cases I am just interested in the latest value of a variable or field of an object.
Here is the fundamental problem: What does the word "latest" mean?
Suppoose that, mathematically speaking, we have a sequence of values Xi, with 0 <= i < N. Then obviously Xj is "later than" Xi if j > i. That's a nice simple definition of "latest" and is probably the one you want.
But when two separate CPUs within a single machine—including two goroutines in a Go program—are working at the same time, time itself loses meaning. We cannot say whether i < j, i == j, or i > j. So there is no correct definition for the word latest.
To solve this kind of problem, modern CPU hardware, and Go as a programming language, gives us certain synchronization primitives. If CPUs A and B execute memory fence instructions, or synchronization instructions, or use whatever other hardware provisions exist, the CPUs (and/or some external hardware) will insert whatever is required for the notion of "time" to regain its meaning. That is, if the CPU uses barrier instructions, we can say that a memory load or store that was executed before the barrier is a "before" and a memory load or store that is executed after the barrier is an "after".
(The actual implementation, in some modern hardware, consists of load and store buffers that can rearrange the order in which loads and stores go to memory. The barrier instruction either synchronizes the buffers, or places an actual barrier in them, so that loads and stores cannot move across the barrier. This particular concrete implementation gives an easy way to think about the problem, but isn't complete: you should think of time as simply not existing outside the hardware-provided synchronization, i.e., all loads from, and stores to, some location are happening simultaneously, rather than in some sequential order, except for these barriers.)
In any case, Go's sync package gives you a simple high level access method to these kinds of barriers. Compiled code that executes before a mutex Lock call really does complete before the lock function returns, and the code that executes after the call really does not start until after the lock function returns.
Go's channels provide the same kinds of before/after time guarantees.
Go's sync/atomic package provides much lower level guarantees. In general you should avoid this in favor of the higher level channel or sync.Mutex style guarantees. (Edit to add note: You could use sync/atomic's Pointer operations here, but not with the string type directly, as Go strings are actually implemented as a header containing two separate values: a pointer, and a length. You could solve this with another layer of indirection, by updating a pointer that points to the string object. But before you even consider doing that, you should benchmark the use of the language's preferred methods and verify that these are a problem, because code that works at the sync/atomic level is hard to write and hard to debug.)

Which Go data types can I safely read and safely write concurrently without a mutext and without producing segfaults and without reading garbage from memory?
None.
It really is that simple: You cannot, under no circumstance whatsoever, read and write concurrently to anything in Go.
(Btw: Your "correct" program is not correct, it is racy and even if you get rid of the race condition it would not deterministically produce the output.)

Why can't you use channels
package main
import (
"fmt"
"sync"
)
func main() {
var wg sync.WaitGroup // wait group to close channel
var buffer int = 1 // buffer of the channel
// channel to get the share data
cName := make(chan string, buffer)
for i := 0; i < 10; i++ {
wg.Add(1) // add to wait group
go func(i int) {
cName <- fmt.Sprintf("name = %d", i)
wg.Done() // decrease wait group.
}(i)
}
go func() {
wg.Wait() // wait of wait group to be 0
close(cName) // close the channel
}()
// process all the data
for n := range cName {
println("read:", n)
}
}
The above code returns the following output
read: name = 0
read: name = 5
read: name = 1
read: name = 2
read: name = 3
read: name = 4
read: name = 7
read: name = 6
read: name = 8
read: name = 9
https://play.golang.org/p/R4n9ssPMOeS
Article about channels

Run a benchmark in parallel, i.e. simulate simultaneous requests

When testing a database procedure invoked from an API, when it runs sequentially, it seems to run consistently within ~3s. However we've noticed that when several requests come in at the same time, this can take much longer, causing time outs. I am trying to reproduce the "several requests at one time" case as a go test.
I tried the -parallel 10 go test flag, but the timings were the same at ~28s.
Is there something wrong with my benchmark function?
func Benchmark_RealCreate(b *testing.B) {
b.ResetTimer()
for n := 0; n < b.N; n++ {
name := randomdata.SillyName()
r := gofight.New()
u := []unit{unit{MefeUnitID: name, MefeCreatorUserID: "user", BzfeCreatorUserID: 55, ClassificationID: 2, UnitName: name, UnitDescriptionDetails: "Up on the hills and testing"}}
uJSON, _ := json.Marshal(u)
r.POST("/create").
SetBody(string(uJSON)).
Run(h.BasicEngine(), func(r gofight.HTTPResponse, rq gofight.HTTPRequest) {
assert.Contains(b, r.Body.String(), name)
assert.Equal(b, http.StatusOK, r.Code)
})
}
}
Else how I can achieve what I am after?

The -parallel flag is not for running the same test or benchmark parallel, in multiple instances.
Quoting from Command go: Testing flags:
-parallel n
Allow parallel execution of test functions that call t.Parallel.
The value of this flag is the maximum number of tests to run
simultaneously; by default, it is set to the value of GOMAXPROCS.
Note that -parallel only applies within a single test binary.
The 'go test' command may run tests for different packages
in parallel as well, according to the setting of the -p flag
(see 'go help build').
So basically if your tests allow, you can use -parallel to run multiple distinct testing or benchmark functions parallel, but not the same one in multiple instances.
In general, running multiple benchmark functions parallel defeats the purpose of benchmarking a function, because running it parallel in multiple instances usually distorts the benchmarking.
However, in your case code efficiency is not what you want to measure, you want to measure an external service. So go's built-in testing and benchmarking facilities are not really suitable.
Of course we could still use the convenience of having this "benchmark" run automatically when our other tests and benchmarks run, but you should not force this into the conventional benchmarking framework.
First thing that comes to mind is to use a for loop to launch n goroutines which all attempt to call the testable service. One problem with this is that this only ensures n concurrent goroutines at the start, because as the calls start to complete, there will be less and less concurrency for the remaining ones.
To overcome this and truly test n concurrent calls, you should have a worker pool with n workers, and continously feed jobs to this worker pool, making sure there will be n concurrent service calls at all times. For a worker pool implementation, see Is this an idiomatic worker thread pool in Go?
So all in all, fire up a worker pool with n workers, have a goroutine send jobs to it for an arbitrary time (e.g. for 30 seconds or 1 minute), and measure (count) the completed jobs. The benchmark result will be a simple division.
Also note that for solely testing purposes, a worker pool might not even be needed. You can just use a loop to launch n goroutines, but make sure each started goroutine keeps calling the service and not return after a single call.

I'm new to go, but why don't you try to make a function and run it using the standard parallel test?
func Benchmark_YourFunc(b *testing.B) {
b.RunParralel(func(pb *testing.PB) {
for pb.Next() {
YourFunc(staff ...T)
}
})
}

Your example code mixes several things. Why are you using assert there? This is not a test it is a benchmark. If the assert methods are slow, your benchmark will be.
You also moved the parallel execution out of your code into the test command. You should try to make a parallel request by using concurrency. Here just a possibility how to start:
func executeRoutines(routines int) {
wg := &sync.WaitGroup{}
wg.Add(routines)
starter := make(chan struct{})
for i := 0; i < routines; i++ {
go func() {
<-starter
// your request here
wg.Done()
}()
}
close(starter)
wg.Wait()
}
https://play.golang.org/p/ZFjUodniDHr
We start some goroutines here, which are waiting until starter is closed. So you can set your request direct after that line. That the function waits until all the requests are done we are using a WaitGroup.
BUT IMPORTANT: Go just supports concurrency. So if your system has not 10 cores the 10 goroutines will not run parallel. So ensure that you have enough cores availiable.
With this start you can play a little bit. You could start to call this function inside your benchmark. You could also play around with the numbers of goroutines.

As the documentation indicates, the parallel flag is to allow multiple different tests to be run in parallel. You generally do not want to run benchmarks in parallel because that would run different benchmarks at the same time, throwing off the results for all of them. If you want to benchmark parallel traffic, you need to write parallel traffic generation into your test. You need to decide how this should work with b.N which is your work factor; I would probably use it as the total request count, and write a benchmark or multiple benchmarks testing different concurrent load levels, e.g.:
func Benchmark_RealCreate(b *testing.B) {
concurrencyLevels := []int{5, 10, 20, 50}
for _, clients := range concurrencyLevels {
b.Run(fmt.Sprintf("%d_clients", clients), func(b *testing.B) {
sem := make(chan struct{}, clients)
wg := sync.WaitGroup{}
for n := 0; n < b.N; n++ {
wg.Add(1)
go func() {
name := randomdata.SillyName()
r := gofight.New()
u := []unit{unit{MefeUnitID: name, MefeCreatorUserID: "user", BzfeCreatorUserID: 55, ClassificationID: 2, UnitName: name, UnitDescriptionDetails: "Up on the hills and testing"}}
uJSON, _ := json.Marshal(u)
sem <- struct{}{}
r.POST("/create").
SetBody(string(uJSON)).
Run(h.BasicEngine(), func(r gofight.HTTPResponse, rq gofight.HTTPRequest) {})
<-sem
wg.Done()
}()
}
wg.Wait()
})
}
}
Note here I removed the initial ResetTimer; the timer doesn't start until you benchmark function is called, so calling it as the first op in your function is pointless. It's intended for cases where you have time-consuming setup prior to the benchmark loop that you don't want included in the benchmark results. I've also removed the assertions, because this is a benchmark, not a test; assertions are for validity checking in tests and only serve to throw off timing results in benchmarks.

One thing is benchmarking (measuring time code takes to run) another one is load/stress testing.
The -parallel flag as stated above, is to allow a set of tests to execute in parallel, allowing the test set to execute faster, not to execute some test N times in parallel.
But is simple to achieve what you want (execution of same test N times). Bellow a very simple (really quick and dirty) example just to clarify/demonstrate the important points, that gets this very specific situation done:
You define a test and mark it to be executed in parallel => TestAverage with a call to t.Parallel
You then define another test and use RunParallel to execute the number of instances of the test (TestAverage) you want.
The class to test:
package math
import (
"fmt"
"time"
)
func Average(xs []float64) float64 {
total := float64(0)
for _, x := range xs {
total += x
}
fmt.Printf("Current Unix Time: %v\n", time.Now().Unix())
time.Sleep(10 * time.Second)
fmt.Printf("Current Unix Time: %v\n", time.Now().Unix())
return total / float64(len(xs))
}
The testing funcs:
package math
import "testing"
func TestAverage(t *testing.T) {
t.Parallel()
var v float64
v = Average([]float64{1,2})
if v != 1.5 {
t.Error("Expected 1.5, got ", v)
}
}
func TestTeardownParallel(t *testing.T) {
// This Run will not return until the parallel tests finish.
t.Run("group", func(t *testing.T) {
t.Run("Test1", TestAverage)
t.Run("Test2", TestAverage)
t.Run("Test3", TestAverage)
})
// <tear-down code>
}
Then just do a go test and you should see:
X:\>go test
Current Unix Time: 1556717363
Current Unix Time: 1556717363
Current Unix Time: 1556717363
And 10 secs after that
...
Current Unix Time: 1556717373
Current Unix Time: 1556717373
Current Unix Time: 1556717373
Current Unix Time: 1556717373
Current Unix Time: 1556717383
PASS
ok _/X_/y 20.259s
The two extra lines, in the end are because TestAverage is executed also.
The interesting point here: if you remove t.Parallel() from TestAverage, it will all be execute sequencially:
X:> go test
Current Unix Time: 1556717564
Current Unix Time: 1556717574
Current Unix Time: 1556717574
Current Unix Time: 1556717584
Current Unix Time: 1556717584
Current Unix Time: 1556717594
Current Unix Time: 1556717594
Current Unix Time: 1556717604
PASS
ok _/X_/y 40.270s
This can of course be made more complex and extensible...

does go ++ operator need mutex?

Does go ++ Operator need mutex?
It seems that when not using mutex i am losing some data , but by logic ++ just add +1 value to the current value , so even if the order is incorrect still a total of 1000 run should happen no?
Example:
package main
import (
"fmt"
"sync"
)
func main() {
var wg sync.WaitGroup
i := 0
for r := 0; r < 1000; r++ {
wg.Add(1)
go func() {
i++
fmt.Println(i)
wg.Done()
}()
}
wg.Wait()
fmt.Printf("%d Done", i)
}

To "just add 1 to the current value" the computer needs to read the current value, add 1, and write the new value back. Clearly ordering does matter; the standard example is:
Thread A Thread B
Read: 5
Read: 5
+1 = 6
+1 = 6
Write: 6
Write: 6
The value started at 5, two threads of execution each added one, and the result is 6 (when it should be 7), because B's read occurred before A's write.
But there's a more important misconception at play here: many people think that in the case of a race, the code will either read the old value, or it will read the new value. This is not guaranteed. It might be what happens most of the time. It might be what happens all the time on your computer, with the current version of the compiler, etc. But actually it's possible for code that accesses data in an unsafe/racy manner to produce any result, even complete garbage. There's no guarantee that the value you read from a variable corresponds to any value it has ever had, if you cause a race.

just add +1 value to the current value
No, it's not "just add". It's
Read current value
Compute new value (based on what was read) and write it
See how this can break with multiple concurrent actors?
If you want atomic increments, check out sync/atomic. Examples: https://gobyexample.com/atomic-counters

The variable in Goroutine not changed as expected

The codes are simple as below:
package main
import (
"fmt"
// "sync"
"time"
)
var count = uint64(0)
//var l sync.Mutex
func add() {
for {
// l.Lock()
// fmt.Println("Start ++")
count++
// l.Unlock()
}
}
func main() {
go add()
time.Sleep(1 * time.Second)
fmt.Println("Count =", count)
}
Cases：
Running the code without changing, u will get "Count = 0". Not expected??
Only uncomment line 16 "fmt.Println("Start ++")"; u will get output with lots of "Start ++" and some value with Count like "Count = 11111". Expected??
Only uncomment line 11 "var l sync.Mutex", line 15 "l.Lock()" and line 18 "l.Unlock()" and keep line 16 commented; u will get output like "Count = 111111111". Expected.
So... something wrong with my usage in shared variable...? My question:
Why case 1 had 0 with Count?
If case 1 is expected, why case 2 happened?
Env:
1. go version go1.8 linux/amd64
2. 3.10.0-123.el7.x86_64
3. CentOS Linux release 7.0.1406 (Core)

You have a data race on count. The results are undefined.
package main
import (
"fmt"
// "sync"
"time"
)
var count = uint64(0)
//var l sync.Mutex
func add() {
for {
// l.Lock()
// fmt.Println("Start ++")
count++
// l.Unlock()
}
}
func main() {
go add()
time.Sleep(1 * time.Second)
fmt.Println("Count =", count)
}
Output:
$ go run -race racer.go
==================
WARNING: DATA RACE
Read at 0x0000005995b8 by main goroutine:
runtime.convT2E64()
/home/peter/go/src/runtime/iface.go:255 +0x0
main.main()
/home/peter/gopath/src/so/racer.go:25 +0xb9
Previous write at 0x0000005995b8 by goroutine 6:
main.add()
/home/peter/gopath/src/so/racer.go:17 +0x5c
Goroutine 6 (running) created at:
main.main()
/home/peter/gopath/src/so/racer.go:23 +0x46
==================
Count = 42104672
Found 1 data race(s)
$
References:
Benign data races: what could possibly go wrong?

Without any synchronisation you have no guarantees at all.
There could be multiple reasons, why you see 'Count = 0' in your first case:
You have a multi processor or multi core system and one unit (cpu or core) is happily churning away at the for loop, while the other sleeps for one second and prints the line you are seeing afterwards. It would be completely legal for the compiler to generate machine code, which loads the value into some register and only ever increase that register in the for loop. The memory location can be updated, when the function is done with the variable. In case of an infinite for loop, that ist never. As you, the programmer told the compiler, that there is no contention about that variable, by omitting any synchronisation.
In your mutex version the synchronisation primitives tell the compiler,
that there might be some other thread taking the mutex, so it needs to write back the value from the register to the memory location before unlocking the mutex. At least one can think about it like that. What really happens that the unlock and a later lock operation introduce a happens before relation between the two go routines and this gives the guarantee, that we will see all writes to variables from before the unlock in one thread after the lock operation in the other thread, as described in go memory model locks howsoever this is implemented.
The Go runtime scheduler doesn't run the for loop at all, until the sleep in the main go routine is done. (Isn't likely, but, if I recall correctly, there is not guarantee that this isn't happening.) Sadly there is not much official documentation available about how the scheduler works in go, but it can only schedule a goroutine at certain points, it is not really preemptive. The consequences of this are severe. For example you could make your program run forever in some versions of go, by firing up as many go routines, as you had cores, doing endless for loops only incrementing a variable. There was no core left for the main go routine (which could end the program) and the scheduler can't preempt a go routine in an endless for loop doing only simple stuff, like incrementing a variable. I don't know, if that is changed now.
as others pointed out, that is a data race, google it and read up about it.
The difference between your versions there only line 16 is commented/uncommented is likely only because of run time, as printing to a terminal can be pretty slow.
For a correct program, you need to additionally lock the mutex after your sleep in you main program and before the fmt.Println and unlock it afterwards. But there can't be a deterministic expectation about the output, as the result will vary with machine/os/...

why golang sched runs differently when the variable is odd or even? Is it coincidence or doomed?

I have a sample of golang code as follows(xx.go):
package main
import "runtime"
func main() {
c2 := make(chan int)
go func() {
for v := range c2 {
println("c2 =", v, "numof routines:", runtime.NumGoroutine())
}
}()
for i:=1;i<=10001;i++{
c2 <- i
//runtime.Gosched()
}
}
When the loop count is odd,say 10001,the code will output all the numbers.
when the loop count is even ,say 10000,the code will output all the numbers but the last!
Why is that?
I have tested numbers from as small as 2 to as large as 10000,they all obey the above rules!
the ENV is as follows:
uname -a : Linux hadoopnode25232 2.6.18-308.16.1.el5 #1 SMP Tue Oct 2 22:01:43 EDT 2012 x86_64 x86_64 x86_64 GNU/Linux
go version : go version go1.1 linux/amd64
I think it has something to do with the go sched.
I assemble the code:
go tool 6g -S xx.go > xx.s
the only difference between 10000 and 10001 is :
33c33
< 0030 (xx.go:20) CMPQ AX,$10001
---
> 0030 (xx.go:20) CMPQ AX,$10000
And,last but not least, when I add runtime.Gosched(),everything runs well.

When main returns, the program terminates. It will not wait for any goroutines to finish. So whether all goroutines finish in time or not depends on the scheduler, and by extension on a fair bit of randomness, coincidence and external factors. The difference between 1e4 and 1e4+1 iterations might be one of those factors that affects the scheduling in just that one bit that makes the goroutine finish in time.
If you really require the goroutine to finish before exitting, wait for it to finish, by employing for example a sync.WaitGroup.
Unrelated to the actual issue, your code is overly complex and unidiomatic for what it's doing. You could rewrite the goroutine function as follows:
for v := range c2 {
println("c2 =", v,"numof routines:",runtime.NumGoroutine())
}

You need sleep a second or block a while wait the goroutine is finish!!

Go statements
A "go" statement starts the execution of a function call as an
independent concurrent thread of control, or goroutine, within the
same address space.
GoStmt = "go" Expression .
The function value and parameters are evaluated as usual in the
calling goroutine, but unlike with a regular call, program execution
does not wait for the invoked function to complete.
Once the final send, c2 <- i, is complete the program may terminate, it does not have to wait for the invoked function to complete. You are only guaranteed to complete executing the receive, v, ok := <-c2, c - 1 times, where c is your loop count, 10000 or 10001 in your tests.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio