I started playing around with Go by going through the Tour of Go on the official site.
I only have basic experience with programming but on coming up to the channels page I started to play around to try get my head around it and I've ended up quite confused.
This is what I have the code as:
package main
import "fmt"
func sum(s []int, c chan int) {
sum := 0
s[0] = 8
s = append(s, 20)
for _, v := range s {
sum += v
c <- sum // send sum to c
func main() {
s := []int{7, 2, 8, -9, 4, 0}
c := make(chan int)
go sum(s[:len(s)/2], c)
go sum(s[len(s)/2:], c)
x, y := <-c, <-c // receive from c
fmt.Println(x, y, x+y)
And this is the result I get:
[8 2 8 20 4 0]
[8 2 8 20]
[8 4 0 20]
26 32 58
[8 2 8 8 4 0]
I get that in creating a slice you get an underlying array with needed number underneath and passes a slice to a function and modifying an element modifying the underlying array but I'm not sure in what sort of order the goroutines are going.
It prints out the first S[0] as 7 even though the function call before it should have modified it to 8, so I'm assuming that the goroutines hasn't run yet?
Then printing the entire array after the second goroutine function call prints it with all the modifications made during the first function call (modifying the first item to 8 and appending 20).
Yet the next 2 line print outs are the print outs of the slice segments from within the functions, which logically by the fact that I just said I saw the modifications done by the function call means they should have printed out BEFORE that line.
Then I'm not sure how it got the calculations or the final printout of the array.
Can someone familiar with how Go operates explain what the logical progression of this code was?

It prints out the first S[0] as 7 even though the function call before it should have modified it to 8, so I'm assuming that the goroutines hasn't run yet?
Yes, this interpretation is correct. (But should not be made, see next section).
Then printing the entire array after the second goroutine function call prints it with all the modifications made during the first function call (modifying the first item to 8 and appending 20).
NO. Never ever think along these lines.
The state of your application at this point is not well defined. 1) You started goroutines which modify the state and you have absolutely no synchronisation between the three goroutines (the main one and the two you started manually). 2) Your code is racy which means it is a) wrong, racy programs are never correct and b) results in undefined behaviour.
Yet the next 2 line print outs are the print outs of the slice segments from within the functions, which logically by the fact that I just said I saw the modifications done by the function call means they should have printed out BEFORE that line.
More or less, if your program would not be racy (see above).
Where is your code racy: the main.s is a slice with a backing array and you subslice it twice when starting the goroutines with go sum(...). Both subslices share the same backing array. Now inside the goroutines you write ( s[0]=8) and append (append(s, 20)) to this subslice. From both goroutines without synchronisation. The first goroutine will fulfill this append using the large enough backing array which is concurrently used by the second goroutine.
This results in concurrent write without synchronisation to the fourth element of main.s which is racy and thus undefined.
With reading twice from the channel (<-c, <-c) you introduce synchronisation between the two manually started goroutines and the main goroutine and your program returns to serial execution (but its state is still not welldefined as it is racy).
The Println statements you use to track what is going on are problematic too: These statements print stuff which is in a not well defined state because it is accessed concurrently from different goroutines; both may print arbitrary data.
Summing up: You program is racy, its output is undefined, there is no "reason" or "explanation" why it prints what it prints, its pure chance.
Always try your code under the race detector (-race) and do not try to make up pseudo-explanations for racy code.
How to fix:
The divide and conquer approach is nice and okay. What is dangerous and thus easy to get wrong (as it happened to you) is modifying the subslice inside the goroutine. While s[0]=9] is harmless in your case the append interferes badly with the shared backing array.
Do not modify s inside the goroutine. If you must: Make sure you have a new backing array.
(Do not try to mix slices and concurrency. Both have some subtleties, slices the shared backing array and concurrency the problem of undefined racy behaviour. Especially if you are new to all this. Learn one, then the other, then combine both.)


What happens to pointer to element when Go moves slice to another place in memory?

I have the following code
package main
import "fmt"
func main() {
a := []int{1}
b := &a[0]
fmt.Println(a, &a[0], b, *b) // prints [1] 0xc00001c030 0xc00001c030 1
a = append(a, 1, 2, 3)
fmt.Println(a, &a[0], b, *b) // prints [1 1 2 3] 0xc000100020 0xc00001c030 1
First it creates a slice of 1 int. Its len is 1 and cap is also 1. Then I take a pointer to its first element and get the underlying pointer value in print. It works fine, as expected.
Than I add 3 more elements to the slice making go expand the capacity of the slice, thus copying it to another place in memory. After that I print the address (by taking a pointer) of the slice's first element which is now different from address stored in b.
However, when I then print the underlying value of b it also works fine. I don't understand why does it work. As far as I know the slice to which first element b points to was copied to another place in memory, so its previous memory must have been released. However, it seems to still be there.
If we look on maps, golang doesn't even allow us to create pointers on element by key because of the exact same problem - underlying data can be moved to another place in memory. However, it works perfectly fine with slices. Why is it so? How does it really works? Is the memory not being freed because there still is a variable which points to this memory? How is it different from maps?
What happens to pointer to element when Go moves slice to another place in memory?
[W]hen I then print the underlying value of b it also works fine. I don't understand why does it work.
Why wouldn't it work?
The memory location originally pointed to is still there, unaltered. And as long as anything (such as b) still references it, it will remain usable. Once all references to that memory are removed (i.e. go out of scope), then the garbage collector may allow it to be used by something else.
What happens to pointer to element when Go moves slice to another
place in memory?
I believe the current GC implementation do not move such objects at all, although the specifications allow for this to happen. Unless you use the "unsafe" package, you are unlikely to encounter any problems even if it did move the underlying data structure.

for loop value semantic in golang

First question about Go in SO. The code below shows, n has the same address in each iteration. I am aware that such a for loop is called value semantic by some people and what's actually ranged over is a copy of the slice not the actual slice itself. Why does n in each iteration has the same address? Is it because each element in the slice is copied rather than the whole slice is copied once beforehand. If only each element from the original slice is copied, then a single memory address can be reused in each iteration?
package main
import (
func main() {
numbers := []int{1, 2}
for i, n := range numbers {
fmt.Println(&n, &numbers[i])
A sample result from go playground:
0xc000122030 0xc000122020
0xc000122030 0xc000122028
You are slightly wrong in your question, it is not a copy of the slice that is being iterated over. In Go when you pass a slice you really pass a pointer to memory and the size and capacity of that memory, this is called a slice header. The header is copied, but the copy points to the same underlying memory, meaning that when you pass a []int to a function, change the values in that function, the values will be changed in the original []int in the outer code as well.
This is in contrast to an array like [5]int which is passed by value, meaninig this would really be copied when you pass it around. In Go structs, strings, numbers and arrays are passed by value. Slices are really also passed by value but as described above, the value in this case contains a pointer to memory. Passing a copy of a pointer still lets you change the memory pointed to.
Now to your experiment:
for i, n := range numbers
will create two variables before the loop starts: integers i and n. In each loop iteration i will be incremented by 1 and n will be assigned the value (a copy of the integer value that is) of numbers[i].
This means there really are only two variables i and n. They are the same which is what you see in your output.
The addresses of numbers[i] are different of course, they are the memory addresses of the items in the array.
The Go Wiki has a Common Mistakes page talking about this exact issue. It also provides an explanation of how to avoid this issue in real code. The quick answer is that this is done for efficiency, and has little to do with the slice. n is a single variable / memory location that gets assigned a new value on each iteration.
If you want additional insight into why this happens under the hood, take a look at this post.

How one can deterime the order that the order of receiving from channels?

Consider the following example from the tour of go.
How one can determine the order of reception from the channels?
why x always get the first output from the gorouting?
It sounds reasonable but i didn't find any documentation about it.
I tried to add some sleep and still x get the input from the first executed gorouting.
c := make(chan int)
go sumSleep(s[:len(s)/2], c)
go sum(s[len(s)/2:], c)
x, y := <-c, <-c // receive from c
fmt.Println(x, y, x+y)
The sleep is before the sending to the channel.
Messages are always received in the order they are sent. That is deterministic.
However, the order of execution of any given operation across concurrent Goroutines is not deterministic. So if you have two goroutines concurrently sending on a channel, you can't know which will send first and which will send second. Same if you have two goroutines receiving on the same channel.
I tried to add some sleep and still x get the input from the first executed goroutine
In addition to what #Adrian wrote, in your code x will always get result from first recieve on c, because of language rules for tuple assignment
The assignment proceeds in two phases. First, the operands of index expressions and pointer indirections (including implicit pointer indirections in selectors) on the left and the expressions on the right are all evaluated in the usual order. Second, the assignments are carried out in left-to-right order.
To add a bit to Adrian's answer: we don't know in what order the two goroutines might run. If your sleep comes before your channel-send, and sleeps "long enough",1 that will guarantee that the other goroutine can run to the point of doing its send. If both goroutines run "at the same time" and neither waits (as in the original Tour example), we cannot be sure which one will actually reach its c <- sum line first.
Running the Tour's example on the Go Playground (either directly, or through the Tour website), I actually get:
-5 17 12
in the output window, which (because we know that the -9 is in the 2nd half of the slice) tells us that the second goroutine "got there" (to the channel-send) first. In some sense, that's just luckā€”but when using the Go Playground, all jobs are run in a fairly deterministic environment, with a single CPU and with cooperative scheduling, so that the results are more predictable. In other words, if the second goroutine got there first on one run, it probably will on the next. If the playground used multiple CPUs and/or a less-deterministic environment, the results might change from one run to the next, but there is no guarantee of that.
In any case, assuming your code does what you say (and I believe it does), this:
go sumSleep(s[:len(s)/2], c)
go sum(s[len(s)/2:], c)
has the first sender wait and the second sender run first. But that's what we already observed actually happened when we let the two routines race. To see a change, we'd need to make the second sender delay.
I made a modified version of the example in the Go Playground here that prints more annotations. With the delay inserted in the 2nd half sum, we see the first-half sum as x:
2nd half: sleeping for 1s
1st half: sleeping for 0s
1st half: sending 17
2nd half: sending -5
17 -5 12
as we can expect since one second is "long enough".
1How long is "long enough"? Well, that depends: how fast are our computers? How much other stuff are they doing? If the computer is fast enough, a delay of a few milliseconds, or even a few nanoseconds, may be enough. If our computer is a really old one or very busy with other higher priority tasks, a few milliseconds might not be enough time. If the problem is sufficiently big, one second might not be enough time. It's often unwise to choose some particular amount of time, if you can control this better by some sort of synchronization operation, and usually you can. For instance, using a sync.WaitGroup variable allows you to wait for n goroutines (for some runtime value of n) to call a Done function before your own goroutine proceeds.
Playground code, copied to StackOverflow for convenience
package main
import (
func sum(s []int, c chan int, printme string, delay time.Duration) {
sum := 0
for _, v := range s {
sum += v
fmt.Printf("%s: sleeping for %v\n", printme, delay)
fmt.Printf("%s: sending %d\n", printme, sum)
c <- sum
func main() {
s := []int{7, 2, 8, -9, 4, 0}
c := make(chan int)
go sum(s[:len(s)/2], c, "1st half", 0*time.Second)
go sum(s[len(s)/2:], c, "2nd half", 1*time.Second)
x, y := <-c, <-c
fmt.Println(x, y, x+y)

Why adding items to map during its iteration produce inconsistent result?

From Go Spec:
If map entries are created during iteration, that entry may be produced during the iteration or may be skipped.
So what I expect from that statement is that the following code should at least print number 1, and how many more numbers which are going to be printed is not predictable and is different each time you run the program:
package main
import (
func main() {
test := make(map[int]int)
test[1] = 1
j := 2
for i, v := range test {
fmt.Println(i, v)
test[j] = j
Go playground link
On my own laptop (Go version 1.8) at maximum it prints till 8, in playground (still version 1.8) it prints exactly till 3!
I don't care much about the result from playground since its go is not vanilla but I wonder why on my local it never prints more than 8? even I tried to add more items in each iteration to make the possibility of going over 8 higher but there's no difference.
EDIT: my own explanation based on #Schwern 's answer
when the map is created with make function and without any size parameter only 1 bucket is assigned and in go each bucket has a size of 8 elements, so when the range starts it sees that the map has only 1 bucket and it will iterate at maximum 8 times. If I use a size parameter bigger than 7 like make(map[int]int, 8) two buckets is created and there would be possibility that I get more than 8 iterations over the added items.
This is an issue inherent in the design of most hash tables. Here's a simple explanation hand waving a lot of unnecessary detail.
Under the hood, a hash table is an array. Each key is mapped onto an element in the array using a hash function. For example, "foo" might map to element 8, "bar" might map to element 4, and so on. Some elements are empty.
for k,v := range hash iterates through this array in whatever order they happen to appear. The ordering is unpredictable to avoid a collision attack.
When you add to a hash, it adds to the underlying array. It might even have to allocate a new, larger array. It's unpredictable where that new key will land in the hash's array.
So if you add more pairs while you're iterating through the hash, any pair that gets put into the array before the current index won't be seen; the iteration has already past that point. Anything that gets put after might be seen; the iteration has yet to reach that point, but the array might get reallocated and the pairs possibly rehashed.
but I wonder why on my local it never prints more than 8
Because the underlying array is probably of length 8. Go likely allocates the underlying array in powers of 2 and probably starts at 8. The range hash probably starts by checking the length of the underlying array and will not go further, even if it's grown.
Long story short: don't add keys to a hash while iterating through it.

Speedup problems with go

I wrote a very simple program in go to test performances of a parallel program. I wrote a very simple program that factorizes a big semiprime number by division trials. Since no communications are involved, I expected an almost perfect speedup. However, the program seems to scale very badly.
I timed the program with 1, 2, 4, and 8 processes, running on a 8 (real, not HT) cores computer, using the system timecommand. The number I factorized is "28808539627864609". Here are my results:
cores time (sec) speedup
1 60.0153 1
2 47.358 1.27
4 34.459 1.75
8 28.686 2.10
How to explain such bad speedups? Is it a bug in my program, or is it a problem with go runtime? How could I get better performances? I'm not talking about the algorithm by itself (I know there are better algorithms to factorize semiprime numbers), but about the way I parallelized it.
Here is the source code of my program:
package main
import (
func factorize(n *big.Int, start int, step int, c chan *big.Int) {
var m big.Int
i := big.NewInt(int64(start))
s := big.NewInt(int64(step))
z := big.NewInt(0)
for {
m.Mod(n, i)
if m.Cmp(z) == 0{
c <- i
i.Add(i, s)
func main() {
var np *int = flag.Int("n", 1, "Number of processes")
var n big.Int
n.SetString(flag.Arg(0), 10) // Uses number given on command line
c := make(chan *big.Int)
for i:=0; i<*np; i++ {
go factorize(&n, 2+i, *np, c)
Problem really seems to be related to Mod function. Replacing it by Rem gives better but still imperfect performances and speedups. Replacing it by QuoRem gives 3 times faster performances, and perfect speedup. Conclusion: it seems memory allocation kills parallel performances in Go. Why? Do you have any references about this?
Big.Int methods generally have to allocate memory, usually to hold the result of the computation. The problem is that there is just one heap and all memory operations are serialized. In this program, the numbers are fairly small and the (parallelizable) computation time needed for things like Mod and Add is small compared to the non-parallelizable operations of repeatedly allocating all the tiny little bits of memory.
As far as speeding it up, there is the obvious answer of don't use big.Ints if you don't have to. Your example number happens to fit in 64 bits. If you plan on working with really big big numbers though, the problem will kind of go away on its own. You will spend much more time doing computations, and the time spent in the heap will be relatively much less.
There is a bug in your program, by the way, although it's not related to performance. When you find a factor and return the result on the channel, you send a pointer to the local variable i. This is fine, except that you don't break out of the loop then. The loop in the goroutine continues incrementing i and by the time the main goroutine gets around to fishing the pointer out of the channel and following it, the value is almost certain to be wrong.
After sending i through the channel, i should be replaced with a newly allocated big.Int:
if m.Cmp(z) == 0 {
c <- i
i = new(big.Int).Set(i)
This is necessary because there is no guarantee when fmt.Println will process the integer received on line fmt.Println(<-c). It isn't usual for fmt.Println to cause goroutine switching, so if i isn't replaced with a newly allocated big.Int and the run-time switches back to executing the for-loop in function factorize then the for-loop will overwrite i before it is printed - in which case the program won't print out the 1st integer sent through the channel.
The fact that fmt.Println can cause goroutine switching means that the for-loop in function factorize may potentially consume a lot of CPU time between the moment when the main goroutine receives from channel c and the moment when the main goroutine terminates. Something like this:
run factorize()
<-c in main()
call fmt.Println()
continue running factorize() // Unnecessary CPU time consumed
return from fmt.Println()
return from main() and terminate program
Another reason for the small multi-core speedup is memory allocation. The function (*Int).Mod is internally using (*Int).QuoRem and will create a new big.Int each time it is called. To avoid the memory allocation, use QuoRem directly:
func factorize(n *big.Int, start int, step int, c chan *big.Int) {
var q, r big.Int
i := big.NewInt(int64(start))
s := big.NewInt(int64(step))
z := big.NewInt(0)
for {
q.QuoRem(n, i, &r)
if r.Cmp(z) == 0 {
c <- i
i = new(big.Int).Set(i)
i.Add(i, s)
Unfortunately, the goroutine scheduler in Go release r60.3 contains a bug which prevents this code to use all CPU cores. When the program is started with -n=2 (GOMAXPROCS=2), the run-time will utilize only 1 thread.
Go weekly release has a better run-time and can utilize 2 threads if n=2 is passed to the program. This gives a speedup of approximately 1.9 on my machine.
Another potential contributing factor to multi-core slowdown has been mentioned in the answer by user "High Performance Mark". If the program is splitting the work into multiple sub-tasks and the result comes only from 1 sub-task, it means that the other sub-tasks may do some "extra work". Running the program with n>=2 may in total consume more CPU time than running the program with n=1.
The learn how much extra work is being done, you may want to (somehow) print out values of all i's in all goroutines at the moment the program is exiting the function main().
I don't read go so this is probably the answer to a question which is not what you asked. If so, downvote or delete as you wish.
If you were to make a plot of 'time to factorise integer n' against 'n' you would get a plot that goes up and down somewhat randomly. For any n you choose there will be an integer in the range 1..n that takes longest to factorise on one processor. If your parallelisation strategy is to distribute the n integers across p processors one of those processors will take at least the time to factorise the hardest integer, then the time to factorise the rest of its load.
Perhaps you've done something similar ?
