I have a golang application that goes through pages of a website, and is supposed to download every link on the website. It looks something a little like this (I don't know the number of pages beforehand, so that is done synchronously):
page := 0
results := getPage(page)
c := make(chan *http.Response)
for len(results) > 0 {
for result := range results {
go myProxySwitcher.downloadChan(result.URL, c)
fmt.Println(myProxySwitcher.counter)
}
page++
results = getPage(page)
myProxySwitcher.counter++
}
The twist is, every 10 requests, I want to change the Proxy I use to connect to the website. To do this, I made a struct with a counter member:
type ProxySwitcher struct {
proxies []string
client *http.Client
counter int
}
And then I have incremented the counter each time a request is made from downloadChan.
func (p *ProxySwitcher) downloadChan(url string, c chan *http.Response) {
p.counter++
proxy := p.proxies[int(p.counter/10)%len(p.proxies]
res := p.client.Get(url, proxy)
c <- res
}
When it does the downloads, it doesn't appear the the counter is synchronized between goroutines. How can I sync the value of the counter between goroutines?
The result I get from those printlns are:
1
1
1
1
1
1
2
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
5
5
5
And I am expecting
1
2
3
4
5
...
You have a race condition in your code.
In the first snippet, you're modifying the counter field from the "main" goroutine:
// ...
myProxySwitcher.counter++
In the third snippet, you're also modifying that counter from a different goroutine:
// ...
p.counter++
This is illegal code in Go. By definition, the results are undefined. To understand why, you'll have to go over the Go Memory Model. Hint: it won't likely be an easy read.
To fix it, you need to ensure synchronization. There are many ways to do it.
One way, as suggested in a comment on your question, to do it is to use a mutex. Below is an example, kinda messy, as it would need some refactoring on the main loop. But this is how you would synchronize access to the counter:
type ProxySwitcher struct {
proxies []string
client *http.Client
mu sync.Mutex
counter int
}
func (p *ProxySwitcher) downloadChan(url string, c chan *http.Response) {
p.mu.Lock()
p.counter++
// gotta read it from p while holding
// the lock to use it below
counter := p.counter
p.mu.Unlock()
// here you use counter rather than p.counter,
// since you don't hold the lock anymore
proxy := p.proxies[int(counter/10)%len(p.proxies)]
res := p.client.Get(url, proxy)
c <- res
}
// ... the loop ...
for len(results) > 0 {
for result := range results {
go myProxySwitcher.downloadChan(result.URL, c)
// this is kinda messy, would need some heavier
// refactoring, but this should fix the race:
myProxySwitcher.mu.Lock()
fmt.Println(myProxySwitcher.counter)
myProxySwitcher.mu.Unlock()
}
page++
results = getPage(page)
// same... it's messy, needs refactoring
myProxySwitcher.mu.Lock()
myProxySwitcher.counter++
myProxySwitcher.mu.Unlock()
}
Alternatively, you could change that counter to e.g. uint64, and then use the atomic/sync package to perform goroutine-safe operations:
type ProxySwitcher struct {
proxies []string
client *http.Client
counter uint64
}
func (p *ProxySwitcher) downloadChan(url string, c chan *http.Response) {
counter := atomic.AddUint64(&p.counter, 1)
// here you use counter rather than p.counter, since that's your local copy
proxy := p.proxies[int(counter/10)%len(p.proxies)]
res := p.client.Get(url, proxy)
c <- res
}
// ... the loop ...
for len(results) > 0 {
for result := range results {
go myProxySwitcher.downloadChan(result.URL, c)
counter := atomic.LoadUint64(&myProxySwitcher.counter)
fmt.Println(counter)
}
page++
results = getPage(page)
atomic.AddUint64(&myProxySwitcher.counter, 1)
}
I'd probably use this last version, as it's cleaner and we don't really need a mutex.
Related
When reading from channels in Go, I observed that it does not follow perfect synchronization between the publishing function and the consuming function. Strangely enough, if it was a quirk with the CPU scheduling, I would have got different results some of the time. The consumer in main seems to consume in multiples two at a time and print them.
Consider the following example:
package main
import (
"fmt"
)
func squares(ch chan int) {
for i:=0; i<9; i++ {
val:= i*i;
fmt.Printf("created val %v \n", val);
ch <- i*i;
fmt.Printf("After posting val %v \n", val);
}
close(ch)
}
func main() {
c := make(chan int)
go squares(c)
for val := range c{
fmt.Println(val);
}
}
No matter how many times I run it on Go Playground, I see the following output.
created val 0
After posting val 0
created val 1
0
1
After posting val 1
created val 4
After posting val 4
created val 9
4
9
After posting val 9
created val 16
After posting val 16
created val 25
16
25
After posting val 25
created val 36
After posting val 36
created val 49
36
49
After posting val 49
created val 64
After posting val 64
64
Shouldn't I be expecting because go would block the squares method until main has printed it?
created val 0
0
After posting val 0
...
If not then why? If I want perfect synchronization like the above, what should be my way?
You are using an unbuffered channel, so this is what happens.
NOTE: This is not meant as a technically 100% accurate description.
main routine starts. It runs until for val := range c {. Then it is put to "sleep" as there is no value to be read from c.
goroutine for squares which was just created is being "awoken". It creates a value and can post it as the other goroutine is already "waiting" for a value on the channel. Then it creates another value and can't post it as the channel now blocks.
main routine is "awoken", reads the value, prints it, reads the next value which the other goroutine is already waiting to push. prints it and then is stuck again as there is no new value available.
goroutine for sqares is "awoken" as it was able to push its value. It prints "After posting", creates a new value, posts it as the other routine is already waiting for one, and creates another one. Then it gets stuck as the other routine is not ready to receive the next value.
back to 3)
If you want a "smoother" workflow where the routines don't synchronize on every single value that is being passed through the channel, make a buffered channel:
c := make(chan int, 3)
I do not understand why following program prints 0 1 2. I thought it will print 2 2 2.
package main
import (
"fmt"
)
func main() {
var funcs []func()
for i := 0; i < 3; i++ {
idx := i
funcs = append(funcs, func() { fmt.Println(idx) })
}
for _, f := range funcs {
f()
}
}
My reasoning of it should print 2 2 2 is that each run of the for loop shared the same scope(e.g., 2nd run of for loop does not terminate the scope of 1st run, the scope are shared). Thus idx's reference is shared by the anonymous function created with in each run of for loop. Thus when the loop ends, all 3 functions created shared the same reference of idx, whose value is 2.
So I think the question boils down to: Does a new run (e.g., i == 2) of for loop ends the scope of last run (e.g., i == 1) of for loop? Would appreciate if answer would point me to golang spec. (I could not find the spec mentioning this).
From spec https://golang.org/ref/spec#For_statements
A "for" statement specifies repeated execution of a block.
each of those blocks has its own scope and they are not nested or shared. But
Each "if", "for", and "switch" statement is considered to be in its
own implicit block.
so variable i in your snippet is shared and
for i := 0; i < 3; i++ {
funcs = append(funcs, func() { fmt.Println(i) })
}
for _, f := range funcs {
f()
}
will print 3 3 3 as expected.
in the interests of learning more about Go, I have been playing with goroutines, and have noticed something - but am not sure what exactly I'm seeing, and hope someone out there might be able to explain the following behaviour.
the following code does exactly what you'd expect:
package main
import (
"fmt"
)
type Test struct {
me int
}
type Tests []Test
func (test *Test) show() {
fmt.Println(test.me)
}
func main() {
var tests Tests
for i := 0; i < 10; i++ {
test := Test{
me: i,
}
tests = append(tests, test)
}
for _, test := range tests {
test.show()
}
}
and prints 0 - 9, in order.
now, when the code is changed as shown below, it always returns with the last one first - doesn't matter which numbers I use:
package main
import (
"fmt"
"sync"
)
type Test struct {
me int
}
type Tests []Test
func (test *Test) show(wg *sync.WaitGroup) {
fmt.Println(test.me)
wg.Done()
}
func main() {
var tests Tests
for i := 0; i < 10; i++ {
test := Test{
me: i,
}
tests = append(tests, test)
}
var wg sync.WaitGroup
wg.Add(10)
for _, test := range tests {
go func(t Test) {
t.show(&wg)
}(test)
}
wg.Wait()
}
this will return:
9
0
1
2
3
4
5
6
7
8
the order of iteration of the loop isn't changing, so I guess that it is something to do with the goroutines...
basically, I am trying to understand why it behaves like this...I understand that goroutines can run in a different order than the order in which they're spawned, but, my question is why this always runs like this. as if there's something really obvious I'm missing...
As expected, the ouput is pseudo-random,
package main
import (
"fmt"
"runtime"
"sync"
)
type Test struct {
me int
}
type Tests []Test
func (test *Test) show(wg *sync.WaitGroup) {
fmt.Println(test.me)
wg.Done()
}
func main() {
fmt.Println("GOMAXPROCS", runtime.GOMAXPROCS(0))
var tests Tests
for i := 0; i < 10; i++ {
test := Test{
me: i,
}
tests = append(tests, test)
}
var wg sync.WaitGroup
wg.Add(10)
for _, test := range tests {
go func(t Test) {
t.show(&wg)
}(test)
}
wg.Wait()
}
Output:
$ go version
go version devel +af15bee Fri Jan 29 18:29:10 2016 +0000 linux/amd64
$ go run goroutine.go
GOMAXPROCS 4
9
4
5
6
7
8
1
2
3
0
$ go run goroutine.go
GOMAXPROCS 4
9
3
0
1
2
7
4
8
5
6
$ go run goroutine.go
GOMAXPROCS 4
1
9
6
8
4
3
0
5
7
2
$
Are you running in the Go playground? The Go playground, by design, is deterministic, which makes it easier to cache programs.
Or, are you running with runtime.GOMAXPROCS = 1? This runs one thing at a time, sequentially. This is what the Go playground does.
Go routines are scheduled randomly since Go 1.5. So, even if the order looks consistent, don't rely on it.
See Go 1.5 release note :
In Go 1.5, the order in which goroutines are scheduled has been changed. The properties of the scheduler were never defined by the language, but programs that depend on the scheduling order may be broken by this change. We have seen a few (erroneous) programs affected by this change. If you have programs that implicitly depend on the scheduling order, you will need to update them.
Another potentially breaking change is that the runtime now sets the default number of threads to run simultaneously, defined by GOMAXPROCS, to the number of cores available on the CPU. In prior releases the default was 1. Programs that do not expect to run with multiple cores may break inadvertently. They can be updated by removing the restriction or by setting GOMAXPROCS explicitly. For a more detailed discussion of this change, see the design document.
For 3 different and distinct "c"
for _, c := range u.components { // u.components has 3 members
fmt.Printf("%v %v", c.name, c.channel) // prints 3 distinct name/channel pairs
go c.Listen(); // a method of c that listens on channel c.channel
}
...launches 3 identical goroutines in which c.name and c.channel are identical.
The long version (commented, short code):
https://play.golang.org/p/mMQb_5jLjm
This is my first Go program, I'm sure I'm missing something obvious. Any ideas?
Thank you.
The call to c.Listen() is closing around the value of c, which is passed via a pointer to Listen, and each iteration changes that value. It's easier to visualize if you look at the method call as a method expression
go (*component).Listen(&c)
https://golang.org/doc/faq#closures_and_goroutines
Create a new value for c on each iteration to prevent the previous from being overwritten:
for _, c := range u.components { // u.components has 3 members
c := c
fmt.Printf("%v %v", c.name, c.channel) // prints 3 distinct name/channel pairs
go c.Listen(); // a method of c that listens on channel c.channel
}
Or use the value contained in the slice directly:
for i := range u.components {
go u.components[i].Listen()
}
Here is my code (run):
package main
import "fmt"
func main() {
var whatever [5]struct{}
for i := range whatever {
fmt.Println(i)
} // part 1
for i := range whatever {
defer func() { fmt.Println(i) }()
} // part 2
for i := range whatever {
defer func(n int) { fmt.Println(n) }(i)
} // part 3
}
Output:
0
1
2
3
4
4
3
2
1
0
4
4
4
4
4
Question: What's the difference between part 2 & part 3? Why part 2 output "44444" instead of "43210"?
The 'part 2' closure captures the variable 'i'. When the code in the closure (later) executes, the variable 'i' has the value which it had in the last iteration of the range statement, ie. '4'. Hence the
4 4 4 4 4
part of the output.
The 'part 3' doesn't capture any outer variables in its closure. As the specs say:
Each time the "defer" statement executes, the function value and parameters to the call are evaluated as usual and saved anew but the actual function is not invoked.
So each of the defered function calls has a different value of the 'n' parameter. It is the value of the 'i' variable in the moment the defer statement was executed. Hence the
4 3 2 1 0
part of the output because:
... deferred calls are executed in LIFO order immediately before the surrounding function returns ...
The key point to note is that the 'f()' in 'defer f()' is not executed when the defer statement executes
but
the expression 'e' in 'defer f(e)' is evaluated when the defer statement executes.
I would like to address another example in order to improve the understanding of defer mechanish, run this snippet as it is first, then switch order of the statements marked as (A) and (B), and see the result to yourself.
package main
import (
"fmt"
)
type Component struct {
val int
}
func (c Component) method() {
fmt.Println(c.val)
}
func main() {
c := Component{}
defer c.method() // statement (A)
c.val = 2 // statement (B)
}
I keep wonderng what are the correct keywords or concepts to apply here. It looks like that the expression c.method is evaluated, thus returning a function binded to the actual state of the component "c" (like taking an snapshot of the component's internal state).
I guess the answer involves not only defer mechanish also how funtions with value or pointer receiver works. Do note that it also happens that if you change the func named method to be a pointer receiver the defer prints c.val as 2, not as 0.