I was studying a blog about the timing of using go-routines, and I saw the example pasted below, from line 61 to line 65. But I don't get the purpose of using channel here.
It seems that he is iterating the channels to retrieve the msg inside go-routine.
But why not directly using string array?
58 func findConcurrent(goroutines int, topic string, docs []string) int {
59 var found int64
60
61 ch := make(chan string, len(docs))
62 for _, doc := range docs {
63 ch <- doc
64 }
65 close(ch)
66
67 var wg sync.WaitGroup
68 wg.Add(goroutines)
69
70 for g := 0; g < goroutines; g++ {
71 go func() {
72 var lFound int64
73 for doc := range ch {
74 items, err := read(doc)
75 if err != nil {
76 continue
77 }
78 for _, item := range items {
79 if strings.Contains(item.Description, topic) {
80 lFound++
81 }
82 }
83 }
84 atomic.AddInt64(&found, lFound)
85 wg.Done()
86 }()
87 }
88
89 wg.Wait()
90
91 return int(found)
92 }
This code is providing an example of a way of distributing work (finding strings within documents) amongst multiple goRoutines. Basically the code is starting goroutines and feeding them documents to search via a channel.
But why not directly using string array?
It would be possible to use a string array and a variable (lets call it count) to track what item in the array you were up to. You would have some code like (a little long winded to demonstrate a point):
for {
if count > len(docarray) {
break;
}
doc := docarray[count]
count++
// Process the document
}
However you would hit syncronisation issues. For example what happens if two go routines (running on different processor cores) get to if count > len(docarray) at the same time? Without something to prevent this they might both end up processing the same item in the slice (and potentially skipping the next element because they both run count++).
Syncronization of processes is complex and issues can be very hard to debug. Using channels hides a lot of this complexity from you and makes it more likely that your code will work as expected (it does not solve all issues; note the use of atomic.AddInt64(&found, lFound) in the example code to prevent another potential issue that would result from multiple go routines writing to a variable at the same time).
The author seems to just be using a contrived example to illustrate how channels work. Perhaps it would be desirable for him to come up with a more realistic example. But he does say:
Note: There are several ways and options you can take when writing a concurrent version of add. Don’t get hung up on my particular implementation at this time. If you have a more readable version that performs the same or better I would love for you to share it.
So it seems clear he wasn't trying to write the best code for the job, just something to illustrate his point.
he is using buffered channel so i dont think channel is doing any special work here, any normal string slice will also do the same.
Related
When reading from channels in Go, I observed that it does not follow perfect synchronization between the publishing function and the consuming function. Strangely enough, if it was a quirk with the CPU scheduling, I would have got different results some of the time. The consumer in main seems to consume in multiples two at a time and print them.
Consider the following example:
package main
import (
"fmt"
)
func squares(ch chan int) {
for i:=0; i<9; i++ {
val:= i*i;
fmt.Printf("created val %v \n", val);
ch <- i*i;
fmt.Printf("After posting val %v \n", val);
}
close(ch)
}
func main() {
c := make(chan int)
go squares(c)
for val := range c{
fmt.Println(val);
}
}
No matter how many times I run it on Go Playground, I see the following output.
created val 0
After posting val 0
created val 1
0
1
After posting val 1
created val 4
After posting val 4
created val 9
4
9
After posting val 9
created val 16
After posting val 16
created val 25
16
25
After posting val 25
created val 36
After posting val 36
created val 49
36
49
After posting val 49
created val 64
After posting val 64
64
Shouldn't I be expecting because go would block the squares method until main has printed it?
created val 0
0
After posting val 0
...
If not then why? If I want perfect synchronization like the above, what should be my way?
You are using an unbuffered channel, so this is what happens.
NOTE: This is not meant as a technically 100% accurate description.
main routine starts. It runs until for val := range c {. Then it is put to "sleep" as there is no value to be read from c.
goroutine for squares which was just created is being "awoken". It creates a value and can post it as the other goroutine is already "waiting" for a value on the channel. Then it creates another value and can't post it as the channel now blocks.
main routine is "awoken", reads the value, prints it, reads the next value which the other goroutine is already waiting to push. prints it and then is stuck again as there is no new value available.
goroutine for sqares is "awoken" as it was able to push its value. It prints "After posting", creates a new value, posts it as the other routine is already waiting for one, and creates another one. Then it gets stuck as the other routine is not ready to receive the next value.
back to 3)
If you want a "smoother" workflow where the routines don't synchronize on every single value that is being passed through the channel, make a buffered channel:
c := make(chan int, 3)
So I'm trying to understand how parallel computing works while also learning Go. I understand the difference between concurrency and parallelism, however, what I'm a little stuck on is how Go (or the OS) determines that something should be executed in parallel...
Is there something I have to do when writing my code, or is it all handled by the schedulers?
In the example below, I have two functions that are run in separate Go routines using the go keyword. Because the default GOMAXPROCS is the number of processors available on your machine (and I'm also explicitly setting it) I would expect that these two functions run at the same time and thus the output would be a mix of number in particular order - And furthermore that each time it is run the output would be different. However, this is not the case. Instead, they are running one after the other and to make matters more confusing function two is running before function one.
Code:
func main() {
runtime.GOMAXPROCS(6)
var wg sync.WaitGroup
wg.Add(2)
fmt.Println("Starting")
go func() {
defer wg.Done()
for smallNum := 0; smallNum < 20; smallNum++ {
fmt.Printf("%v ", smallNum)
}
}()
go func() {
defer wg.Done()
for bigNum := 100; bigNum > 80; bigNum-- {
fmt.Printf("%v ", bigNum)
}
}()
fmt.Println("Waiting to finish")
wg.Wait()
fmt.Println("\nFinished, Now terminating")
}
Output:
go run main.go
Starting
Waiting to finish
100 99 98 97 96 95 94 93 92 91 90 89 88 87 86 85 84 83 82 81 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
Finished, Now terminating
I am following this article, although just about every example I've come across does something similar.
Concurrency, Goroutines and GOMAXPROCS
Is this working the way is should and I'm not understanding something correctly, or is my code not right?
Is there something I have to do when writing my code,
No.
or is it all handled by the schedulers?
Yes.
In the example below, I have two functions that are run in separate Go routines using the go keyword. Because the default GOMAXPROCS is the number of processors available on your machine (and I'm also explicitly setting it) I would expect that these two functions run at the same time
They might or might not, you have no control here.
and thus the output would be a mix of number in particular order - And furthermore that each time it is run the output would be different. However, this is not the case. Instead, they are running one after the other and to make matters more confusing function two is running before function one.
Yes. Again you cannot force parallel computation.
Your test is flawed: You just don't do much in each goroutine. In your example goroutine 2 might be scheduled to run, starts running and completes before goroutine 1 started running. "Starting" a goroutine with go doesn't force it to start executing right away, all there is done is creating a new goroutine which can run. From all goroutines which can run some are scheduled onto your processors. All this scheduling cannot be controlled, it is fully automatic. As you seem to know this is the difference between concurrent and parallel. You have control over concurrency in Go but not (much) on what is done actually in parallel on two or more cores.
More realistic examples with actual, long-running goroutines which do actual work will show interleaved output.
It's all handled by the scheduler.
With only two loops of 20 short instructions, you will be hard pressed to see the effects of concurrency or parallelism.
Here is another toy example : https://play.golang.org/p/xPKITzKACZp
package main
import (
"fmt"
"runtime"
"sync"
"sync/atomic"
"time"
)
const (
ConstMaxProcs = 2
ConstRunners = 4
ConstLoopcount = 1_000_000
)
func runner(id int, wg *sync.WaitGroup, cptr *int64) {
var times int
for i := 0; i < ConstLoopcount; i++ {
val := atomic.AddInt64(cptr, 1)
if val > 1 {
times++
}
atomic.AddInt64(cptr, -1)
}
fmt.Printf("[runner %d] cptr was > 1 on %d occasions\n", id, times)
wg.Done()
}
func main() {
runtime.GOMAXPROCS(ConstMaxProcs)
var cptr int64
wg := &sync.WaitGroup{}
wg.Add(ConstRunners)
start := time.Now()
for id := 1; id <= ConstRunners; id++ {
go runner(id, wg, &cptr)
}
wg.Wait()
fmt.Printf("completed in %s\n", time.Now().Sub(start))
}
As with your example : you don't have control on the scheduler, this example just has more "surface" to witness some effects of concurrency.
It's hard to witness the actual difference between concurrency and parallelism from within the program, you can view your processor's activity while it runs, or check the global execution time.
The playground does not give sub-second precision on its clock, if you want to see the actual timing, copy/paste the code in a local file and tune the constants to see various effects.
Note that some other effects (probably : branch prediction on the if val > 1 {...} check and/or memory invalidation around the shared cptr variable) make the execution very volatile on my machine, so don't expect a straight "running with ConstMaxProcs = 4 is 4 times quicker than ConstMaxProcs = 1".
I want to dynamically generate a slice of bytes and prefill them with a a value. In this case, if difficulty is 3, I want to generate []byte("000").
I have the working code below, is there anyway to optimize this into a one-liner? (i.e. initialize it and not have to run a for loop and set each element manually)?
var targetPrefix []byte = make([]byte, difficulty)
for i := 0; i < difficulty; i++ {
targetPrefix[i] = 48 // UTF8 encoding for "0"
}
I guess it depends what you mean by optimize. If you mean performance, then no not really. Ultimately when you "request" some memory, it would have to be iterated across to set the values. This was often more noticeable when writing C/C++ after using malloc. If you looked at the contents of the memory, it might be a bunch of 0s, but its likely to be a bunch of random values.
If however you mean to just write less code and utilize something that already exists you could take a look at bytes.Repeat:
targetPrefix := bytes.Repeat([]byte("0"), difficulty)
In Go, write your code as:
package main
import "fmt"
func main() {
difficulty := 7
targetPrefix := make([]byte, difficulty)
for i := range targetPrefix {
targetPrefix[i] = '0'
}
fmt.Println(targetPrefix)
}
Playground: https://play.golang.org/p/QrxEvsnRKMC
Output:
[48 48 48 48 48 48 48]
Or, also in Go, write:
package main
import (
"bytes"
"fmt"
)
func main() {
difficulty := 7
targetPrefix := bytes.Repeat([]byte{'0'}, difficulty)
fmt.Println(targetPrefix)
}
Playground: https://play.golang.org/p/Setx4kXTo1_H
Output:
[48 48 48 48 48 48 48]
Assuming you had 80 bytes of data and only the last 4 bytes was constantly changing, how would you efficiently hash the total 80 bytes using Go. In essence, the first 76 bytes are the same, while the last 4 bytes keeps changing. Ideally, you want to keep a copy of the hash digest for the first 76 bytes and just keep changing the last 4.
You can try the following examples on the Go Playground. Benchmark results is at the end.
Note: the implementations below are not safe for concurrent use; I intentionally made them like this to be simpler and faster.
Fastest when using only public API (always hashes all input)
The general concept and interface of Go's hash algorithms is the hash.Hash interface. This does not allow you to save the state of the hasher and to return or rewind to the saved state. So using the public hash APIs of the Go standard lib, you always have to calculate the hash from start.
What the public API offers is to reuse an already constructed hasher to calculate the hash of a new input, using the Hash.Reset() method. This is nice so that no (memory) allocations will be needed to calculate multiple hash values. Also you may take advantage of the optional slice that may be passed to Hash.Sum() which is used to append the current hash to. This is nice so that no allocations will be needed to receive the hash results either.
Here's an example that takes advantage of these:
type Cached1 struct {
hasher hash.Hash
result [sha256.Size]byte
}
func NewCached1() *Cached1 {
return &Cached1{hasher: sha256.New()}
}
func (c *Cached1) Sum(data []byte) []byte {
c.hasher.Reset()
c.hasher.Write(data)
return c.hasher.Sum(c.result[:0])
}
Test data
We'll use the following test data:
var fixed = bytes.Repeat([]byte{1}, 76)
var variantA = []byte{1, 1, 1, 1}
var variantB = []byte{2, 2, 2, 2}
var data = append(append([]byte{}, fixed...), variantA...)
var data2 = append(append([]byte{}, fixed...), variantB...)
var c1 = NewCached1()
First let's get authentic results (to verify if our hasher works correctly):
fmt.Printf("%x\n", sha256.Sum256(data))
fmt.Printf("%x\n", sha256.Sum256(data2))
Output:
fb8e69bdfa2ad15be7cc8a346b74e773d059f96cfc92da89e631895422fe966a
10ef52823dad5d1212e8ac83b54c001bfb9a03dc0c7c3c83246fb988aa788c0c
Now let's check our Cached1 hasher:
fmt.Printf("%x\n", c1.Sum(data))
fmt.Printf("%x\n", c1.Sum(data2))
Output is the same:
fb8e69bdfa2ad15be7cc8a346b74e773d059f96cfc92da89e631895422fe966a
10ef52823dad5d1212e8ac83b54c001bfb9a03dc0c7c3c83246fb988aa788c0c
Even faster but may break (in future Go releases): hashes only the last 4 bytes
Now let's see a less flexible solution which truly calculates the hash of the first 76 fixed part only once.
The hasher of the crypto/sha256 package is the unexported sha256.digest type (more precisely a pointer to this type):
// digest represents the partial evaluation of a checksum.
type digest struct {
h [8]uint32
x [chunk]byte
nx int
len uint64
is224 bool // mark if this digest is SHA-224
}
A value of the digest struct type basically holds the current state of the hasher.
What we may do is feed the hasher the fixed, first 76 bytes, and then save this struct value. When we need to caclulate the hash of some 80 bytes data where the first 76 is the same, we use this saved value as a starting point, and then feed the varying last 4 bytes.
Note that it's enough to simply save this struct value as it contains no pointers and no descriptor types like slices and maps. Else we would also have to make a copy of those, but we're "lucky". So this solution would need adjustment if a future implementation of crypto/sha256 would add a pointer or slice field for example.
Since sha256.digest is unexported, we can only use reflection (reflect package) to achieve our goals, which inherently will add some delays to computation.
Example implementation that does this:
type Cached2 struct {
origv reflect.Value
hasherv reflect.Value
hasher hash.Hash
result [sha256.Size]byte
}
func NewCached2(fixed []byte) *Cached2 {
h := sha256.New()
h.Write(fixed)
c := &Cached2{origv: reflect.ValueOf(h).Elem()}
hasherv := reflect.New(c.origv.Type())
c.hasher = hasherv.Interface().(hash.Hash)
c.hasherv = hasherv.Elem()
return c
}
func (c *Cached2) Sum(data []byte) []byte {
// Set state of the fixed hash:
c.hasherv.Set(c.origv)
c.hasher.Write(data)
return c.hasher.Sum(c.result[:0])
}
Testing it:
var c2 = NewCached2(fixed)
fmt.Printf("%x\n", c2.Sum(variantA))
fmt.Printf("%x\n", c2.Sum(variantB))
Output is again the same:
fb8e69bdfa2ad15be7cc8a346b74e773d059f96cfc92da89e631895422fe966a
10ef52823dad5d1212e8ac83b54c001bfb9a03dc0c7c3c83246fb988aa788c0c
So it works.
The "ultimate", fastest solution
Cached2 could be faster if reflection would not be involved. If we want an even faster solution, simply we can make a copy of the sha256.digest type and its methods into our package, so we can directly use it without having to resort to reflection.
If we do this, we will have access to the digest struct value, and we can simply make a copy of it like:
var d digest
// init d
saved := d
And restoring it is like:
d = saved
I simply "cloned" the crypto/sha256 package to my workspace, and changed / exported the digest type as Digest just for demonstration purposes. Then using this mysha256.Digest type I implemented Cached3 like this:
type Cached3 struct {
orig mysha256.Digest
result [sha256.Size]byte
}
func NewCached3(fixed []byte) *Cached3 {
var d mysha256.Digest
d.Reset()
d.Write(fixed)
return &Cached3{orig: d}
}
func (c *Cached3) Sum(data []byte) []byte {
// Make a copy of the fixed hash:
d := c.orig
d.Write(data)
return d.Sum(c.result[:0])
}
Testing it:
var c3 = NewCached3(fixed)
fmt.Printf("%x\n", c3.Sum(variantA))
fmt.Printf("%x\n", c3.Sum(variantB))
Output again is the same. So this works too.
Benchmarks
We can benchmark performance with this code:
func BenchmarkCached1(b *testing.B) {
for i := 0; i < b.N; i++ {
c1.Sum(data)
c1.Sum(data2)
}
}
func BenchmarkCached2(b *testing.B) {
for i := 0; i < b.N; i++ {
c2.Sum(variantA)
c2.Sum(variantB)
}
}
func BenchmarkCached3(b *testing.B) {
for i := 0; i < b.N; i++ {
c3.Sum(variantA)
c3.Sum(variantB)
}
}
Benchmark results (go test -bench . -benchmem):
BenchmarkCached1-4 1000000 1569 ns/op 0 B/op 0 allocs/op
BenchmarkCached2-4 2000000 926 ns/op 0 B/op 0 allocs/op
BenchmarkCached3-4 2000000 872 ns/op 0 B/op 0 allocs/op
Cached2 is approximately 41% faster than Cached1 which is quite noticable and nice. Cached3 only gives a "little" performance boost compared to Cached2, another 6%. Cached3 is 44% faster than Cached1.
Also note that none of the solutions use any allocations which is also nice.
Conclusion
For that extra 40% or 44%, I would probably not go for the Cached2 or Cached3 solutions. Of course it really depends on how important the performance is to you. If it is important, I think the Cached2 solution presents a fine compromise between minimum added complexity and the noticeable performance gain. It does pose a threat as future Go implementations may break it; if it is a problem, Cached3 solves this by copying the current implementation (and also improves its performance a little).
say for some very simple Golang code:
package main
import "fmt"
func plus( a int, b int) int {
return a+b
}
func plusPlus(a,b,c int) int {
return a +b + c
}
func main() {
ptr := plus
ptr2 := plusPlus
fmt.Println(ptr)
fmt.Println(ptr2)
}
This has the following output:
0x2000
0x2020
What is going on here? This doesn't look like a function pointer, or any kind of pointer for that matter, that one would find in the stack. I also understand that Go, while offering some nice low level functionality in the threading department, also requires an OS for it to function; C is functional across all computer platforms and operating systems can be written in it while Go needs an operating system to function and in fact only works on a few OS right now. Do the very regular function pointers mean that this works on a VM? Or is the compiler just linked to low level C functions?
Go does not run on a virtual machine.
From the view of the language specification, ptr and ptr2 are function values. They can be called as ptr(1, 2) and ptr2(1, 2, 3).
Diving down into the implementation, the variables ptr and ptr2 are pointers to func values. See the Function Call design document for information on func values. Note the distinction between the language's "function" value and the implementation's "func" value.
Because the reflection API used by the fmt package indirects through the func values to get the pointer to print, the call tofmt.Println(ptr) prints the actual address of the plus function.
Go doesn't run on a virtual machine. Those are the actual addresses of the functions.
On my machine (go 1.4.1, Linux amd64) the program prints
0x400c00
0x400c20
which are different from the values in your example, but still pretty low. Checking the compiled code:
$ nm test | grep 'T main.plus'
0000000000400c00 T main.plus
0000000000400c20 T main.plusPlus
these are the actual addresses of the functions. func plus compiles to a mere 19 bytes of code, so plusPlus appears only 32 (0x20) bytes later to satisfy optimal alignment requirements.
For the sake of curiosity, here's the disassembly of func plus from objdump -d, which should dispell any doubts that Go compiles to anything but native code:
0000000000400c00 <main.plus>:
400c00: 48 8b 5c 24 08 mov 0x8(%rsp),%rbx
400c05: 48 8b 6c 24 10 mov 0x10(%rsp),%rbp
400c0a: 48 01 eb add %rbp,%rbx
400c0d: 48 89 5c 24 18 mov %rbx,0x18(%rsp)
400c12: c3 retq
They are function values:
package main
import "fmt"
func plus(a int, b int) int {
return a + b
}
func plusPlus(a, b, c int) int {
return a + b + c
}
func main() {
funcp := plus
funcpp := plusPlus
fmt.Println(funcp)
fmt.Println(funcpp)
fmt.Println(funcp(1, 2))
fmt.Println(funcpp(1, 2, 3))
}
Output:
0x20000
0x20020
3
6