I am learning Go, and one of most powerful features is concurrency. I wrote PHP scripts before, they executed line-by-line, that's why it is difficult for me to understand channels and goroutines.
Is there are any website or any other resources (books, articles, etc.) where I can see a task that can be processed concurrently, so I can practice in concurrency with Go? It would be great, if at the end I can see the solution with comments and explanations why we do it this way and why this solution is better then others.
Just for example, here is the task that confuses me and I don't know how to approach: i need to make kinda parser, that receive start point (e.g.: http://example.com), and start navigating whole website (example.com/about, example.com/best-hotels/, etc.), and took some text parts from the each page (e.g., by selector, like h1.title and p.description) and then, after all website crawled, I receive a slice of parsed content.
I know how to make requests, how to get information using selector, but I don't know how to organize communication between all the goroutines.
Thank you for any information and links. Hope this would help others with the same problem in future.
so there are lots of resources online about concurrency patterns in go -- those three I got from a quick google search. But if you have something specific in mind, I think I can address that too.
Looks like you want to crawl a website and get information from it's many pages concurrently, depositing that "information" into a common location (ie. a slice). The way to go here is to use a chan, chaonlinennel, which is a thread-safe (multiple threads can access it without fear) data-structure for channeling data from one thread/goroutine to another.
And of course the go keyword in Go is how to spawn a goroutine.
so for example, in a func main() thread:
// get a listOfWebpages
dataChannel := make(chan string)
for _, webpage := range listOfWebpages {
go fetchDataFromWebpage(webpage, dataChannel)
}
// the dataChannel will be concurrently filled with the data you send to it
for x := range dataChannel {
fmt.Println(x) // print the header or whatever you scraped from webpage
}
The goroutines will be functions which scrape websites and feed the dataChannel (you mentioned you know how to scrape websites already). Something like this:
func fetchDataFromWebpage(url string, c chan string) {
data := scrapeWebsite(url)
c <- data // send the data to thread safe channel
}
If your having trouble understanding how to use concurrent tools, such as channels, mutex locks, or WaitGroups -- maybe you should start by trying to understand why concurrency can be problematic :) I find the best illustration of that (to me) is the Dining Philosophers Problem, https://en.wikipedia.org/wiki/Dining_philosophers_problem
Five silent philosophers sit at a round table with bowls of spaghetti. Forks are placed between each pair of adjacent philosophers.
Each philosopher must alternately think and eat. However, a philosopher can only eat spaghetti when they have both left and right forks. Each fork can be held by only one philosopher and so a philosopher can use the fork only if it is not being used by another philosopher. After an individual philosopher finishes eating, they need to put down both forks so that the forks become available to others. A philosopher can take the fork on their right or the one on their left as they become available, but cannot start eating before getting both forks.
If practice is what you're looking for, I recommend implementing this problem, so that it fails, and then trying to fix it using concurrent patterns :) -- there are other problems like this available to! And creating the problem is one step towards understanding how to solve it!
If you're having more trouble just understanding how to use Channels, aside from reading up on it, you can more simply think about channels as queues which can safely be accessed/modified from concurrent threads.
Related
I have read https://golang.org/ref/mem, but there are some parts which are still unclear to me.
For instance, in the section "Channel communication" it says: "The write to a happens before the send on c", but I don't know why that is the case. I am copying below the sample code extracted from the mentioned page for providing context.
var c = make(chan int, 10)
var a string
func f() {
a = "hello, world"
c <- 0
}
func main() {
go f()
<-c
print(a)
}
From the point of view of a single goroutine that assertion is true, however from the point of view of another goroutine that cannot be inferred from the guarantees that the text has mentioned so far.
So my question is: Are there other guarantees which are not explicitly stated in this document? For instance, can we say that given some sync primitive such as sending on a channel, ensures that commands placed before it, will not be moved after it by the compiler? What about the commands which comes after it, can we say that they will not be placed before the sync primitive?
What about the operations offered in the atomic package? Do they provide the same guarantees as channels operations?
can we say that given some sync primitive such as sending on a channel, ensures that commands placed before it, will not be moved after it by the compiler?
That is exactly what the memory model says. When you look at a single goroutine, the execution order can be rearranged so that effects of write operations are visible in the order they appear in the execution. So if you set a=1 at some point and read a later, the compiler knows not to move the write operation ahead of the read. For multiple goroutines, channels and locks are the synchronization points, so anything that happened before a channel/lock operation is visible to other goroutines once the synchronization point is reached. The compiler will not move code around so that a write operation crosses the synchronization boundary.
There are guarantees satisfied by the sync/atomic operations as well, and there have been discussions on whether to add them to the memory model. They are not explicitly stated at the moment. There is an open issue about it:
https://github.com/golang/go/issues/5045
As tile, I am referring to Go package sync.Map, can its functions be considered as atomic? Mainly the Load, Store, LoadOrStore, and Delete function.
I also build a simple example go playground, is it guaranteed that only one goroutine can enter the code range line 15 - 17? As my test seems it can be guaranteed.
Please help to explain.
The godoc page for the sync package says: "Map is like a Go map[interface{}]interface{} but is safe for concurrent use by multiple goroutines without additional locking or coordination."
This statement guarantees that there's no need for additional mutexes or synchronization across goroutines. I wouldn't call that claim "atomic" (which has a very precise meaning), but it does mean that you don't have to worry about multiple goroutines being able to enter a LoadOrStore block (with the same key) like in your example.
To profile my app, I want to know how many goroutines are waiting to write to or read from a channel; I can't find anything relevant in the reflect package.
I can maintain an explicit counter of course, but I'd expect golang runtime to know that, so I try to avoid reinventing the wheel.
So, is there a way to do that without maintaining the counter manually?
To track overall load you are probably looking for runtime.NumGoroutine()
https://golang.org/pkg/runtime/#NumGoroutine
Though it's not exactly number of just blocked Go routines it should be very close to it and not exceed the runtime.NumGoroutine() - GOMAXPROCS
For tracking Go routines per channel you can do next:
Use https://golang.org/pkg/runtime/pprof/#Do to mark routines with a specific channel.
Use http/pprof to get information on current profile and parse output - see this answer for details https://stackoverflow.com/a/38414527/1975086. Or maybe you can look into http/pprof and find out how it gets the information so you can get it within your app in typed way.
There's a map[PlayerId]Player to check whether player is online and perform state alterations knowing his ID. This must be done from multiple goroutines concurrently.
For now I plan to use streamrail's concurrent map, but what about a regular map and synchronization using channels?
Should it always be preferred in Go?
Should it be preferred in certain circumstances?
Are they basically just two ways to accomplish the same thing?
BTW, I know the slogan:
don't communicate by sharing memory share memory by communicating
but there are locking mechanisms in stdlib and no words in docs about not using them at all.
Start with the simplest approach: a map and RWMutex.
I cannot recommend using concurrency library unless it is widely used and tested (see https://github.com/streamrail/concurrent-map/issues/6 for example).
Note that even if you use github.com/streamrail/concurrent-map you will still need to implement your own synhronisation (use RWMutex) in the following scenario:
if _, ok = m[k]; !ok {
m[k] = newPlayer()
}
If your game is super popular and played by many players you will find that this approach doesn't scale but I would worry about it only if it becomes a problem.
One of Go's slogans is Do not communicate by sharing memory; instead, share memory by communicating.
I am wondering whether Go allows two different Go-compiled binaries running on the same machine to communicate with one another (i.e. client-server), and how fast that would be in comparison to boost::interprocess in C++? All the examples I've seen so far only illustrate communication between same-program routines.
A simple Go example (with separate client and sever code) would be much appreciated!
One of the first things I thought of when I read this was Stackless Python. The channels in Go remind me a lot of Stackless Python, but that's likely because (a) I've used it and (b) the language/thoughts that they actually came from I've never touched.
I've never attempted to use channels as IPC, but that's probably because the alternative is likely much safer. Here's some psuedocode:
program1
chan = channel()
ipc = IPCManager(chan, None)
send_to_other_app(ipc.underlying_method)
chan.send("Ahoy!")
program2
chan = channel()
recv_from_other_app(underlying_method)
ipc = IPCManager(chan, underlying_method)
ahoy = chan.recv()
If you use a traditional IPC method, you can have channels at each side that wrap their communication on top of it. This leads to some issues in implementation, which I can't even think about how to tackle, and likely a few unexpected race conditions.
However, I agree; the ability to communicate via processes using the same flexibility of Go channels would be phenomenal (but I fear unstable).
Wrapping a simple socket with channels on each side gets you almost all of the benefits, however.
Rob has said that they are thinking a lot about how to make channels work as a (network) transparent RPC, this doesn't work at the moment, but obviously this is something they want to take the time to get it right.
In the meantime you can use the gob package which, while not a perfect and seamless solution, works quite well already.
I've looked at doing a similar thing for wrapping the MPI library. My current thinking is to use something like
func SendHandler(comm Comm){
// Look for sends to a process
for {
i := <-comm.To;
comm.Send(i,dest);
}
}
func ReceiveHandler(comm Comm){
// Look for recieves from a process
// Ping the handler to read out
for {
_ = <-comm.From;
i := comm.Recv(source);
comm.From <- i;
}
}
where comm.Send and comm.Recv wrap a c communications library. I'm not sure how you do the setting up of a channel for two different programs though, I've no experience in that sort of thing.