Should goroutine/channel-based mechanism replace a concurrent map? - go

There's a map[PlayerId]Player to check whether player is online and perform state alterations knowing his ID. This must be done from multiple goroutines concurrently.
For now I plan to use streamrail's concurrent map, but what about a regular map and synchronization using channels?
Should it always be preferred in Go?
Should it be preferred in certain circumstances?
Are they basically just two ways to accomplish the same thing?
BTW, I know the slogan:
don't communicate by sharing memory share memory by communicating
but there are locking mechanisms in stdlib and no words in docs about not using them at all.

Start with the simplest approach: a map and RWMutex.
I cannot recommend using concurrency library unless it is widely used and tested (see https://github.com/streamrail/concurrent-map/issues/6 for example).
Note that even if you use github.com/streamrail/concurrent-map you will still need to implement your own synhronisation (use RWMutex) in the following scenario:
if _, ok = m[k]; !ok {
m[k] = newPlayer()
}
If your game is super popular and played by many players you will find that this approach doesn't scale but I would worry about it only if it becomes a problem.

Related

Why does Go uses channels to send and receive data between goroutines instead of using normal variables?

I could not find anything about this question except this explanation by Wikipedia https://en.wikipedia.org/wiki/Channel_(programming). But I'm not satisfied with the explanation.
What problem do channels solve?
Why don't we just use normal variables to send and receive data instead?
If by "normal variables" you mean, for example, a slice that multiple goroutines write to and read from, then this is a guaranteed way to get data races (you don't want to get data races). You can avoid concurrent access by using some kind of synchronization (such as Mutex or RWLock).
At this point, you
reinvented channels (which are basically that, a slice under a mutex)
spent more time than you needed to and still your solution is inferior (there's no syntax support, you can't use your slices in select, etc.)
Channels solve the problem of concurrent read and write. Basically, prevent the situation when one goroutine reads a variable and another one writes the same variable.
Also channels may have buffer, so you can write several values before locking.
Of course, you don't have to use channels. There are other ways to send data between goroutines. For example, you can use atomic operations when assigning or reading a value from a shared variable, or use mutex whenever you access it.

Why do a Beam io need beam.AddFixedKey+beam.GroupByKey to work properly?

I'm working on a Beam IO for Elasticsearch in Golang and at the moment I have a working draft version but, only managed to make it work by doing something that's not clear to me why do I need it.
Basically I looked at existing IO's and found that writes only work if I add the following:
x := beam.AddFixedKey(s, pColl)
y := beam.GroupByKey(s, x)
A full example is in the existing BigQuery IO
Basically I would like to understand why do I need both AddFixedKey followed by a GroupByKey to make it work. Also checked the issue BEAM-3860, but doesn't have much more details about it.
Those two transforms essentially function as a way to group all elements in a PCollection into one list. For example, its usage in the BigQuery example you posted allows grouping the entire input PCollection into a list that gets iterated over in the ProcessElement method.
Whether to use this approach depends how you are implementing the IO. The BigQuery example you posted performs its writes as a batch once all elements are available, but that may not be the best approach for your use case. You might prefer to write elements one at a time as they come in, especially if you can parallelize writes among different workers. In that case you would want to avoid grouping the input PCollection together.

How to understand and practice in concurrency with Go?

I am learning Go, and one of most powerful features is concurrency. I wrote PHP scripts before, they executed line-by-line, that's why it is difficult for me to understand channels and goroutines.
Is there are any website or any other resources (books, articles, etc.) where I can see a task that can be processed concurrently, so I can practice in concurrency with Go? It would be great, if at the end I can see the solution with comments and explanations why we do it this way and why this solution is better then others.
Just for example, here is the task that confuses me and I don't know how to approach: i need to make kinda parser, that receive start point (e.g.: http://example.com), and start navigating whole website (example.com/about, example.com/best-hotels/, etc.), and took some text parts from the each page (e.g., by selector, like h1.title and p.description) and then, after all website crawled, I receive a slice of parsed content.
I know how to make requests, how to get information using selector, but I don't know how to organize communication between all the goroutines.
Thank you for any information and links. Hope this would help others with the same problem in future.
so there are lots of resources online about concurrency patterns in go -- those three I got from a quick google search. But if you have something specific in mind, I think I can address that too.
Looks like you want to crawl a website and get information from it's many pages concurrently, depositing that "information" into a common location (ie. a slice). The way to go here is to use a chan, chaonlinennel, which is a thread-safe (multiple threads can access it without fear) data-structure for channeling data from one thread/goroutine to another.
And of course the go keyword in Go is how to spawn a goroutine.
so for example, in a func main() thread:
// get a listOfWebpages
dataChannel := make(chan string)
for _, webpage := range listOfWebpages {
go fetchDataFromWebpage(webpage, dataChannel)
}
// the dataChannel will be concurrently filled with the data you send to it
for x := range dataChannel {
fmt.Println(x) // print the header or whatever you scraped from webpage
}
The goroutines will be functions which scrape websites and feed the dataChannel (you mentioned you know how to scrape websites already). Something like this:
func fetchDataFromWebpage(url string, c chan string) {
data := scrapeWebsite(url)
c <- data // send the data to thread safe channel
}
If your having trouble understanding how to use concurrent tools, such as channels, mutex locks, or WaitGroups -- maybe you should start by trying to understand why concurrency can be problematic :) I find the best illustration of that (to me) is the Dining Philosophers Problem, https://en.wikipedia.org/wiki/Dining_philosophers_problem
Five silent philosophers sit at a round table with bowls of spaghetti. Forks are placed between each pair of adjacent philosophers.
Each philosopher must alternately think and eat. However, a philosopher can only eat spaghetti when they have both left and right forks. Each fork can be held by only one philosopher and so a philosopher can use the fork only if it is not being used by another philosopher. After an individual philosopher finishes eating, they need to put down both forks so that the forks become available to others. A philosopher can take the fork on their right or the one on their left as they become available, but cannot start eating before getting both forks.
If practice is what you're looking for, I recommend implementing this problem, so that it fails, and then trying to fix it using concurrent patterns :) -- there are other problems like this available to! And creating the problem is one step towards understanding how to solve it!
If you're having more trouble just understanding how to use Channels, aside from reading up on it, you can more simply think about channels as queues which can safely be accessed/modified from concurrent threads.

Is there a preferred way to design signal or event APIs in Go?

I am designing a package where I want to provide an API based on the observer pattern: that is, there are points where I'd like to emit a signal that will trigger zero or more interested parties. Those interested parties shouldn't necessarily need to know about each other.
I know I can implement an API like this from scratch (e.g. using a collection of channels or callback functions), but was wondering if there was a preferred way of structuring such APIs.
In many of the languages or frameworks I've played with, there has been standard ways to build these APIs so that they behave the way users expect: e.g. the g_signal_* functions for glib based applications, events and addEventListener() for JavaScript DOM apps, or multicast delegates for .NET.
Is there anything similar for Go? If not, is there some other way of structuring this type of API that is more idiomatic in Go?
I would say that a goroutine receiving from a channel is an analogue of an observer to a certain extent. An idiomatic way to expose events in Go would be thus IMHO to return channels from a package (function). Another observation is that callbacks are not used too often in Go programs. One of the reasons is also the existence of the powerful select statement.
As a final note: some people (me too) consider GoF patterns as Go antipatterns.
Go gives you a lot of tools for designing a signal api.
First you have to decide a few things:
Do you want a push or a pull model? eg. Does the publisher push events to the subscribers or do the subscribers pull events from the publisher?
If you want a push system then having the subscribers give the publisher a channel to send messages on would work really well. If you want a pull method then just a message box guarded with a mutex would work. Other than that without knowing more about your requirements it's hard to give much more detail.
I needed an "observer pattern" type thing in a couple of projects. Here's a reusable example from a recent project.
It's got a corresponding test that shows how to use it.
The basic theory is that an event emitter calls Submit with some piece of data whenever something interesting occurs. Any client that wants to be aware of that event will Register a channel it reads the event data off of. This channel you registered can be used in a select loop, or you can read it directly (or buffer and poll it).
When you're done, you Unregister.
It's not perfect for all cases (e.g. I may want a force-unregister type of event for slow observers), but it works where I use it.
I would say there is no standard way of doing this because channels are built into the language. There is no channel library with standard ways of doing things with channels, there are simply channels. Having channels as built in first class objects frees you from having to work with standard techniques and lets you solve problems in the simplest most natural way.
There is a basic Golang version of Node EventEmitter at https://github.com/chuckpreslar/emission
See http://itjumpstart.wordpress.com/2014/11/21/eventemitter-in-go/

Shared memory vs. Go channel communication

One of Go's slogans is Do not communicate by sharing memory; instead, share memory by communicating.
I am wondering whether Go allows two different Go-compiled binaries running on the same machine to communicate with one another (i.e. client-server), and how fast that would be in comparison to boost::interprocess in C++? All the examples I've seen so far only illustrate communication between same-program routines.
A simple Go example (with separate client and sever code) would be much appreciated!
One of the first things I thought of when I read this was Stackless Python. The channels in Go remind me a lot of Stackless Python, but that's likely because (a) I've used it and (b) the language/thoughts that they actually came from I've never touched.
I've never attempted to use channels as IPC, but that's probably because the alternative is likely much safer. Here's some psuedocode:
program1
chan = channel()
ipc = IPCManager(chan, None)
send_to_other_app(ipc.underlying_method)
chan.send("Ahoy!")
program2
chan = channel()
recv_from_other_app(underlying_method)
ipc = IPCManager(chan, underlying_method)
ahoy = chan.recv()
If you use a traditional IPC method, you can have channels at each side that wrap their communication on top of it. This leads to some issues in implementation, which I can't even think about how to tackle, and likely a few unexpected race conditions.
However, I agree; the ability to communicate via processes using the same flexibility of Go channels would be phenomenal (but I fear unstable).
Wrapping a simple socket with channels on each side gets you almost all of the benefits, however.
Rob has said that they are thinking a lot about how to make channels work as a (network) transparent RPC, this doesn't work at the moment, but obviously this is something they want to take the time to get it right.
In the meantime you can use the gob package which, while not a perfect and seamless solution, works quite well already.
I've looked at doing a similar thing for wrapping the MPI library. My current thinking is to use something like
func SendHandler(comm Comm){
// Look for sends to a process
for {
i := <-comm.To;
comm.Send(i,dest);
}
}
func ReceiveHandler(comm Comm){
// Look for recieves from a process
// Ping the handler to read out
for {
_ = <-comm.From;
i := comm.Recv(source);
comm.From <- i;
}
}
where comm.Send and comm.Recv wrap a c communications library. I'm not sure how you do the setting up of a channel for two different programs though, I've no experience in that sort of thing.

Resources