Shared memory vs. Go channel communication - go

One of Go's slogans is Do not communicate by sharing memory; instead, share memory by communicating.
I am wondering whether Go allows two different Go-compiled binaries running on the same machine to communicate with one another (i.e. client-server), and how fast that would be in comparison to boost::interprocess in C++? All the examples I've seen so far only illustrate communication between same-program routines.
A simple Go example (with separate client and sever code) would be much appreciated!

One of the first things I thought of when I read this was Stackless Python. The channels in Go remind me a lot of Stackless Python, but that's likely because (a) I've used it and (b) the language/thoughts that they actually came from I've never touched.
I've never attempted to use channels as IPC, but that's probably because the alternative is likely much safer. Here's some psuedocode:
program1
chan = channel()
ipc = IPCManager(chan, None)
send_to_other_app(ipc.underlying_method)
chan.send("Ahoy!")
program2
chan = channel()
recv_from_other_app(underlying_method)
ipc = IPCManager(chan, underlying_method)
ahoy = chan.recv()
If you use a traditional IPC method, you can have channels at each side that wrap their communication on top of it. This leads to some issues in implementation, which I can't even think about how to tackle, and likely a few unexpected race conditions.
However, I agree; the ability to communicate via processes using the same flexibility of Go channels would be phenomenal (but I fear unstable).
Wrapping a simple socket with channels on each side gets you almost all of the benefits, however.

Rob has said that they are thinking a lot about how to make channels work as a (network) transparent RPC, this doesn't work at the moment, but obviously this is something they want to take the time to get it right.
In the meantime you can use the gob package which, while not a perfect and seamless solution, works quite well already.

I've looked at doing a similar thing for wrapping the MPI library. My current thinking is to use something like
func SendHandler(comm Comm){
// Look for sends to a process
for {
i := <-comm.To;
comm.Send(i,dest);
}
}
func ReceiveHandler(comm Comm){
// Look for recieves from a process
// Ping the handler to read out
for {
_ = <-comm.From;
i := comm.Recv(source);
comm.From <- i;
}
}
where comm.Send and comm.Recv wrap a c communications library. I'm not sure how you do the setting up of a channel for two different programs though, I've no experience in that sort of thing.

Related

is concurrent write on stdout threadsafe?

below code does not throw a data race
package main
import (
"fmt"
"os"
"strings"
)
func main() {
x := strings.Repeat(" ", 1024)
go func() {
for {
fmt.Fprintf(os.Stdout, x+"aa\n")
}
}()
go func() {
for {
fmt.Fprintf(os.Stdout, x+"bb\n")
}
}()
go func() {
for {
fmt.Fprintf(os.Stdout, x+"cc\n")
}
}()
go func() {
for {
fmt.Fprintf(os.Stdout, x+"dd\n")
}
}()
<-make(chan bool)
}
I tried multiple length of data, with variant https://play.golang.org/p/29Cnwqj5K30
This post says it is not TS.
This mail does not really answer the question, or I did not understand.
Package documentation of os and fmt dont mention much about this. I admit i did not dig the source code of those two packages to find further explanations, they appear too complex to me.
What are the recommendations and their references ?
I'm not sure it would qualify as a definitive answer but I'll try to provide some insight.
The F*-functions of the fmt package merely state they take a value of a type implementing io.Writer interface and call Write on it.
The functions themselves are safe for concurrent use — in the sense it's OK to call any number of fmt.Fwhaveter concurrently: the package itself is prepared for that,
but when it comes to concurrently writing to the same value of a type implementing io.Writer, the question becomes more complex because supporting of an interface in Go does not state anything about the real type concurrency-wise.
In other words, the real point of where the concurrency may or may not be allowed is deferred to the "writer" which the functions of fmt write to.
(One should also keep in mind that the fmt.*Print* functions are allowed to call Write on its destination any number of times during a single invocation, in a row, — as opposed to those provided by the stock package log.)
So, we basically have two cases:
Custom implementations of io.Writer.
Stock implementations of it, such as *os.File or wrappers around sockets produced by the functions of net package.
The first case is the simple one: whatever the implementor did.
The second case is harder: as I understand, the Go standard library's stance on this (albeit not clearly stated in the docs) in that the wrappers it provides around "things" provided by the OS—such as file descriptors and sockets—are reasonably "thin", and hence whatever semantics they implement, is transitively implemented by the stdlib code running on a particular system.
For instance, POSIX requires that write(2) calls are atomic with regard to one another when they are operating on regular files or symbolic links. This means, since any call to Write on things wrapping file descriptors or sockets actually results in a single "write" syscall of the target system, you might consult the docs of the target OS and get the idea of what will happen.
Note that POSIX only tells about filesystem objects, and if os.Stdout is opened to a terminal (or a pseudo-terminal) or to a pipe or to anything else which supports the write(2) syscall, the results will depend on what the relevant subsystem and/or the driver implement—for instance, data from multiple concurrent calls may be interspersed, or one of the calls, or both, may just be failed by the OS—unlikely, but still.
Going back to Go, from what I gather, the following facts hold true about the Go stdlib types which wrap file descriptors and sockets:
They are safe for concurrent use by themselves (I mean, on the Go level).
They "map" Write and Read calls 1-to-1 to the underlying object—that is, a Write call is never split into two or more underlying syscalls, and a Read call never returns data "glued" from the results of multiple underlying syscalls.
(By the way, people occasionally get tripped by this no-frills behaviour — for example, see this or this as examples.)
So basically when we consider this with the fact fmt.*Print* are free to call Write any number of times per a single call, your examples which use os.Stdout, will:
Never result in a data race — unless you've assigned the variable os.Stdout some custom implementation, — but
The data actually written to the underlying FD will be intermixed in an unpredictable order which may depend on many factors including the OS kernel version and settings, the version of Go used to build the program, the hardware and the load on the system.
TL;DR
Multiple concurrent calls to fmt.Fprint* writing to the same "writer" value defer their concurrency to the implementation (type) of the "writer".
It's impossible to have a data race with "file-like" objects provided by the Go stdlib in the setup you have presented in your question.
The real problem will be not with data races on the Go program level but with the concurrent access to a single resource happening on level of the OS. And there, we do not (usually) speak about data races because the commodity OSes Go supports expose things one may "write to" as abstractions, where a real data race would possibly indicate a bug in the kernel or in the driver (and the Go's race detector won't be able to detect it anyway as that memory would not be owned by the Go runtime powering the process).
Basically, in your case, if you need to be sure the data produced by any particular call to fmt.Fprint* comes out as a single contiguous piece to the actual data receiver provided by the OS, you need to serialize these calls as the fmt package provides no guarantees regarding the number of calls to Write on the supplied "writer" for the functions it exports.
The serialization may either be external (explicit, that is "take a lock, call fmt.Fprint*, release the lock") or internal — by wrapping the os.Stdout in a custom type which would manage a lock, and using it).
And while we're at it, the log package does just that, and can be used straight away as the "loggers" it provides, including the default one, allow to inhibit outputting of "log headers" (such as the timestamp and the name of the file).

How to understand and practice in concurrency with Go?

I am learning Go, and one of most powerful features is concurrency. I wrote PHP scripts before, they executed line-by-line, that's why it is difficult for me to understand channels and goroutines.
Is there are any website or any other resources (books, articles, etc.) where I can see a task that can be processed concurrently, so I can practice in concurrency with Go? It would be great, if at the end I can see the solution with comments and explanations why we do it this way and why this solution is better then others.
Just for example, here is the task that confuses me and I don't know how to approach: i need to make kinda parser, that receive start point (e.g.: http://example.com), and start navigating whole website (example.com/about, example.com/best-hotels/, etc.), and took some text parts from the each page (e.g., by selector, like h1.title and p.description) and then, after all website crawled, I receive a slice of parsed content.
I know how to make requests, how to get information using selector, but I don't know how to organize communication between all the goroutines.
Thank you for any information and links. Hope this would help others with the same problem in future.
so there are lots of resources online about concurrency patterns in go -- those three I got from a quick google search. But if you have something specific in mind, I think I can address that too.
Looks like you want to crawl a website and get information from it's many pages concurrently, depositing that "information" into a common location (ie. a slice). The way to go here is to use a chan, chaonlinennel, which is a thread-safe (multiple threads can access it without fear) data-structure for channeling data from one thread/goroutine to another.
And of course the go keyword in Go is how to spawn a goroutine.
so for example, in a func main() thread:
// get a listOfWebpages
dataChannel := make(chan string)
for _, webpage := range listOfWebpages {
go fetchDataFromWebpage(webpage, dataChannel)
}
// the dataChannel will be concurrently filled with the data you send to it
for x := range dataChannel {
fmt.Println(x) // print the header or whatever you scraped from webpage
}
The goroutines will be functions which scrape websites and feed the dataChannel (you mentioned you know how to scrape websites already). Something like this:
func fetchDataFromWebpage(url string, c chan string) {
data := scrapeWebsite(url)
c <- data // send the data to thread safe channel
}
If your having trouble understanding how to use concurrent tools, such as channels, mutex locks, or WaitGroups -- maybe you should start by trying to understand why concurrency can be problematic :) I find the best illustration of that (to me) is the Dining Philosophers Problem, https://en.wikipedia.org/wiki/Dining_philosophers_problem
Five silent philosophers sit at a round table with bowls of spaghetti. Forks are placed between each pair of adjacent philosophers.
Each philosopher must alternately think and eat. However, a philosopher can only eat spaghetti when they have both left and right forks. Each fork can be held by only one philosopher and so a philosopher can use the fork only if it is not being used by another philosopher. After an individual philosopher finishes eating, they need to put down both forks so that the forks become available to others. A philosopher can take the fork on their right or the one on their left as they become available, but cannot start eating before getting both forks.
If practice is what you're looking for, I recommend implementing this problem, so that it fails, and then trying to fix it using concurrent patterns :) -- there are other problems like this available to! And creating the problem is one step towards understanding how to solve it!
If you're having more trouble just understanding how to use Channels, aside from reading up on it, you can more simply think about channels as queues which can safely be accessed/modified from concurrent threads.

Should goroutine/channel-based mechanism replace a concurrent map?

There's a map[PlayerId]Player to check whether player is online and perform state alterations knowing his ID. This must be done from multiple goroutines concurrently.
For now I plan to use streamrail's concurrent map, but what about a regular map and synchronization using channels?
Should it always be preferred in Go?
Should it be preferred in certain circumstances?
Are they basically just two ways to accomplish the same thing?
BTW, I know the slogan:
don't communicate by sharing memory share memory by communicating
but there are locking mechanisms in stdlib and no words in docs about not using them at all.
Start with the simplest approach: a map and RWMutex.
I cannot recommend using concurrency library unless it is widely used and tested (see https://github.com/streamrail/concurrent-map/issues/6 for example).
Note that even if you use github.com/streamrail/concurrent-map you will still need to implement your own synhronisation (use RWMutex) in the following scenario:
if _, ok = m[k]; !ok {
m[k] = newPlayer()
}
If your game is super popular and played by many players you will find that this approach doesn't scale but I would worry about it only if it becomes a problem.

Can goroutine like erlang spawn process across multiple hosts transparently?

It is said that if configured erlang with cookie setting, the erlang's process could be run across different machines, and this is transparent to the caller.
Is that possible for goroutine run like this?
This is not a feature of the language, no. However, since there's no way in the language to ask about goroutines (for example, to get a thread ID or control them from a different goroutine like in some other languages), as long as you could set up transparent communication mechanisms (for example, channels that work over the network), you could create a similar effect. In fact, Rob Pike, one of the creators of Go, has toyed around in the past with a package he called "netchan" to do exactly this, but couldn't get the semantics right, and so he hasn't published a finalized version yet. It's definitely something he's still interested in, though, and would certainly be consistent with the Go approach to abstraction.

How do Erlang actors differ from OOP objects?

Suppose I have an Erlang actor defined like this:
counter(Num) ->
receive
{From, increment} ->
From ! {self(), new_value, Num + 1}
counter(Num + 1);
end.
And similarly, I have a Ruby class defined like this:
class Counter
def initialize(num)
#num = num
end
def increment
#num += 1
end
end
The Erlang code is written in a functional style, using tail recursion to maintain state. However, what is the meaningful impact of this difference? To my naive eyes, the interfaces to these two things seem much the same: You send a message, the state gets updated, and you get back a representation of the new state.
Functional programming is so often described as being a totally different paradigm than OOP. But the Erlang actor seems to do exactly what objects are supposed to do: Maintain state, encapsulate, and provide a message-based interface.
In other words, when I am passing messages between Erlang actors, how is it different than when I'm passing messages between Ruby objects?
I suspect there are bigger consequences to the functional/OOP dichotomy than I'm seeing. Can anyone point them out?
Let's put aside the fact that the Erlang actor will be scheduled by the VM and thus may run concurrently with other code. I realize that this is a major difference between the Erlang and Ruby versions, but that's not what I'm getting at. Concurrency is possible in other languages, including Ruby. And while Erlang's concurrency may perform very differently (sometimes better), I'm not really asking about the performance differences.
Rather, I'm more interested in the functional-vs-OOP side of the question.
In other words, when I am passing messages between Erlang actors, how is it different than when I'm passing messages between Ruby objects?
The difference is that in traditional languages like Ruby there is no message passing but method call that is executed in the same thread and this may lead to synchronization problems if you have multithreaded application. All threads have access to each other thread memory.
In Erlang all actors are independent and the only way to change state of another actor is to send message. No process have access to internal state of any other process.
IMHO this is not the best example for FP vs OOP. Differences usually manifest in accessing/iterating and chaining methods/functions on objects. Also, probably, understanding what is "current state" works better in FP.
Here, you put two very different technologies against each other. One happen to be F, the other one OO.
The first difference I can spot right away is memory isolation. Messages are serialized in Erlang, so it is easier to avoid race conditions.
The second are memory management details. In Erlang message handling is divided underneath between Sender and Receiver. There are two sets of locks of process structure held by Erlang VM. Therefore, while Sender sends the message he acquires lock which is not blocking main process operations (accessed by MAIN lock). To sum up, it gives Erlang more soft real-time nature vs totally random behaviour on Ruby side.
Looking from the outside, actors resemble objects. They encapsulate state and communicate with the rest of the world via messages to manipulate that state.
To see how FP works, you must look inside an actor and see how it mutates state. Your example where the state is an integer is too simple. I don't have the time to provide full example, but I'll sketch the code. Normally, an actor loop looks like following:
loop(State) ->
Message = receive
...
end,
NewState = f(State, Message),
loop(NewState).
The most important difference from OOP is that there are no variable mutations i.e. NewState is obtained from the State and may share most of the data with it, but the State variable always remains the same.
This is a nice property, since we never corrupt current state. Function f will usually perform a series of transformation to turn State into NewState. And only if/when it completely succeeds we replace the old state with the new one by calling loop(NewState).
So the important benefit is consistency of our state.
The second benefit I found is cleaner code, but it takes some time getting used to it. Generally, since you cannot modify variable, you will have to divide your code in many very small functions. This is actually nice, because your code will be well factored.
Finally, since you cannot modify a variable, it is easier to reason about the code. With mutable objects you can never be sure whether some part of your object will be modified, and it gets progressively worse if using global variables. You should not encounter such problems when doing FP.
To try it out, you should try to manipulate some more complex data in a functional way by using pure erlang structures (not actors, ets, mnesia or proc dict). Alternatively, you might try it in ruby with this
Erlang includes the message passing approach of Alan Kay's OOP (Smalltalk) and the functional programming from Lisp.
What you describe in your example is the message approach for OOP. The Erlang processes sending messages are a concept similar to Alan Kay's objects sending messages. By the way, you can retrieve this concept implemtented also in Scratch where parallel running objects send messages between them.
The functional programming is how you code the processes. For instance, variables in Erlang cannot be modified. Once they have been set, you can only read them. You have also a list data structure which works pretty much like Lisp lists and you have fun which are insprired by Lisp's lambda.
The message passing on one side, and the functional on the other side are quite two separate things in Erlang. When coding real life erlang applications, you spend 98% of your time doing functional programming and 2% thinking about messages passing, which is mainly used for scalability and concurrency. To say it another way, when you come to tackly complex programming problem, you will probably use the FP side of Erlang to implement the details of the algo, and use the message passing for scalability, reliability, etc...
What do you think of this:
thing(0) ->
exit(this_is_the_end);
thing(Val) when is_integer(Val) ->
NewVal = receive
{From,F,Arg} -> NV = F(Val,Arg),
From ! {self(), new_value, NV},
NV;
_ -> Val div 2
after 10000
max(Val-1,0)
end,
thing(NewVal).
When you spawn the process, it will live by its own, decreasing its value until it reach the value 0 and send the message {'EXIT',this_is_the_end} to any process linked to it, unless you take care of executing something like:
ThingPid ! {self(),fun(X,_) -> X+1 end,[]}.
% which will increment the counter
or
ThingPid ! {self(),fun(X,X) -> 0; (X,_) -> X end,10}.
% which will do nothing, unless the internal value = 10 and in this case will go directly to 0 and exit
In this case you can see that the "object" lives its own live by itself in parallel with the rest of the application, that it can interact with the outside almost without any code, and that the outside can ask him to do things you didn't know when you wrote and compile the code.
This is a stupid code, but there are some principle that are used to implement application like mnesia transaction, the behaviors... IMHO the concept is really different, but you have to try to think different if you want to use it correctly. I am pretty sure that it is possible to write "OOPlike" code in Erlang, but it will be extremely difficult to avoid concurrency :o), and at the end no advantage. Have a look at OTP principle which gives some tracks about the application architecture in Erlang (supervision trees, pool of "1 single client servers", linked processes, monitored processes, and of course pattern matching single assignment, messages, node clusters ...).

Resources