GobEncoder for Passing Anonymous Function via RPC - go

I'm trying to build a system that will execute a function on multiple machines, passing the function anonymously via RPC to each worker machine (a la MapReduce) to execute on some subset of data. Gob doesn't support encoding functions, though the docs for GobEncoder say that "A type that implements GobEncoder and GobDecoder has complete control over the representation of its data and may therefore contain things such as private fields, channels, and functions, which are not usually transmissible in gob streams" so it seems possible.
Any examples of how this might work? I don't know much about how this encoding/decoding should be done with Gob.

IMHO this won't work. While it is true that if your type implements
Gob{En,De}coder it can (de)serialize unexported fields of structs it is still impossible to (de)serialize code: Go is statically compiled and linked without
runtime code generation capabilities (which would circumvent compile time type
safety).
Short: You cannot serialize functions, only data. Your workers must provide
the functions you wan't to execute. Take a look at encoding/rpc.

You may want to try GoCircuit, which provides a framework that basically lets you do this:
http://www.gocircuit.org/
It works by copying your binary to the remote machine(s), starting it, then doing an RPC that effectively says "execute function X with args A, B, ..."

Related

is concurrent write on stdout threadsafe?

below code does not throw a data race
package main
import (
"fmt"
"os"
"strings"
)
func main() {
x := strings.Repeat(" ", 1024)
go func() {
for {
fmt.Fprintf(os.Stdout, x+"aa\n")
}
}()
go func() {
for {
fmt.Fprintf(os.Stdout, x+"bb\n")
}
}()
go func() {
for {
fmt.Fprintf(os.Stdout, x+"cc\n")
}
}()
go func() {
for {
fmt.Fprintf(os.Stdout, x+"dd\n")
}
}()
<-make(chan bool)
}
I tried multiple length of data, with variant https://play.golang.org/p/29Cnwqj5K30
This post says it is not TS.
This mail does not really answer the question, or I did not understand.
Package documentation of os and fmt dont mention much about this. I admit i did not dig the source code of those two packages to find further explanations, they appear too complex to me.
What are the recommendations and their references ?
I'm not sure it would qualify as a definitive answer but I'll try to provide some insight.
The F*-functions of the fmt package merely state they take a value of a type implementing io.Writer interface and call Write on it.
The functions themselves are safe for concurrent use — in the sense it's OK to call any number of fmt.Fwhaveter concurrently: the package itself is prepared for that,
but when it comes to concurrently writing to the same value of a type implementing io.Writer, the question becomes more complex because supporting of an interface in Go does not state anything about the real type concurrency-wise.
In other words, the real point of where the concurrency may or may not be allowed is deferred to the "writer" which the functions of fmt write to.
(One should also keep in mind that the fmt.*Print* functions are allowed to call Write on its destination any number of times during a single invocation, in a row, — as opposed to those provided by the stock package log.)
So, we basically have two cases:
Custom implementations of io.Writer.
Stock implementations of it, such as *os.File or wrappers around sockets produced by the functions of net package.
The first case is the simple one: whatever the implementor did.
The second case is harder: as I understand, the Go standard library's stance on this (albeit not clearly stated in the docs) in that the wrappers it provides around "things" provided by the OS—such as file descriptors and sockets—are reasonably "thin", and hence whatever semantics they implement, is transitively implemented by the stdlib code running on a particular system.
For instance, POSIX requires that write(2) calls are atomic with regard to one another when they are operating on regular files or symbolic links. This means, since any call to Write on things wrapping file descriptors or sockets actually results in a single "write" syscall of the target system, you might consult the docs of the target OS and get the idea of what will happen.
Note that POSIX only tells about filesystem objects, and if os.Stdout is opened to a terminal (or a pseudo-terminal) or to a pipe or to anything else which supports the write(2) syscall, the results will depend on what the relevant subsystem and/or the driver implement—for instance, data from multiple concurrent calls may be interspersed, or one of the calls, or both, may just be failed by the OS—unlikely, but still.
Going back to Go, from what I gather, the following facts hold true about the Go stdlib types which wrap file descriptors and sockets:
They are safe for concurrent use by themselves (I mean, on the Go level).
They "map" Write and Read calls 1-to-1 to the underlying object—that is, a Write call is never split into two or more underlying syscalls, and a Read call never returns data "glued" from the results of multiple underlying syscalls.
(By the way, people occasionally get tripped by this no-frills behaviour — for example, see this or this as examples.)
So basically when we consider this with the fact fmt.*Print* are free to call Write any number of times per a single call, your examples which use os.Stdout, will:
Never result in a data race — unless you've assigned the variable os.Stdout some custom implementation, — but
The data actually written to the underlying FD will be intermixed in an unpredictable order which may depend on many factors including the OS kernel version and settings, the version of Go used to build the program, the hardware and the load on the system.
TL;DR
Multiple concurrent calls to fmt.Fprint* writing to the same "writer" value defer their concurrency to the implementation (type) of the "writer".
It's impossible to have a data race with "file-like" objects provided by the Go stdlib in the setup you have presented in your question.
The real problem will be not with data races on the Go program level but with the concurrent access to a single resource happening on level of the OS. And there, we do not (usually) speak about data races because the commodity OSes Go supports expose things one may "write to" as abstractions, where a real data race would possibly indicate a bug in the kernel or in the driver (and the Go's race detector won't be able to detect it anyway as that memory would not be owned by the Go runtime powering the process).
Basically, in your case, if you need to be sure the data produced by any particular call to fmt.Fprint* comes out as a single contiguous piece to the actual data receiver provided by the OS, you need to serialize these calls as the fmt package provides no guarantees regarding the number of calls to Write on the supplied "writer" for the functions it exports.
The serialization may either be external (explicit, that is "take a lock, call fmt.Fprint*, release the lock") or internal — by wrapping the os.Stdout in a custom type which would manage a lock, and using it).
And while we're at it, the log package does just that, and can be used straight away as the "loggers" it provides, including the default one, allow to inhibit outputting of "log headers" (such as the timestamp and the name of the file).

Confused about performance implications of Sync

I have a question about the marker trait Sync after reading Extensible Concurrency with the Sync and Send Traits.
Java's "synchronize" means blocking, so I was very confused about how a Rust struct with Sync implemented whose method is executed on multiple threads would be effective.
I searched but found no meaningful answer. I'm thinking about it this way: every thread will get the struct's reference synchronously (blocking), but call the method in parallel, is that true?
Java: Accesses to this object from multiple threads become a synchronized sequence of actions when going through this codepath.
Rust: It is safe to access this type synchronously through a reference from multiple threads.
(The two points above are not canonical definitions, they are just demonstrations how similar words can be used in sentences to obtain different meanings)
synchronized is implemented as a mutual exclusion lock at runtime. Sync is a compile time promise about runtime properties of a specific type that allows other types depend on those properties through trait bounds. A Mutex just happens to be one way one can provide Sync behavior. Immutable types usually provide this behavior too without any runtime cost.
Generally you shouldn't rely on words having exactly the same meaning in different contexts. Java IO stream != java collection stream != RxJava reactive stream ~= tokio Stream. C volatile != java volatile. etc. etc.
Ultimately the prose matters a lot more than the keyword which are just shorthands.

Explain/Give example of "Hide pointer operations" in Code Complete 2

I am reading Code Complete 2, Chapter 7.1 and I don't understand the point author said below.
7.1 Valid Reasons to Create a Routine
Hide pointer operations
Pointer operations tend to be hard to read and error prone. By isolating them in routines (or a class, if appropriate), you can concentrate on the intent of the operation rather than the mechanics of pointer manipulation. Also, if the operations are done in only one place, you can be more certain that the code is correct. If you find a better data type than pointers, you can change the program without traumatizing the routines that would have used the pointers.
Please explain or give example of this purpose.
Essentially, the advice is a specific example of the data-hiding. It boils down to this -
Stick to Object-oriented design and hide your data within objects.
In case of pointers, the norm is to NEVER expose pointers of "internal" data-structures as public members. Rather make them private and expose ONLY certain meaningful manipulations that are allowed to be performed on the pointers as public member functions.
Portable / Easy to maintain
The added advantage (as explained in the section quoted) is that any change in the internal data structures never forces the external API to be changed. Only the internal implementation of the publicly exposed member functions needs to be modified to handle any changes.
Code re-use / Easy to debug
Also pointer manipulations are now NOT copy/pasted and littered all around the code with no idea what exactly they do. They are now limited to the member functions which are written keeping in mind how exactly the internal data structures are being manipulated.
For example if we have a table of data which the user is allowed to add rows into,
Do NOT expose
pointers to the head/tail of table.
pointers to the individual elements.
Instead create a table object that exposes the functions
addNewRowTop(newData)
addNewRowBottom(newData)
addNewRow(position, newData)
To take this further, we implement addNewRowTop() and addNewRowBottom() by simply calling addNewRow() with the proper position - another internal variable of the table object.

How to properly use Golang packages in the standard library or third-party with Goroutines?

Hi Golang programmers,
First of all I apologize if my question is not very clear initially but I'm trying to understand the proper usage pattern when writing Golang code that uses Goroutines when using the standard lib or other libraries.
Let me elaborate: Suppose I import some package that I didn't have a hand in writing that I want to utilize. Let's say this package does a simple http get request somehow to a website such as Flickr for example. If I want a concurrent request, I can just prefix the function call with the go keyword. But how do I know, that this package when doing the request doesn't already do some internal go calls itself therefore making my go calls redundant?
Do Golang packages typically say in the documentation that their method is "greened"? Or perhaps they provide two versions of a method, one that is green and one that is straight synchronous?
In my quest to understand Go idioms and usage patterns I feel like when using even packages in the standard lib that I can't be sure if my go commands are necessary. I suppose I can profile the calls, or write test code but that feels odd to have to figure out if a func is already "green".
I suppose another possibility is that it's up to me to study the source code of whatever I'm using and understand how it should be used and if the go keyword is necessary.
If anybody can shed some light on this or point me to the right documentation or even a Golang screen-cast I'd much appreciate it. I think Rob Pike briefly mentions in one talk that a good client api written go is just written in a typical synchronous manner and it's up to the caller of that api to have the choice of making it green or not.
Thanks for your time,
-Ralph
If a function / method returns some value(s), or have a side effect like that (io.Reader.Read) - then it's necessarily a synchronous thing. Unless documented otherwise, no safety for concurrent use by multiple goroutines should be assumed.
If it accepts a closure (callback) or a channel or if it returns a channel - then it is often an asynchronous thing. If that's the case, it's normally either obvious or explicitly documented. Asynchronous stuff like this is usually safe for concurrent use by multiple goroutines.

What differences are allowed between IDL of CORBA server and client?

What I think I know so far is that the CORBA specification as such doesn't allow any differences between the IDL the server program uses and the IDL the client program uses.
However, in practice, certain differences are bound to work (pretty) universally, because the underlying communication mechanism is very probably GIOP (at least IIOP) and some differences are bound to not be detectable via IIOP.
What I like to establish is which differences are allowed between the server and client IDL universally between arbitrary ORBs as long as GIOP/IIOP is used.
For example: So far I assume it works to:
Add any type/interface to the server IDL as long as the types the client IDL knows about aren't touched or any unknown new types sent back to the client.
Add a method to an existing interface on the server side -- the client should be able to continue call objects with this interface, even though his IDL doesn't list said method. (This seems to be answered with yes here.)
Add a member to the end of an enum, as long as the client never sees this new value.
Add a member to a union, as long as the client never sees this Union type with the discriminator set to the new value.
My aim is to get to something like a short list of stuff one can do in an existing IDL to extend "the server" with new stuff without having to recompile exiting clients with the modified IDL.
Yes, the server and client method set need not match completely as the methods are accessed by name (the operation field in GIOP message) and independently. In other words, GIOP call includes the method name as string, later the parameters are encoded as they are expected by this parameter. See the example of the CORBA tie and CORBA stub.
Yes, if you create and export a new interface, it is just a new interface. It can be bound to any name service independently from others, and clients unaware of this new interface just will not be able to use it. The will be able to use the known types bound to the same name service.
Yes, GIOP writes enums as unsigned longs, first value is always encoded as zeros, successive identifiers take ascending numeric values, in order of declaration from left to right. Hence it is perfectly safe to append new enum identifies but not to remove and not to reorder.
Read GIOP specification, helps a lot. It is also very good to look into the code, generated by IDL compiler, and how does it change when something is changed in the IDL.
Surely it is not a good practice to use mismatching IDLs just because of the lack of care as it is also easy to introduce incompatible changes. This makes any sense probably only in cases if you cannot longer reach and update the client as it has been released to the user.

Resources