Does loading a map element in Go imply copying? - go

Suppose there is a code snippet like this.
mapper := make(map[int]SomeStructType)
mapper[0] = SomeStructType{}
somestruct := mapper[0] // load mapper[0] to 'somestruct'
At the last line, does that mean mapper[0] is copied to somestruct in all situations, like even if somestruct is ever used as a read-only constant afterward?
If so, is there any way to make a reference to a map element (mapper[0] here) like in C/C++, so that I can reference it through an alias while avoiding unnecessary object copy? I tried to make a pointer to a map element, but apparently, Go does not allow me to do so.

The simple answer is NO. In Go, the map implementation may move data around, so references would get invalidated and it wouldn't be safe. In C you define your own data structure, so it's up to you how this is done; in Go, maps are implemented in the Go runtime, and it can't guess at your intent.
The solution you're looking for, I think, is keep pointers in the map, i.e.:
mapper := make(map[int]*SomeStructType)
Now accessing mapper elements will just "copy" a pointer (typically a single word), which is very cheap.
somestruct := mapper[0] // copies a pointer
It's very common to use pointer types in Go, so you wouldn't be doing anything too magical or unusual by defining mapper like this.

Related

Golang For Range Pointers

I'm new to Golang and I recently had the same problem as described in this question: Strange golang "append" behavior
So I'm wondering if it's basically never appropriate to use the object copy in the for range loop in anything outside of the scope of that loop– like passing it to a separate function, appending it (as described in the question), and so on.
Is it almost always more appropriate to access an object like this if you plan on mutating it, adding it to a list outside of the scope of that loop, and so on, because on the next loop the pointer you added will change?
for index := range myList {
doSomething(&myList[index])
}
You mostly need to keep in mind that Go is pass by value. IF you are perfectly ok using a copy, fully understanding that mutations won't reflect, and that copies of huge structs can be costly, there are perfectly valid reasons for passing either a copy or a pointer.
If you need to make modifications, either pass by reference as you show here, or you can also store them as pointers in the slice to start with e.g. []*MyStruct. Actually, that brings to light a design choice where they chose specifically in the spec to pass a struct by copying.
Under the covers, a slice abstracts an array, but is merely a struct, known as a slice header, with a pointer to the array. When you pass a slice around, the header is passed by value, but the contents basically consist of 3 integers, so it's not expensive to copy it. This is the reason you have to reassign the variable when using append for it to persist, e.g. s = append(s, value). You can pass a slice around in your code by reference, but doing so isn't idiomatic Go.

By reference or value

if i had an instance of the following struct
type Node struct {
id string
name string
address string
conn net.Conn
enc json.Encoder
dec json.Decoder
in chan *Command
out chan *Command
clients map[string]ClientNodesContainer
}
i am failing to understand when i should send a struct by reference and when should i send it by value(considering that i do not want to make any changes to that instance), is there a rule of thumb that makes it easier to decide?
all i could find is send a struct by value when its small or inexpensive to copy, but does small really mean smaller than 64bit address for example?
would be glad if someone can point some more obvious rules
The rule is very simple:
There is no concept of "pass/send by reference" in Go, all you can do is pass by value.
Regarding the question whether to pass the value of your struct or a pointer to your struct (this is not call by reference!):
If you want to modify the value inside the function or method: Pass a pointer.
If you do not want to modify the value:
If your struct is large: Use pointer.
Otherwise: It just doesn't matter.
All this thinking about how much a copy costs you is wasted time. Copies are cheap, even for medium sized structs. Passing a pointer might be a suitable optimization after profiling.
Your struct is not large. A large struct contains fields like wholeWorldBuf [1000000]uint64.
Tiny structs like yours might or might not benefit from passing a pointer and anybody who gives advice which one is better is lying: It all depends on your code and call patterns.
If you run out of sensible options and profiling shows that time is spent copying your structs: Experiment with pointers.
The principle of "usually pass values for small structs you don't intend to mutate" I agree with, but this struct, right now, is 688 bytes on x64, most of those in the embedded json structs. The breakdown is:
16*4=64 for the three strings (pointer/len pairs) and the net.Conn (an interface value)
208 for the embedded json.Encoder
392 for the embedded json.Decoder
8*3=24 for the three chan/map values (must be pointers)
Here's the code I used to get that, though you need to save it locally to run it because it uses unsafe.Sizeof.
You could embed *Encoder/*Decoder instead of pointers, leaving you at 104 bytes. I think it's reasonable to keep as-is and pass *Nodes around, though.
The Go code review comments on receiver type say "How large is large? Assume it's equivalent to passing all its elements as arguments to the method. If that feels too large, it's also too large for the receiver." There is room for differences of opinion here, but for me, nine values, some multiple words, "feels large" even before getting precise numbers.
In the "Pass Values" section the review comments doc says "This advice does not apply to large structs, or even small structs that might grow." It doesn't say that the same standard for "large" applies to normal parameters as to receivers, but I think that's a reasonable approach.
Part of the subtlety of determining largeness, alluded to in the code review doc, is that many things in Go are internally pointers or small, reference-containing structs: slices (but not fixed-size arrays), strings, interface values, function values, channels, maps. Your struct may own a []byte pointing to a 64KB buffer, or a big struct via an interface, but still be cheap to copy. I wrote some about this in another answer, but nothing I said there prevents one from having to make some less-than-scientific judgement calls.
It's interesting to see what standard library methods do. bytes.Replace has ten words' worth of args, so that's at least sometimes OK to copy onto the stack. On the other hand go/ast tends to pass around references (either * pointers or interface values) even for non-mutating methods on its not-huge structs. Conclusion: it can be hard to reduce it to a couple simple rules. :)
Most of the time you should use passing by reference. Like:
func (n *Node) exampleFunc() {
...
}
Only situation when you would like to use passing instance by value is when you would like to be sure that your instance is safe from changes.

Move Semantics in Golang

This from Bjarne Stroustrup's The C++ Programming Language, Fourth Edition 3.3.2.
We didn’t really want a copy; we just wanted to get the result out of
a function: we wanted to move a Vector rather than to copy it.
Fortunately, we can state that intent:
class Vector {
// ...
Vector(const Vector& a); // copy constructor
Vector& operator=(const Vector& a); // copy assignment
Vector(Vector&& a); // move constructor
Vector& operator=(Vector&& a); // move assignment
};
Given that definition, the compiler will choose the move constructor
to implement the transfer of the return value out of the function.
This means that r=x+y+z will involve no copying of Vectors. Instead,
Vectors are just moved.As is typical, Vector’s move constructor is
trivial to define...
I know Golang supports traditional passing by value and passing by reference using Go style pointers.
Does Go support "move semantics" the way C++11 does, as described by Stroustrup above, to avoid the useless copying back and forth? If so, is this automatic, or does it require us to do something in our code to make it happen.
Note: A few answers have been posted - I have to digest them a bit, so I haven't accepted one yet - thanks.
The breakdown is like here:
Everything in Go is passed by value.
But there are five built-in "reference types" which are passed by value as well but internally they hold references to separately maintained data structure: maps, slices, channels, strings and function values (there is no way to mutate the data the latter two reference).
Your own answer, #Vector, is incorrect is that nothing in Go is passed by reference. Rather, there are types with reference semantics. Values of them are still passed by value (sic!).
Your confusion suppsedly stems from the fact your mind is supposedly currently burdened by C++, Java etc while these things in Go are done mostly "as in C".
Take arrays and slices for instance. An array is passed by value in Go, but a slice is a packed struct containing a pointer (to an underlying array) and two platform-sized integers (the length and the capacity of the slice), and it's the value of this structure which is copied — a pointer and two integers — when it's assigned or returned etc. Should you copy a "bare" array, it would be copied literally — with all its elements.
The same applies to channels and maps. You can think of types defining channels and maps as declared something like this:
type Map struct {
impl *mapImplementation
}
type Slice struct {
impl *sliceImplementation
}
(By the way, if you know C++, you should be aware that some C++ code uses this trick to lower exposure of the details into header files.)
So when you later have
m := make(map[int]string)
you could think of it as m having the type Map and so when you later do
x := m
the value of m gets copied, but it contains just a single pointer, and so both x and m now reference the same underlying data structure. Was m copied by reference ("move semantics")? Surely not! Do values of type map and slice and channel have reference semantincs? Yes!
Note that these three types of this kind are not at all special: implementing your custom type by embedding in it a pointer to some complicated data structure is a rather common pattern.
In other words, Go allows the programmer to decide what semantics they want for their types. And Go happens to have five built-in types which have reference semantics already (while all the other built-in types have value semantics). Picking one semantics over the other does not affect the rule of copying everything by value in any way. For instance, it's fine to have pointers to values of any kind of type in Go, and assign them (so long they have compatible types) — these pointers will be copied by value.
Another angle to look at this is that many Go packages (standard and 3rd-party) prefer to work with pointers to (complex) values. One example is os.Open() (which opens a file on a filesystem) returning a value of the type *os.File. That is, it returns a pointer and expects the calling code to pass this pointer around. Surely, the Go authors might have declared os.File to be a struct containing a single pointer, essentially making this value have reference semantics but they did not do that. I think the reason for this is that there's no special syntax to work with the values of this type so there's no reason to make them work as maps, channels and slices. KISS, in other words.
Recommended reading:
"Go Data Structures"
"Go Slices: Usage and Internals"
Arrays, slices (and strings): The mechanics of 'append'"
A thead on golang-nuts — pay close attention to the reply by Rob Pike.
The Go Programming Language Specification
Calls
In a function call, the function value and arguments are evaluated in
the usual order. After they are evaluated, the parameters of the call
are passed by value to the function and the called function begins
execution. The return parameters of the function are passed by value
back to the calling function when the function returns.
In Go, everything is passed by value.
Rob Pike
In Go, everything is passed by value. Everything.
There are some types (pointers, channels, maps, slices) that have
reference-like properties, but in those cases the relevant data
structure (pointer, channel pointer, map header, slice header) holds a
pointer to an underlying, shared object (pointed-to thing, channel
descriptor, hash table, array); the data structure itself is passed by
value. Always.
Always.
-rob
It is my understanding that Go, as well as Java and C# never had the excessive copying costs of C++, but do not solve ownership transference to containers. Therefore there is still copying involved. As C++ becomes more of a value-semantics language, with references/pointers being relegated to i) smart-pointer managed objects inside classes and ii) dependence references, move semantics solves the problem of excessive copying. Note that this has nothing to do with "pass by value", nowadays everyone passes objects by Reference (&) or Const Reference (const &) in C++.
Let's look at this (1) :
BigObject BO(big,stuff,inside);
vector<BigObject> vo;
vo.reserve(1000000);
vo.push_back(BO);
Or (2)
vector<BigObject> vo;
vo.reserve(1000000);
vo.push_back(BigObject(big,stuff,inside));
Although you're passing by reference to the vector vo, in C++03 there was a copy inside the vector code.
In the second case, there is a temporary object that has to be constructed and then is copied inside the vector. Since it can only be accessed by the vector, that is a wasteful copy.
However, in the first case, our intent could be just to give control of BO to the vector itself. C++17 allows this:
(1, C++17)
vector<BigObject> vo;
vo.reserve(1000000);
vo.emplace_back(big,stuff,inside);
Or (2, C++17)
vector<BigObject> vo;
vo.reserve(1000000);
vo.push_back(BigObject(big,stuff,inside));
From what I've read, it is not clear that Java, C# or Go are exempt from the same copy duplication that C++03 suffered from in the case of containers.
The old-fashioned COW (copy-on-write) technique, also had the same problems, since the resources will be copied as soon as the object inside the vector is duplicated.
Stroustrup is talking about C++, which allows you to pass containers, etc by value - so the excessive copying becomes an issue.
In Go, (like in Delphi, Java, etc) when you pass a container type, etc they are always references, so it's a non-issue. Regardless, you don't have to deal with it or worry about in GoLang - the compiler just does what it needs to do, and from what I've seen thus far, it's doing it right.
Tnx to #KerrekSB for putting me on the right track.
#KerrekSB - I hope this is the right answer. If it's wrong, you bear no responsibility.:)

Purely functional equivalent of weakhashmap?

Weak hash tables like Java's weak hash map use weak references to track the collection of unreachable keys by the garbage collector and remove bindings with that key from the collection. Weak hash tables are typically used to implement indirections from one vertex or edge in a graph to another because they allow the garbage collector to collect unreachable portions of the graph.
Is there a purely functional equivalent of this data structure? If not, how might one be created?
This seems like an interesting challenge. The internal implementation cannot be pure because it must collect (i.e. mutate) the data structure in order to remove unreachable parts but I believe it could present a pure interface to the user, who could never observe the impurities because they only affect portions of the data structure that the user can, by definition, no longer reach.
That's an interesting concept. One major complication in a "purely functional" setting would be that object identity is not normally observable in a "purely functional" sense. I.E., if I copy an object or create a new identical one, in Java it's expected that the clone is not the original. But in a functional setting, it is expected that the new one be semantically identical to the old one, even though the garbage collector will treat it differently.
So, if we allow object identity to be a part of the semantics, it would be sound, otherwise probably not. In the latter case, even if a hack could be found (I thought of one, described below), you're likely to have the language implementation fighting you all over the place because it's going to do all sorts of things to exploit the fact that object identity is not supposed to be observable.
One 'hack' that popped into my mind would be to use unique-by-construction values as keys, so that for the most part value equality will coincide with reference equality. For example, I have a library I use personally in Haskell with the following in its interface:
data Uniq s
getUniq :: IO (Uniq RealWorld)
instance Eq (Uniq s)
instance Ord (Uniq s)
A hash map like you describe would probably mostly-work with these as key, but even here I can think of a way it might break: Suppose a user stores a key in a strict field of some data structure, with the compiler's "unbox-strict-fields" optimization enabled. If 'Uniq' is just a newtype wrapper to a machine integer, there may no longer be any object to which the GC can point and say "that's the key"; so when the user goes and unpacks his key to use it, the map may have forgotten about it already. (Edit: This particular example can obviously be worked around; make Uniq's implementation be something that can't be unboxed like that; the point is just that it's tricky precisely because the compiler is trying to be helpful in a lot of ways we might not expect)
TL;DR: I wouldn't say it can't be done, but I suspect that in many cases "optimizations" will either break or be broken by a weak hash map implementation, unless object identity is given first-class observable status.
Purely functional data-structures can't change from the user perspective. So, if I get a key from a hash-map, wait, and then get the same key again, I have to get the same value. I can hold onto keys, so they can't disappear.
The only way it could work is if the API gives me the next generation and the values aren't collected until all references to the past versions of the container are released. Users of the data-structure are expected to periodically ask for new generations to release weakly held values.
EDIT (based on comment): I understand the behavior you want, but you can't pass this test with a map that releases objects:
FunctionalWeakHashMap map = new FunctionalWeakHashMap();
{ // make scope to make o have no references
Object o = new SomeObject();
map["key"] = o;
} // at this point I lose all references to o, and the reference is weak
// wait as much time as you think it takes for that weak reference to collect,
// force it, etc
Assert.isNotNull(map["key"]); // this must be true or map is not persistent
I am suggesting that this test could pass
FunctionalWeakHashMap map = new FunctionalWeakHashMap();
{ // make scope to make o have no references
Object o = new SomeObject();
map["key"] = o;
} // at this point I lose all references to o, and the reference is weak in the map
// wait as much time as you think it takes for that weak reference to collect,
// force it, etc
map = map.nextGen();
Assert.isNull(map["key"]);

Getting a reflect.Type from a name

If I have a name of a type (i.e "container/vector"), is there a way to lookup the reflect.Type that has the given name? I'm trying to write a simple database-backed workqueue system and this it would be very difficult without this feature.
I can't see how this would be possible in any trivial way (or at all), since name resolution is part of the compiler/linker, not the runtime.
However, http://github.com/nsf/gocode might offer up some ideas. Though I'm pretty sure that works by processing the .a files in $GOROOT, so I still don't see how you'd get the reflect.Type. Maybe if the exp/eval package was more mature?
Of course if you know all the possible types you'll encounter, you could always make a map of the reflect.Type. But I'm assuming you're working with unpredictable input, or you would've thought of that.
Only way to create a reflect.Type is by having a concrete value of the intended type first. You can't even create composite-types, such as a slice ([]T), from a base type (T).
The only way to go from a string to a reflect.Type is by entering the mapping yourself.
mapping := map[string]reflect.Type {
"string": reflect.Typeof(""),
"container/vector": reflect.Typeof(new(vector.Vector)),
/* ... */
}

Resources