By reference or value - go

if i had an instance of the following struct
type Node struct {
id string
name string
address string
conn net.Conn
enc json.Encoder
dec json.Decoder
in chan *Command
out chan *Command
clients map[string]ClientNodesContainer
}
i am failing to understand when i should send a struct by reference and when should i send it by value(considering that i do not want to make any changes to that instance), is there a rule of thumb that makes it easier to decide?
all i could find is send a struct by value when its small or inexpensive to copy, but does small really mean smaller than 64bit address for example?
would be glad if someone can point some more obvious rules

The rule is very simple:
There is no concept of "pass/send by reference" in Go, all you can do is pass by value.
Regarding the question whether to pass the value of your struct or a pointer to your struct (this is not call by reference!):
If you want to modify the value inside the function or method: Pass a pointer.
If you do not want to modify the value:
If your struct is large: Use pointer.
Otherwise: It just doesn't matter.
All this thinking about how much a copy costs you is wasted time. Copies are cheap, even for medium sized structs. Passing a pointer might be a suitable optimization after profiling.
Your struct is not large. A large struct contains fields like wholeWorldBuf [1000000]uint64.
Tiny structs like yours might or might not benefit from passing a pointer and anybody who gives advice which one is better is lying: It all depends on your code and call patterns.
If you run out of sensible options and profiling shows that time is spent copying your structs: Experiment with pointers.

The principle of "usually pass values for small structs you don't intend to mutate" I agree with, but this struct, right now, is 688 bytes on x64, most of those in the embedded json structs. The breakdown is:
16*4=64 for the three strings (pointer/len pairs) and the net.Conn (an interface value)
208 for the embedded json.Encoder
392 for the embedded json.Decoder
8*3=24 for the three chan/map values (must be pointers)
Here's the code I used to get that, though you need to save it locally to run it because it uses unsafe.Sizeof.
You could embed *Encoder/*Decoder instead of pointers, leaving you at 104 bytes. I think it's reasonable to keep as-is and pass *Nodes around, though.
The Go code review comments on receiver type say "How large is large? Assume it's equivalent to passing all its elements as arguments to the method. If that feels too large, it's also too large for the receiver." There is room for differences of opinion here, but for me, nine values, some multiple words, "feels large" even before getting precise numbers.
In the "Pass Values" section the review comments doc says "This advice does not apply to large structs, or even small structs that might grow." It doesn't say that the same standard for "large" applies to normal parameters as to receivers, but I think that's a reasonable approach.
Part of the subtlety of determining largeness, alluded to in the code review doc, is that many things in Go are internally pointers or small, reference-containing structs: slices (but not fixed-size arrays), strings, interface values, function values, channels, maps. Your struct may own a []byte pointing to a 64KB buffer, or a big struct via an interface, but still be cheap to copy. I wrote some about this in another answer, but nothing I said there prevents one from having to make some less-than-scientific judgement calls.
It's interesting to see what standard library methods do. bytes.Replace has ten words' worth of args, so that's at least sometimes OK to copy onto the stack. On the other hand go/ast tends to pass around references (either * pointers or interface values) even for non-mutating methods on its not-huge structs. Conclusion: it can be hard to reduce it to a couple simple rules. :)

Most of the time you should use passing by reference. Like:
func (n *Node) exampleFunc() {
...
}
Only situation when you would like to use passing instance by value is when you would like to be sure that your instance is safe from changes.

Related

Does loading a map element in Go imply copying?

Suppose there is a code snippet like this.
mapper := make(map[int]SomeStructType)
mapper[0] = SomeStructType{}
somestruct := mapper[0] // load mapper[0] to 'somestruct'
At the last line, does that mean mapper[0] is copied to somestruct in all situations, like even if somestruct is ever used as a read-only constant afterward?
If so, is there any way to make a reference to a map element (mapper[0] here) like in C/C++, so that I can reference it through an alias while avoiding unnecessary object copy? I tried to make a pointer to a map element, but apparently, Go does not allow me to do so.
The simple answer is NO. In Go, the map implementation may move data around, so references would get invalidated and it wouldn't be safe. In C you define your own data structure, so it's up to you how this is done; in Go, maps are implemented in the Go runtime, and it can't guess at your intent.
The solution you're looking for, I think, is keep pointers in the map, i.e.:
mapper := make(map[int]*SomeStructType)
Now accessing mapper elements will just "copy" a pointer (typically a single word), which is very cheap.
somestruct := mapper[0] // copies a pointer
It's very common to use pointer types in Go, so you wouldn't be doing anything too magical or unusual by defining mapper like this.

Golang For Range Pointers

I'm new to Golang and I recently had the same problem as described in this question: Strange golang "append" behavior
So I'm wondering if it's basically never appropriate to use the object copy in the for range loop in anything outside of the scope of that loop– like passing it to a separate function, appending it (as described in the question), and so on.
Is it almost always more appropriate to access an object like this if you plan on mutating it, adding it to a list outside of the scope of that loop, and so on, because on the next loop the pointer you added will change?
for index := range myList {
doSomething(&myList[index])
}
You mostly need to keep in mind that Go is pass by value. IF you are perfectly ok using a copy, fully understanding that mutations won't reflect, and that copies of huge structs can be costly, there are perfectly valid reasons for passing either a copy or a pointer.
If you need to make modifications, either pass by reference as you show here, or you can also store them as pointers in the slice to start with e.g. []*MyStruct. Actually, that brings to light a design choice where they chose specifically in the spec to pass a struct by copying.
Under the covers, a slice abstracts an array, but is merely a struct, known as a slice header, with a pointer to the array. When you pass a slice around, the header is passed by value, but the contents basically consist of 3 integers, so it's not expensive to copy it. This is the reason you have to reassign the variable when using append for it to persist, e.g. s = append(s, value). You can pass a slice around in your code by reference, but doing so isn't idiomatic Go.

Ruby Fiddle: Nested union of structs within a struct

I have seen this question asked a few times on this site, and others, and have yet to see a legitimate and actual answer aside from posting links to the Ruby docs that don't answer the question either.
Is it possible, and if so how, to have a union of structs within another struct using Ruby Fiddle? Everything within the documentation simply indicates how to create structs/unions using primitive types, but not how it would be possible to nest them, as is a common convention to do.
The Fiddle::CParser cannot parse anything other than primitive types, as well as manual creation using signatures.
I have attempted to simply use TYPE_VOIDP and use that pointer to as a location to the address create the struct, which I was fairly sure should work, but I get only junk, as if the address is incorrect. I imagine this is because not enough memory is allocated, but I since the structs within the union are different sizes, I cannot allocate it ahead of time, leading me in circles.
The basic format is something like this: (this is psuedo-code just to give idea)
struct1 = [float, float, float, int]
struct2 = [int, int]
struct3 = [float, enum, enum, float, double, float]
structMaster = [
int, // Determines the type of the within the union
char[16],
char[16],
union[struct1, struct2, struct3]
]
I have looked extensively over all the Fiddle documentation, and it never indicates if this is even possible. I am familiar with Fiddle::CStructBuilder and related classes, and posting links to it is not the answer, as I have seen in other posts asking a similar question.
I have successfully done it accomplished this with old Win32API and using binary blobs, but am now trying to accomplish this with Fiddle, and am getting myself very frustrated.
EDIT:
I did manage to find some success by calculating the offset and reading the memory directly, and casting the pointer to the proper type, but I would still love to know if there is a way to do this cleaner with a nested struct, instead of the "hackish" way I am accomplishing it.

Move Semantics in Golang

This from Bjarne Stroustrup's The C++ Programming Language, Fourth Edition 3.3.2.
We didn’t really want a copy; we just wanted to get the result out of
a function: we wanted to move a Vector rather than to copy it.
Fortunately, we can state that intent:
class Vector {
// ...
Vector(const Vector& a); // copy constructor
Vector& operator=(const Vector& a); // copy assignment
Vector(Vector&& a); // move constructor
Vector& operator=(Vector&& a); // move assignment
};
Given that definition, the compiler will choose the move constructor
to implement the transfer of the return value out of the function.
This means that r=x+y+z will involve no copying of Vectors. Instead,
Vectors are just moved.As is typical, Vector’s move constructor is
trivial to define...
I know Golang supports traditional passing by value and passing by reference using Go style pointers.
Does Go support "move semantics" the way C++11 does, as described by Stroustrup above, to avoid the useless copying back and forth? If so, is this automatic, or does it require us to do something in our code to make it happen.
Note: A few answers have been posted - I have to digest them a bit, so I haven't accepted one yet - thanks.
The breakdown is like here:
Everything in Go is passed by value.
But there are five built-in "reference types" which are passed by value as well but internally they hold references to separately maintained data structure: maps, slices, channels, strings and function values (there is no way to mutate the data the latter two reference).
Your own answer, #Vector, is incorrect is that nothing in Go is passed by reference. Rather, there are types with reference semantics. Values of them are still passed by value (sic!).
Your confusion suppsedly stems from the fact your mind is supposedly currently burdened by C++, Java etc while these things in Go are done mostly "as in C".
Take arrays and slices for instance. An array is passed by value in Go, but a slice is a packed struct containing a pointer (to an underlying array) and two platform-sized integers (the length and the capacity of the slice), and it's the value of this structure which is copied — a pointer and two integers — when it's assigned or returned etc. Should you copy a "bare" array, it would be copied literally — with all its elements.
The same applies to channels and maps. You can think of types defining channels and maps as declared something like this:
type Map struct {
impl *mapImplementation
}
type Slice struct {
impl *sliceImplementation
}
(By the way, if you know C++, you should be aware that some C++ code uses this trick to lower exposure of the details into header files.)
So when you later have
m := make(map[int]string)
you could think of it as m having the type Map and so when you later do
x := m
the value of m gets copied, but it contains just a single pointer, and so both x and m now reference the same underlying data structure. Was m copied by reference ("move semantics")? Surely not! Do values of type map and slice and channel have reference semantincs? Yes!
Note that these three types of this kind are not at all special: implementing your custom type by embedding in it a pointer to some complicated data structure is a rather common pattern.
In other words, Go allows the programmer to decide what semantics they want for their types. And Go happens to have five built-in types which have reference semantics already (while all the other built-in types have value semantics). Picking one semantics over the other does not affect the rule of copying everything by value in any way. For instance, it's fine to have pointers to values of any kind of type in Go, and assign them (so long they have compatible types) — these pointers will be copied by value.
Another angle to look at this is that many Go packages (standard and 3rd-party) prefer to work with pointers to (complex) values. One example is os.Open() (which opens a file on a filesystem) returning a value of the type *os.File. That is, it returns a pointer and expects the calling code to pass this pointer around. Surely, the Go authors might have declared os.File to be a struct containing a single pointer, essentially making this value have reference semantics but they did not do that. I think the reason for this is that there's no special syntax to work with the values of this type so there's no reason to make them work as maps, channels and slices. KISS, in other words.
Recommended reading:
"Go Data Structures"
"Go Slices: Usage and Internals"
Arrays, slices (and strings): The mechanics of 'append'"
A thead on golang-nuts — pay close attention to the reply by Rob Pike.
The Go Programming Language Specification
Calls
In a function call, the function value and arguments are evaluated in
the usual order. After they are evaluated, the parameters of the call
are passed by value to the function and the called function begins
execution. The return parameters of the function are passed by value
back to the calling function when the function returns.
In Go, everything is passed by value.
Rob Pike
In Go, everything is passed by value. Everything.
There are some types (pointers, channels, maps, slices) that have
reference-like properties, but in those cases the relevant data
structure (pointer, channel pointer, map header, slice header) holds a
pointer to an underlying, shared object (pointed-to thing, channel
descriptor, hash table, array); the data structure itself is passed by
value. Always.
Always.
-rob
It is my understanding that Go, as well as Java and C# never had the excessive copying costs of C++, but do not solve ownership transference to containers. Therefore there is still copying involved. As C++ becomes more of a value-semantics language, with references/pointers being relegated to i) smart-pointer managed objects inside classes and ii) dependence references, move semantics solves the problem of excessive copying. Note that this has nothing to do with "pass by value", nowadays everyone passes objects by Reference (&) or Const Reference (const &) in C++.
Let's look at this (1) :
BigObject BO(big,stuff,inside);
vector<BigObject> vo;
vo.reserve(1000000);
vo.push_back(BO);
Or (2)
vector<BigObject> vo;
vo.reserve(1000000);
vo.push_back(BigObject(big,stuff,inside));
Although you're passing by reference to the vector vo, in C++03 there was a copy inside the vector code.
In the second case, there is a temporary object that has to be constructed and then is copied inside the vector. Since it can only be accessed by the vector, that is a wasteful copy.
However, in the first case, our intent could be just to give control of BO to the vector itself. C++17 allows this:
(1, C++17)
vector<BigObject> vo;
vo.reserve(1000000);
vo.emplace_back(big,stuff,inside);
Or (2, C++17)
vector<BigObject> vo;
vo.reserve(1000000);
vo.push_back(BigObject(big,stuff,inside));
From what I've read, it is not clear that Java, C# or Go are exempt from the same copy duplication that C++03 suffered from in the case of containers.
The old-fashioned COW (copy-on-write) technique, also had the same problems, since the resources will be copied as soon as the object inside the vector is duplicated.
Stroustrup is talking about C++, which allows you to pass containers, etc by value - so the excessive copying becomes an issue.
In Go, (like in Delphi, Java, etc) when you pass a container type, etc they are always references, so it's a non-issue. Regardless, you don't have to deal with it or worry about in GoLang - the compiler just does what it needs to do, and from what I've seen thus far, it's doing it right.
Tnx to #KerrekSB for putting me on the right track.
#KerrekSB - I hope this is the right answer. If it's wrong, you bear no responsibility.:)

Return concrete or abstract datatypes?

I'm in the middle of reading Code Complete, and towards the end of the book, in the chapter about refactoring, the author lists a bunch of things you should do to improve the quality of your code while refactoring.
One of his points was to always return as specific types of data as possible, especially when returning collections, iterators etc. So, as I've understood it, instead of returning, say, Collection<String>, you should return HashSet<String>, if you use that data type inside the method.
This confuses me, because it sounds like he's encouraging people to break the rule of information hiding. Now, I understand this when talking about accessors, that's a clear cut case. But, when calculating and mangling data, and the level of abstraction of the method implies no direct data structure, I find it best to return as abstract a datatype as possible, as long as the data doesn't fall apart (I wouldn't return Object instead of Iterable<String>, for example).
So, my question is: is there a deeper philosophy behind Code Complete's advice of always returning as specific a data type as possible, and allow downcasting, instead of maintaining a need-to-know-basis, that I've just not understood?
I think it is simply wrong for the most cases. It has to be:
be as lenient as possible, be as specific as needed
In my opinion, you should always return List rather than LinkedList or ArrayList, because the difference is more an implementation detail and not a semantic one. The guys from the Google collections api for Java taking this one step further: they return (and expect) iterators where that's enough. But, they also recommend to return ImmutableList, -Set, -Map etc. where possible to show the caller he doesn't have to make a defensive copy.
Beside that, I think the performance of the different list implementations isn't the bottleneck for most applications.
Most of the time one should return an interface or perhaps an abstract type that represents the return value being returned. If you are returning a list of X, then use List. This ultimately provides maximum flexibility if the need arises to return the list type.
Maybe later you realise that you want to return a linked list or a readonly list etc. If you put a concrete type your stuck and its a pain to change. Using the interface solves this problem.
#Gishu
If your api requires that clients cast straight away most of the time your design is suckered. Why bother returning X if clients need to cast to Y.
Can't find any evidence to substantiate my claim but the idea/guideline seems to be:
Be as lenient as possible when accepting input. Choose a generalized type over a specialized type. This means clients can use your method with different specialized types. So an IEnumerable or an IList as an input parameter would mean that the method can run off an ArrayList or a ListItemCollection. It maximizes the chance that your method is useful.
Be as strict as possible when returning values. Prefer a specialized type if possible. This means clients do not have to second-guess or jump through hoops to process the return value. Also specialized types have greater functionality. If you choose to return an IList or an IEnumerable, the number of things the caller can do with your return value drastically reduces - e.g. If you return an IList over an ArrayList, to get the number of elements returned - use the Count property, the client must downcast. But then such downcasting defeats the purpose - works today.. won't tomorrow (if you change the Type of returned object). So for all purposes, the client can't get a count of elements easily - leading him to write mundane boilerplate code (in multiple places or as a helper method)
The summary here is it depends on the context (exceptions to most rules). E.g. if the most probable use of your return value is that clients would use the returned list to search for some element, it makes sense to return a List Implementation (type) that supports some kind of search method. Make it as easy as possible for the client to consume the return value.
I could see how, in some cases, having a more specific data type returned could be useful. For example knowing that the return value is a LinkedList rather than just List would allow you to do a delete from the list knowing that it will be efficient.
I think, while designing interfaces, you should design a method to return the as abstract data type as possible. Returning specific type would make the purpose of the method more clear about what they return.
Also, I would understand it in this way:
Return as abstract a data type as possible = return as specific a data type as possible
i.e. when your method is supposed to return any collection data type return collection rather than object.
tell me if i m wrong.
A specific return type is much more valuable because it:
reduces possible performance issues with discovering functionality with casting or reflection
increases code readability
does NOT in fact, expose more than is necessary.
The return type of a function is specifically chosen to cater to ALL of its callers. It is the calling function that should USE the return variable as abstractly as possible, since the calling function knows how the data will be used.
Is it only necessary to traverse the structure? is it necessary to sort the structure? transform it? clone it? These are questions only the caller can answer, and thus can use an abstracted type. The called function MUST provide for all of these cases.
If,in fact, the most specific use case you have right now is Iterable< string >, then that's fine. But more often than not - your callers will eventually need to have more details, so start with a specific return type - it doesn't cost anything.

Resources